This article needs a technical review. How you can help.
This page outlines a simple set of techniques, each with a process to follow, for manually auditing and improving the performance of an app, with a particular emphasis on Firefox OS apps.
Techniques listed first have a higher priority and make sense to do earlier. For example, fixing over-invalidation and responsiveness will lead to vastly decreased checker-boarding, therefore these should be addressed first.
Over-invalidation
Over-invalidation means that we're repainting content that isn't changing, leading to higher CPU usage & bandwidth than necessary.
Verification Steps
- In the Firefox OS Developer settings turn on Flash repainted area.
- Perform all common interactions with the app. Look for areas of the page that are flashing but where the content hasn't changed.
Examples
The first example is good: The animated gif invalidates every frame, and the links invalidate on mouse over because of a :hover
style change. The static content however is not repainted.
The second example is good: When scrolling, new content is painted. It doesn't however repaint once on the screen.
The third example is bad: Clicking on any part of the interface repaints the whole page.
Debugging
- Run your app through the App Manager or WebIDE and tweak the structure and styles of the page to trigger the desired behavior. Positioning, z-index, and opacity styles are common culprits here.
- Use dump-painting (Example: Display list).
- Try to work out alternative display methodologies that will reduce/remove unnecessary repainting.
Useful data to report for Gaia issues
- Report back on how you think repainting could be reduced.
- Report any relevant bugs as completed once all excessive painting is resolved.
Platform Contact
Layout team
Reflows/restyles
Note: All the following information applies to restyles. Consider minimizing restyles, avoiding sync style queries, and batching all style changes in a single frame (or event if possible).
Reflowing the page is the process of deciding where all the DOM elements will be positioned on the page. Any changes to the page that have the potential of effecting the flow of the page will have to reflow the page again. The larger the DOM, the longer this can potentially take.
Reflowing is expected as the page/app is being loaded. Once the page is ready, app authors should carefully consider what interactions are necessary that will make changes that affect the flow of the document. If changes to the page are required they should happen in the same frame/refresh tick as all changes will be validated at the same time. Reflows should not happen on scrolling unless using a "virtual list" approach.
Sync reflow is when the position information of a DOM node is queried when there are pending changes made to the flow of the document (as in the case of scrollLeft
/scrollTop
/clientWidth
/clientHeight
). This will prevent reflow from being delayed to the next refresh tick, block all current execution and reflow the document.
Sometimes reflows can be avoided by using features like CSS transforms that don't affect the flow of the page.
Verification Steps
- In the Firefox OS Developer Settings Developer HUD, turn on the Reflows counter.
- Perform all common interactions with the app, and work out reflows that could be removed. Reflows should occur only when necessary, i.e. when the structure of the app must change.
Note: You can find some useful information in Paul Rouget's article Firefox OS: tracking reflows and event loop lags.
Useful data to report for Gaia issues
Report back on reflows you think could be removed, and how.
Platform Contact
Layout team.
Event loop responsiveness
The event loop is the list of events that have occurred during the running of a browser that require processing by its rendering engine. These run in sequence from oldest to newest. Paul Rouget's Firefox OS: tracking reflows and event loop lags contains further description.
Each Firefox OS app runs in its own process. Each process has its own event loop. The event loop runs in the main thread. There are other threads running (for network requests, composition, media decoding, …), but the UI of the app is rendered in the main thread. So if the event loop is blocked, the UI of the app is frozen. This is barely noticeable on desktop browsers, but with Firefox OS (and on any mobile device in general) any slow operation in the event loop will make the app choppy.
Keeping the event loop delay is important to having a smooth and responsive app, so this technique is very important. It's only listed 3rd to make sure the simplest problems have been resolved first.
Verification steps
- In the Firefox OS Developer settings Developer HUD, turn on the Jank threshold counter.
- Perform all common interactions with the app. Track the 'Jank' number for each interaction. When that number is above 10ms, your app's scrolling will be affected. Once that number is 100ms painting will fall behind and input will be delayed. 500ms+ indicates that this interaction will be noticeably jerky.
- Understand the worse case responsiveness for each interaction of your app. Consider various workloads for your app, like hundreds of pictures in the Gallery App.
Note: Workloads can be seeded in Gaia apps using the Reference workload make options.
Debugging
- Profile the app's main thread. The built-in profiler is useful if you can't make an educated guess.
- For JavaScript related slowdown, add function duration logging throughout the code.
- Disable some of the functionality of your app to see if that reduces the event loop lag (jank).
Useful data to report for Gaia issues
Report the jank numbers for various interactions, and the reference workloads used.
Platform Contact
This can vary, depending on the problem: use profilers to narrow down the issue or contact the Performance team.
Load time
This is the duration, in milliseconds, required to load the app. This is usually pretty difficult to measure because the definition of when the app is loaded varies from app to app. Generally though, we're talking about the time between the user launching the app and the visible UI being fully drawn and responsive. We put an emphasis on the visible UI because many apps require lots of computation and/or I/O to initialize their state enough to draw the full UI for the initial app screen. If every app waited until it was fully initialized before drawing the UI, they would all appear to be very slow and unresponsive.
The trick to getting a fast load time is to do as little as possible before putting up the visible part of the UI. All long term calculations and I/O should be delayed into an idle timer callback if possible. Apps with list UIs (e.g. contacts, sms, email) should only load enough data to display the UI that is initially visible to the user. The rest of the list can then be loaded in an idle timer callback.
The real key here is to make sure that the UI is drawn as early as possible. Preferably, show a UI that isn't going to have a lot of box size re-adjustments due to reflows. Having multiple box resizes during load makes the app feel like a web page rather than a native app. The goal here is to provide a native app experience using web technologies.
Verification steps
The easiest way to verify this is to use the built-in profiler: Since there are no obvious markers in the profiles to indicate where your app load ends, it will be up to you to either output data to the log or call a dummy function where the loading should be done and then find the function call in the profile. The latter method is tricky because of the asynchronous nature of the callbacks and the rendering code. It all comes down to knowing your app's code well and then spending time to do a careful examination of the profiles using the Cleopatra tool.
If you are working on a B2G app, the datazilla tool measures cold load time for each of those. That's a good place to start to see if there has been a regression.
Useful data to report for Gaia issues
- Capture profiles using the built-in profiler.
- Upload them to Cleopatra and put the Cleopatra URL in the relevant bug.
- Contact the performance team if you're stuck.
Platform Contact
This can vary, depending on the problem: use profilers to narrow down the issue or contact the Performance team.
Layer tree
Having a good layer tree means we can perform most changes without having to repaint the page. Think of a cartoon on the television like The Simpsons: Most cartoons will divide their scene into a few layers and move the layers separately instead of redrawing the scene for every frame of the cartoon. Using layers to perform animations means we can avoid redrawing the page on every frame and can simply move these layers around.
Verification Steps
Note: Verifying an optimal layer tree is a complex task that requires a lot of internal knowledge of Gecko. Be ready to ask for help.
- In the Firefox OS Developer settings, turn on the Layer border and Flash repainted area options.
- Perform all common interactions with the app.
- Look for borders around the content that is animating.
- Verify that paint flashing is kept to a minimum, as shown in the Over-invalidation section.
Further optional steps:
- Enable layers.dump. On each frame the exact layer structure will be printed to adb logcat.
- Perform all common interactions with the app.
Look for the following (Note that they might not be bugs):
- Layers that are created around content that isn't animating.
- Layers that are created and destroyed because something changes once.
- Large layers that are created at the start of — or during — an important user interaction.
- Excessive numbers of layers.
- RGBA layers in the
layers.dump
logcat output when the animated content is opaque.
Debugging
- Run the app with the App Manager or WebIDE and tweak the structure and styles of the page to trigger the desired behavior.
- Get a display list dump. Build b2g with
export B2G_DUMP_PAINTING=1
set in your .userconfig file (note that this is set automatically if you have a debug build). Run your b2g with the preferenceuser_pref("layout.display-list.dump", true);
enabled. - Look for the display list dump for the particular process at the moment of interest. It will contain a list of display items and the mappings between layers and their original frames.
Notes on creating/destroying layers
Continuously creating and destroying layers can hurt performance. Setting the pref layers.flash-borders
when layer borders are activated will add a fading animation to layer borders when layers are created. Immediately after a layer is created its border is black — it then fades into its usual color (green for most layers) over a couple of seconds. This can highlight counter-intuitive behaviors, such as elements being already "layerized" seeing their layer being destroyed and recreated after a style change (for example, an animated element that moves around and then fades to transparent currently gets its layer recreated at the start of its opacity transition).
Beware that this triggers full-tilt compositing (the compositor will try to composite continuously at 60fps regardless of whether anything has changed on the screen). So while this can be useful to understand the layerization behavior in some cases, it should not be used during performance measurements.
Platform Contact
Layout & Graphics
Checkerboarding
Checkerboarding is when an app fails to keep up with the panning happening in the main process. When this happens a solid background color will be shown temporarily until the page catches up.
Note: Ideally at this point you shouldn't experience much checkboarding — If the Over-invalidation, Reflows/restyles, and Event loop responsiveness sections return favourable results, checkerboarding should be minimal to none. Find a more in depth explanation in Mason Chang's Wow. Such Checkerboard article.
Verification Steps
- Perform all common interactions with the app, including heavy scrolling.
- Look for the background color.
- Consider first repeating some of the previous steps to rule out over-invalidation, reflows and non-responsive event loops.
Platform Contact
Graphics; verify previous steps first.
Compositor FPS
This means checking that the generated layer tree can composited by the GPU on every frame.
Verification Steps
- Turn on the Frames per second option from the Firefox OS Developer Settings Developer HUD.
- Watch the leftmost FPS counter during animations and scrolling. This number should remain very close to the refresh rate of the display (~60 FPS for most devices). Large layer transformations (check with layer border/layers.dump) will cause a dip in this counter.
Optional further steps
- Turn on the Frames per second option from the Firefox OS Developer Settings Developer HUD.
- Set
layers.offmainthreadcomposition.frame-rate
to 0. (TBD: NOT SURE HOW THIS IS DONE) - Watch the leftmost FPS counter during idle stages at various stages of your app. This number should remain very close to the refresh rate of the display (~60 FPS for most devices). There should be little to no dips below 60 FPS here.
Platform Contact
Graphics
Memory usage
This technique checks that memory consumption is not excessive.
Verification Steps
- Use tools/get_about_memory.py to get a memory reports dump while the app is running — do this both shortly after start-up, and after using the app for a while.
- Load the dumps in about:memory in Desktop Firefox. Check for any measurements that seem to be excessive, particularly "heap-unclassified".
- If "heap-unclassified" is high, re-run tools/get_about_memory.py with a DMD-enabled build, which will produce data that can be used to understand where additional memory reporters need to be added.
Platform Contact
Memshrink
Profiling
At this point most common issues should have been fixed. It's important to check some of the other things first because bugs like over-invalidation can sometimes incorrectly report that the page is expensive to paint whereas really we're simply painting too much.
With sampling profiling we can get a good approximation of where the remaining CPU time is spent and look for stages in the pipeline that are exceeding their budget or find things that are running that simply shouldn't be.
Note: See Profiling with the built-in profiler for more information on how to profile threads.
Verification Steps
- Start by profiling only the main thread at a high resolution.
- Profile the app and the compositor together, using the following command:
./profile.sh start b2g -t Compositor && ./profile.sh start APP_NAME_OR_PID
Platform Contact
Performance Team & Graphics
Power usage
Power usage for a given change is difficult to measure. Power usage is directly correlated with CPU usage, and also with the use of various hardware features of the phone.
Verification Steps
If you have a power harness available to you, you can do before and after power measurements by following the Powertool instructions. If you don't have a power harness, please contact the performance team (#fxos-perf on IRC) for help.
Platform Contact
Performance Team