Firefox now has a built-in profiler. Having a profiler in the code base lets us, among other things, better measure responsiveness, charge performance costs more accurately, and run in environments and platforms in which external profilers aren't available, such as in the user's environment or on a locked Android device. This article details how to use the profiler.
Getting the Profiler Add-on
The built-in profiler has two interfaces. For Web developers there is a simplified profiler that can be opened from the menu Tools > Web Developer > Performance. A more advanced interface for developers of Mozilla's internals can be obtained by installing Benoit Girard's Gecko Profiler add-on. Once installed, you should customize the Firefox toolbar (using the "Customize" option in the hamburger menu) and add the Profiler icon to the toolbar.
Using the Add-on
Reporting a Performance Problem has a step-by-step guide for obtaining a profile when requested by Firefox developers.
Reporting a Thunderbird Performance Problem has a step-by-step guide for obtaining a profile when requested by Thunderbird developers.
The profiler uses a fixed size buffer to store a few seconds worth of samples. When it runs out of space it discards old entries. When you stop the profiler it throws away its buffer. When you start the profiler it creates it again and begins to fill it. When a profile is taken, it's sent, for viewing, to a web based application called Cleopatra.
Click the profiler icon in your toolbar to open the profiler panel. To take a profile you can use the buttons in the profiler panel or the keyboard shortcuts.
Ctrl+Shift+1
- Start/Stop the profilerCtrl+Shift+2
- Take a profile and launch Cleopatra to view it
Understanding Cleopatra Profiles
To learn more on how to read and understanding performance profile in Cleopatra (the primary front-end), see the Cleopatra page.
Profiling Boot to Gecko (with a real device)
There is a script called profile.sh
in the root of the B2G tree that simplifies most of the steps of grabbing profile information from the phone. It will profile both Gecko and the running JavaScript. To use this script:
- You need to have a local build of B2G. Make sure you build with
export B2G_PROFILING=1
in your.userconfig
file (See Customizing your .userconfig). This will not work with prebuilt binaries. Note: if you have a debug build (export B2G_DEBUG=1
in.userconfig
), there is no need to additionally exportB2G_PROFILING
. - You need to have your phone plugged into your PC and have it accessible via ADB.
The general steps for using the profiler are:
- Start the app you want to profile and perform all the steps required to get you up to the point just before the slow action that you want to investigate.
- Start the profiler.
- Perform all the actions you want to investigate on the phone.
- Capture the profile and stop the profiler.
- Upload and Share the profile to Cleopatra.
Start the Profiler
TL;DR: You should read the below instructions for more details, but your general command structure for profiling rendering will be ./profile.sh start -p b2g -t Compositor && ./profile.sh start -p YOUR APP NAME HERE
, for example ./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Settings
. You can find the list of running apps with profile.sh ps
. Run ./profile.sh capture
after performing the actions you want to profile. You have to profile the B2G app at all times.
Starting the profiler is done seperatly for each process. The general guideline is to start the profiler on the B2G parent process' compositor thread (effectively the window manager) and the app you want to profile. For example, ./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Settings
will start the profiler to profile the B2G compositor and start the profiler on the Settings app. If you start the profiler with this command, you should get an output like this:
./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Settings Process: b2g Using default features js,leaf Starting profiling PID 500.. Profiler started Process: Settings Using default features js,leaf Starting profiling PID 611.. Profiler started
Note: Using ./profile.sh
start, without any arguments, will reset the device and start the profiler on all processes once the phone reboots. This mode is deprecated.
If you don't know the name of your app or you get an error saying a process doesn't exist, try running ./profile.sh ps
. You should get something like this:
./profile.sh ps PID Name ----- ---------------- 845 b2g profiler not running 893 (Nuwa) profiler not running 909 Homescreen profiler not running 937 Messages profiler not running 1024 (Preallocated a profiler not running
Note: Process names are CASE SENSITIVE.
If you want to profile the startup of an app, and not something in a running version of the app, use the PID named " (Preallocated a". That is the process ID that will be assigned to the app that is launched next. Instead of using the name in the profile.sh command, just use the PID. For the example above: ./profile.sh start -p 1024
.
Profiler Options
./profile.sh start
Starts the profiler on a specific process / thread. For example, profiling the 'Compositor' thread on the B2G process is useful for profiling scrolling/drawing. The associated flags are:-p
: Which process to profile (b2g, Email, etc.).-e
: The number of profile entries to capture, which details how much profile data the profiler should keep. This is a circular buffer.-s
: The stack scan mode, as detailed above.-i
: The sampling interval to the specified number of milliseconds.-m
: The profiler mode.-f
: The features to enable for the profiler. The default is to profile javascript and leaf mode.-t
: Which threads to profile. To specify multiple threads, use a single argument with comma-separated names, e.g.-t Compositor,GeckoMain
. (Do not specify a second-t
argument, it will just override the first.) Note that the thread name of the Gecko main thread isGeckoMain
.
./profile.sh ps
will show running B2G processes and whether the profiler was enabled for those processes or not../profile.sh capture [pid or name]
will initiate a capture. If you don't specify any arguments, then all currently running B2G processes will be captured. Otherwise the B2G process with the indicated pid or name will be captured. The profile script will pull the profile files from the phone, and add symbols.- The profile script uses some variables from the file
.var.profile
, which is generated by the build. These will allow the script to locate yourobjdir-gecko
tree, the appropriate toolchain, and theout/target/product/<phone>
tree to get symbols for the Android libraries. - The
.txt
files will be renamed when pulled to the host and will have the following pattern:profile_HHMM_PID_NAME.txt
(or.sym
) If your capture includes multiple processes then they'll all have the same HHMM portion. The PID will be the PID of the process, and NAME will be the app name (as per theps
output). - The
.sym
files (which are the.txt
files with symbols added) can then be uploaded to the Cleopatra UI. ./profile.sh stop
will kill the currently running b2g and restart it normally (i.e. profiling disabled).
Some extra commands available in the profile script (these will not be needed normally, but can be useful if you're working on the script):
./profile.sh ls
will show all of the profile files stored on the phone (it looks in the/data/local/tmp
directory)../profile.sh ps
will show all Gecko processes and if they are being profiled../profile.sh signal [pid]
triggers the profiler to store the current profile buffers to files on the phone../profile.sh pull pid [NAME [HHMM]]
will pull the profile file for the indicated pid and rename it as mentioned above../profile.sh symbolicate filename
will take theprofile_HHMM_PID_NAME.txt
file and createprofile_HHMM_PID_NAME.sym
, which has symbols in it../profile.sh help
will print out all of the commands currently supported by the script.
Other Ways to Profile B2G:
linux's perf tool can be used: See bug 831611 comments 53 and 60. Do not do BRANCH=
when running config.sh
via perf; just do ./config.sh
.
Capture the Profile
Once you have finished all the actions under investigation, you need to capture the profile. You can capture the profile by running ./profile.sh capture
. Your output should be something like this:
./profile.sh capture Signaling Profiled Processes: 500 611 Stabilizing 500 b2g ... Pulling /data/local/tmp/profile_0_500.txt into profile_500_b2g.txt Adding symbols to profile_500_b2g.txt and creating profile_500_b2g.sym ... Stabilizing 611 Settings ... Pulling /data/local/tmp/profile_2_611.txt into profile_611_Settings.txt Adding symbols to profile_611_Settings.txt and creating profile_611_Settings.sym ... Merging profile: profile_500_b2g.sym profile_611_Settings.sym Results: profile_captured.sym Removing old profile files (from device) ... done
Important: If you do not see the line Results: profile_captured.sym
, YOUR PROFILE WAS NOT SUCCESSFULLY CAPTURED. Try it again. This should be very uncommon now.
If you use a build from pvtbuilds and you don't have the symbols locally, you can capture using the -s
parameter:
./profile.sh capture -s https://symbolapi.mozilla.org
Stop the Profiler
You can stop the profiler now: Run ./profile.sh stop
, and your phone should reboot.
./profile.sh stop Profiler appears to be running. Killing b2g ........ b2g doesn't seem to want to go away. Try rebooting.
Upload and Share the Profile to Cleopatra
You should now have a file called profile_captured.sym
. Head on over to Cleopatra to view the results — under Upload your profile here, click the Browse... button and select the profile_captured.sym
file.
After a few seconds, you should see something like this:
The next best thing you can do is to share it. Push the Share button on the bottom left of the interface:
After it finishes uploading, the URL should now be something like https://people.mozilla.org/~bgirard/cleopatra/#report=03e8dc46769c50751c23cbb9d707e980f96f56b5
. You can now send that link to whoever you want to share the results with!
Profiling local Windows builds
If you built Firefox for Windows locally and you would like to use the local symbols with the profiler, you will need to run an additional tool; see Profiling with the Built-in Profiler and Local Symbols on Windows.
Profiling Firefox mobile
- For local builds of Fennec, you should build with optimization and
STRIP_FLAGS="--strip-debug"
but NOT with--enable-profiling
. Nightly builds are already built with the appropriate flags. - You'll need to have
adb
andarm-eabi-addr2line
(which is part of the Android NDK) in your bashPATH
, so uselocate arm-eabi-addr2line
(on Linux) ormdfind name:arm-eabi-addr2line
(on OS X) and stick an export to its location in~/.bash_profile
. The extension will invoke bash to useadb
andaddr2line
. - Install the latest pre-release build in your host machine's Firefox browser that has your phone reachable via ADB. This will add a icon in the top right of the browser.
- Set
devtools.debugger.remote-enabled
totrue
inabout:config
for Fennec. - Select target Mobile USB and press Connect. The first run will take an additional 1 minute or so to pull in the required system libraries.
Profiling JS benchmark (xpcshell)
- You'll need a custom build of the xpcshell, including the following patches: 100µs sampling patch (bug 807854), and — on Linux — the experimental patch to enable for native stacks (bug 812946).
- To profile the script
run.js
with IonMonkey (-I
), type inference (-n
) and JäegerMonkey (-m
). Thgis requires the following command:
The xpcshell output all benchmark information and on its last line it output the result of the profiling, you can filter it with$ xpcshell -m -I -n -e ' const Ci = Components.interfaces; const Cc = Components.classes; var profiler = Cc["@mozilla.org/tools/profiler;1"].getService(Ci.nsIProfiler); profiler.StartProfiler( 10000000 /* = profiler memory */, 1 /* = sample rate: 100µs with patch, 1ms without */, ["stackwalk", "js"], 2 /* = features, and number of features. */ ); ' -f ./run.js -e ' var profileObj = profiler.getProfileData(); print(JSON.stringify(profileObj)); ' | tail -n 1 > run.cleo
tail -n 1
and redirect it to a file to prevent printing it in your shell. The expected size of the output is around 100 of MB. - To add symbols to your build, you need to call
./scripts/profile-symbolicate.py
available in B2G repository. If libraries are not found, you will need to patch the script with bug 812063's attachment.$ GECKO_OBJDIR=<objdir> PRODUCT_OUT=<objdir> TARGET_TOOLS_PREFIX= \ ./scripts/profile-symbolicate.py -o run.symb.cleo run.cleo
- Clone Cleopatra and start the server with
./run_webserver.sh
. - Access Cleopatra from your web browser by loading the page
localhost:8000
, and uploadrun.symb.cleo
to render the profile with most of the symbol information.
Native stack vs. Pseudo stack
The profiler periodically samples the stack(s) of thread(s) in Firefox, collecting a stack trace, and presents the aggregated results using the Cleopatra UI. Stack traces can be collected into two different ways: Pseudostack (the default) or Nativestack.
Pseudostack
With Pseudostack, we sidestep the difficulties and performance overheads of unwinding stacks in a robust and platform independent way by using function entry/exit tags added by hand to important points in the code base. The stacks you see in the UI are chains of these tags. This gives robust stacks that work on all platforms, but they miss out on un-annotated areas of the code base, and give no visibility into system libraries or drivers.
Tagging is done by adding macros of the form PROFILER_LABEL("NAMESPACE", "NAME")
. These add RAII helpers, which are used by the profiler to track entries/exits of the annotated functions. For this to be effective, you need to liberally use PROFILER_LABEL
throughout the code. See GeckoProfiler.h
for more variations like PROFILER_LABEL_PRINTF
.
Because of the small overhead of the instrumentation, the sample label shouldn't be placed inside hot loops. A profile reporting that a large portion is spent in "Unknown" code indicates that the area being executed doesn't have any sample labels. As we focus on using this tool and add additional sample labels coverage should improve.
Nativestack
Nativestack is an optional, platform specific feature that isn't complete yet. The goal is to provide "native" — that is, real — stacktraces on platforms that support it. Having this feature will give us detailed stacks and help us analyze problems where we're spending time in drivers and system libraries. We're working on building the proper stack walking and symbolization required to make this step work, and are looking for help with this feature.
Note: On Windows XP, the native and pseudostacks do not interleave properly. There is, however, a workaround in the associated bug.
Availability
The profiler will operate in either Pseudostack or Nativestack mode depending on your environment. See above for details on these.
Custom Build | Nightly | Release (Gecko 15.0+) | |
---|---|---|---|
Windows | Native stack (Custom steps) | Native stack | Pseudo stack |
Mac | Native stack | Native stack | Pseudo stack |
Linux | Pseudo stack (Bug for Native stack) | Pseudo stack (Bug for Native stack) | Pseudo stack |
Fennec | Pseudo stack (Bug for Native stack) | Pseudo stack (Bug for Native stack) | Pseudo stack (19+) |
B2G | Native stack (EHABI unwinds) | Pseudo stack | None (Bug) |
Using native stack unwinding on 32- and 64-bit Linux
Nightly builds for 32- and 64-bit Linux now have native stack unwinding via Breakpad available. This is controlled by a set of environment variables if you profile using a clean reboot of the phone (e.g. ./profile.sh
start). Otherwise, these variables are passed in via the profile.sh
script. Here are some recommended settings. I suggest you use all of them, and adjust as appropriate.
MOZ_PROFILER_VERBOSE=1
: This makes the logging output a bit more verbose, which helps to diagnose possible problems reading or using the Dwarf CFI (unwind information) that is used.MOZ_PROFILER_INTERVAL=50
: This sets the sampling interval to the specified number of milliseconds. You can reduce this down to 1 millisecond, but I'd recommend you do some trial runs at 50 milliseconds and gradually reduce the interval. Native unwinding can be expensive, so you can end up with Firefox or Fennec being unresponsive if you set the interval too low.MOZ_PROFILER_MODE=native
: This controls how stack unwinding is done, and can take three values:help
,native
,pseudo
andcombined
. WIthnative
, it uses Breakpad only to unwind the stacks. Withpseudo
, the stacks are pseudostacks only, as described above. Withcombined
, both a native and a pseudo stack trace is obtained for each sample point, and are interleaved based on observed stack pointer values, to created a combined trace. You can also set this tohelp
to get a summary of all of these options.MOZ_PROFILER_STACK_SCAN=0
(zero): Breakpad has multiple different schemes for unwinding the stack, of varying levels of trustworthyness: using Dwarf CFI data, using frame pointers, and scanning the stack looking for probable return addresses. This last scheme is used when nothing else works. It can generate useful data, but can also add frames that are not really present, which is very confusing. By default, stack scanning is disallowed. You can selectively re-enable it by changing the value to 1, 2, 3, etc. What this does is to limit the number of frames obtained by stack scanning to the specified number, and truncates the trace if any more stack-scanned frames are found. This is best left at the default setting (zero). If however you absolutely need the profiler to unwind through some library in which it is getting stuck, try increasing it gradually, but be aware you may get bogus stack traces as a result.
If you have problems getting a native stacktrace instead of a pseudostack still, try enabling both "Stackwalk" and "Breakpad" options in the profiler options.
Profile Fails to Upload
You can upload profiles up to about 10 MB in size to the public central storage (AppEngine). For profiles bigger you will have to download the profile and then either
- Share the file or
- Host the file yourself while allowing
Access-Control-Allow-Origin *
. For apache (people.mozilla.org) use$ echo "Header set Access-Control-Allow-Origin *" > .htaccess
and share the URLhttps://people.mozilla.com/~bgirard/cleopatra/?customProfile=<URL>
, replacing <URL> with the location of your profile file.
Profiling a hung process
It is possible to get profiles from hung Firefox processes using lldb1.
- After the process has hung, attach lldb.
- Type in2, :
p (void)mozilla_sampler_save_profile_to_file("somepath/profile.txt")
- Clone mstange’s handy profile analysis repository.
- Run:
python symbolicate_profile.py somepath/profile.txt
To graft symbols into the profile. mstange’s scripts do some fairly clever things to get those symbols – if your Firefox was built by Mozilla, then it will retrieve the symbols from the Mozilla symbol server. If you built Firefox yourself, it will attempt to use some cleverness3 to grab the symbols from your binary.
Your profile will now, hopefully, be updated with symbols.
Then, load up Cleopatra, and upload the profile.
I haven’t yet had the opportunity to try this, but I hope to next week. I’d be eager to hear people’s experience giving this a go – it might be a great tool in determining what’s going on in Firefox when it’s hung!
Profiling Threads
SPS has rudimentary support for profiling multiple threads. To enable it, check the 'Multi-Thread' box then enter one or more thread names into the textbox beside it. Thread names are the strings passed to the base::Thread class at initialization. At present there is no central list of these thread names, but you can find them by grepping the source.
If the filter you entered is invalid, no threads will be profiled. You can identify this by hitting Analyze (Cleopatra will show you an error message). If the filter is left empty, only the main thread is captured (as if you had not enabled Multi-Thread.)
Profiler Features
The profiler supports several features. These are options to gather additional data in your profiles. Each option will increase the performance overhead of profiling so it's important to activate only options that will provide useful information for your particular problem to reduce the distortion.
Jank-only
This feature is deprecated. The goal was to only record samples while the browser was not responsive.
Stackwalk
When taking a sample the profiler will attempt to unwind the stack using platform specific code appropriate for the ABI. This will provide an accurate callstack for most samples. On ABIs where framepointers are not avaiable this will cause a significant performance impact.
JS Profiling
Javascript callstacks will be generated and interleaved with the c++ callstacks. This will introduce an overhead when running JS.
GC Stats
Will embed GC stats from 'javascript.options.mem.notify' in the profile.
Breakpad
This feature is currently deprecated.
Main Thread IO
This will interpose file I/O and report them in the profiles.
Multi-Thread
This will sample other threads. This fields accept a comma seperated list of thread names. A thread can only be profiled if it is registered to the profiler.
Power
Use the Intel Power Gadget driver to tag each sample with the power state of the CPU.
GPU
This will insert a timer query during compositing and show the result in the Frames view. This will appropriate how much GPU time was spent compositing each frame.
Layers & Texture
The profiler can be used to view the layer tree at each composite, optionally with texture data. This can be used to debug correctness problems.
Viewing the Layer Tree
To view the layer tree, the layers.dump
pref must be set to true
in the Firefox or B2G program being profiled.
Note: in B2G, layer dumping can also be enabled from the Developer menu in Settings.
In addition, both the compositor thread and the content thread (in the case of B2G, the content thread of whichever app you're interested in) must be profiled. For example, on B2G, when profiling the Homescreen app, you might start the profiler with:
./profile.sh start -p b2g -t Compositor && ./profile.sh start -p Homescreen
Having gotten a profile this way, the layer tree for a composite can be seen by clicking on a composite in the "Frames" section of Cleopatra (you may need to a sub-range of samples to make individual composites large enough to be clicked). This will activate the "LayerTree" tab:
In this screenshot, Composite #143 has been selected. The layer tree structure can be seen in the left panel. It contains, for each layer, the type of the layer, and various metrics about the layer, such as the visible region and any transforms. In the right panel, a visualization of the layer tree (based entirely on the aforementioned metrics) is shown. Hovering over a layer in the left panel highlights the layer in the right panel. This is useful for identifying what content each layer corresponds to. Here, I'm hovering over the last layer in the layer tree (a PaintedLayerComposite), and a strip at the top of the right panel is highlighted, telling me that this layer is for the system notification bar in B2G.
Viewing Textures
Sometimes, it's useful to see not only the structure of the layer tree for each composite, but also the rendered textures for each layer. This can be achieved by additionally setting the layers.dump-texture
pref to true
, or by adding -f layersdump
to the profiler command line (the latter implies both the layers.dump
and layers.dump-texture
prefs).
Warning: Dumping texture data slows performance considerably, and requires a lot of storage for the profile files. Expect rendering to happen at a significantly reduced frame rate when profiling this way, and keep the duration of the capture short, to ensure the samples of interest aren't overwritten.
Here's how the Layer Tree view looks in Cleopatra with texture data:
This time, the visualization in right panel shows the actual textures rather than just the outlines of the layers. This can be very useful for debugging correctness problems such as a temporary visual/rendering glitch, because it allows you to find the precise composite that shows the glitch, and look at the layer tree for that composite.
Visualizing a layer tree without a profile
If you have a layer dump from somewhere (such as from adb logcat
on B2G), you can get Cleopatra to visualize it (just the structure of course, not textures) without needing a profile. To do so, paste the layer dump into the "Enter your profile data here" text field on the front page of Cleopatra:
The resulting "profile" will have the Layer Tree view enabled (but nothing else). This is useful in cases where you want to gain a quick visual understanding of a layer dump without having to take a profile.
On B2G, each line of a layer dump in adb logcat
output is prefixed with something like I/Gecko (30593):
. Cleopatra doesn't currently understand this prefix, so it needs to be removed before pasting.
Display List
Dump the display list after each refresh with the texture data. This can be used to debug correctness problems.
Contribute
- Platform tracking bugs can be found in bug 713227. Source is located in
tools/profiler
. - The profiler add-on repository can be found here: https://github.com/bgirard/Gecko-Profiler-Addon.
- The Cleopatra repository can be found here: https://github.com/bgirard/cleopatra