Power profiling

This article covers important background information about power profiling, with an emphasis on Intel processors used in desktop and laptop machines. It serves as a starting point for anybody doing power profiling for the first time.

Basic physics concepts

In physics, power is the rate of doing work. It is equivalent to an amount of energy{.mw-redirect} consumed per unit time. In SI units, energy is measured in Joules, and power is measured in Watts, which is equivalent to Joules per second.

Although power is an instantaneous concept, in practice measurements of it are determined in a non-instantaneous fashion, i.e. by dividing an energy amount by a non-infinitesimal time period. Strictly speaking, such a computation gives the average power but this is often referred to as just the power when context makes it clear.

In the context of computing, a fully-charged mobile device battery (as found in a laptop or smartphone) holds a certain amount of energy, and the speed at which that stored energy is depleted depends on the power consumption of the mobile device. That in turn depends on the software running on the device. Web browsers are popular applications and can be power-intensive, and therefore can significantly affect battery life. As a result, it is worth optimizing (i.e. reducing) the power consumption caused by Firefox and Firefox OS.

Intel processor basics

Processor layout

The following diagram (from the Intel Power Governor documentation) shows how machines using recent Intel processors are constructed.

../_images/power-planes.jpg

The important points are as follows.

  • The processor has one or more packages. These are part of the actual processor that you buy from Intel. Client processors (e.g. Core i3/i5/i7) have one package. Server processors (e.g. Xeon) typically have two or more packages.

  • Each package contains multiple cores.

  • Each core typically has hyper-threading, which means it contains two logical CPUs.

  • The part of the package outside the cores is called the uncore or system agent. It includes various components including the L3 cache, memory controller, and, for processors that have one, the integrated GPU.

  • RAM is separate from the processor.

C-states

Intel processors have aggressive power-saving features. The first is the ability to switch frequently (thousands of times per second) between active and idle states, and there are actually several different kinds of idle states. These different states are called C-states. C0 is the active/busy state, where instructions are being executed. The other states have higher numbers and reflect increasing deeper idle states. The deeper an idle state is, the less power it uses, but the longer it takes to wake up from.

Note: the ACPI standard specifies four states, C0, C1, C2 and C3. Intel maps these to processor-specific states such as C0, C1, C2, C6 and C7. and many tools report C-states using the latter names. The exact relationship is confusing, and chapter 13 of the Intel optimization manual has more details. The important thing is that C0 is always the active state, and for the idle states a higher number always means less power consumption.

The other thing to note about C-states is that they apply both to cores and the entire package — i.e. if all cores are idle then the entire package can also become idle, which reduces power consumption even further.

The fraction of time that a package or core spends in an idle C-state is called the C-state residency. This is a misleading term — the active state, C0, is also a C-state — but one that is nonetheless common.

Intel processors have model-specific registers (MSRs) containing measurements of how much time is spent in different C-states, and tools such as powermetrics (Mac), powertop and turbostat (Linux) can expose this information.

A wakeup occurs when a core or package transitions from an idle state to the active state. This happens when the OS schedules a process to run due to some kind of event. Common causes of wakeups include scheduled timers going off and blocked I/O system calls receiving data. Maintaining C-state residency is crucial to keep power consumption low, and so reducing wakeup frequency is one of the best ways to reduce power consumption.

One consequence of the existence of C-states is that observations made during power profiling — even more than with other kinds of profiling — can disturb what is being observed. For example, the Gecko Profiler takes samples at 1000Hz using a timer. Each of these samples can trigger a wakeup, which consumes power and obscures Firefox’s natural wakeup patterns. For this reason, integrating power measurements into the Gecko Profiler is unlikely to be useful, and other power profiling tools typically use much lower sampling rates (e.g. 1Hz.)

P-states

Intel processors also support multiple P-states. P0 is the state where the processor is operating at maximum frequency and voltage, and higher-numbered P-states operate at a lower frequency and voltage to reduce power consumption. Processors can have dozens of P-states, but the transitions are controlled by the hardware and OS and so P-states are of less interest to application developers than C-states.

Power profiling how-to

This section aims to put together all the above information and provide a set of strategies for finding, diagnosing and fixing cases of high power consumption.

  • First of all, all measurements are best done on a quiet machine that is running little other than the program of interest. Global measurements in particular can be completely skewed and unreliable if this is not the case.

  • Find or confirm a test case where Firefox’s power consumption is high. “High” can most easily be gauged by comparing against other browsers. Use power measurements or estimates (e.g. via tools/power/rapl, or mach power on Mac, or Intel Power Gadget on Windows) for the comparisons. Avoid lower-quality measurements, especially Activity Monitor’s “Energy Impact”.

  • Try using differential profiling to narrow down the cause.

    • Try turning hardware acceleration on or off; e10s on or off; Flash on or off.

    • Try putting the relevant tab in the foreground vs. in the background.

    • If the problem manifests on a particular website, try saving a local copy of the site and then manually removing HTML elements to see if a particular page feature is causing the problem

  • Many power problems are caused by either high CPU usage or high wakeup frequency. Use one of the low-context tools to determine if this is the case (e.g. on Mac use powermetrics.) If so, follow that up by using a tool that gives high-context measurements, which hopefully will identify the cause of the problem.

    • For high CPU usage, many profilers can be used: Firefox’s dev tools profiler, the Gecko Profiler, or generic performance profilers.

    • For high wakeup counts, use dtrace or perf or TimerFirings logging.

  • On Mac workloads that use graphics, Activity Monitor’s “Energy” tab can tell you if the high-performance GPU is being used, which uses more power than the integrated GPU.

  • If neither CPU usage nor wakeup frequency identifies the problem, more ingenuity may be needed. Looking at other measurements (C-state residency, GPU usage, etc.) may be helpful.

  • Animations are sometimes the cause of high power consumption. The animation inspector in the Firefox Devtools can identify them. Alternatively, here is an explanation of how one developer diagnosed two animation-related problems the hard way (which required genuine platform expertise).

  • The approximate cause of power problems often isn’t that hard to find. Fixing them is often the hard part. Good luck.

  • If you do fix a problem by improving a proxy measurement, you should verify that it also improves a power measurement or estimate. That way you know the fix had a genuine effect.

Further reading

Chapter 13 of the Intel optimization manual has many details about optimizing for power consumption. Section 13.5 (“Tuning Software for Intelligent Power Consumption”) in particular is worth reading.