Talos

Talos is a cross-platform Python performance testing framework that is specifically for Firefox on desktop. New performance tests should be added to the newer framework mozperftest unless there are limitations there (highly unlikely) that make it absolutely necessary to add them to Talos. Talos is named after the bronze automaton from Greek myth.

Talos tests are run in a similar manner to xpcshell and mochitests. They are started via the command mach talos-test. A python script then launches Firefox, which runs the tests via JavaScript special powers. The test timing information is recorded in a text log file, e.g. browser_output.txt, and then processed into the JSON format supported by Perfherder.

Talos bugs can be filed in Testing::Talos.

Talos infrastructure is still mostly documented on the Mozilla Wiki. In addition, there are plans to surface all of the individual tests using PerfDocs. This work is tracked in Bug 1674220.

Examples of current Talos runs can be found in Treeherder by searching for “Talos”. If none are immediately available, then scroll to the bottom of the page and load more test runs. The tests all share a group symbol starting with a T, for example T(c d damp g1) or T-gli(webgl).

Running Talos Locally

Running tests locally is most likely only useful for debugging what is going on in a test, as the test output is only reported as raw JSON. The CLI is documented via:

./mach talos-test --help

To quickly try out the ./mach talos-test command, the following can be run to do a single run of the DevTools’ simple netmonitor test.

# Run the "simple.netmonitor" test very quickly with 1 cycle, and 1 page cycle.
./mach talos-test --activeTests damp --subtests simple.netmonitor --cycles 1 --tppagecycles 1

The --print-suites and --print-tests are two helpful command flags to figure out what suites and tests are available to run.

# Print out the suites:
./mach talos-test --print-suites

# Available suites:
#  bcv                          (basic_compositor_video)
#  chromez                      (about_preferences_basic:tresize)
#  dromaeojs                    (dromaeo_css:kraken)
#  flex                         (tart_flex:ts_paint_flex)
# ...

# Run all of the tests in the "bcv" test suite:
./mach talos-test --suite bcv

# Print out the tests:
./mach talos-test --print-tests

# Available tests:
# ================
#
# a11yr
# -----
# This test ensures basic a11y tables and permutations do not cause
# performance regressions.
#
# ...

# Run the tests in "a11yr" listed above
./mach talos-test --activeTests a11yr

Running Talos on Try

Talos runs can be generated through the mach try fuzzy finder:

./mach try fuzzy

The following is an example output at the time of this writing. Refine the query for the platform and test suites of your choosing.

| test-windows10-64-qr/opt-talos-bcv-swr-e10s
| test-linux64-shippable/opt-talos-webgl-e10s
| test-linux64-shippable/opt-talos-other-e10s
| test-linux64-shippable-qr/opt-talos-g5-e10s
| test-linux64-shippable-qr/opt-talos-g4-e10s
| test-linux64-shippable-qr/opt-talos-g3-e10s
| test-linux64-shippable-qr/opt-talos-g1-e10s
| test-windows10-64/opt-talos-webgl-gli-e10s
| test-linux64-shippable/opt-talos-tp5o-e10s
| test-linux64-shippable/opt-talos-svgr-e10s
| test-linux64-shippable/opt-talos-flex-e10s
| test-linux64-shippable/opt-talos-damp-e10s
> test-windows7-32/opt-talos-webgl-gli-e10s
| test-linux64-shippable/opt-talos-bcv-e10s
| test-linux64-shippable/opt-talos-g5-e10s
| test-linux64-shippable/opt-talos-g4-e10s
| test-linux64-shippable/opt-talos-g3-e10s
| test-linux64-shippable/opt-talos-g1-e10s
| test-linux64-qr/opt-talos-bcv-swr-e10s

  For more shortcuts, see mach help try fuzzy and man fzf
  select: <tab>, accept: <enter>, cancel: <ctrl-c>, select-all: <ctrl-a>, cursor-up: <up>, cursor-down: <down>
  1379/2967
> talos

At a glance

Test lifecycle

  • Taskcluster schedules talos jobs

  • Taskcluster runs a Talos job on a hardware machine when one is available - this is bootstrapped by mozharness

  • Treeherder displays a green (all OK) status and has a link to Perfherder

  • 13 pushes later, analyze_talos.py is ran which compares your push to the previous 12 pushes and next 12 pushes to look for a regression

Test types

There are two different species of Talos tests:

  • Startup: Start up the browser and wait for either the load event or the paint event and exit, measuring the time

  • Page load: Load a manifest of pages

In addition we have some variations on existing tests:

  • Heavy: Run tests with the heavy user profile instead of a blank one

  • Web extension: Run tests with a web extension to see the perf impact extension have

  • Real-world WebExtensions: Run tests with a set of 5 popular real-world WebExtensions installed and enabled.

Some tests measure different things:

  • Paint: These measure events from the browser like moz_after_paint, etc.

  • ASAP: These tests go really fast and typically measure how many frames we can render in a time window

  • Benchmarks: These are benchmarks that measure specific items and report a summarized score

Startup

Startup tests launch Firefox and measure the time to the onload or paint events. We run this in a series of cycles (default to 20) to generate a full set of data. Tests that currently are startup tests are:

Page load

Many of the talos tests use the page loader to load a manifest of pages. These are tests that load a specific page and measure the time it takes to load the page, scroll the page, draw the page etc. In order to run a page load test, you need a manifest of pages to run. The manifest is simply a list of URLs of pages to load, separated by carriage returns, e.g.:

https://www.mozilla.org
https://www.mozilla.com

Example: svgx.manifest

Manifests may also specify that a test computes its own data by prepending a % in front of the line:

% https://www.mozilla.org
% https://www.mozilla.com

Example: v8.manifest

The file you created should be referenced in your test config inside of test.py. For example, open test.py, and look for the line referring to the test you want to run:

tpmanifest = '${talos}/page_load_test/svgx/svgx.manifest'
tpcycles = 1 # run a single cycle
tppagecycles = 25 # load each page 25 times before moving onto the next page

Heavy

All our testing is done with empty blank profiles, this is not ideal for finding issues for end users. We recently undertook a task to create a daily update to a profile so it is modern and relevant. It browses a variety of web pages, and have history and cache to give us a more realistic scenario.

The toolchain is documented on github and was added to Talos in bug 1407398.

Currently we have issues with this on windows (takes too long to unpack the files from the profile), so we have turned this off there. Our goal is to run this on basic pageload and startup tests.

Web extension

Web Extensions are what Firefox has switched to and there are different code paths and APIs used vs addons. Historically we don’t test with addons (other than our test addons) and are missing out on common slowdowns. In 2017 we started running some startup and basic pageload tests with a web extension in the profile (bug 1398974). We have updated the Extension to be more real world and will continue to do that.

Real-world WebExtensions

We’ve added a variation on our test suite that automatically downloads, installs and enables 5 popular WebExtensions. This is used to measure things like the impact of real-world WebExtensions on start-up time.

Currently, the following extensions are installed:

  • Adblock Plus (3.5.2)

  • Cisco Webex Extension (1.4.0)

  • Easy Screenshot (3.67)

  • NoScript (10.6.3)

  • Video DownloadHelper (7.3.6)

Note that these add-ons and versions are “pinned” by being held in a compressed file that’s hosted in an archive by our test infrastructure and downloaded at test runtime. To update the add-ons in this set, one must provide a new ZIP file to someone on the test automation team. See this comment in Bugzilla.

Paint

Paint tests are measuring the time to receive both the MozAfterPaint and OnLoad event instead of just the OnLoad event. Most tests now look for this unless they are an ASAP test, or an internal benchmark

ASAP

We have a variety of tests which we now run in ASAP mode where we render as fast as possible (disabling vsync and letting the rendering iterate as fast as it can using `requestAnimationFrame`). In fact we have replaced some original tests with the ‘x’ versions to make them measure. We do this with RequestAnimationFrame().

ASAP tests are:

Benchmarks

Many tests have internal benchmarks which we report as accurately as possible. These are the exceptions to the general rule of calculating the suite score as a geometric mean of the subtest values (which are median values of the raw data from the subtests).

Tests which are imported benchmarks are:

Row major vs. column major

To get more stable numbers, tests are run multiple times. There are two ways that we do this: row major and column major. Row major means each test is run multiple times and then we move to the next test (and run it multiple times). Column major means that each test is run once one after the other and then the whole sequence of tests is run again.

More background information about these approaches can be found in Joel Maher’s Reducing the Noise in Talos blog post.

Page sets

We run our tests 100% offline, but serve pages via a webserver. Knowing this we need to store and make available the offline pages we use for testing.

tp5pages

Some tests make use of a set of 50 “real world” pages, known as the tp5n set. These pages are not part of the talos repository, but without them the tests which use them won’t run.

  • To add these pages to your local setup, download the latest tp5n zip from tooltool, and extract it such that tp5n ends up as testing/talos/talos/tests/tp5n. You can also obtain it by running a talos test locally to get the zip into testing/talos/talos/tests/, i.e ./mach talos-test --suite damp

  • see also tp5o.

Talos Tests

Talos test lists

Extra Talos Tests

about_newtab_with_snippets

Note

add test details

File IO

File IO is tested using the tp5 test set in the xperf test.

Possible regression causes
  • nonmain_startup_fileio opt (with or without e10s) windows7-32bug 1274018 This test seems to consistently report a higher result for mozilla-central compared to Try even for an identical revision due to extension signing checks. In other words, if you are comparing Try and Mozilla-Central you may see a false-positive regression on perfherder. Graphs: non-e10s e10s

Xres (X Resource Monitoring)

A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on linux only.

xres man page.

% CPU

Cpu usage tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on windows only.

Responsiveness

contact: :jimm, :overholt

Measures the delay for the event loop to process a tracer event. For more details, see bug 631571.

The score on this benchmark is proportional to the sum of squares of all event delays that exceed a 20ms threshold. Lower is better.

We collect 8000+ data points from the browser during the test and apply this formula to the results:

return sum([float(x)*float(x) / 1000000.0 for x in val_list])

tpaint

Warning

This test no longer exists

Talos test name

Description

tpaint

twinopen but measuring the time after we receive the MozAfterPaint and OnLoad event.

Tests the amount of time it takes the open a new window. This test does not include startup time. Multiple test windows are opened in succession, results reported are the average amount of time required to create and display a window in the running instance of the browser. (Measures ctrl-n performance.)

Example Data

[209.219, 222.180, 225.299, 225.970, 228.090, 229.450, 230.625, 236.315, 239.804, 242.795, 244.5, 244.770, 250.524, 251.785, 253.074, 255.349, 264.729, 266.014, 269.399, 326.190]

Possible regression causes

  • None listed yet. If you fix a regression for this test and have some tips to share, this is a good place for them.

xperf

These tests only run on windows builds. See this active-data query for an updated set of platforms that xperf can be found on. If the query is not found, use the following on the query page:

{
    "from":"task",
    "groupby":["run.name","build.platform"],
    "limit":2000,
    "where":{"regex":{"run.name":".*xperf.*"}}
}

Talos will turn orange for ‘x’ jobs on windows 7 if your changeset accesses files which are not predefined in the allowlist during startup; specifically, before the “sessionstore-windows-restored” Firefox event. If your job turns orange, you will see a list of files in Treeherder (or in the log file) which have been accessed unexpectedly (similar to this):

TEST-UNEXPECTED-FAIL : xperf: File '{profile}\secmod.db' was accessed and we were not expecting it. DiskReadCount: 6, DiskWriteCount: 0, DiskReadBytes: 16904, DiskWriteBytes: 0
TEST-UNEXPECTED-FAIL : xperf: File '{profile}\cert8.db' was accessed and we were not expecting it. DiskReadCount: 4, DiskWriteCount: 0, DiskReadBytes: 33288, DiskWriteBytes: 0
TEST-UNEXPECTED-FAIL : xperf: File 'c:\$logfile' was accessed and we were not expecting it. DiskReadCount: 0, DiskWriteCount: 2, DiskReadBytes: 0, DiskWriteBytes: 32768
TEST-UNEXPECTED-FAIL : xperf: File '{profile}\secmod.db' was accessed and we were not expecting it. DiskReadCount: 6, DiskWriteCount: 0, DiskReadBytes: 16904, DiskWriteBytes: 0
TEST-UNEXPECTED-FAIL : xperf: File '{profile}\cert8.db' was accessed and we were not expecting it. DiskReadCount: 4, DiskWriteCount: 0, DiskReadBytes: 33288, DiskWriteBytes: 0
TEST-UNEXPECTED-FAIL : xperf: File 'c:\$logfile' was accessed and we were not expecting it. DiskReadCount: 0, DiskWriteCount: 2, DiskReadBytes: 0, DiskWriteBytes: 32768

In the case that these files are expected to be accessed during startup by your changeset, then we can add them to the allowlist.

Xperf runs tp5 while collecting xperf metrics for disk IO and network IO. The providers we listen for are:

The values we collect during stackwalk are:

Notes:

Build metrics

These are not part of the Talos code, but like Talos they are benchmarks that record data using the graphserver and are analyzed by the same scripts for regressions.

Number of constructors (num_ctors)

This test runs at build time and measures the number of static initializers in the compiled code. Reducing this number is helpful for startup optimizations.

Platform microbenchmark

IsASCII and IsUTF8 gtest microbenchmarks

Test whose name starts with PerfIsASCII test the performance of the XPCOM string IsASCII function with ASCII inputs if different lengths.

Test whose name starts with PerfIsUTF8 test the performance of the XPCOM string IsUTF8 function with ASCII inputs if different lengths.

Possible regression causes

  • The –enable-rust-simd accidentally getting turned off in automation.

  • Changes to encoding_rs internals.

  • LLVM optimizations regressing between updates to the copy of LLVM included in the Rust compiler.

Microbench

  • contact: :bholley

  • source: MozGTestBench.cpp

  • type: Custom GTest micro-benchmarking

  • data: Time taken for a GTest function to execute

  • summarization: Not a Talos test. This suite is provides a way to add low level platform performance regression tests for things that are not suited to be tested by Talos.

PerfStrip Tests

PerfStripWhitespace - call StripWhitespace() on 5 different test cases 20k times (each)

PerfStripCharsWhitespace - call StripChars(“ftrn”) on 5 different test cases 20k times (each)

PerfStripCRLF - call StripCRLF() on 5 different test cases 20k times (each)

PerfStripCharsCRLF() - call StripChars(“rn”) on 5 different test cases 20k times (each)

Stylo gtest microbenchmarks

  • contact: :bholley, :SimonSapin

  • source: gtest

  • type: Microbench

  • reporting: intervals in ms (lower is better)

  • data: each test is run and measured 5 times

  • summarization: take the median of the 5 data points; source: MozGTestBench.cpp

Servo_StyleSheet_FromUTF8Bytes_Bench parses a sample stylesheet 20 times with Stylo’s CSS parser that is written in Rust. It starts from an in-memory UTF-8 string, so that I/O or UTF-16-to-UTF-8 conversion is not measured.

Gecko_nsCSSParser_ParseSheet_Bench does the same with Gecko’s previous CSS parser that is written in C++, for comparison.

Servo_DeclarationBlock_SetPropertyById_Bench parses the string “10px” with Stylo’s CSS parser and sets it as the value of a property in a declaration block, a million times. This is similar to animations that are based on JavaScript code modifying Element.style instead of using CSS @keyframes.

Servo_DeclarationBlock_SetPropertyById_WithInitialSpace_Bench is the same, but with the string ” 10px” with an initial space. That initial space is less typical of JS animations, but is almost always there in stylesheets or full declarations like “width: 10px”. This microbenchmark was used to test the effect of some specific code changes. Regressions here may be acceptable if Servo_StyleSheet_FromUTF8Bytes_Bench is not affected.

History of tp tests

tp

The original tp test created by Mozilla to test browser page load time. Cycled through 40 pages. The pages were copied from the live web during November, 2000. Pages were cycled by loading them within the main browser window from a script that lived in content.

tp2/tp_js

The same tp test but loading the individual pages into a frame instead of the main browser window. Still used the old 40 page, year 2000 web page test set.

tp3

An update to both the page set and the method by which pages are cycled. The page set is now 393 pages from December, 2006. The pageloader is re-built as an extension that is pre-loaded into the browser chrome/components directories.

tp4

Updated web page test set to 100 pages from February 2009.

tp4m

This is a smaller pageset (21 pages) designed for mobile Firefox. This is a blend of regular and mobile friendly pages.

We landed on this on April 18th, 2011 in bug 648307. This runs for Android and Maemo mobile builds only.

tp5

Updated web page test set to 100 pages from April 8th, 2011. Effort was made for the pages to no longer be splash screens/login pages/home pages but to be pages that better reflect the actual content of the site in question. There are two test page data sets for tp5 which are used in multiple tests (i.e. awsy, xperf, etc.): (i) an optimized data set called tp5o, and (ii) the standard data set called tp5n.

tp6

Created June 2017 with recorded pages via mitmproxy using modern google, amazon, youtube, and facebook. Ideally this will contain more realistic user accounts that have full content, in addition we would have more than 4 sites- up to top 10 or maybe top 20.

These were migrated to Raptor between 2018 and 2019.