diff options
author | Allan Sandfeld Jensen <allan.jensen@qt.io> | 2020-10-12 14:27:29 +0200 |
---|---|---|
committer | Allan Sandfeld Jensen <allan.jensen@qt.io> | 2020-10-13 09:35:20 +0000 |
commit | c30a6232df03e1efbd9f3b226777b07e087a1122 (patch) | |
tree | e992f45784689f373bcc38d1b79a239ebe17ee23 /chromium/docs/speed | |
parent | 7b5b123ac58f58ffde0f4f6e488bcd09aa4decd3 (diff) | |
download | qtwebengine-chromium-85-based.tar.gz |
BASELINE: Update Chromium to 85.0.4183.14085-based
Change-Id: Iaa42f4680837c57725b1344f108c0196741f6057
Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
Diffstat (limited to 'chromium/docs/speed')
3 files changed, 150 insertions, 10 deletions
diff --git a/chromium/docs/speed/bot_health_sheriffing/how_to_disable_a_story.md b/chromium/docs/speed/bot_health_sheriffing/how_to_disable_a_story.md index 496391e9c78..3eb10fc993d 100644 --- a/chromium/docs/speed/bot_health_sheriffing/how_to_disable_a_story.md +++ b/chromium/docs/speed/bot_health_sheriffing/how_to_disable_a_story.md @@ -57,10 +57,10 @@ whereas an entry disabling a benchmark on an entire platform might look like: Once you've committed your changes locally, your CL can be submitted with: -- `No-Try:True` -- `Tbr:`someone from [`tools/perf/OWNERS`](https://cs.chromium.org/chromium/src/tools/perf/OWNERS?q=tools/perf/owners&sq=package:chromium&dr) -- `CC:`benchmark owner found in [this spreadsheet](https://docs.google.com/spreadsheets/u/1/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0) -- `Bug:`tracking bug +- `No-Try: True` +- `Tbr: `someone from [`tools/perf/OWNERS`](https://cs.chromium.org/chromium/src/tools/perf/OWNERS?q=tools/perf/owners&sq=package:chromium&dr) +- `CC: `benchmark owner found in [this spreadsheet](https://docs.google.com/spreadsheets/u/1/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0) +- `Bug: `tracking bug *Please make sure to CC the benchmark owner so that they're aware that they've lost coverage.* diff --git a/chromium/docs/speed/good_toplevel_metrics.md b/chromium/docs/speed/good_toplevel_metrics.md index 4bf6a249da5..111c9e70b24 100644 --- a/chromium/docs/speed/good_toplevel_metrics.md +++ b/chromium/docs/speed/good_toplevel_metrics.md @@ -33,6 +33,13 @@ To initially evaluate accuracy of a quality of experience metric, we rely heavil * Use the metric implementation to sort the samples. * Use [Spearman's rank-order correlation](https://statistics.laerd.com/statistical-guides/spearmans-rank-order-correlation-statistical-guide.php) to evaluate how similar the metric implementation is to the hand ordering. +## Incentivizes the right optimizations + +Ideally developers optimize their sites' performance on metrics by improving the user experience. +But if developers can easily improve their performance on a metric without improving the actual user experience, the metric does not incentivize the right things. + +For example, if we use the onload event as the time at which we consider a web page to be fully loaded, developers will shift work after the onload event to improve their page load time. In many cases, this is the right thing to do. But since the onload event doesn't correspond to any real user-visible milestone in loading the page, it's easy to just keep shifting work after it, until eventually the entire page is loaded after onload. So instead we work to write metrics that capture user experience in a way that it's clearer to developers how they should optimize. + ## Stable A metric is stable if the result doesn’t vary much between successive runs on similar input. This can be quantitatively evaluated, ideally using Chrome Trace Processor and cluster telemetry on the top 10k sites. Eventually we hope to have a concrete threshold for a specific spread metric here, but for now, we gather the stability data, and analyze it by hand. @@ -63,9 +70,17 @@ This is frequently at odds with the interpretability requirement. For example, F If your metric involves thresholds (such as the 50ms task length threshold in TTI), or heuristics (looking at the largest jump in the number of layout objects in FMP), it’s likely to be non-elastic. -## Realtime +## Performant to compute + +If a metric is to be made available in a real-user monitoring context, it must be performant enough to compute that computing the metric does not slow down the user's browsing experience. Some metrics, like [Speed Index](https://web.dev/speed-index/), are very difficult to compute quickly enough for real-user monitoring. + +## Immediate + +Ideally we would know the metric's value *at the time it occurred*. For example, as soon as there is a contentful paint, we know that First Contentful Paint has occurred. But when the largest image or text paints to the screen, while we know it is the Largest Contentful Paint *so far*, we do not know if there will be another, larger contentful paint later on. So we can't know the value of Largest Contentful Paint until an input, scroll, or page unload. + +This isn’t always attainable, but when possible, it avoids some classes of [survivorship bias](https://en.wikipedia.org/wiki/Survivorship_bias), which makes metrics easier to analyze. -We’d like to have metrics which we can compute in realtime. For example, if we’re measuring First Meaningful Paint, we’d like to know when First Meaningful Paint occurred *at the time it occurred*. This isn’t always attainable, but when possible, it avoids some classes of [survivorship bias](https://en.wikipedia.org/wiki/Survivorship_bias), which makes metrics easier to analyze. +It also makes it easier for developers to reason about simple things like when to send a beacon to analytics, and more complex things like deferring work until after a metric representing a major milestone, like the main content being displayed. ## Orthogonal @@ -90,18 +105,22 @@ We'd like to have metrics that correlate well in the wild and in the lab, so tha * Summary [here](https://docs.google.com/document/d/1GGiI9-7KeY3TPqS3YT271upUVimo-XiL5mwWorDUD4c/edit#heading=h.iqlwzaf6lqrh), analysis [here](https://docs.google.com/document/d/1pZsTKqcBUb1pc49J89QbZDisCmHLpMyUqElOwYqTpSI/edit#bookmark=id.4euqu19nka18). Overall, based on manual investigation of 25 sites, our approach fired uncontroversially at the right time 64% of the time, and possibly too late the other 36% of time. We split TTI in two to allow this metric to be quite pessimistic about when TTI fires, so we’re happy with when this fires for all 25 sites. A few issues with this research: * Ideally someone less familiar with our approach would have performed the evaluation. * Ideally we’d have looked at more than 25 sites. +* Incentivizes the right optimizations + * In real-user monitoring, Time To Interactive often isn't fully measured before the page is loaded. If users leave the page before TTI is complete, the value isn't tracked. This means that sites could accidentally improve the metric if the slowest users leave the page earlier. This doesn't incentivize the right thing, which is part of the reason we recommend [First Input Delay](https://web.dev/fid) for real-user monitoring instead. * Stable * Analysis [here](https://docs.google.com/document/d/1GGiI9-7KeY3TPqS3YT271upUVimo-XiL5mwWorDUD4c/edit#heading=h.27s41u6tkfzj). * Interpretable * Time to Interactive is easy to explain. We report the first 5 second window where the network is roughly idle and no tasks are greater than 50ms long. * Elastic - * Time to Interactive is generally non-elastic. We’re investigating another metric which will quantify how busy the main thread is between FMP and TTI, which should be a nice elastic proxy metric for TTI. + * Time to Interactive is generally non-elastic. This is the reason we now recommend Total Blocking Time (TBT) for lab monitoring. Analysis [here](https://docs.google.com/document/d/1xCERB_X7PiP5RAZDwyIkODnIXoBk-Oo7Mi9266aEdGg/edit). * Simple * Time To Interactive has a reasonable amount of complexity, but is much simpler than Time to First Interactive. Time to Interactive has 3 parameters: * Number of allowable requests during network idle (currently 2). * Length of allowable tasks during main thread idle (currently 50ms). * Window length (currently 5 seconds). -* Realtime - * Time To Interactive is definitely not realtime, as it needs to wait until it’s seen 5 seconds of idle time before declaring that we became interactive at the start of the 5 second window. +* Immediate + * Time To Interactive is definitely not immediate, as it needs to wait until it’s seen 5 seconds of idle time before declaring that we became interactive at the start of the 5 second window. First Input Delay is an immediate alternative. +* Performant to Compute + * Time To Interactive is performant enough in Chrome that it can be used for real-user monitoring, but we recommend [First Input Delay](https://web.dev/fid) due to its issues with incentivizing the right optimizations and elasticity. * Orthogonal - * Time to Interactive aims to represent interactivity during page load, which is also what [First Input Delay](https://web.dev/fid/) aims to represent. The reason is that we haven't found a way to accurately represent this across the lab (TTI) and wild (FID) with a single metric. + * Time to Interactive aims to represent interactivity during page load, which is also what [First Input Delay](https://web.dev/fid/) aims to represent. The reason is that we haven't found a way to accurately represent this across the lab (TBT/TTI) and wild (FID) with a single metric. diff --git a/chromium/docs/speed/graphics_metrics_definitions.md b/chromium/docs/speed/graphics_metrics_definitions.md new file mode 100644 index 00000000000..b23e5774a5d --- /dev/null +++ b/chromium/docs/speed/graphics_metrics_definitions.md @@ -0,0 +1,121 @@ +# Graphics metrics: Definitions + +We need to have a metric to understand the smoothness of a particular +interaction (e.g. scroll, animation, etc.). We also need to understand the +latency of such interactions (e.g. touch-on-screen to +scroll-displayed-on-screen), and the throughput during the interaction. + +[TOC] + +We define these metrics as follows: + +## Responsiveness / Latency + +Responsiveness is a measure of how quickly the web-page responds to an event. +Latency is defined as the time between when an event happens, (e.g. moving a +touch-point on screen) and when the screen is updated directly in response to +that event [1]. For example, the event could be a moving touch-point on the +touchscreen, and the update would be scrolled content in response to that +(may only require the compositor frame update). If a rAF callback was +registered, the event would be the one that caused the current script execution +(e.g. a click event which triggered rAF), and the update would be the displayed +frame after the rAF callback is run and the content from the main-thread has +been presented on screen. + +## Throughput + +The ratio between the number of times the screen is updated for a particular +interaction (e.g. scroll, animation, etc.), and the number of times the screen +was expected to be updated (see examples below). On a 60Hz display, there would +ideally be 60 frames produced during a scroll or an animation. + +## DroppedFrames / SkippedFrames + +The ratio between the number of dropped/skipped frames for a particular +interaction, and the number of times the screen was expected to be updated. This +is the other part data of Throughput so it is a "lower-is-better" metric and +works better with current out perf tools. + +## Smoothness / Jank + +Smoothness is a measure of how consistent the throughput is. Jank during an +interaction is defined as a change in the throughput for consecutive frames. +To explain this further: + +Consider the following presented frames: + +**f1** +**f2** +**f3** +**f4** +**f5** +**f6** +**f7** +**f8** +**f9** + +Each highlighted **fn** indicates a frame that contained response from the +renderer[2]. So in the above example, there were no janks, and throughput was +100%: i.e. all the presented frames included updated content. + +Considering the following frames: + +**f1** +**f2** +f3 +**f4** +f5 +**f6** +**f7** +**f8** +**f9** + +In this case, frames `f3` and `f5` did not include any updates (either +display-compositor was unable to submit a new frame, or the frame submitted by +the display compositor did not include any updates from the renderer). +Therefore, the throughput during this interaction is 78%. + +To explain the jank, during the first two frames `[f1, f2]`, the throughput is +100%. Because of the subsequently missed frame `f3`, the throughput changes for +`[f2, f4]` drops to 67%. The throughput for `[f4, f6]` is also 67%. For +subsequent frames, the throughput goes back up to 100%. Therefore, there was a +single jank. + +Consider the following two sequences: + +**f1** +**f2** +**f3** +**f4** +f5 +f6 +f7 +f8 +**f9** + +**f1** +f2 +**f3** +f4 +**f5** +f6 +**f7** +f8 +**f9** + +In both cases, throughput is 55%, since only 5 out of 9 frames are displayed. +In the first sequence, there is a jank (`[f1, f2][f2, f3][f3, f4]` has 100% +throughput, but `[f4, f9]` has a throughput of 33%). However, in the second +sequence, there are no janks, since `[f1, f3]` `[f3, f5]` `[f5, f7]` `[f7, f9]` +all have 67% throughput. + +[1]: Indirect updates in response to an event, e.g. an update from a +setTimeout() callback from an event-handler would not be associated with that +event. + +[2]: Note that the response could be either an update to the content, or a +notification that no update is expected for that frame. For example, for a 30fps +animation in this frame-sequence, only frames `f1` `f3` `f5` `f7` `f9` will have +actual updates from the animation, and frames `f2` `f4` `f6` `f8` should still +have notification from the client that no update is expected. + |