4 files changed, 100 insertions, 53 deletions
diff --git a/chromium/docs/testing/android_test_instructions.md b/chromium/docs/testing/android_test_instructions.md
index ec2786c0c04..6a51945bb9e 100644
--- a/chromium/docs/testing/android_test_instructions.md
+++ b/chromium/docs/testing/android_test_instructions.md
@@ -172,10 +172,10 @@ For example, adding a test to `chrome_junit_tests` requires to update
 ninja -C out/Default chrome_junit_tests
 
 # Run the test suite.
-out/Default/run_chrome_junit_tests
+out/Default/bin/run_chrome_junit_tests
 
 # Run a subset of tests. You might need to pass the package name for some tests.
-out/Default/run_chrome_junit_tests -f "org.chromium.chrome.browser.media.*"
+out/Default/bin/run_chrome_junit_tests -f "org.chromium.chrome.browser.media.*"
 ```
 
 ### Debugging
diff --git a/chromium/docs/testing/rendering_representative_perf_tests.md b/chromium/docs/testing/rendering_representative_perf_tests.md
index c4b8c5a4b9f..da3600b0370 100644
--- a/chromium/docs/testing/rendering_representative_perf_tests.md
+++ b/chromium/docs/testing/rendering_representative_perf_tests.md
@@ -1,10 +1,14 @@
 # Representative Performance Tests for Rendering Benchmark
 
-`rendering_representative_perf_tests` runs a sub set of stories from rendering
+`rendering_representative_perf_tests` run a subset of stories from rendering
 benchmark on CQ, to prevent performance regressions. For each platform there is
 a `story_tag` which describes the representative stories used in this test.
-These stories will be tested using the [`run_benchmark`](../../tools/perf/run_benchmark) script. Then the recorded values for `frame_times` will be
-compared with the historical upper limit described in [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`](../../testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json).
+These stories will be tested using the [`run_benchmark`](../../tools/perf/run_benchmark) script, and then the recorded values for the target metric (currently `frame_times`) will be
+compared with the historical upper limit described in [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`][rep_perf_json].
+
+These tests are currently running on CQ on:
+* mac-rel
+* win10_chromium_x64_rel_ng
 
 [TOC]
 
@@ -18,13 +22,60 @@ Example:`animometer_webgl_attrib_arrays higher average frame_times(21.095) compa
 This means that the animometer_webgl_attrib_arrays story has the average frame_times of 21 ms and the recorded upper limit for the story (in the tested platform) is 17 ms.
 
 In these cases the failed story will be ran three more times to make sure that this has not been a flake, and the new result (average of three runs) will be reported in the same format.
-For deeper investigation of such cases you can find the traces of the runs in the isolated outputs of the test. In the isolated outputs directory look at output.json for the initial run and at re_run_failures/output.json for the three re-runs.
+For deeper investigation of such cases you can find the traces of the runs in the isolated outputs of the test. In the isolated outputs directory look at `output.json` for the initial run and at `re_run_failures/output.json` for the three traces recorded from re-runs.
+If the failure is as a result of an expected regression, please follow the instructions in the next section for the "Updating Expectations".
 
 In the `output.json` file, you can find the name of the story and under the trace.html field of the story a gs:// link to the trace ([Example](https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=f8961d773cdf0bf121525f29024c0e6c19d42e61&as=output.json)).
 To download the trace run: `gsutil cp gs://link_from_output.json trace_name.html`
 
 Also if tests fail with no specific messages in the output, it will be useful to check the {benchmark}/benchmark_log.txt file in the isolated outputs directory for more detailed log of the failure.
 
+### Running representative perf tests locally
+
+You can run the representative perf tests locally for more investigation, but it is important to note that the values may differ with the values reported on the bots as these tests can have different values for different hardware.
+
+```
+./testing/scripts/run_rendering_benchmark_with_gated_performance.py ./tools/perf/run_benchmark \
+--benchmark rendering.desktop --isolated-script-test-output /tmp/temp_dir/ \
+--isolated-script-test-perf-output /tmp/temp_dir
+```
+*rendering.mobile for running mobile representatives*
+
+
+For investigation of crashes (or when the comparison of the values is not the focus) it might be easier to directly run the benchmark for the representative story tags such as:
+*   `representative_win_desktop (benchmark: rendering.desktop)`
+*   `representative_mac_desktop (benchmark: rendering.desktop)`
+*   `representative_mobile (benchmark: rendering.mobile)`
+
+```
+./tools/perf/run_benchmark rendering.desktop --story-tag-filter representative_win_desktop
+```
+
+## Updating Expectations
+
+There might be multiple reasons to skip a story in representative perf tests such as:
+*   Tests results are flaky and the story needs to be skipped
+*   Adding a new change with expected regression, so we need to skip the test, so that the change would pass on CQ), and adjust the upper limit later.
+*   We want to add a new story to the representative set and the appropriate upper limit is not known yet
+
+In these cases the story should not cause a failure but it needs to record the values for later adjustments.
+As a result the preferred method to skip a story of the representative perf test is to mark the specific story as experimental in [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`][rep_perf_json] along with a bug referring to the cause of test suppression (flakiness, change with expected regression or experimenting with new representatives). This way the test will be run but the values will not be considered for failing the test.
+
+To do so find the story under the affected platform in [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`][rep_perf_json] and add the "experimental" tag to it.
+
+```
+"platform": {
+    "story_name": {
+        "ci_095": 0.377,
+        "avg": 31.486,
+        "experimental": true,
+        "_comment": "crbug.com/bug_id"
+    },
+}
+```
+*[Example Cl](https://chromium-review.googlesource.com/c/chromium/src/+/2208294)*
+
+
 ## Maintaining Representative Performance Tests
 
 ### Clustering the Benchmark and Choosing Representatives
@@ -43,27 +94,15 @@ managed by adding and removing story tags above to stories in [rendering benchma
 ### Updating the Upper Limits
 
 The upper limits for averages and confidence interval (CI) ranges of
-`frame_times` described in [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`](../../testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json)
-are used to passing or failing a test. These values are the 95 percentile of
-the past 30 runs of the test on each platform (for both average and CI).
+`frame_times` described in [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`][rep_perf_json] are used to passing or failing a test. These values are the 95 percentile of the past 30 runs of the test on each platform (for both average and CI).
 
 This helps with catching sudden regressions which results in a value higher
 than the upper limits. But in case of gradual regressions, the upper limits
 may not be useful in not updated frequently. Updating these upper limits also
 helps with adopting to improvements.
 
-Updating these values can be done by running [`src/tools/perf/experimental/representative_perf_test_limit_adjuster/adjust_upper_limits.py`](../../tools/perf/experimental/representative_perf_test_limit_adjuster/adjust_upper_limits.py)and committing the changes.
+Updating these values can be done by running [`src/tools/perf/experimental/representative_perf_test_limit_adjuster/adjust_upper_limits.py`](../../tools/perf/experimental/representative_perf_test_limit_adjuster/adjust_upper_limits.py) and committing the changes.
 The script will create a new JSON file using the values of recent runs in place
-of [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`](../../testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json).
-
-### Updating Expectations
-
-To skip any of the tests, update
-[`src/tools/perf/expectations.config`](../../tools/perf/expectations.config) and
-add the story under rendering benchmark (Examples [1](https://chromium-review.googlesource.com/c/chromium/src/+/2055681), [2](https://chromium-review.googlesource.com/c/chromium/src/+/1901357)).
-This expectations file disables the story on the rendering benchmark, which rendering_representative_perf_tests are part of.
-So please add the a bug for each skipped test and link to `Internals>GPU>Metrics`.
+of [`src/testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json`][rep_perf_json].
 
-If the test is part of representative perf tests on Windows or MacOS, this
-should be done under rendering.desktop benchmark and if it's a test on Android
-under rendering.mobile.
-\ No newline at end of file
+[rep_perf_json]: ../../testing/scripts/representative_perf_test_data/representatives_frame_times_upper_limit.json
+\ No newline at end of file
diff --git a/chromium/docs/testing/web_tests.md b/chromium/docs/testing/web_tests.md
index 5e4d188e293..faa428ca938 100644
--- a/chromium/docs/testing/web_tests.md
+++ b/chromium/docs/testing/web_tests.md
@@ -55,23 +55,26 @@ strip ./xcodebuild/{Debug,Release}/content_shell.app/Contents/MacOS/content_shel
 
 TODO: mention `testing/xvfb.py`
 
-The test runner script is in
-`third_party/blink/tools/run_web_tests.py`.
+The test runner script is in `third_party/blink/tools/run_web_tests.py`.
 
 To specify which build directory to use (e.g. out/Default, out/Release,
 out/Debug) you should pass the `-t` or `--target` parameter. For example, to
 use the build in `out/Default`, use:
 
 ```bash
-python third_party/blink/tools/run_web_tests.py -t Default
+third_party/blink/tools/run_web_tests.py -t Default
 ```
 
 For Android (if your build directory is `out/android`):
 
 ```bash
-python third_party/blink/tools/run_web_tests.py -t android --android
+third_party/blink/tools/run_web_tests.py -t android --android
 ```
 
+*** promo
+Windows users need to use `third_party/blink/tools/run_web_tests.bat` instead.
+***
+
 Tests marked as `[ Skip ]` in
 [TestExpectations](../../third_party/blink/web_tests/TestExpectations)
 won't be run by default, generally because they cause some intractable tool error.
@@ -97,13 +100,13 @@ arguments to `run_web_tests.py` relative to the web test directory
 use:
 
 ```bash
-python third_party/blink/tools/run_web_tests.py fast/forms
+third_party/blink/tools/run_web_tests.py fast/forms
 ```
 
 Or you could use the following shorthand:
 
 ```bash
-python third_party/blink/tools/run_web_tests.py fast/fo\*
+third_party/blink/tools/run_web_tests.py fast/fo\*
 ```
 
 *** promo
@@ -111,7 +114,7 @@ Example: To run the web tests with a debug build of `content_shell`, but only
 test the SVG tests and run pixel tests, you would run:
 
 ```bash
-[python] third_party/blink/tools/run_web_tests.py -t Default svg
+third_party/blink/tools/run_web_tests.py -t Default svg
 ```
 ***
 
@@ -143,7 +146,7 @@ for more details of running `content_shell`.
 To see a complete list of arguments supported, run:
 
 ```bash
-python third_party/blink/tools/run_web_tests.py --help
+third_party/blink/tools/run_web_tests.py --help
 ```
 
 *** note
@@ -217,7 +220,7 @@ There are two ways to run web tests with additional command-line arguments:
 * Using `--additional-driver-flag`:
 
   ```bash
-  python run_web_tests.py --additional-driver-flag=--blocking-repaint
+  third_party/blink/tools/run_web_tests.py --additional-driver-flag=--blocking-repaint
   ```
 
   This tells the test harness to pass `--blocking-repaint` to the
@@ -382,10 +385,10 @@ tips for finding the problem.
       spacing or box sizes are often unimportant, especially around fonts and
       form controls. Differences in wording of JS error messages are also
       usually acceptable.
-    * `python run_web_tests.py path/to/your/test.html` produces a page listing
-      all test results. Those which fail their expectations will include links
-      to the expected result, actual result, and diff. These results are saved
-      to `$root_build_dir/layout-test-results`.
+    * `third_party/blink/tools/run_web_tests.py path/to/your/test.html` produces
+      a page listing all test results. Those which fail their expectations will
+      include links to the expected result, actual result, and diff. These
+      results are saved to `$root_build_dir/layout-test-results`.
         * Alternatively the `--results-directory=path/for/output/` option allows
           you to specify an alternative directory for the output to be saved to.
     * If you're still sure it's correct, rebaseline the test (see below).
@@ -428,8 +431,7 @@ tips for finding the problem.
 To run the server manually to reproduce/debug a failure:
 
 ```bash
-cd src/third_party/blink/tools
-python run_blink_httpd.py
+third_party/blink/tools/run_blink_httpd.py
 ```
 
 The web tests are served from `http://127.0.0.1:8000/`. For example, to
@@ -473,18 +475,12 @@ machine?
 
 ### Debugging DevTools Tests
 
-* Add `debug_devtools=true` to `args.gn` and compile: `autoninja -C out/Default devtools_frontend_resources`
-  > Debug DevTools lets you avoid having to recompile after every change to the DevTools front-end.
 * Do one of the following:
     * Option A) Run from the `chromium/src` folder:
-      `third_party/blink/tools/run_web_tests.sh
-      --additional-driver-flag='--debug-devtools'
-      --additional-driver-flag='--remote-debugging-port=9222'
-      --time-out-ms=6000000`
+      `third_party/blink/tools/run_web_tests.py --additional-driver-flag='--remote-debugging-port=9222' --additional-driver-flag='--debug-devtools' --time-out-ms=6000000`
     * Option B) If you need to debug an http/tests/inspector test, start httpd
       as described above. Then, run content_shell:
-      `out/Default/content_shell --debug-devtools --remote-debugging-port=9222 --run-web-tests
-      http://127.0.0.1:8000/path/to/test.html`
+      `out/Default/content_shell --remote-debugging-port=9222 --additional-driver-flag='--debug-devtools' --run-web-tests http://127.0.0.1:8000/path/to/test.html`
 * Open `http://localhost:9222` in a stable/beta/canary Chrome, click the single
   link to open the devtools with the test loaded.
 * In the loaded devtools, set any required breakpoints and execute `test()` in
@@ -546,8 +542,7 @@ read on.
 ***
 
 ```bash
-cd src/third_party/blink
-python tools/run_web_tests.py --reset-results foo/bar/test.html
+third_party/blink/tools/run_web_tests.py --reset-results foo/bar/test.html
 ```
 
 If there are current expectation files for `web_tests/foo/bar/test.html`,
@@ -569,12 +564,18 @@ Though we prefer the Rebaseline Tool to local rebaselining, the Rebaseline Tool
 doesn't support rebaselining flag-specific expectations.
 
 ```bash
-cd src/third_party/blink
-python tools/run_web_tests.py --additional-driver-flag=--enable-flag --reset-results foo/bar/test.html
+third_party/blink/tools/run_web_tests.py --additional-driver-flag=--enable-flag --reset-results foo/bar/test.html
 ```
+*** promo
+You can use `--flag-specific=config` as a shorthand of
+`--additional-driver-flag=--enable-flag` if `config` is defined in
+`web_tests/FlagSpecificConfig`.
+***
 
 New baselines will be created in the flag-specific baselines directory, e.g.
-`web_tests/flag-specific/enable-flag/foo/bar/test-expected.{txt,png}`.
+`web_tests/flag-specific/enable-flag/foo/bar/test-expected.{txt,png}`
+or
+`web_tests/flag-specific/config/foo/bar/test-expected.{txt,png}`
 
 Then you can commit the new baselines and upload the patch for review.
 
diff --git a/chromium/docs/testing/web_tests_in_content_shell.md b/chromium/docs/testing/web_tests_in_content_shell.md
index 9b4c4229481..ecdb57aabfd 100644
--- a/chromium/docs/testing/web_tests_in_content_shell.md
+++ b/chromium/docs/testing/web_tests_in_content_shell.md
@@ -14,11 +14,15 @@ You can run web tests using `run_web_tests.py` (in
 `src/third_party/blink/tools`).
 
 ```bash
-python third_party/blink/tools/run_web_tests.py storage/indexeddb
+third_party/blink/tools/run_web_tests.py -t <build directory> storage/indexeddb
 ```
 To see a complete list of arguments supported, run with `--help`.
 
-***promo
+*** promo
+Windows users need to use `third_party/blink/tools/run_web_tests.bat` instead.
+***
+
+*** promo
 You can add `<path>/third_party/blink/tools` into `PATH` so that you can
 run it from anywhere without the full path.
 ***
@@ -85,6 +89,8 @@ Then run the test with a localhost URL:
 out/Default/content_shell --run-web-tests http://localhost:8000/<test>
 ```
 
+It may be necessary specify [--enable-blink-features](https://source.chromium.org/search?q=%22--enable-blink-features%3D%22) explicitly for some tests.
+
 #### Running WPT Tests in Content Shell
 
 Similar to HTTP tests, many WPT (a.k.a. web-platform-tests under
@@ -92,8 +98,9 @@ Similar to HTTP tests, many WPT (a.k.a. web-platform-tests under
 tests require some setup before running in Content Shell:
 
 ```bash
-python third_party/blink/tools/run_blink_wptserve.py
+python third_party/blink/tools/run_blink_wptserve.py -t <build directory>
 ```
+
 Then run the test:
 
 ```bash