summaryrefslogtreecommitdiff
path: root/chromium/docs/gpu/vaapi.md
blob: 10a6af1708e83ae2b48ac115475cd0d493386673 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
# VaAPI

This page documents tracing and debugging the Video Acceleration API (VaAPI or
VA-API) on ChromeOS. The VA-API is an open-source library and API specification,
providing access to graphics hardware acceleration capabilities for video and
image processing. The VaAPI is used on ChromeOS on both Intel and AMD platforms.

[TOC]

## Overview

VaAPI code is developed upstream on the [VaAPI GitHub repository], from which
ChromeOS is a downstream client via the [libva] package, with packaged backends
for e.g. both [Intel] and [AMD].

[VaAPI GitHub repository]: https://github.com/intel/libva
[libva]: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/main/x11-libs/libva/
[Intel]: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/main/x11-libs/libva-intel-driver/
[AMD]: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/main/media-libs/libva-amdgpu-driver/

## Tracing VaAPI video decoding

A simplified diagram of the buffer circulation is provided below. The "client"
is always a Renderer process via a Mojo/IPC communication. Essentially the VaAPI
Video Decode Accelerator ([VaVDA]) receives encoded BitstreamBuffers from the
"client", and sends them to the "va internals", which eventually produces
decoded video in PictureBuffers. The VaVDA may or may not use the `Vpp` unit for
pixel format adaptation, depending on the codec used, silicon generation and
other specifics.

```
      K BitstreamBuffers   +-----+    +-------------------+
 C   --------------------->| Va  | ----->                 |
 L   <---------------------| VDA | <----     va internals |
 I      (encoded stuff)    |     |    |                   |
 E                         |     |    | +-----+       +----+
 N   <---------------------|     | <----|     |<------| lib|
 T   --------------------->|     | ---->| Vpp |------>| va |
                 N         +-----+    +-+-----+   M   +----+
           PictureBuffers                      VASurfaces
           (decoded stuff)
```
*** aside
PictureBuffers are created by the "client" but allocated and filled in by the
VaVDA. `K` is unrelated to both `M` and `N`.
***

[VaVDA]: https://cs.chromium.org/chromium/src/media/gpu/vaapi/vaapi_video_decode_accelerator.h?type=cs&q=vaapivideodecodeaccelerator&sq=package:chromium&g=0&l=57

### Tracing memory consumption

Tracing memory consumption is done via the [MemoryInfra] system. Please take a
minute and read that document (in particular the [difference between
`effective_size` and `size`]).  The VaAPI lives inside the GPU process (a.k.a.
Viz process), so please familiarize yourself with the [GPU Memory Tracing]
document. The VaVDA provides information by implementing the [Memory Dump
Provider] interface, but the information provided varies with the executing mode
as explained next.

#### Internal VASurfaces accountancy

The usage of the `Vpp` unit is controlled by the member variable
[`|decode_using_client_picture_buffers_|`] and is very advantageous in terms of
CPU, power and memory consumption (see [crbug.com/822346]).

* When [`|decode_using_client_picture_buffers_|`] is false, `libva` uses a set
  of internally allocated VASurfaces that are accounted for in the
  `gpu/vaapi/decoder` tracing category (see screenshot below). Each of these
  VASurfaces is backed by a Buffer Object large enough to hold, at least, the
  decoded image in YUV semiplanar format. In the diagram above, `M` varies: 4
  for VP8, 9 for VP9, 4-12 for H264/AVC1 (see [`GetNumReferenceFrames()`]).

![](https://i.imgur.com/UWAuAli.png)

* When [`|decode_using_client_picture_buffers_|`] is true, `libva` can decode
  directly on the client's PictureBuffers, `M = 0`, and the `gpu/vaapi/decoder`
  category is not present in the GPU MemoryInfra.

[MemoryInfra]: https://chromium.googlesource.com/chromium/src/+/HEAD/docs/memory-infra/README.md#memoryinfra
[difference between `effective_size` and `size`]: https://chromium.googlesource.com/chromium/src/+/HEAD/docs/memory-infra#effective_size-vs_size
[GPU Memory Tracing]: ../memory-infra/probe-gpu.md
[Memory Dump Provider]: https://chromium.googlesource.com/chromium/src/+/HEAD/docs/memory-infra/adding_memory_infra_tracing.md
[`|decode_using_client_picture_buffers_|`]: https://cs.chromium.org/search/?q=decode_using_client_picture_buffers_&sq=package:chromium&type=cs
[crbug.com/822346]: https://crbug.com/822346
[`GetNumReferenceFrames()`]: https://cs.chromium.org/search/?q=GetNumReferenceFrames+file:%5Esrc/media/gpu/+package:%5Echromium$+file:%5C.cc&type=cs

#### PictureBuffers accountancy

VaVDA allocates storage for the N PictureBuffers provided by the client by means
of VaapiPicture{NativePixmapOzone}s, backed by NativePixmaps, themselves backed
by DmaBufs (the client only knows about the client Texture IDs). The GPU's
TextureManager accounts for these textures, but:
- They are not correctly identified as being backed by NativePixmaps (see
  [crbug.com/514914]).
- They are not correctly linked back to the Renderer or ARC++ client on behalf
  of whom the allocation took place, like e.g. [the probe-gpu example] (see
  [crbug.com/721674]).

See e.g. the following ToT example for 10 1920x1080p textures (32bpp); finding
the desired `context_group` can be tricky.

![](https://i.imgur.com/3tJThzL.png)

[crbug.com/514914]: https://crbug.com/514914
[the probe-gpu example]: https://chromium.googlesource.com/chromium/src/+/HEAD/docs/memory-infra/probe-gpu.md#example
[crbug.com/721674]: https://crbug.com/721674

### Tracing power consumption

Power consumption is available on ChromeOS test/dev images via the command line
binary [`dump_intel_rapl_consumption`]; this tool averages the power
consumption of the four SoC domains over a configurable period of time, usually
a few seconds. These domains are, in the order presented by the tool:

* `pkg`: estimated power consumption of the whole SoC; in particular, this is a
  superset of pp0 and pp1, including all accessory silicon, e.g. video
  processing.
* `pp0`: CPU set.
* `pp1`/`gfx`: Integrated GPU or GPUs.
* `dram`: estimated power consumption of the DRAM, from the bus activity.

Googlers can read more about this topic under
[go/power-consumption-meas-in-intel].

`dump_intel_rapl_consumption` is usually run while a given workload is active
(e.g. a video playback) with an interval larger than a second to smooth out all
kinds of system services that would show up in smaller periods, e.g. WiFi.

```shell
dump_intel_rapl_consumption --interval_ms=2000 --repeat --verbose
```

E.g. on a nocturne main1, the average power consumption while playing back the
first minute of a 1080p VP9 [video], the average consumptions in watts are:

|`pkg` |`pp0` |`pp1`/`gfx` |`dram`|
| ---: | ---: | ---:       | ---: |
| 2.63 | 1.44 | 0.29       | 0.87 |

As can be seen, `pkg` ~= `pp0` + `pp1` + 1W, this extra watt is the cost of all
the associated silicon, e.g. bridges, bus controllers, caches, and the media
processing engine.

[`dump_intel_rapl_consumption`]: https://chromium.googlesource.com/chromiumos/platform2/+/main/power_manager/tools/dump_intel_rapl_consumption.cc
[video]: https://commons.wikimedia.org/wiki/File:Big_Buck_Bunny_4K.webm
[go/power-consumption-meas-in-intel]: http://go/power-consumption-meas-in-intel

### Tracing CPU cycles and instantaneous buffer usage

TODO(mcasas): fill in this section.

## Verifying VaAPI installation and usage

### <a name="verify-driver"></a> Verify the VaAPI is correctly installed and can be loaded

`vainfo` is a small command line utility used to enumerate the supported
operation modes; it's developed in the [libva-utils] repository, but more
concretely available on ChromeOS dev images ([media-video/libva-utils] package)
and under Debian systems ([vainfo]). `vainfo` will try to load the appropriate
backend driver for the system and/or GPUs and fail if it cannot find/load it.

[libva-utils]: https://github.com/intel/libva-utils
[media-video/libva-utils]: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/main/media-video/libva-utils
[vainfo]: https://packages.debian.org/sid/main/vainfo

### <a name="verify-vaapi"></a> Verify the VaAPI supports and/or uses a given codec

A few steps are customary to verify the support and use of a given codec.

To verify that the build and platform supports video acceleration, launch
Chromium and navigate to `chrome://gpu`, then:
* Search for the "Video Acceleration Information" Section: this should
   enumerate the available accelerated codecs and resolutions.
* If this section is empty, oftentimes the "Log Messages" Section immediately
  below might indicate an associated error, e.g.:

    > vaInitialize failed: unknown libva error

  that can usually be reproduced with `vainfo`, see the [previous
  section](#verify-driver).

To verify that a given video is being played back using the accelerated video
decoding backend:
* Navigate to a url that causes a video to be played. Leave it playing.
* Navigate to the `chrome://media-internals` tab.
 * Find the entry associated to the video-playing tab.
 * Scroll down to "`Player Properties`" and check the "`video_decoder`" entry:
   it should say "GpuVideoDecoder".

### VaAPI on Linux

This configuration is **unsupported** (see [docs/linux/hw_video_decode.md]), the
following instructions are provided only as a reference for developers to test
the code paths on a Linux machine.

* Follow the instructions under the [Linux build setup] document, adding the GN
  argument `use_vaapi=true` in the args.gn file (please refer to the [Setting up
  the build]) Section).
* To support proprietary codecs such as, e.g. H264/AVC1, add the options
  `proprietary_codecs = true` and `ffmpeg_branding = "Chrome"` to the GN args.
* Build Chromium as usual.

At this point you should make sure the appropriate VA driver backend is working
correctly; try running `vainfo` from the command line and verify no errors show
up.

To run Chromium using VaAPI three arguments are necessary:
* `--enable-features=VaapiVideoDecoder`
* `--ignore-gpu-blocklist`
* `--use-gl=desktop` or `--use-gl=egl`

```shell
./out/gn/chrome --ignore-gpu-blocklist --use-gl=egl
```

Note that you can set the environment variable `MESA_GLSL_CACHE_DISABLE=false`
if you want the gpu process to run in sandboxed mode, see
[crbug.com/264818](https://crbug.com/264818). To check if the running gpu
process is sandboxed or not, just open `chrome://gpu` and search for
`Sandboxed` in the driver information table. In addition, passing
`--gpu-sandbox-failures-fatal=yes` will prevent the gpu process to run in
non-sandboxed mode.

Refer to the [previous section](#verify-vaapi) to verify support and use of
the VaAPI.

[docs/linux/hw_video_decode.md]: https://chromium.googlesource.com/chromium/src/+/HEAD/docs/linux/hw_video_decode.md
[Linux build setup]: https://chromium.googlesource.com/chromium/src/+/HEAD/docs/linux/build_instructions.md
[Setting up the build]: https://chromium.googlesource.com/chromium/src/+/HEAD/docs/linux/build_instructions.md#setting-up-the-build