docs/misc/xl-psr.pandoc


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254

# Intel Platform Shared Resource Monitoring/Control in xl

This document introduces Intel Platform Shared Resource Monitoring/Control
technologies, their basic concepts and the xl interfaces.

## Cache Monitoring Technology (CMT)

Cache Monitoring Technology (CMT) is a new feature available on Intel Haswell
and later server platforms that allows an OS or Hypervisor/VMM to determine
the usage of cache (currently only L3 cache supported) by applications running
on the platform. A Resource Monitoring ID (RMID) is the abstraction of the
application(s) that will be monitored for its cache usage. The CMT hardware
tracks cache utilization of memory accesses according to the RMID and reports
monitored data via a counter register.

For more detailed information please refer to Intel SDM chapter
"Platform Shared Resource Monitoring: Cache Monitoring Technology".

In Xen's implementation, each domain in the system can be assigned a RMID
independently, while RMID=0 is reserved for monitoring domains that don't
have CMT service attached. RMID is opaque for xl/libxl and is only used in
hypervisor.

### xl interfaces

A domain is assigned a RMID implicitly by attaching it to CMT service:

`xl psr-cmt-attach <domid>`

After that, cache usage for the domain can be shown by:

`xl psr-cmt-show cache-occupancy <domid>`

Once monitoring is not needed any more, the domain can be detached from the
CMT service by:

`xl psr-cmt-detach <domid>`

An attach may fail because of no free RMID available. In such case unused
RMID(s) can be freed by detaching corresponding domains from CMT service.

Maximum RMID and supported monitor types in the system can be obtained by:

`xl psr-hwinfo --cmt`

## Memory Bandwidth Monitoring (MBM)

Memory Bandwidth Monitoring(MBM) is a new hardware feature available on Intel
Broadwell and later server platforms which builds on the CMT infrastructure to
allow monitoring of system memory bandwidth. It introduces two new monitoring
event type to monitor system total/local memory bandwidth. The same RMID can
be used to monitor both cache usage and memory bandwidth at the same time.

For more detailed information please refer to Intel SDM chapter
"Overview of Cache Monitoring Technology and Memory Bandwidth Monitoring".

In Xen's implementation, MBM shares the same set of underlying monitoring
service with CMT and can be used to monitor memory bandwidth on a per domain
basis.

The xl interfaces are the same with that of CMT. The difference is the
monitor type is corresponding memory monitoring type (local-mem-bandwidth/
total-mem-bandwidth instead of cache-occupancy). E.g. after a `xl psr-cmt-attach`:

`xl psr-cmt-show local-mem-bandwidth <domid>`

`xl psr-cmt-show total-mem-bandwidth <domid>`

## Cache Allocation Technology (CAT)

Cache Allocation Technology (CAT) is a new feature available on Intel
Broadwell and later server platforms that allows an OS or Hypervisor/VMM to
partition cache allocation (i.e. L3/L2 cache) based on application priority or
Class of Service (COS). Each COS is configured using capacity bitmasks (CBM)
which represent cache capacity and indicate the degree of overlap and
isolation between classes. System cache resource is divided into numbers of
minimum portions which is then made up into subset for cache partition. Each
portion corresponds to a bit in CBM and the set bit represents the
corresponding cache portion is available.

For example, assuming a system with 8 portions and 3 domains:

 * A CBM of 0xff for every domain means each domain can access the whole cache.
   This is the default.

 * Giving one domain a CBM of 0x0f and the other two domain's 0xf0 means that
   the first domain gets exclusive access to half of the cache (half of the
   portions) and the other two will share the other half.

 * Giving one domain a CBM of 0x0f, one 0x30 and the last 0xc0 would give the
   first domain exclusive access to half the cache, and the other two exclusive
   access to one quarter each.

For more detailed information please refer to Intel SDM chapter
"Platform Shared Resource Control: Cache Allocation Technology".

In Xen's implementation, CBM can be configured with libxl/xl interfaces but
COS is maintained in hypervisor only. The cache partition granularity is per
domain, each domain has COS=0 assigned by default, the corresponding CBM is
all-ones, which means all the cache resource can be used by default.

### xl interfaces

System CAT information such as maximum COS and CBM length can be obtained by:

`xl psr-hwinfo --cat`

The simplest way to change a domain's CBM from its default is running:

`xl psr-cat-set  [OPTIONS] <domid> <cbm>`

where cbm is a number to represent the corresponding cache subset can be used.
A cbm is valid only when:

 * Set bits only exist in the range of [0, cbm_len), where cbm_len can be
   obtained with `xl psr-hwinfo --cat`.
 * All the set bits are contiguous.

In a multi-socket system, the same cbm will be set on each socket by default.
Per socket cbm can be specified with the `--socket SOCKET` option.

In different systems, the different cache level is supported, e.g. L3 cache or
L2 cache. Per cache level cbm can be specified with the `--level LEVEL` option.

Setting the CBM may not be successful if insufficient COS is available. In
such case unused COS(es) may be freed by setting CBM of all related domains to
its default value(all-ones).

Per domain CBM settings can be shown by:

`xl psr-cat-show [OPTIONS] <domid>`

In different systems, the different cache level is supported, e.g. L3 cache or
L2 cache. Per cache level cbm can be specified with the `--level LEVEL` option.

## Code and Data Prioritization (CDP)

Code and Data Prioritization (CDP) Technology is an extension of CAT, which
is available on Intel Broadwell and later server platforms. CDP enables
isolation and separate prioritization of code and data fetches to the L3
cache in a software configurable manner, which can enable workload
prioritization and tuning of cache capacity to the characteristics of the
workload. CDP extends Cache Allocation Technology (CAT) by providing
separate code and data masks per Class of Service (COS).

CDP can be enabled by adding `psr=cdp` to Xen command line.

When CDP is enabled,

 * the CAT masks are re-mapped into interleaved pairs of masks for data or
   code fetches.

 * the range of COS for CAT is re-indexed, with the lower-half of the COS
   range available for CDP.

CDP allows the OS or Hypervisor to partition cache allocation in a more
fine-grained manner. Code cache and data cache can be specified independently.
With CDP enabled, one COS corresponds to two CBMs (code CBM & data CBM),
since the sum of CBMs is fixed, that means the number of available COSes
will reduce by half when CDP is on.

For more detailed information please refer to Intel SDM chapter
"Platform Shared Resource Control: Cache Allocation Technology".

The xl interfaces are the same with that of CAT. The difference is that
CBM type can be passed as option to set code CBM or data CBM.

When CDP is enabled, `-c` or `--code` option is available to set code CBM
for the domain.

When CDP is enabled, `-d` or `--data` option is available to set data CBM
for the domain.

If neither `-c` nor `-d` option is specified when CDP is on, the same code
CBM and data CBM will be set for the domain. Passing both `-c` and `-d`
options is invalid.

Example:

Setting code CBM for a domain:
`xl psr-cat-set -c <domid> <cbm>`

Setting data CBM for a domain:
`xl psr-cat-set -d <domid> <cbm>`

Setting the same code and data CBM for a domain:
`xl psr-cat-set <domid> <cbm>`

## Memory Bandwidth Allocation (MBA)

Memory Bandwidth Allocation (MBA) is a new feature available on Intel
Skylake and later server platforms that allows an OS or Hypervisor/VMM to
slow misbehaving apps/VMs by using a credit-based throttling mechanism. To
enforce bandwidth on a specific domain, just set throttling value (THRTL)
into Class of Service (COS). MBA provides two THRTL mode. One is linear mode
and the other is non-linear mode.

In the linear mode the input precision is defined as 100-(THRTL_MAX). Values
not an even multiple of the precision (e.g., 12%) will be rounded down (e.g.,
to 10% delay by the hardware).

If linear values are not supported then input delay values are powers-of-two
from zero to the THRTL_MAX value from CPUID. In this case any values not a power
of two will be rounded down the next nearest power of two.

For example, assuming a system with 2 domains:

 * A THRTL of 0x0 for every domain means each domain can access the whole cache
   without any delay. This is the default.

 * Linear mode: Giving one domain a THRTL of 0xC and the other domain's 0 means
   that the first domain gets 10% delay to access the cache and the other one
   without any delay.

 * Non-linear mode: Giving one domain a THRTL of 0xC and the other domain's 0
   means that the first domain gets 8% delay to access the cache and the other
   one without any delay.

For more detailed information please refer to Intel SDM chapter
"Introduction to Memory Bandwidth Allocation".

In Xen's implementation, THRTL can be configured with libxl/xl interfaces but
COS is maintained in hypervisor only. The cache partition granularity is per
domain, each domain has COS=0 assigned by default, the corresponding THRTL is
0, which means all the cache resource can be accessed without delay.

### xl interfaces

System MBA information such as maximum COS and maximum THRTL can be obtained by:

`xl psr-hwinfo --mba`

The simplest way to change a domain's THRTL from its default is running:

`xl psr-mba-set  [OPTIONS] <domid> <thrtl>`

In a multi-socket system, the same thrtl will be set on each socket by default.
Per socket thrtl can be specified with the `--socket SOCKET` option.

Setting the THRTL may not be successful if insufficient COS is available. In
such case unused COS(es) may be freed by setting THRTL of all related domains to
its default value(0).

Per domain THRTL settings can be shown by:

`xl psr-mba-show [OPTIONS] <domid>`

For linear mode, it shows the decimal value. For non-linear mode, it shows
hexadecimal value.

## Reference

[1] Intel SDM
(http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html).