summaryrefslogtreecommitdiff
path: root/docs/features/intel_psr_mba.pandoc
blob: 86df661ba8866b2d0a099682230ffad1b97f4408 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
% Intel Memory Bandwidth Allocation (MBA) Feature
% Revision 1.8

\clearpage

# Basics

---------------- ----------------------------------------------------
         Status: **Tech Preview**

Architecture(s): Intel x86

   Component(s): Hypervisor, toolstack

       Hardware: MBA is supported on Skylake Server and beyond
---------------- ----------------------------------------------------

# Terminology

* CAT         Cache Allocation Technology
* CBM         Capacity BitMasks
* CDP         Code and Data Prioritization
* COS/CLOS    Class of Service
* HW          Hardware
* MBA         Memory Bandwidth Allocation
* MSRs        Machine Specific Registers
* PSR         Intel Platform Shared Resource
* THRTL       Throttle value or delay value

# Overview

The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate
control over memory bandwidth available per-core. This feature provides OS/
hypervisor the ability to slow misbehaving apps/domains by using a credit-based
throttling mechanism.

# User details

* Feature Enabling:

  Add "psr=mba" to boot line parameter to enable MBA feature.

* xl interfaces:

  1. `psr-mba-show [domain-id|domain-name]`:

     Show memory bandwidth throttling for domain. Under different modes, it
     shows different type of data.

     There are two modes:
     Linear mode: the input precision is defined as 100-(MBA_MAX). For instance,
     if the MBA_MAX value is 90, the input precision is 10%. Values not an even
     multiple of the precision (e.g., 12%) will be rounded down (e.g., to 10%
     delay applied) by HW automatically. The response of throttling value is
     linear.

     Non-linear mode: input delay values are powers-of-two from zero to the
     MBA_MAX value from CPUID. In this case any values not a power of two will
     be rounded down the next nearest power of two by HW automatically. The
     response of throttling value is non-linear.

     For linear mode, it shows the decimal value. For non-linear mode, it shows
     hexadecimal value.

  2. `psr-mba-set [OPTIONS] <domain-id|domain-name> <throttling>`:

     Set memory bandwidth throttling for domain.

     Options:
     '-s': Specify the socket to process, otherwise all sockets are processed.

     Throttling value set in register implies the approximate amount of delaying
     the traffic between core and memory. Higher throttling value result in
     lower bandwidth. The max throttling value (MBA_MAX) supported can be
     obtained through CPUID inside hypervisor. Users can fetch the MBA_MAX value
     using the `psr-hwinfo` xl command.

# Technical details

MBA is a member of Intel PSR features, it shares the base PSR infrastructure
in Xen.

## Hardware perspective

  MBA defines a range of MSRs to support specifying a delay value (Thrtl) per
  COS, with details below.

  ```
   +----------------------------+----------------+
   | MSR (per socket)           |    Address     |
   +----------------------------+----------------+
   | IA32_L2_QOS_Ext_BW_Thrtl_0 |     0xD50      |
   +----------------------------+----------------+
   | ...                        |  ...           |
   +----------------------------+----------------+
   | IA32_L2_QOS_Ext_BW_Thrtl_n |     0xD50+n    |
   +----------------------------+----------------+
  ```

  When context switch happens, the COS ID of domain is written to per-hyper-
  thread MSR `IA32_PQR_ASSOC`, and then hardware enforces bandwidth allocation
  according to the throttling value stored in the Thrtl MSR register.

## The relationship between MBA and CAT/CDP

  Generally speaking, MBA is completely independent of CAT/CDP, and any
  combination may be applied at any time, e.g. enabling MBA with CAT
  disabled.

  But it needs to be noticed that MBA shares COS infrastructure with CAT,
  although MBA is enumerated by different CPUID leaf from CAT (which
  indicates that the max COS of MBA may be different from CAT). In some
  cases, a domain is permitted to have a COS that is beyond one (or more)
  of PSR features but within the others. For instance, let's assume the max
  COS of MBA is 8 but the max COS of L3 CAT is 16, when a domain is assigned
  9 as COS, the L3 CAT CBM associated to COS 9 would be enforced, but for MBA,
  the HW works as default value is set since COS 9 is beyond the max COS (8)
  of MBA.

## Design Overview

* Core COS/Thrtl association

  When enforcing Memory Bandwidth Allocation, all cores of domains have
  the same default Thrtl MSR (COS0) which stores the same Thrtl (0). The
  default Thrtl MSR is used only in hypervisor and is transparent to tool stack
  and user.

  System administrators can change PSR allocation policy at runtime by
  using the tool stack. Since MBA shares COS ID with CAT/CDP, a COS ID
  corresponds to a 2-tuple, like [CBM, Thrtl] with only-CAT enabled, when CDP
  is enabled, the COS ID corresponds to a 3-tuple, like [Code_CBM, Data_CBM,
  Thrtl]. If neither CAT nor CDP is enabled, things are easier, since one COS
  ID corresponds to one Thrtl.

* VCPU schedule

  This part reuses CAT COS infrastructure.

* Multi-sockets

  Different sockets may have different MBA capabilities (like max COS)
  although it is consistent on the same socket. So the capability
  of per-socket MBA is specified.

  This part reuses CAT COS infrastructure.

## Implementation Description

* Hypervisor interfaces:

  1. Boot line param: "psr=mba" to enable the feature.

  2. SYSCTL:
          - XEN_SYSCTL_PSR_MBA_get_info: Get system MBA information.

  3. DOMCTL:
          - XEN_DOMCTL_PSR_MBA_OP_GET_THRTL: Get throttling for a domain.
          - XEN_DOMCTL_PSR_MBA_OP_SET_THRTL: Set throttling for a domain.

* xl interfaces:

  1. psr-mba-show [domain-id]
          Show system/domain runtime MBA throttling value. For linear mode,
          it shows the decimal value. For non-linear mode, it shows hexadecimal
          value.
          => XEN_SYSCTL_PSR_MBA_get_info/XEN_DOMCTL_PSR_MBA_OP_GET_THRTL

  2. psr-mba-set [OPTIONS] <domain-id> <throttling>
          Set bandwidth throttling for a domain.
          => XEN_DOMCTL_PSR_MBA_OP_SET_THRTL

  3. psr-hwinfo
          Show PSR HW information, including L3 CAT/CDP/L2 CAT/MBA.
          => XEN_SYSCTL_PSR_MBA_get_info

* Key data structure:

  1. Feature HW info

     ```
     struct {
         unsigned int thrtl_max;
         bool linear;
     } mba;

     - Member `thrtl_max`

       `thrtl_max` is the max throttling value to be set, i.e. MBA_MAX.

     - Member `linear`

       `linear` means the response of delay value is linear or not.

     As mentioned above, MBA is a member of Intel PSR features, it shares the
     base PSR infrastructure in Xen. For example, the 'cos_max' is a common HW
     property for all features. So, for other data structure details, please
     refer to 'intel_psr_cat_cdp.pandoc'.

# Limitations

MBA can only work on HW which supports it (check CPUID).

# Testing

We can execute these commands to verify MBA on different HWs supporting them.

For example:
  1. User can get the MBA hardware info through 'psr-hwinfo' command. From
     result, user can know if this hardware works under linear mode or non-
     linear mode, the max throttling value (MBA_MAX) and so on.

    root@:~$ xl psr-hwinfo --mba
    Memory Bandwidth Allocation (MBA):
    Socket ID       : 0
    Linear Mode     : Enabled
    Maximum COS     : 7
    Maximum Throttling Value: 90
    Default Throttling Value: 0

  2. Then, user can set a throttling value to a domain. For example, set '10',
     i.e 10% delay.

    root@:~$ xl psr-mba-set 1 10

  3. User can check the current configuration of the domain through
     'psr-mab-show'. For linear mode, the decimal value is shown.

    root@:~$ xl psr-mba-show 1
    Socket ID       : 0
    Default THRTL   : 0
       ID                     NAME            THRTL
        1                 ubuntu14             10

# Areas for improvement

N/A

# Known issues

N/A

# References

"INTEL RESOURCE DIRECTOR TECHNOLOGY (INTEL RDT) ALLOCATION FEATURES" [Intel 64 and IA-32 Architectures Software Developer Manuals, vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)

# History

------------------------------------------------------------------------
Date       Revision Version  Notes
---------- -------- -------- -------------------------------------------
2017-01-10 1.0      Xen 4.9  Design document written
2017-07-10 1.1      Xen 4.10 Changes:
                             1. Modify data structure according to latest
                                codes;
                             2. Add content for 'Areas for improvement';
                             3. Other minor changes.
2017-08-09 1.2      Xen 4.10 Changes:
                             1. Remove a special character to avoid error when
                                building pandoc.
2017-08-15 1.3      Xen 4.10 Changes:
                             1. Add terminology 'HW'.
                             2. Change 'COS ID of VCPU' to 'COS ID of domain'.
                             3. Change 'COS register' to 'Thrtl MSR'.
                             4. Explain the value shown for 'psr-mba-show' under
                                different modes.
                             5. Remove content in 'Areas for improvement'.
2017-08-16 1.4      Xen 4.10 Changes:
                             1. Add '<>' for mandatory argument.
2017-08-30 1.5      Xen 4.10 Changes:
                             1. Modify words in 'Overview' to make it easier to
                                understand.
                             2. Explain 'linear/non-linear' modes before mention
                                them.
                             3. Explain throttling value more accurate.
                             4. Explain 'MBA_MAX'.
                             5. Correct some words in 'Design Overview'.
                             6. Change 'mba_info' to 'mba' according to code
                                changes. Also, modify contents of it.
                             7. Add context in 'Testing' part to make things
                                more clear.
                             8. Remove 'n<64' to avoid out-of-sync.
2017-09-21 1.6      Xen 4.10 Changes:
                             1. Add 'domain-name' as parameter of 'psr-mba-show/
                                psr-mba-set'.
                             2. Fix some wordings.
                             3. Explain how user can know the MBA_MAX.
                             4. Move the description of 'Linear mode/Non-linear
                                mode' into section of 'psr-mba-show'.
                             5. Change 'per-thread' to 'per-hyper-thread'.
2017-09-29 1.7      Xen 4.10 Changes:
                             1. Correct some words.
                             2. Change 'xl psr-mba-set 1 0xa' to
                                'xl psr-mba-set 1 10'
2017-10-08 1.8      Xen 4.10 Changes:
                             1. Correct some words.
---------- -------- -------- -------------------------------------------