summaryrefslogtreecommitdiff
path: root/Documentation/userspace-api
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2023-02-21 18:24:12 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2023-02-21 18:24:12 -0800
commit5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch)
treecc5c2d0a898769fd59549594fedb3ee6f84e59a0 /Documentation/userspace-api
parent36289a03bcd3aabdf66de75cb6d1b4ee15726438 (diff)
parentd1fabc68f8e0541d41657096dc713cb01775652d (diff)
downloadlinux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz
Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
Diffstat (limited to 'Documentation/userspace-api')
-rw-r--r--Documentation/userspace-api/netlink/c-code-gen.rst107
-rw-r--r--Documentation/userspace-api/netlink/genetlink-legacy.rst178
-rw-r--r--Documentation/userspace-api/netlink/index.rst6
-rw-r--r--Documentation/userspace-api/netlink/intro-specs.rst80
-rw-r--r--Documentation/userspace-api/netlink/specs.rst425
5 files changed, 796 insertions, 0 deletions
diff --git a/Documentation/userspace-api/netlink/c-code-gen.rst b/Documentation/userspace-api/netlink/c-code-gen.rst
new file mode 100644
index 000000000000..89de42c13350
--- /dev/null
+++ b/Documentation/userspace-api/netlink/c-code-gen.rst
@@ -0,0 +1,107 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+
+==============================
+Netlink spec C code generation
+==============================
+
+This document describes how Netlink specifications are used to render
+C code (uAPI, policies etc.). It also defines the additional properties
+allowed in older families by the ``genetlink-c`` protocol level,
+to control the naming.
+
+For brevity this document refers to ``name`` properties of various
+objects by the object type. For example ``$attr`` is the value
+of ``name`` in an attribute, and ``$family`` is the name of the
+family (the global ``name`` property).
+
+The upper case is used to denote literal values, e.g. ``$family-CMD``
+means the concatenation of ``$family``, a dash character, and the literal
+``CMD``.
+
+The names of ``#defines`` and enum values are always converted to upper case,
+and with dashes (``-``) replaced by underscores (``_``).
+
+If the constructed name is a C keyword, an extra underscore is
+appended (``do`` -> ``do_``).
+
+Globals
+=======
+
+``c-family-name`` controls the name of the ``#define`` for the family
+name, default is ``$family-FAMILY-NAME``.
+
+``c-version-name`` controls the name of the ``#define`` for the version
+of the family, default is ``$family-FAMILY-VERSION``.
+
+``max-by-define`` selects if max values for enums are defined as a
+``#define`` rather than inside the enum.
+
+Definitions
+===========
+
+Constants
+---------
+
+Every constant is rendered as a ``#define``.
+The name of the constant is ``$family-$constant`` and the value
+is rendered as a string or integer according to its type in the spec.
+
+Enums and flags
+---------------
+
+Enums are named ``$family-$enum``. The full name can be set directly
+or suppressed by specifying the ``enum-name`` property.
+Default entry name is ``$family-$enum-$entry``.
+If ``name-prefix`` is specified it replaces the ``$family-$enum``
+portion of the entry name.
+
+Boolean ``render-max`` controls creation of the max values
+(which are enabled by default for attribute enums).
+
+Attributes
+==========
+
+Each attribute set (excluding fractional sets) is rendered as an enum.
+
+Attribute enums are traditionally unnamed in netlink headers.
+If naming is desired ``enum-name`` can be used to specify the name.
+
+The default attribute name prefix is ``$family-A`` if the name of the set
+is the same as the name of the family and ``$family-A-$set`` if the names
+differ. The prefix can be overridden by the ``name-prefix`` property of a set.
+The rest of the section will refer to the prefix as ``$pfx``.
+
+Attributes are named ``$pfx-$attribute``.
+
+Attribute enums end with two special values ``__$pfx-MAX`` and ``$pfx-MAX``
+which are used for sizing attribute tables.
+These two names can be specified directly with the ``attr-cnt-name``
+and ``attr-max-name`` properties respectively.
+
+If ``max-by-define`` is set to ``true`` at the global level ``attr-max-name``
+will be specified as a ``#define`` rather than an enum value.
+
+Operations
+==========
+
+Operations are named ``$family-CMD-$operation``.
+If ``name-prefix`` is specified it replaces the ``$family-CMD``
+portion of the name.
+
+Similarly to attribute enums operation enums end with special count and max
+attributes. For operations those attributes can be renamed with
+``cmd-cnt-name`` and ``cmd-max-name``. Max will be a define if ``max-by-define``
+is ``true``.
+
+Multicast groups
+================
+
+Each multicast group gets a define rendered into the kernel uAPI header.
+The name of the define is ``$family-MCGRP-$group``, and can be overwritten
+with the ``c-define-name`` property.
+
+Code generation
+===============
+
+uAPI header is assumed to come from ``<linux/$family.h>`` in the default header
+search path. It can be changed using the ``uapi-header`` global property.
diff --git a/Documentation/userspace-api/netlink/genetlink-legacy.rst b/Documentation/userspace-api/netlink/genetlink-legacy.rst
new file mode 100644
index 000000000000..3bf0bcdf21d8
--- /dev/null
+++ b/Documentation/userspace-api/netlink/genetlink-legacy.rst
@@ -0,0 +1,178 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+
+=================================================================
+Netlink specification support for legacy Generic Netlink families
+=================================================================
+
+This document describes the many additional quirks and properties
+required to describe older Generic Netlink families which form
+the ``genetlink-legacy`` protocol level.
+
+The spec is a work in progress, some of the quirks are just documented
+for future reference.
+
+Specification (defined)
+=======================
+
+Attribute type nests
+--------------------
+
+New Netlink families should use ``multi-attr`` to define arrays.
+Older families (e.g. ``genetlink`` control family) attempted to
+define array types reusing attribute type to carry information.
+
+For reference the ``multi-attr`` array may look like this::
+
+ [ARRAY-ATTR]
+ [INDEX (optionally)]
+ [MEMBER1]
+ [MEMBER2]
+ [SOME-OTHER-ATTR]
+ [ARRAY-ATTR]
+ [INDEX (optionally)]
+ [MEMBER1]
+ [MEMBER2]
+
+where ``ARRAY-ATTR`` is the array entry type.
+
+array-nest
+~~~~~~~~~~
+
+``array-nest`` creates the following structure::
+
+ [SOME-OTHER-ATTR]
+ [ARRAY-ATTR]
+ [ENTRY]
+ [MEMBER1]
+ [MEMBER2]
+ [ENTRY]
+ [MEMBER1]
+ [MEMBER2]
+
+It wraps the entire array in an extra attribute (hence limiting its size
+to 64kB). The ``ENTRY`` nests are special and have the index of the entry
+as their type instead of normal attribute type.
+
+type-value
+~~~~~~~~~~
+
+``type-value`` is a construct which uses attribute types to carry
+information about a single object (often used when array is dumped
+entry-by-entry).
+
+``type-value`` can have multiple levels of nesting, for example
+genetlink's policy dumps create the following structures::
+
+ [POLICY-IDX]
+ [ATTR-IDX]
+ [POLICY-INFO-ATTR1]
+ [POLICY-INFO-ATTR2]
+
+Where the first level of nest has the policy index as it's attribute
+type, it contains a single nest which has the attribute index as its
+type. Inside the attr-index nest are the policy attributes. Modern
+Netlink families should have instead defined this as a flat structure,
+the nesting serves no good purpose here.
+
+Operations
+==========
+
+Enum (message ID) model
+-----------------------
+
+unified
+~~~~~~~
+
+Modern families use the ``unified`` message ID model, which uses
+a single enumeration for all messages within family. Requests and
+responses share the same message ID. Notifications have separate
+IDs from the same space. For example given the following list
+of operations:
+
+.. code-block:: yaml
+
+ -
+ name: a
+ value: 1
+ do: ...
+ -
+ name: b
+ do: ...
+ -
+ name: c
+ value: 4
+ notify: a
+ -
+ name: d
+ do: ...
+
+Requests and responses for operation ``a`` will have the ID of 1,
+the requests and responses of ``b`` - 2 (since there is no explicit
+``value`` it's previous operation ``+ 1``). Notification ``c`` will
+use the ID of 4, operation ``d`` 5 etc.
+
+directional
+~~~~~~~~~~~
+
+The ``directional`` model splits the ID assignment by the direction of
+the message. Messages from and to the kernel can't be confused with
+each other so this conserves the ID space (at the cost of making
+the programming more cumbersome).
+
+In this case ``value`` attribute should be specified in the ``request``
+``reply`` sections of the operations (if an operation has both ``do``
+and ``dump`` the IDs are shared, ``value`` should be set in ``do``).
+For notifications the ``value`` is provided at the op level but it
+only allocates a ``reply`` (i.e. a "from-kernel" ID). Let's look
+at an example:
+
+.. code-block:: yaml
+
+ -
+ name: a
+ do:
+ request:
+ value: 2
+ attributes: ...
+ reply:
+ value: 1
+ attributes: ...
+ -
+ name: b
+ notify: a
+ -
+ name: c
+ notify: a
+ value: 7
+ -
+ name: d
+ do: ...
+
+In this case ``a`` will use 2 when sending the message to the kernel
+and expects message with ID 1 in response. Notification ``b`` allocates
+a "from-kernel" ID which is 2. ``c`` allocates "from-kernel" ID of 7.
+If operation ``d`` does not set ``values`` explicitly in the spec
+it will be allocated 3 for the request (``a`` is the previous operation
+with a request section and the value of 2) and 8 for response (``c`` is
+the previous operation in the "from-kernel" direction).
+
+Other quirks (todo)
+===================
+
+Structures
+----------
+
+Legacy families can define C structures both to be used as the contents
+of an attribute and as a fixed message header. The plan is to define
+the structs in ``definitions`` and link the appropriate attrs.
+
+Multi-message DO
+----------------
+
+New Netlink families should never respond to a DO operation with multiple
+replies, with ``NLM_F_MULTI`` set. Use a filtered dump instead.
+
+At the spec level we can define a ``dumps`` property for the ``do``,
+perhaps with values of ``combine`` and ``multi-object`` depending
+on how the parsing should be implemented (parse into a single reply
+vs list of objects i.e. pretty much a dump).
diff --git a/Documentation/userspace-api/netlink/index.rst b/Documentation/userspace-api/netlink/index.rst
index b0c21538d97d..26f3720cb3be 100644
--- a/Documentation/userspace-api/netlink/index.rst
+++ b/Documentation/userspace-api/netlink/index.rst
@@ -10,3 +10,9 @@ Netlink documentation for users.
:maxdepth: 2
intro
+ intro-specs
+ specs
+ c-code-gen
+ genetlink-legacy
+
+See also :ref:`Documentation/core-api/netlink.rst <kernel_netlink>`.
diff --git a/Documentation/userspace-api/netlink/intro-specs.rst b/Documentation/userspace-api/netlink/intro-specs.rst
new file mode 100644
index 000000000000..a3b847eafff7
--- /dev/null
+++ b/Documentation/userspace-api/netlink/intro-specs.rst
@@ -0,0 +1,80 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+
+=====================================
+Using Netlink protocol specifications
+=====================================
+
+This document is a quick starting guide for using Netlink protocol
+specifications. For more detailed description of the specs see :doc:`specs`.
+
+Simple CLI
+==========
+
+Kernel comes with a simple CLI tool which should be useful when
+developing Netlink related code. The tool is implemented in Python
+and can use a YAML specification to issue Netlink requests
+to the kernel. Only Generic Netlink is supported.
+
+The tool is located at ``tools/net/ynl/cli.py``. It accepts
+a handul of arguments, the most important ones are:
+
+ - ``--spec`` - point to the spec file
+ - ``--do $name`` / ``--dump $name`` - issue request ``$name``
+ - ``--json $attrs`` - provide attributes for the request
+ - ``--subscribe $group`` - receive notifications from ``$group``
+
+YAML specs can be found under ``Documentation/netlink/specs/``.
+
+Example use::
+
+ $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/ethtool.yaml \
+ --do rings-get \
+ --json '{"header":{"dev-index": 18}}'
+ {'header': {'dev-index': 18, 'dev-name': 'eni1np1'},
+ 'rx': 0,
+ 'rx-jumbo': 0,
+ 'rx-jumbo-max': 4096,
+ 'rx-max': 4096,
+ 'rx-mini': 0,
+ 'rx-mini-max': 4096,
+ 'tx': 0,
+ 'tx-max': 4096,
+ 'tx-push': 0}
+
+The input arguments are parsed as JSON, while the output is only
+Python-pretty-printed. This is because some Netlink types can't
+be expressed as JSON directly. If such attributes are needed in
+the input some hacking of the script will be necessary.
+
+The spec and Netlink internals are factored out as a standalone
+library - it should be easy to write Python tools / tests reusing
+code from ``cli.py``.
+
+Generating kernel code
+======================
+
+``tools/net/ynl/ynl-regen.sh`` scans the kernel tree in search of
+auto-generated files which need to be updated. Using this tool is the easiest
+way to generate / update auto-generated code.
+
+By default code is re-generated only if spec is newer than the source,
+to force regeneration use ``-f``.
+
+``ynl-regen.sh`` searches for ``YNL-GEN`` in the contents of files
+(note that it only scans files in the git index, that is only files
+tracked by git!) For instance the ``fou_nl.c`` kernel source contains::
+
+ /* Documentation/netlink/specs/fou.yaml */
+ /* YNL-GEN kernel source */
+
+``ynl-regen.sh`` will find this marker and replace the file with
+kernel source based on fou.yaml.
+
+The simplest way to generate a new file based on a spec is to add
+the two marker lines like above to a file, add that file to git,
+and run the regeneration tool. Grep the tree for ``YNL-GEN``
+to see other examples.
+
+The code generation itself is performed by ``tools/net/ynl/ynl-gen-c.py``
+but it takes a few arguments so calling it directly for each file
+quickly becomes tedious.
diff --git a/Documentation/userspace-api/netlink/specs.rst b/Documentation/userspace-api/netlink/specs.rst
new file mode 100644
index 000000000000..6ffe8137cd90
--- /dev/null
+++ b/Documentation/userspace-api/netlink/specs.rst
@@ -0,0 +1,425 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+
+=========================================
+Netlink protocol specifications (in YAML)
+=========================================
+
+Netlink protocol specifications are complete, machine readable descriptions of
+Netlink protocols written in YAML. The goal of the specifications is to allow
+separating Netlink parsing from user space logic and minimize the amount of
+hand written Netlink code for each new family, command, attribute.
+Netlink specs should be complete and not depend on any other spec
+or C header file, making it easy to use in languages which can't include
+kernel headers directly.
+
+Internally kernel uses the YAML specs to generate:
+
+ - the C uAPI header
+ - documentation of the protocol as a ReST file
+ - policy tables for input attribute validation
+ - operation tables
+
+YAML specifications can be found under ``Documentation/netlink/specs/``
+
+This document describes details of the schema.
+See :doc:`intro-specs` for a practical starting guide.
+
+Compatibility levels
+====================
+
+There are four schema levels for Netlink specs, from the simplest used
+by new families to the most complex covering all the quirks of the old ones.
+Each next level inherits the attributes of the previous level, meaning that
+user capable of parsing more complex ``genetlink`` schemas is also compatible
+with simpler ones. The levels are:
+
+ - ``genetlink`` - most streamlined, should be used by all new families
+ - ``genetlink-c`` - superset of ``genetlink`` with extra attributes allowing
+ customization of define and enum type and value names; this schema should
+ be equivalent to ``genetlink`` for all implementations which don't interact
+ directly with C uAPI headers
+ - ``genetlink-legacy`` - Generic Netlink catch all schema supporting quirks of
+ all old genetlink families, strange attribute formats, binary structures etc.
+ - ``netlink-raw`` - catch all schema supporting pre-Generic Netlink protocols
+ such as ``NETLINK_ROUTE``
+
+The definition of the schemas (in ``jsonschema``) can be found
+under ``Documentation/netlink/``.
+
+Schema structure
+================
+
+YAML schema has the following conceptual sections:
+
+ - globals
+ - definitions
+ - attributes
+ - operations
+ - multicast groups
+
+Most properties in the schema accept (or in fact require) a ``doc``
+sub-property documenting the defined object.
+
+The following sections describe the properties of the most modern ``genetlink``
+schema. See the documentation of :doc:`genetlink-c <c-code-gen>`
+for information on how C names are derived from name properties.
+
+genetlink
+=========
+
+Globals
+-------
+
+Attributes listed directly at the root level of the spec file.
+
+name
+~~~~
+
+Name of the family. Name identifies the family in a unique way, since
+the Family IDs are allocated dynamically.
+
+version
+~~~~~~~
+
+Generic Netlink family version, default is 1.
+
+protocol
+~~~~~~~~
+
+The schema level, default is ``genetlink``, which is the only value
+allowed for new ``genetlink`` families.
+
+definitions
+-----------
+
+Array of type and constant definitions.
+
+name
+~~~~
+
+Name of the type / constant.
+
+type
+~~~~
+
+One of the following types:
+
+ - const - a single, standalone constant
+ - enum - defines an integer enumeration, with values for each entry
+ incrementing by 1, (e.g. 0, 1, 2, 3)
+ - flags - defines an integer enumeration, with values for each entry
+ occupying a bit, starting from bit 0, (e.g. 1, 2, 4, 8)
+
+value
+~~~~~
+
+The value for the ``const``.
+
+value-start
+~~~~~~~~~~~
+
+The first value for ``enum`` and ``flags``, allows overriding the default
+start value of ``0`` (for ``enum``) and starting bit (for ``flags``).
+For ``flags`` ``value-start`` selects the starting bit, not the shifted value.
+
+Sparse enumerations are not supported.
+
+entries
+~~~~~~~
+
+Array of names of the entries for ``enum`` and ``flags``.
+
+header
+~~~~~~
+
+For C-compatible languages, header which already defines this value.
+In case the definition is shared by multiple families (e.g. ``IFNAMSIZ``)
+code generators for C-compatible languages may prefer to add an appropriate
+include instead of rendering a new definition.
+
+attribute-sets
+--------------
+
+This property contains information about netlink attributes of the family.
+All families have at least one attribute set, most have multiple.
+``attribute-sets`` is an array, with each entry describing a single set.
+
+Note that the spec is "flattened" and is not meant to visually resemble
+the format of the netlink messages (unlike certain ad-hoc documentation
+formats seen in kernel comments). In the spec subordinate attribute sets
+are not defined inline as a nest, but defined in a separate attribute set
+referred to with a ``nested-attributes`` property of the container.
+
+Spec may also contain fractional sets - sets which contain a ``subset-of``
+property. Such sets describe a section of a full set, allowing narrowing down
+which attributes are allowed in a nest or refining the validation criteria.
+Fractional sets can only be used in nests. They are not rendered to the uAPI
+in any fashion.
+
+name
+~~~~
+
+Uniquely identifies the attribute set, operations and nested attributes
+refer to the sets by the ``name``.
+
+subset-of
+~~~~~~~~~
+
+Re-defines a portion of another set (a fractional set).
+Allows narrowing down fields and changing validation criteria
+or even types of attributes depending on the nest in which they
+are contained. The ``value`` of each attribute in the fractional
+set is implicitly the same as in the main set.
+
+attributes
+~~~~~~~~~~
+
+List of attributes in the set.
+
+Attribute properties
+--------------------
+
+name
+~~~~
+
+Identifies the attribute, unique within the set.
+
+type
+~~~~
+
+Netlink attribute type, see :ref:`attr_types`.
+
+.. _assign_val:
+
+value
+~~~~~
+
+Numerical attribute ID, used in serialized Netlink messages.
+The ``value`` property can be skipped, in which case the attribute ID
+will be the value of the previous attribute plus one (recursively)
+and ``0`` for the first attribute in the attribute set.
+
+Note that the ``value`` of an attribute is defined only in its main set.
+
+enum
+~~~~
+
+For integer types specifies that values in the attribute belong
+to an ``enum`` or ``flags`` from the ``definitions`` section.
+
+enum-as-flags
+~~~~~~~~~~~~~
+
+Treat ``enum`` as ``flags`` regardless of its type in ``definitions``.
+When both ``enum`` and ``flags`` forms are needed ``definitions`` should
+contain an ``enum`` and attributes which need the ``flags`` form should
+use this attribute.
+
+nested-attributes
+~~~~~~~~~~~~~~~~~
+
+Identifies the attribute space for attributes nested within given attribute.
+Only valid for complex attributes which may have sub-attributes.
+
+multi-attr (arrays)
+~~~~~~~~~~~~~~~~~~~
+
+Boolean property signifying that the attribute may be present multiple times.
+Allowing an attribute to repeat is the recommended way of implementing arrays
+(no extra nesting).
+
+byte-order
+~~~~~~~~~~
+
+For integer types specifies attribute byte order - ``little-endian``
+or ``big-endian``.
+
+checks
+~~~~~~
+
+Input validation constraints used by the kernel. User space should query
+the policy of the running kernel using Generic Netlink introspection,
+rather than depend on what is specified in the spec file.
+
+The validation policy in the kernel is formed by combining the type
+definition (``type`` and ``nested-attributes``) and the ``checks``.
+
+operations
+----------
+
+This section describes messages passed between the kernel and the user space.
+There are three types of entries in this section - operations, notifications
+and events.
+
+Operations describe the most common request - response communication. User
+sends a request and kernel replies. Each operation may contain any combination
+of the two modes familiar to netlink users - ``do`` and ``dump``.
+``do`` and ``dump`` in turn contain a combination of ``request`` and
+``response`` properties. If no explicit message with attributes is passed
+in a given direction (e.g. a ``dump`` which does not accept filter, or a ``do``
+of a SET operation to which the kernel responds with just the netlink error
+code) ``request`` or ``response`` section can be skipped.
+``request`` and ``response`` sections list the attributes allowed in a message.
+The list contains only the names of attributes from a set referred
+to by the ``attribute-set`` property.
+
+Notifications and events both refer to the asynchronous messages sent by
+the kernel to members of a multicast group. The difference between the
+two is that a notification shares its contents with a GET operation
+(the name of the GET operation is specified in the ``notify`` property).
+This arrangement is commonly used for notifications about
+objects where the notification carries the full object definition.
+
+Events are more focused and carry only a subset of information rather than full
+object state (a made up example would be a link state change event with just
+the interface name and the new link state). Events contain the ``event``
+property. Events are considered less idiomatic for netlink and notifications
+should be preferred.
+
+list
+~~~~
+
+The only property of ``operations`` for ``genetlink``, holds the list of
+operations, notifications etc.
+
+Operation properties
+--------------------
+
+name
+~~~~
+
+Identifies the operation.
+
+value
+~~~~~
+
+Numerical message ID, used in serialized Netlink messages.
+The same enumeration rules are applied as to
+:ref:`attribute values<assign_val>`.
+
+attribute-set
+~~~~~~~~~~~~~
+
+Specifies the attribute set contained within the message.
+
+do
+~~~
+
+Specification for the ``doit`` request. Should contain ``request``, ``reply``
+or both of these properties, each holding a :ref:`attr_list`.
+
+dump
+~~~~
+
+Specification for the ``dumpit`` request. Should contain ``request``, ``reply``
+or both of these properties, each holding a :ref:`attr_list`.
+
+notify
+~~~~~~
+
+Designates the message as a notification. Contains the name of the operation
+(possibly the same as the operation holding this property) which shares
+the contents with the notification (``do``).
+
+event
+~~~~~
+
+Specification of attributes in the event, holds a :ref:`attr_list`.
+``event`` property is mutually exclusive with ``notify``.
+
+mcgrp
+~~~~~
+
+Used with ``event`` and ``notify``, specifies which multicast group
+message belongs to.
+
+.. _attr_list:
+
+Message attribute list
+----------------------
+
+``request``, ``reply`` and ``event`` properties have a single ``attributes``
+property which holds the list of attribute names.
+
+Messages can also define ``pre`` and ``post`` properties which will be rendered
+as ``pre_doit`` and ``post_doit`` calls in the kernel (these properties should
+be ignored by user space).
+
+mcast-groups
+------------
+
+This section lists the multicast groups of the family.
+
+list
+~~~~
+
+The only property of ``mcast-groups`` for ``genetlink``, holds the list
+of groups.
+
+Multicast group properties
+--------------------------
+
+name
+~~~~
+
+Uniquely identifies the multicast group in the family. Similarly to
+Family ID, Multicast Group ID needs to be resolved at runtime, based
+on the name.
+
+.. _attr_types:
+
+Attribute types
+===============
+
+This section describes the attribute types supported by the ``genetlink``
+compatibility level. Refer to documentation of different levels for additional
+attribute types.
+
+Scalar integer types
+--------------------
+
+Fixed-width integer types:
+``u8``, ``u16``, ``u32``, ``u64``, ``s8``, ``s16``, ``s32``, ``s64``.
+
+Note that types smaller than 32 bit should be avoided as using them
+does not save any memory in Netlink messages (due to alignment).
+See :ref:`pad_type` for padding of 64 bit attributes.
+
+The payload of the attribute is the integer in host order unless ``byte-order``
+specifies otherwise.
+
+.. _pad_type:
+
+pad
+---
+
+Special attribute type used for padding attributes which require alignment
+bigger than standard 4B alignment required by netlink (e.g. 64 bit integers).
+There can only be a single attribute of the ``pad`` type in any attribute set
+and it should be automatically used for padding when needed.
+
+flag
+----
+
+Attribute with no payload, its presence is the entire information.
+
+binary
+------
+
+Raw binary data attribute, the contents are opaque to generic code.
+
+string
+------
+
+Character string. Unless ``checks`` has ``unterminated-ok`` set to ``true``
+the string is required to be null terminated.
+``max-len`` in ``checks`` indicates the longest possible string,
+if not present the length of the string is unbounded.
+
+Note that ``max-len`` does not count the terminating character.
+
+nest
+----
+
+Attribute containing other (nested) attributes.
+``nested-attributes`` specifies which attribute set is used inside.