diff options
author | Thomas Haller <thaller@redhat.com> | 2021-04-06 21:49:59 +0200 |
---|---|---|
committer | Thomas Haller <thaller@redhat.com> | 2021-04-08 16:35:21 +0200 |
commit | e095e77b46b394a8409f55ba8ce773908bbd9962 (patch) | |
tree | a68a435d8ebcce3decafbd5221141bee4b0ad81e | |
parent | 79903065e4d7610b45b521f35cd9bc839bbf293e (diff) | |
download | NetworkManager-e095e77b46b394a8409f55ba8ce773908bbd9962.tar.gz |
l3cfg: add README file about l3cfg
Add a README with thoughts about the L3Cfg rework. As this is a large
rework, we should have an understanding what is happening. This text
is subject to future improvements.
-rw-r--r-- | src/core/README.l3cfg.md | 286 |
1 files changed, 286 insertions, 0 deletions
diff --git a/src/core/README.l3cfg.md b/src/core/README.l3cfg.md new file mode 100644 index 0000000000..80de67b911 --- /dev/null +++ b/src/core/README.l3cfg.md @@ -0,0 +1,286 @@ +L3Cfg Rework +============ + +NMDevice is complex. Together with NMManager, NMDevice does too much. + +The goal is to rework the IP configuration (Layer 3) to be a more separate +part of the code that is better maintainable, easier to understand and +extend and more correct. + +Current Situation +----------------- + +- [NMManager](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/src/core/nm-manager.c): + this is the main object (a singleton) that drives most things. + Among many other things, it creates NMDevice instances and coordinates. + +- [NMDevice](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/src/core/devices/nm-device.c): + this represents a device. This is a subclass of NMDBusObject, + it is thus directly exported on D-Bus (as D-Bus objects like + `/org/freedesktop/NetworkManager/Devices/1`). + It also manages all aspects of the device. It has an overall state + (`NM_DEVICE_STATE`) but lots of more specific states (e.g. current state + of DHCP configuration). As always, the hardest part in programming are + stateful objects, and NMDevice has *a lot* of state. The code is huge and + hard to understand and the class has (too) many responsibilities. \ + \ + NMDevice also has subclasses, which are determined based on the "device type". That + means, there are subclasses like NMDeviceEthernet and NMDeviceBridge. As such, the + subclasses also export additional D-Bus interfaces. These subclasses also handle + the Layer 2 specific aspects of the device. For this aspect, delegation probably + would have been a better choice. On the other hand, IP configuration is almost entirely + managed by the parent class. Which is good, because the IP configuration is common to all + device types, but is bad because NMDevice already does so many things. + +- [NMIP4Config](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/src/core/nm-ip4-config.c) (and NMIP6Config): + these are also subclasses of NMDBusObject + and exported on D-Bus on paths like `/org/freedesktop/NetworkManager/IP4Config/1`. + The device's `IP4Config` property refers to these objects. They contain + the runtime IP information of that device. I don't think these objects + should exist on the D-Bus API, as NMDevice could directly expose these properties. + But for historic reasons, such is our D-Bus API. + Other than that, NMIP4Config objects are also used internally for tracking + IP configuration. For example, [when](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/src/core/dhcp/nm-dhcp-nettools.c#L563) + we receive a DHCP lease, we construct a NMIP4Config object with the addresses, DNS settings, + and so on. These + instances are then [tracked by](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/src/core/devices/nm-device.c#L519) + NMDevice, and [merged](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/src/core/devices/nm-device.c#L8928) + into an instance that is then exposed on D-Bus. As such, this class has two + mostly independent purposes. + +- [NMDhcpClient](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/src/core/dhcp/nm-dhcp-client.c): + our DHCP "library". It's a simple object with a clear API that + abstracts most of the complexity of handling DHCP. But still, NMDevice + needs to drive the DHCP client instance. Meaning, it needs to create (start) and stop + them and hook up signals for changes (new lease) and timeout. This is mostly + fine and unavoidable. The point is that while specific tasks are well abstracted + (like the details of DHCP), there is still some state in NMDevice that is related + to manage these tasks. DHCP is one of many such tasks, like also + link local addresses, SLAAC or LLDP. + This leads to the increased complexity of NMDevice, which manages a large variety + of such tasks. + +### Problems: + +1. first the sheer code size of [nm-device.c](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/src/core/devices/nm-device.c#L19030). + It's hard to understand and maintain, and this results in misbehaviours. Also, features that should be easy to implement + are not. Also, there are inefficiencies that are hard to fix. + +2. NMDevice and NMIP4Config are both exported on D-Bus while having other responsibilities. + Being subclasses of NMDBusObject, they are glued to the D-Bus API. For example, NMIP4Config is + also used for other purposes (for tracking IP configuration internally). + +3. NMDevice simply does too much. IP configuration should be a separate, encapsulated + API to make allow NMDevice to be smaller and the IP configuration part better + testable, understandable and smaller too. + +4. in the current model, NMDevice can be related to zero, one or two ifindexes. For example, + for ethernet devices, there is commonly only one actual netdev device (and one ifindex). + For OVS devices, there is no ifindex. For NMDeviceModem or NMDeviceBluetooth there is + a NMDevice instance that has initially no ifindex (it represents the tty serial port + or the bluetooth device) but during activation it gets and ip ifindex. With PPPoE, + the ethernet device can even have two ifindexes (one for the underlying ethernet and + one for the PPP device). That is all utterly confusing, inconsistent and limited. + For example, not all interfaces you see in `ip link` can be found in the D-Bus API. + The D-Bus API also does not give access to the ifindex (which is the real identifier + for a netdev devices). It only exposes the IpInterface name. That should be improved too, + but even such seemingly simple things are not done for years, because it's not trivially + clear what the right ifindex is. + Also a device instance on D-Bus significantly changes its meaning when it activates/deactivates + and it starts/stops being responsible for an ifindex. + In the future there should be devices that represent exactly one netdev device (an ifindex) + and devices that don't have an ifindex. That is follow up work and hinted by + [rhbz#1066703](https://bugzilla.redhat.com/show_bug.cgi?id=1066703). But simplifying + the IP configuration is a requisite before addressing that rework. + With this we will have controller and controlled devices. That means, a controller devices + (that for example represents a bluetooth device) will need to configure IP address on the + controlled IP device. That would be doable by injecting the IP config on that device, + but as the device already does so much, it would be better if this would be a separate + IP configuration manager for that ifindex. + +5. NMIP4Config exports properties on D-Bus like [AddressData](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/introspection/org.freedesktop.NetworkManager.IP4Config.xml#L26). + which are the currently configured IP addresses. These should be directly obtained + from the NMPlatform cache, which contains the correct list of addresses as kernel + exposes them via rtnetlink. Instead, whenever there are changes in platform we + [generate](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/src/core/devices/nm-device.c#L14223) + an NMIP4Config instance, then we merge, intersect and subtract this captured information + with the IP configs we want to configure. Finally we merge them together again + and sync the result to platform. This is bad, wrong and inefficient. + We must not mix "what is configured" with "what we want to configure". The current + approach also re-generates these IP config instance whenever something in platform changes. + That does not scale. If we have any hope to handle thousands of routes, this needs to change. + +6. The NMIP4Config objects are mutable, and they are heavily mutated. When we create an NMIP4Config + instance that represent a DHCP lease, we will [subtract](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/src/core/devices/nm-device.c#L14236) + addresses that were externally removed. That is wrong, because during a reapply we + will need to know these addresses again. The solution for that is not to mutate this + data, but track whether IP addresses are removed separately. + +7. NMDevice also does ACD, but it can only do it for addresses received via DHCP. + It implicitly also does ACD for IPv4LL, but that is via using the n_ipv4ll library. + It would be good to have an option that we can configure IPv4LL for any address. + Also, if you manually configure an address like 192.168.2.5 (for which we don't do + ACD) and the same address is obtained via DHCP, then doing ACD for the address is wrong. + There needs to be link-wide view of the addresses, and not only looking at individual + addresses when deciding to do ACD. + +8. As IP configuration is done by NMDevice, VPN connections have limited capabilities + in this regard. + When a VPN has IP addresses, then it injects them into NMDevice by + [providing](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/src/core/devices/nm-device.c#L13696) + an NMIP4Config. However, that means VPNs cannot do DHCP or IPv4LL, because it can + only inject known configuration. That would be very useful for example with a tap + device with openvpn. The real problem here is that NMVpnConnection are + treated special, when they should be more like devices. That should be reworked in the future, + by reworking VPN plugins. Regardless, having IP configuration handled by NMDevice is limiting. + +9. NetworkManager currently supports `ipv4.method` which can be "manual", "disabled" or + "auto". This scheme does not allow for example to enable IPv4LL together with DHCPv4. + As a special case, you can configure `ipv4.method=auto` together with static + addresses in `ipv4.addresses`, so combining DHCP and static addressing works. But in general, + this scheme is limited. In the future we should have more flexible schemes, where + addressing methods can be independently enabled or disabled. Also, we currently + have `ipv4.may-fail`, but that is limited as well. For example, + `ipv4.may-fail=yes` and `ipv6.may-fail=yes` still means that at least one of the + address families must succeed. That makes sense for certain use cases, but it + means, you cannot have truly best-effort, opportunistic DHCP with this way. + As workaround for that there is `ipv4.dhcp-timeout=infinity`. In + general it is not only useful to enable methods independently, we also configure + independently whether they are required or optional (and possibly, that they are optional + but at least one of several optional methods must succeed). Anyway. The point + is there is a need to make IP configuration more flexible. Currently it is not. + Such a seemingly simple extension would be surprisingly difficult to implement + because [the code](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/src/core/devices/nm-device.c#L6616) is + all over the place. The way how NMDevice tracks the overall activation state is + hard to understand. This should be improved and possibly could be improved in a + smaller refactoring effort. But instead of a smaller effort, we will use the big hammer + with L3Cfg rework. + +10. There are two classes NMIP4Config and NMIP6Config. Handling both address families is + commonly similar, so there is lot of similar code in both. They should be unified + so that similar code can handle both address families. + + +Solution and Future +------------------- + +NML3Cfg work is supposed to simplify some part of NMDevice: the part related to +IP configuration. This is a huge rework of a core part of NetworkManager. Arguably, +some parts maybe could be done more evolutionary, but the fundamental problems require +to rip out NMIP4Config and replace it by something better. Doing that is a large rework +that changes NMDevice heavily. That is also the opportunity to get the smaller issues +right. + +There is already a new class [NML3Cfg](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/2a88de2280c9c5dcbb35fed02f766cd3f6911303/src/core/nm-l3cfg.h#L141) +(currently unused). An NML3Cfg instance is responsible for handling IP configuration +of an ifindex. Consequently, we can ask NMNetns to [get](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/2a88de2280c9c5dcbb35fed02f766cd3f6911303/src/core/nm-netns.c#L142) +(or create) a NML3Cfg instance for an ifindex. +The idea is that there can be multiple users (NMDevice and NMVpnConnection and future controller devices) +that use the same NML3Cfg instance. Especially with a future rework of NMDevice where +a NMDevice only manages one ifindex (or none), there is a need that multiple +devices manage the IP configuration on the same device. Independent users can cooperate +to configure IP configuration on the same device. Already now with Libreswan VPN, where the VPN "injects" +its NMIP4Config in NMDevice. Or with PPPoE, where the NMDeviceEthernet is both about IP configuration +for the PPPoE device. + +There is also a new class [NML3ConfigData](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/2a88de2280c9c5dcbb35fed02f766cd3f6911303/src/core/nm-l3-config-data.h). +This replaces some aspect of NMIP4Config/NMIP6Config. A NML3ConfigData object is immutable and has no real logic +(or state). It has some "logic", like comparing two NML3ConfigData instances, logging it, or merging two (immutable) +instances into a new instance. But as the class is immutable, that logic is rather simple. This class is +used to track information. As it's immutable, anybody who is interested can keep a reference +for it's own purpose. For example, NMDhcpClient will generate a NML3ConfigData with the information +of the lease. It may keep the reference, but it will also tell NMDevice about it. The NMDevice +will then itself tell NML3Cfg to accept this configuration. This works by calling +[add()/remove()](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/2a88de2280c9c5dcbb35fed02f766cd3f6911303/src/core/nm-l3cfg.c#L2654). +One NML3ConfigData can also track both IPv4 and IPv6 information. It's a general set of IP related +configuration, that has some address specific properties. Those are then duplicated for both address +families and implemented in a way to minimize code duplication and encourage to treat them the same. +As this replaces an aspect of NMIP4Config, NMIP4Config can focus on it's other purpose: to expose data on D-Bus. + +What NML3Cfg then does, is to merge all NML3ConfigData, and "commit" it to platform. Thereby it knows +which addresses it configured the last time (if they no longer are to be configured, they must be removed). +This is done [here](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/2a88de2280c9c5dcbb35fed02f766cd3f6911303/src/core/nm-l3cfg.c#L3442). + +As independent users should be able to cooperate, it is not appropriate that they call "commit". +Instead, they set a commit type ([here](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/2a88de2280c9c5dcbb35fed02f766cd3f6911303/src/core/nm-l3cfg.c#L3476), +and whenever something changes, NML3Cfg knows the aggregated commit type. That is necessary +because when we activate a device, we may want to preserve the existing IP configuration (e.g. after +a restart of NetworkManager). During that time is the NML3Cfg instance set to a reduced commit +mode (ASSUME). + +NML3Cfg will also handle IPv4 ACD. Any user of NML3Cfg registers/unregisters NML3ConfigData instances +that should be configured. Thereby they also say whether ACD should be done for the IPv4 addresses. +NML3Cfg then keeps state for each IPv4 address, whether ACD should be performed, and whether the +address is ready to be configured. NML3Cfg does not do DHCP or similar. That is still the responsibility +of NMDevice to run a NMDhcpClient. But it does run ACD, because whether to perform ACD on an address +requires a holistic view of all addresses of that interface. For example, if you configure a static +IP address 192.168.2.5 (with ACD disabled) and you also get the same address via DHCP, then ACD should +not performed for that address (even if the user configured ACD with DHCP). Of course, that is a very +unlikely example. More likely is that NetworkManager is restarted and it leaves the addresses (that passed +ACD) configured. After restart, DHCP finds the same addresses and no new ACD should be performed. This shows +that the ACD state depends all the IP addresses on an interface, +and thus it's done by NML3Cfg. The API for this is very simple. Users enable/disable ACD during nm_l3cfg_add_config() +and receive events like [NM_L3_CONFIG_NOTIFY_TYPE_ACD_EVENT](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/2a88de2280c9c5dcbb35fed02f766cd3f6911303/src/core/nm-l3cfg.c#L303). +Another advantage is that ACD now works for any kinds of addresses. Currently it only works for addresses +from DHCP and link local addresses. + +NML3Cfg does not implement or drive DHCP. However, as it already does ACD it gained it's own IPv4LL +"library": [NML3IPv4LL](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/2a88de2280c9c5dcbb35fed02f766cd3f6911303/src/core/nm-l3cfg.c#L3624). +This will replace nettools' n-ipv4ll library, because that library also does ACD internally, while we want +to use the holistic view that NML3Cfg has. What this means, is that the user (NMDevice) +can request a NML3IPv4LL handle from the NML3Cfg instance, and it just does it with a simple API. +All the user might do is to enable/disable the handle and to react to signals (if it cares to find +out whether IPv4LL fails). + +The general parts of NML3Cfg are already implemented. It has unit tests and can be tested independently. +You might note that NML3Cfg is not trivial already, but the API that it provides is as simple as possible: +create immutable NML3ConfigData instance, and add/remove them. Optionally, handle the ACD events and +listen to some events. The complexity that NML3Cfg has, will lead in the same amount simplify NMDevice. + +What is missing is NMDevice using this new API. Instead of creating and tracking NMIP4Config instances, +it needs to track NML3ConfigData instances. In principle that sounds simple, in practice that changes +large part of "nm-device.c". + +Thereby also the state machine for NM_DEVICE_STATE will be improved. It's anyway a rewrite. This will lay the +groundwork for more flexible configuration of IP methods, with different failure modes (opportunistic or +mandatory). + +What then also should be easier, to combine IPv4LL with other addressing methods. In Windows AFAIK, if you +don't get a DHCP address it will configure a IPv4LL address. That is also what RFC suggests, but which we +currently don't support. + +In general, change the way how external IP addresses/routes are tracked. This merge, intersect, subtract +approach does not perform well. Currently we react on signals and it's hard to understand what happens +in response to that, or whether it's really the correct thing to do. See yourself starting from +[here](https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/af360238be198257934b89d42f24431401550721/src/core/devices/nm-device.c#L14214). + + +Related Bugs +------------ + +* Main bug: + + - [rh#1868254](https://bugzilla.redhat.com/show_bug.cgi?id=1868254): + "refactor NetworkManager's IP configuration done by NMDevice" + +* Follow up but to improve model of devices: + + - [rh#1066703](https://bugzilla.redhat.com/show_bug.cgi?id=1066703): + "\[RFE\] Handle parent/child relationships more cleanly" + +* Flexible IP methods: + + - [rh#1791624](https://bugzilla.redhat.com/show_bug.cgi?id=1791624): + "NetworkManager must not remove used bridge" + +* Improving performance issues, this will lay ground work: + + - [rh#1847125](https://bugzilla.redhat.com/show_bug.cgi?id=1847125): + "\[RFE\] Improve 20% performance on creating 1000 bridge over 1000 VLANs" + + - [rh#1861527](https://bugzilla.redhat.com/show_bug.cgi?id=1861527): + "Excessive memory and CPU usage on a router with IPv6 BGP feed" + + - [rh#1753677](https://bugzilla.redhat.com/show_bug.cgi?id=1753677): + "High cpu usage while non-controlled interface is mangling tc filters" + |