diff options
author | Thomas Haller <thaller@redhat.com> | 2022-02-16 09:59:56 +0100 |
---|---|---|
committer | Thomas Haller <thaller@redhat.com> | 2022-02-16 09:59:56 +0100 |
commit | 1971e6a7bb9155b69d0f42501d070f137adb2776 (patch) | |
tree | 9b8372c79993c111157ffe817cc26c138786f42c | |
parent | 67ad9a62b1333158951588ddc46b5953ed631b92 (diff) | |
parent | dac12a8d6178a6d82fc0913ad8825c8556380ba8 (diff) | |
download | NetworkManager-1971e6a7bb9155b69d0f42501d070f137adb2776.tar.gz |
platform: merge branch 'th/platform-ip6-multipath-routes'
https://bugzilla.redhat.com/show_bug.cgi?id=1837254
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1093
-rw-r--r-- | TODO | 5 | ||||
-rw-r--r-- | src/libnm-platform/README.md | 56 | ||||
-rw-r--r-- | src/libnm-platform/nm-linux-platform.c | 177 |
3 files changed, 132 insertions, 106 deletions
@@ -2,11 +2,6 @@ So you're interested in hacking on NetworkManager? Here's some cool stuff you could do... -* IPv6 Multihop Routes in Platform - -See src/libnm-platform/README.md - - * Use netlink API instead of ioctl based ethtool. NetworkManager uses ethtool API to set/obtain certain settings of network diff --git a/src/libnm-platform/README.md b/src/libnm-platform/README.md index d09975bc3d..c8e7cf7679 100644 --- a/src/libnm-platform/README.md +++ b/src/libnm-platform/README.md @@ -20,59 +20,3 @@ This depends on the following helper libraries - [../libnm-udev-aux/](../libnm-udev-aux/) - [../libnm-log-core/](../libnm-log-core/) - [../linux-headers/](../linux-headers/) - - -TODO and Bugs -============= - -IPv6 Multi-hop Routes ---------------------- - -NMPlatform has a cache (dictionary) with netlink objects, which can also be augmented -with additional information like the WifiData or the udev device. A dictionary requires -that objects have an identity, which they can be compared and hash. In other words, -a set of properties that determines that the object is something distinctly recognizable. - -Route routes and routing policy routes, from point of view of kernel there is not -a simple set of properties/attributes that determine the identity of the route/rule. Rather, -most attributes are part of the ID, but not all. See `NM_PLATFORM_IP_ROUTE_CMP_TYPE_ID` -and `NM_PLATFORM_ROUTING_RULE_CMP_TYPE_ID`. - -For routes, we currently ignore all multi-hop routes (ECMP). For IPv4, that is fine because -kernel also treats the next hops (of any number) to be the part of the ID of a route. For -example, you can see in `ip route` two IPv4 routes that only differ by their next-hops. -As NetworkManager currently doesn't configure multi-hop routes, ignoring those routes and -not caching them is no problem. - -For IPv6 routes that is different. When you add two IPv6 routes that only differ by their -next hops, then kernel will merge them into a multi-hop route (as you can see in `ip -6 route`). -Likewise, if you remove a (single or multi-hop) route, then kernel will "subtract" those -hops from the multi-hop route. In a way, kernel always mangles the result into a multi-hop route. -If you logically consider the hops of an IPv6 part of the identity of a route, then adding a route, -can create a new (because distinct as by their ID) route while removing the previously existing route -(without sending a RTM_DELROUTE message). As NetworkManager currently ignores all multi-hop routes, -this easily leads to an inconsistent cache, because NetworkManager does not understand that the -addition/removal of an IPv6 route, interferes with an entirely different route (from point of view of -the identity). -So you could say the problem is that the ID of a route changes (by merging the next hops). But that -makes no sense, because the ID identifies the route, it cannot change without creating a different -route. So the alternative to see this problem is that adding a route can create a different route -and deleting the previous one, but there are not sufficient netlink events to understand which -route got mangled (short of searching the cache). But also, the RTM_NEWROUTE command no longer -necessarily results in the addition of the route we requested and a RTM_DELROUTE event does -not necessarily notify about the route that was removed (rather, it notifies about the part -that got subtracted). - -Another way to see kernel's bogus behavior is to pretend that there are only single-hop routes. -That makes everything simple, the only speciality is that a RTM_NEWROUTE now can contain -(with this point of view of the identity) multiple routes, one for each hop. - -To solve the problem of platform cache inconsistencies for IPv6 routes, NetworkManager should -only honor IPv6 single-path routes, but with the twist that one RTM_NEWROUTE can announce multiple -routes at once. - -This alternative view that we should implement is possibly a deviation from kernel's view. -Usually we avoid modelling things differently than kernel, but in this case it makes more -sense as this is more how it appears on the netlink API (based on the events that we get). - -See also: https://bugzilla.redhat.com/show_bug.cgi?id=1837254#c20 diff --git a/src/libnm-platform/nm-linux-platform.c b/src/libnm-platform/nm-linux-platform.c index 93935b98d4..ce21ebbb7a 100644 --- a/src/libnm-platform/nm-linux-platform.c +++ b/src/libnm-platform/nm-linux-platform.c @@ -1210,6 +1210,21 @@ _linktype_get_type(NMPlatform *platform, * libnl unility functions and wrappers ******************************************************************/ +typedef struct { + /* The first time, we are called with "iter_more" false. If there is only + * one message to parse, the callee can leave this at false and be done + * (meaning, it can just ignore the potential parsing of multiple messages). + * If there are multiple message, then set this to TRUE. We will continue + * the parsing as long as this flag stays TRUE and an object gets returned. */ + bool iter_more; + + union { + struct { + guint next_multihop; + } ip6_route; + }; +} ParseNlmsgIter; + #define NLMSG_TAIL(nmsg) ((struct rtattr *) (((char *) (nmsg)) + NLMSG_ALIGN((nmsg)->nlmsg_len))) /* copied from iproute2's addattr_l(). */ @@ -3374,7 +3389,7 @@ _new_from_nl_addr(struct nlmsghdr *nlh, gboolean id_only) /* Copied and heavily modified from libnl3's rtnl_route_parse() and parse_multipath(). */ static NMPObject * -_new_from_nl_route(struct nlmsghdr *nlh, gboolean id_only) +_new_from_nl_route(struct nlmsghdr *nlh, gboolean id_only, ParseNlmsgIter *parse_nlmsg_iter) { static const struct nla_policy policy[] = { [RTA_TABLE] = {.type = NLA_U32}, @@ -3387,17 +3402,21 @@ _new_from_nl_route(struct nlmsghdr *nlh, gboolean id_only) [RTA_METRICS] = {.type = NLA_NESTED}, [RTA_MULTIPATH] = {.type = NLA_NESTED}, }; + guint multihop_idx; const struct rtmsg *rtm; struct nlattr *tb[G_N_ELEMENTS(policy)]; + int addr_family; gboolean IS_IPv4; nm_auto_nmpobj NMPObject *obj = NULL; int addr_len; struct { - gboolean is_present; + gboolean found; + gboolean has_more; int ifindex; NMIPAddr gateway; } nh = { - .is_present = FALSE, + .found = FALSE, + .has_more = FALSE, }; guint32 mss; guint32 window = 0; @@ -3407,6 +3426,10 @@ _new_from_nl_route(struct nlmsghdr *nlh, gboolean id_only) guint32 mtu = 0; guint32 lock = 0; + nm_assert((parse_nlmsg_iter->iter_more && parse_nlmsg_iter->ip6_route.next_multihop > 0) + || (!parse_nlmsg_iter->iter_more && parse_nlmsg_iter->ip6_route.next_multihop == 0)); + multihop_idx = parse_nlmsg_iter->ip6_route.next_multihop; + if (!nlmsg_valid_hdr(nlh, sizeof(*rtm))) return NULL; @@ -3416,9 +3439,11 @@ _new_from_nl_route(struct nlmsghdr *nlh, gboolean id_only) * only handle ~supported~ routes. *****************************************************************/ - if (rtm->rtm_family == AF_INET) + addr_family = rtm->rtm_family; + + if (addr_family == AF_INET) IS_IPv4 = TRUE; - else if (rtm->rtm_family == AF_INET6) + else if (addr_family == AF_INET6) IS_IPv4 = FALSE; else return NULL; @@ -3436,57 +3461,76 @@ _new_from_nl_route(struct nlmsghdr *nlh, gboolean id_only) /*****************************************************************/ - addr_len = IS_IPv4 ? sizeof(in_addr_t) : sizeof(struct in6_addr); + addr_len = nm_utils_addr_family_to_size(addr_family); if (rtm->rtm_dst_len > (IS_IPv4 ? 32 : 128)) return NULL; - /***************************************************************** - * parse nexthops. Only handle routes with one nh. - *****************************************************************/ - if (tb[RTA_MULTIPATH]) { - size_t tlen = nla_len(tb[RTA_MULTIPATH]); + size_t tlen; struct rtnexthop *rtnh; + guint idx; + tlen = nla_len(tb[RTA_MULTIPATH]); if (tlen < sizeof(*rtnh)) goto rta_multipath_done; rtnh = nla_data_as(struct rtnexthop, tb[RTA_MULTIPATH]); - if (tlen < rtnh->rtnh_len) goto rta_multipath_done; + idx = 0; while (TRUE) { - if (nh.is_present) { - /* we don't support multipath routes. */ - return NULL; - } - - nh.is_present = TRUE; - nh.ifindex = rtnh->rtnh_ifindex; - - if (rtnh->rtnh_len > sizeof(*rtnh)) { - struct nlattr *ntb[G_N_ELEMENTS(policy)]; + if (idx == multihop_idx) { + nh.found = TRUE; + nh.ifindex = rtnh->rtnh_ifindex; + if (rtnh->rtnh_len > sizeof(*rtnh)) { + struct nlattr *ntb[RTA_MAX + 1]; + + if (nla_parse_arr(ntb, + (struct nlattr *) RTNH_DATA(rtnh), + rtnh->rtnh_len - sizeof(*rtnh), + NULL) + < 0) + return NULL; - if (nla_parse_arr(ntb, - (struct nlattr *) RTNH_DATA(rtnh), - rtnh->rtnh_len - sizeof(*rtnh), - policy) - < 0) + if (_check_addr_or_return_null(ntb, RTA_GATEWAY, addr_len)) + memcpy(&nh.gateway, nla_data(ntb[RTA_GATEWAY]), addr_len); + } + } else if (nh.found) { + /* we just parsed a nexthop, but there is yet another hop afterwards. */ + nm_assert(idx == multihop_idx + 1); + if (IS_IPv4) { + /* for IPv4, multihop routes are currently not supported. + * + * If we ever support them, then the next-hop list is part of the NMPlatformIPRoute, + * that is, for IPv4 we truly have multihop routes. Unlike for IPv6. + * + * For now, just error out. */ return NULL; + } - if (_check_addr_or_return_null(ntb, RTA_GATEWAY, addr_len)) - memcpy(&nh.gateway, nla_data(ntb[RTA_GATEWAY]), addr_len); + /* For IPv6 multihop routes, we need to remember to iterate again. + * For each next-hop, we will create a distinct single-hop NMPlatformIP6Route. */ + nh.has_more = TRUE; + break; } if (tlen < RTNH_ALIGN(rtnh->rtnh_len) + sizeof(*rtnh)) - goto rta_multipath_done; + break; tlen -= RTNH_ALIGN(rtnh->rtnh_len); rtnh = RTNH_NEXT(rtnh); + idx++; } -rta_multipath_done:; + } + +rta_multipath_done: + + if (!nh.found && multihop_idx > 0) { + /* something is wrong. We are called back to collect multi_idx, but the index + * is not there. We messed up the book keeping. */ + return nm_assert_unreachable_val(NULL); } if (tb[RTA_OIF] || tb[RTA_GATEWAY] || tb[RTA_FLOW]) { @@ -3498,18 +3542,21 @@ rta_multipath_done:; if (_check_addr_or_return_null(tb, RTA_GATEWAY, addr_len)) memcpy(&gateway, nla_data(tb[RTA_GATEWAY]), addr_len); - if (!nh.is_present) { + if (!nh.found) { /* If no nexthops have been provided via RTA_MULTIPATH * we add it as regular nexthop to maintain backwards * compatibility */ - nh.ifindex = ifindex; - nh.gateway = gateway; - nh.is_present = TRUE; + nh.ifindex = ifindex; + nh.gateway = gateway; + nh.found = TRUE; } else { /* Kernel supports new style nexthop configuration, * verify that it is a duplicate and ignore old-style nexthop. */ - if (nh.ifindex != ifindex || memcmp(&nh.gateway, &gateway, addr_len) != 0) + if (nh.ifindex != ifindex || memcmp(&nh.gateway, &gateway, addr_len) != 0) { + /* we have a RTA_MULTIPATH attribute that does not agree. + * That seems not right. Error out. */ return NULL; + } } } @@ -3518,7 +3565,7 @@ rta_multipath_done:; * * Well, actually, for IPv6 kernel will always say that the device is * 1 (lo). Of course it does!! */ - if (nh.is_present) { + if (nh.found) { if (IS_IPv4) { if (nh.ifindex != 0 || nh.gateway.addr4 != 0) { /* we only accept kernel to notify about the ifindex/gateway, if it @@ -3536,7 +3583,7 @@ rta_multipath_done:; } } } else { - if (!nh.is_present) { + if (!nh.found) { /* a "normal" route needs a device. This is not the route we are looking for. */ return NULL; } @@ -3639,6 +3686,12 @@ rta_multipath_done:; obj->ip_route.r_rtm_flags = rtm->rtm_flags; obj->ip_route.rt_source = nmp_utils_ip_config_source_from_rtprot(rtm->rtm_protocol); + if (nh.has_more) { + parse_nlmsg_iter->iter_more = TRUE; + parse_nlmsg_iter->ip6_route.next_multihop = multihop_idx + 1; + } else + parse_nlmsg_iter->iter_more = FALSE; + return g_steal_pointer(&obj); } @@ -4080,7 +4133,8 @@ static NMPObject * nmp_object_new_from_nl(NMPlatform *platform, const NMPCache *cache, struct nl_msg *msg, - gboolean id_only) + gboolean id_only, + ParseNlmsgIter *parse_nlmsg_iter) { struct nlmsghdr *msghdr; @@ -4102,7 +4156,7 @@ nmp_object_new_from_nl(NMPlatform *platform, case RTM_NEWROUTE: case RTM_DELROUTE: case RTM_GETROUTE: - return _new_from_nl_route(msghdr, id_only); + return _new_from_nl_route(msghdr, id_only, parse_nlmsg_iter); case RTM_NEWRULE: case RTM_DELRULE: case RTM_GETRULE: @@ -6977,12 +7031,13 @@ event_valid_msg(NMPlatform *platform, struct nl_msg *msg, gboolean handle_events gboolean is_del = FALSE; gboolean is_dump = FALSE; NMPCache *cache = nm_platform_get_cache(platform); - - msghdr = nlmsg_hdr(msg); + ParseNlmsgIter parse_nlmsg_iter; if (!handle_events) return; + msghdr = nlmsg_hdr(msg); + if (NM_IN_SET(msghdr->nlmsg_type, RTM_DELLINK, RTM_DELADDR, @@ -6995,7 +7050,11 @@ event_valid_msg(NMPlatform *platform, struct nl_msg *msg, gboolean handle_events is_del = TRUE; } - obj = nmp_object_new_from_nl(platform, cache, msg, is_del); + parse_nlmsg_iter = (ParseNlmsgIter){ + .iter_more = FALSE, + }; + + obj = nmp_object_new_from_nl(platform, cache, msg, is_del, &parse_nlmsg_iter); if (!obj) { _LOGT("event-notification: %s: ignore", nl_nlmsghdr_to_str(msghdr, buf_nlmsghdr, sizeof(buf_nlmsghdr))); @@ -7023,7 +7082,7 @@ event_valid_msg(NMPlatform *platform, struct nl_msg *msg, gboolean handle_events NULL, 0)); - { + while (TRUE) { nm_auto_nmpobj const NMPObject *obj_old = NULL; nm_auto_nmpobj const NMPObject *obj_new = NULL; @@ -7085,7 +7144,10 @@ event_valid_msg(NMPlatform *platform, struct nl_msg *msg, gboolean handle_events * * This is to help with the performance overhead of a huge number of * routes, for example with the bird BGP software, that adds routes - * with RTPROT_BIRD protocol. */ + * with RTPROT_BIRD protocol. + * + * Even if this is a IPv6 multipath route, we abort (parse_nlmsg_iter). There + * is nothing for us to do. */ return; } @@ -7158,6 +7220,31 @@ event_valid_msg(NMPlatform *platform, struct nl_msg *msg, gboolean handle_events default: break; } + + if (!parse_nlmsg_iter.iter_more) { + /* we are done. */ + return; + } + + /* There is a special case here. For IPv6 routes, kernel will merge/mangle routes + * that only differ by their next-hop, and pretend they are multi-hop routes. We + * untangle them, and pretend there are only single-hop routes. Hence, one RTM_{NEW,DEL}ROUTE + * message might be about multiple IPv6 routes (NMPObject). So, now let's parse the next... */ + + nm_assert(NM_IN_SET(msghdr->nlmsg_type, RTM_NEWROUTE, RTM_DELROUTE)); + + nm_clear_pointer(&obj, nmp_object_unref); + + obj = nmp_object_new_from_nl(platform, cache, msg, is_del, &parse_nlmsg_iter); + if (!obj) { + /* we are done. Usually we don't expect this, because we were told that + * there would be another object to collect, but there isn't one. Something + * unusual happened. + * + * the only reason why this actually could happen is if the next-hop data + * is invalid -- we didn't verify that it would be valid when we set iter_more. */ + return; + } } } |