summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorThomas Haller <thaller@redhat.com>2021-09-01 10:31:55 +0200
committerThomas Haller <thaller@redhat.com>2021-09-16 17:30:25 +0200
commitfe80b2d1ecd94639573a944633a5d960db316f60 (patch)
treef2f799c1b90d346a2514826f2d7066501ead0b06
parent0978be5e43f142ec5c6062dcfe1c2f4aa834464b (diff)
downloadNetworkManager-th/cloud-setup-fix-containers.tar.gz
cloud-setup: use suppress_prefixlength rule to honor non-default-routes in the main tableth/cloud-setup-fix-containers
Background ========== Imagine you run a container on your machine. Then the routing table might look like: default via 10.0.10.1 dev eth0 proto dhcp metric 100 10.0.10.0/28 dev eth0 proto kernel scope link src 10.0.10.5 metric 100 [...] 10.42.0.0/24 via 10.42.0.0 dev flannel.1 onlink 10.42.1.2 dev cali02ad7e68ce1 scope link 10.42.1.3 dev cali8fcecf5aaff scope link 10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink 10.42.3.0/24 via 10.42.3.0 dev flannel.1 onlink That is, there are another interfaces with subnets and specific routes. If nm-cloud-setup now configures rules: 0: from all lookup local 30400: from 10.0.10.5 lookup 30400 32766: from all lookup main 32767: from all lookup default and default via 10.0.10.1 dev eth0 table 30400 proto static metric 10 10.0.10.1 dev eth0 table 30400 proto static scope link metric 10 then these other subnets will also be reached via the default route. This container example is just one case where this is a problem. In general, if you have specific routes on another interface, then the default route in the 30400+ table will interfere badly. The idea of nm-cloud-setup is to automatically configure the network for secondary IP addresses. When the user has special requirements, then they should disable nm-cloud-setup and configure whatever they want. But the container use case is popular and important. It is not something where the user actively configures the network. This case needs to work better, out of the box. In general, nm-cloud-setup should work better with the existing network configuration. Change ====== Add new routing tables 30200+ with the individual subnets of the interface: 10.0.10.0/24 dev eth0 table 30200 proto static metric 10 [...] default via 10.0.10.1 dev eth0 table 30400 proto static metric 10 10.0.10.1 dev eth0 table 30400 proto static scope link metric 10 Also add more important routing rules with priority 30200+, which select these tables based on the source address: 30200: from 10.0.10.5 lookup 30200 These will do source based routing for the subnets on these interfaces. Then, add a rule with priority 30350 30350: lookup main suppress_prefixlength 0 which processes the routes from the main table, but ignores the default routes. 30350 was chosen, because it's in between the rules 30200+ and 30400+, leaving a range for the user to configure their own rules. Then, as before, the rules 30400+ again look at the corresponding 30400+ table, to find a default route. Finally, process the main table again, this time honoring the default route. That is for packets that have a different source address. This change means that the source based routing is used for the subnets that are configured on the interface and for the default route. Whereas, if there are any more specific routes in the main table, they will be preferred over the default route. Apparently Amazon Linux solves this differently, by not configuring a routing table for addresses on interface "eth0". That might be an alternative, but it's not clear to me what is special about eth0 to warrant this treatment. It also would imply that we somehow recognize this primary interface. In practise that would be doable by selecting the interface with "iface_idx" zero. Instead choose this approach. This is remotely similar to what WireGuard does for configuring the default route ([1]), however WireGuard uses fwmark to match the packets instead of the source address. [1] https://www.wireguard.com/netns/#improved-rule-based-routing
-rw-r--r--man/nm-cloud-setup.xml66
-rw-r--r--src/nm-cloud-setup/main.c36
2 files changed, 62 insertions, 40 deletions
diff --git a/man/nm-cloud-setup.xml b/man/nm-cloud-setup.xml
index 7493cc1d7f..976fc64724 100644
--- a/man/nm-cloud-setup.xml
+++ b/man/nm-cloud-setup.xml
@@ -256,7 +256,9 @@ ln -s /etc/systemd/system/timers.target.wants/nm-cloud-setup.timer /usr/lib/syst
Also, if the device is currently not activated in NetworkManager or if the currently
activated profile has a user-data <literal>org.freedesktop.nm-cloud-setup.skip=yes</literal>,
it is skipped.</para>
- <para>Then, the tool will change the runtime configuration of the device.
+ <para>If only one interface and one address is configured, then the tool does nothing
+ and leaves the automatic configuration that was obtained via DHCP.</para>
+ <para>Otherwise, the tool will change the runtime configuration of the device.
<itemizedlist>
<listitem>
<para>Add static IPv4 addresses for all the configured addresses from <literal>local-ipv4s</literal> with
@@ -267,15 +269,25 @@ ln -s /etc/systemd/system/timers.target.wants/nm-cloud-setup.timer /usr/lib/syst
<para>Choose a route table 30400 + the index of the interface and
add a default route <literal>0.0.0.0/0</literal>. The gateway
is the first IP address in the CIDR subnet block. For
- example, we might get a route <literal>"0.0.0.0/0 172.16.5.1 10 table=30401"</literal>.</para>
+ example, we might get a route <literal>"0.0.0.0/0 172.16.5.1 10 table=30400"</literal>.</para>
+ <para>Also choose a route table 30200 + the interface index. This
+ contains a direct routes to the subnets of this interface.</para>
</listitem>
<listitem>
<para>Finally, add a policy routing rule for each address. For example
- <literal>"priority 30401 from 172.16.5.3/32 table 30401, priority 30401 from 172.16.5.4/32 table 30401"</literal>.</para>
+ <literal>"priority 30200 from 172.16.5.3/32 table 30200, priority 30200 from 172.16.5.4/32 table 30200"</literal>.
+ and
+ <literal>"priority 30400 from 172.16.5.3/32 table 30400, priority 30400 from 172.16.5.4/32 table 30400"</literal>
+ The 30200+ rules select the table to reach the subnet directly, while the 30400+ rules use the
+ default route. Also add a rule
+ <literal>"priority 30350 table main suppress_prefixlength 0"</literal>. This has a priority between
+ the two previous rules and causes a lookup of routes in the main table while ignoring the default
+ route. The purpose of this is so that other specific routes in the main table are honored over
+ the default route in table 30400+.</para>
</listitem>
</itemizedlist>
With above example, this roughly corresponds for interface <literal>eth0</literal> to
- <command>nmcli device modify "eth0" ipv4.addresses "172.16.5.3/24,172.16.5.4/24" ipv4.routes "0.0.0.0/0 172.16.5.1 10 table=30401" ipv4.routing-rules "priority 30401 from 172.16.5.3/32 table 30401, priority 30401 from 172.16.5.4/32 table 30401"</command>.
+ <command>nmcli device modify "eth0" ipv4.addresses "172.16.5.3/24,172.16.5.4/24" ipv4.routes "172.16.5.0/24 0.0.0.0 10 table=30200, 0.0.0.0/0 172.16.5.1 10 table=30400" ipv4.routing-rules "priority 30200 from 172.16.5.3/32 table 30200, priority 30200 from 172.16.5.4/32 table 30200, priority 20350 table main suppress_prefixlength 0, priority 30400 from 172.16.5.3/32 table 30400, priority 30400 from 172.16.5.4/32 table 30400"</command>.
Note that this replaces the previous addresses, routes and rules with the new information.
But also note that this only changes the run time configuration of the device. The
connection profile on disk is not affected.
@@ -360,14 +372,8 @@ ln -s /etc/systemd/system/timers.target.wants/nm-cloud-setup.timer /usr/lib/syst
</listitem>
<listitem>
<para>At this point, we have a list of all interfaces (by MAC address) and their configured IPv4 addresses.</para>
- <para>For each device, we lookup the currently applied connection in NetworkManager. That implies, that the device is currently activated
- in NetworkManager. If no such device was in NetworkManager, or if the profile has user-data <literal>org.freedesktop.nm-cloud-setup.skip=yes</literal>,
- we skip the device. Now for each found IP address we add a static address "$ADDR/$SUBNET_PREFIX". Also we configure policy routing
- by adding a static route "$ADDR/$SUBNET_PREFIX $GATEWAY 10, table=$TABLE" where $GATEWAY is the first IP address in the subnet and table
- is 30400 plus the interface index. Also we add a policy routing rule "priority $TABLE from $ADDR/32 table $TABLE".</para>
- <para>The effect is not unlike calling
- <command>nmcli device modify "$DEVICE" ipv4.addresses "$ADDR/$SUBNET [,...]" ipv4.routes "$ADDR/32 $GATEWAY 10 table=$TABLE" ipv4.routing-rules "priority $TABLE from $ADDR/32 table $TABLE"</command>
- for all relevant devices and all found addresses.</para>
+ <para>Then the tool configures the system like doing for AWS environment. That is, using source based policy routing
+ with the tables/rules 30200/30400.</para>
</listitem>
</itemizedlist>
</refsect2>
@@ -389,9 +395,10 @@ ln -s /etc/systemd/system/timers.target.wants/nm-cloud-setup.timer /usr/lib/syst
of available interface. Interfaces are identified by their MAC address.</para>
</listitem>
<listitem>
- <para>Then for each interface fetch <literal>http://100.100.100.200/2016-01-01/meta-data/network/interfaces/macs/$MAC/vpc-cidr-block</literal>
- , <literal>http://100.100.100.200/2016-01-01/meta-data/network/interfaces/macs/$MAC/private-ipv4s</literal> and
- <literal>http://100.100.100.200/2016-01-01/meta-data/network/interfaces/macs/$MAC/netmask</literal>.
+ <para>Then for each interface fetch <literal>http://100.100.100.200/2016-01-01/meta-data/network/interfaces/macs/$MAC/vpc-cidr-block</literal>,
+ <literal>http://100.100.100.200/2016-01-01/meta-data/network/interfaces/macs/$MAC/private-ipv4s</literal>,
+ <literal>http://100.100.100.200/2016-01-01/meta-data/network/interfaces/macs/$MAC/netmask</literal> and
+ <literal>http://100.100.100.200/2016-01-01/meta-data/network/interfaces/macs/$MAC/gateway</literal>.
Thereby we get a list of private IPv4 addresses, one CIDR subnet block and private IPv4 addresses prefix.</para>
</listitem>
<listitem>
@@ -399,31 +406,10 @@ ln -s /etc/systemd/system/timers.target.wants/nm-cloud-setup.timer /usr/lib/syst
If no ethernet device for the respective MAC address is found, it is skipped.
Also, if the device is currently not activated in NetworkManager or if the currently
activated profile has a user-data <literal>org.freedesktop.nm-cloud-setup.skip=yes</literal>,
- it is skipped.</para>
- <para>Then, the tool will change the runtime configuration of the device.
- <itemizedlist>
- <listitem>
- <para>Add static IPv4 addresses for all the configured addresses from <literal>private-ipv4s</literal> with
- prefix length according to <literal>netmask</literal>. For example,
- we might have here 2 IP addresses like <literal>"10.0.0.150/24,10.0.0.152/24"</literal>.</para>
- </listitem>
- <listitem>
- <para>Choose a route table 30400 + the index of the interface and
- add a default route <literal>0.0.0.0/0</literal>. The gateway
- is the default gateway retrieved from metadata server. For
- example, we might get a route <literal>"0.0.0.0/0 10.0.0.253 10 table=30400"</literal>.</para>
- </listitem>
- <listitem>
- <para>Finally, add a policy routing rule for each address. For example
- <literal>"priority 30400 from 10.0.0.150/32 table 30400, priority 30400 from 10.0.0.152/32 table 30400"</literal>.</para>
- </listitem>
- </itemizedlist>
- With above example, this roughly corresponds for interface <literal>eth0</literal> to
- <command>nmcli device modify "eth0" ipv4.addresses "10.0.0.150/24,10.0.0.152/24" ipv4.routes "0.0.0.0/0 10.0.0.253 10 table=30400" ipv4.routing-rules "priority 30400 from 10.0.0.150/32 table 30400, priority 30400 from 10.0.0.152/32 table 30400"</command>.
- Note that this replaces the previous addresses, routes and rules with the new information.
- But also note that this only changes the run time configuration of the device. The
- connection profile on disk is not affected.
- </para>
+ it is skipped. Also, there is only one interface and one IP address, the tool does nothing.</para>
+ <para>Then the tool configures the system like doing for AWS environment. That is, using source based policy routing
+ with the tables/rules 30200/30400. One difference to AWS is that the gateway is also fetched via metadata instead
+ of using the first IP address in the subnet.</para>
</listitem>
</itemizedlist>
</refsect2>
diff --git a/src/nm-cloud-setup/main.c b/src/nm-cloud-setup/main.c
index 260d111205..916f41da91 100644
--- a/src/nm-cloud-setup/main.c
+++ b/src/nm-cloud-setup/main.c
@@ -4,6 +4,8 @@
#include "libnm-client-aux-extern/nm-libnm-aux.h"
+#include <linux/rtnetlink.h>
+
#include "nm-cloud-setup-utils.h"
#include "nmcs-provider-ec2.h"
#include "nmcs-provider-gcp.h"
@@ -335,6 +337,8 @@ _nmc_mangle_connection(NMDevice * device,
* We don't need to configure policy routing in this case. */
NM_SET_OUT(out_skipped_single_addr, TRUE);
} else if (config_data->has_ipv4s && config_data->has_cidr) {
+ gs_unref_hashtable GHashTable *unique_subnets =
+ g_hash_table_new(nm_direct_hash, g_direct_equal);
NMIPAddress * addr_entry;
NMIPRoute * route_entry;
NMIPRoutingRule *rule_entry;
@@ -359,6 +363,38 @@ _nmc_mangle_connection(NMDevice * device,
((guint8 *) &gateway)[3] += 1;
}
+ for (i = 0; i < config_data->ipv4s_len; i++) {
+ in_addr_t a = config_data->ipv4s_arr[i];
+
+ a = nm_utils_ip4_address_clear_host_address(a, config_data->cidr_prefix);
+
+ G_STATIC_ASSERT_EXPR(sizeof(gsize) >= sizeof(in_addr_t));
+ if (g_hash_table_add(unique_subnets, GSIZE_TO_POINTER(a))) {
+ route_entry =
+ nm_ip_route_new_binary(AF_INET, &a, config_data->cidr_prefix, NULL, 10, NULL);
+ nm_ip_route_set_attribute(route_entry,
+ NM_IP_ROUTE_ATTRIBUTE_TABLE,
+ g_variant_new_uint32(30200 + config_data->iface_idx));
+ g_ptr_array_add(routes_new, route_entry);
+ }
+
+ rule_entry = nm_ip_routing_rule_new(AF_INET);
+ nm_ip_routing_rule_set_priority(rule_entry, 30200 + config_data->iface_idx);
+ nm_ip_routing_rule_set_from(rule_entry,
+ _nm_utils_inet4_ntop(config_data->ipv4s_arr[i], sbuf),
+ 32);
+ nm_ip_routing_rule_set_table(rule_entry, 30200 + config_data->iface_idx);
+ nm_assert(nm_ip_routing_rule_validate(rule_entry, NULL));
+ g_ptr_array_add(rules_new, rule_entry);
+ }
+
+ rule_entry = nm_ip_routing_rule_new(AF_INET);
+ nm_ip_routing_rule_set_priority(rule_entry, 30350);
+ nm_ip_routing_rule_set_table(rule_entry, RT_TABLE_MAIN);
+ nm_ip_routing_rule_set_suppress_prefixlength(rule_entry, 0);
+ nm_assert(nm_ip_routing_rule_validate(rule_entry, NULL));
+ g_ptr_array_add(rules_new, rule_entry);
+
route_entry = nm_ip_route_new_binary(AF_INET, &nm_ip_addr_zero, 0, &gateway, 10, NULL);
nm_ip_route_set_attribute(route_entry,
NM_IP_ROUTE_ATTRIBUTE_TABLE,