summaryrefslogtreecommitdiff
path: root/src/nspawn
Commit message (Collapse)AuthorAgeFilesLines
* tree-wide: add size limits for tmpfs mountsTopi Miettinen2020-05-132-14/+14
| | | | | | | | | | | | | | | | | Limit size of various tmpfs mounts to 10% of RAM, except volatile root and /var to 25%. Another exception is made for /dev (also /devs for PrivateDevices) and /sys/fs/cgroup since no (or very few) regular files are expected to be used. In addition, since directories, symbolic links, device specials and xattrs are not counted towards the size= limit, number of inodes is also limited correspondingly: 4MB size translates to 1k of inodes (assuming 4k each), 10% of RAM (using 16GB of RAM as baseline) translates to 400k and 25% to 1M inodes. Because nr_inodes option can't use ratios like size option, there's an unfortunate side effect that with small memory systems the limit may be on the too large side. Also, on an extremely small device with only 256MB of RAM, 10% of RAM for /run may not be enough for re-exec of PID1 because 16MB of free space is required.
* Merge pull request #15623 from poettering/cmsg-cleanupZbigniew Jędrzejewski-Szmek2020-05-081-5/+2
|\ | | | | various CMSG_xyz clean-ups, split out of #15571
| * tree-wide: make sure our control buffers are properly alignedLennart Poettering2020-05-071-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | We always need to make them unions with a "struct cmsghdr" in them, so that things properly aligned. Otherwise we might end up at an unaligned address and the counting goes all wrong, possibly making the kernel refuse our buffers. Also, let's make sure we initialize the control buffers to zero when sending, but leave them uninitialized when reading. Both the alignment and the initialization thing is mentioned in the cmsg(3) man page.
* | Merge pull request #15681 from vcaputo/buslocatorVito Caputo2020-05-071-52/+9
|\ \ | | | | | | *: switch to BusLocator-oriented helpers
| * | nspawn: switch to BusLocator-oriented helpersVito Caputo2020-05-071-52/+9
| |/ | | | | | | Mechanical substitution reducing some verbosity
* | basic/set: let set_put_strdup() create the set with string hash opsZbigniew Jędrzejewski-Szmek2020-05-061-7/+3
|/ | | | | | | | | | | | | | | | | | If we're using a set with _put_strdup(), most of the time we want to use string hash ops on the set, and free the strings when done. This defines the appropriate a new string_hash_ops_free structure to automatically free the keys when removing the set, and makes set_put_strdup() and set_put_strdupv() instantiate the set with those hash ops. hashmap_put_strdup() was already doing something similar. (It is OK to instantiate the set earlier, possibly with a different hash ops structure. set_put_strdup() will then use the existing set. It is also OK to call set_free_free() instead of set_free() on a set with string_hash_ops_free, the effect is the same, we're just overriding the override of the cleanup function.) No functional change intended.
* nspawn: mount custom paths before writing to /etcMotiejus Jakštys2020-05-051-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider such configuration: $ systemd-nspawn --read-only --timezone=copy --resolv-conf=copy-host \ --overlay="+/etc::/etc" <...> Assuming one wants `/` to be read-only, DNS and `/etc/localtime` to work. One way to do it is to create an overlay filesystem in `/etc/`. However, systemd-nspawn tries to create `/etc/resolv.conf` and `/etc/localtime` before mounting the custom paths, while `/` (and, by extension, `/etc`) is read-only. Thus it fails to create those files. Mounting custom paths before modifying anything in `/etc/` makes this possible. Full example: ``` $ debootstrap buster /var/lib/machines/t1 http://deb.debian.org/debian $ systemd-nspawn --private-users=false --timezone=copy --resolv-conf=copy-host --read-only --tmpfs=/var --tmpfs=/run --overlay="+/etc::/etc" -D /var/lib/machines/t1 ping -c 1 example.com Spawning container t1 on /var/lib/machines/t1. Press ^] three times within 1s to kill container. ping: example.com: Temporary failure in name resolution Container t1 failed with error code 130. ``` With the patch: ``` $ sudo ./build/systemd-nspawn --private-users=false --timezone=copy --resolv-conf=copy-host --read-only --tmpfs=/var --tmpfs=/run --overlay="+/etc::/etc" -D /var/lib/machines/t1 ping -qc 1 example.com Spawning container t1 on /var/lib/machines/t1. Press ^] three times within 1s to kill container. PING example.com (93.184.216.34) 56(84) bytes of data. --- example.org ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 110.912/110.912/110.912/0.000 ms Container t1 exited successfully. ```
* nspawn: be more careful with creating/chowning directories to overmountLennart Poettering2020-04-282-19/+22
| | | | | | We should never re-chown selinuxfs. Fixes: #15475
* socket-util: introduce type-safe, dereferencing wrapper CMSG_FIND_DATA ↵Lennart Poettering2020-04-231-11/+2
| | | | | | | around cmsg_find() let's take this once step further, and add type-safety to cmsg_find(), and imply the CMSG_DATA() macro for finding the cmsg payload.
* Merge pull request #15504 from poettering/cmsg-find-pureLennart Poettering2020-04-231-6/+5
|\ | | | | just the recvmsg_safe() stuff from #15457
| * tree-wide: use recvmsg_safe() at various placesLennart Poettering2020-04-231-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's be extra careful whenever we return from recvmsg() and see MSG_CTRUNC set. This generally means we ran into a programming error, as we didn't size the control buffer large enough. It's an error condition we should at least log about, or propagate up. Hence do that. This is particularly important when receiving fds, since for those the control data can be of any size. In particular on stream sockets that's nasty, because if we miss an fd because of control data truncation we cannot recover, we might not even realize that we are one off. (Also, when failing early, if there's any chance the socket might be AF_UNIX let's close all received fds, all the time. We got this right most of the time, but there were a few cases missing. God, UNIX is hard to use)
* | nspawn: refuse politely when we are run in the non-host netns in combination ↵Lennart Poettering2020-04-231-0/+56
| | | | | | | | | | | | | | | | | | with --image= Strictly speaking this doesn't really fix #15079, but it at least means we won't hang anymore. Fixes: #15079
* | nspawn: minor simplificationLennart Poettering2020-04-231-6/+3
|/
* Merge pull request #15516 from poettering/nspawn-resolv-confZbigniew Jędrzejewski-Szmek2020-04-233-16/+37
|\ | | | | beef up --resolv-conf= options of systemd-nspawn
| * nspawn: beef up --resolve-conf= modesLennart Poettering2020-04-223-16/+37
| | | | | | | | | | | | | | | | | | | | | | | | Let's add flavours for copying stub/uplink resolv.conf versions. Let's add a more brutal "replace" mode, where we'll replace any existing destination file. Let's also change what "auto" means: instead of copying the static file, let's use the stub file, so that DNS search info is copied over. Fixes: #15340
* | nspawn: some minor modernizationsLennart Poettering2020-04-231-7/+11
|/
* tree-wide: fix spelling errorsFrantisek Sumsal2020-04-211-1/+1
| | | | | | Based on a report from Fossies.org using Codespell. Followup to #15436
* tree-wide: spellcheck using codespellZbigniew Jędrzejewski-Szmek2020-04-161-1/+1
| | | | Fixes #15436.
* *: convert amenable fdopendir() calls to take_fdopendir()Vito Caputo2020-03-311-2/+2
| | | | | | | Some fdopendir() calls remain where safe_close() is manually performed, those could be simplified as well by converting to use the _cleanup_close_ machinery, but makes things less trivial to review so left for a future cleanup.
* *: convert amenable fdopen calls to take_fdopenVito Caputo2020-03-311-4/+2
| | | | | Mechanical change to eliminate some cruft by using the new take_fdopen{_unlocked}() wrappers where trivial.
* pid1, nspawn: voidify loopback_setup()Yu Watanabe2020-03-041-1/+1
|
* tree-wide: fix spelling of lookup and setup verbsZbigniew Jędrzejewski-Szmek2020-03-031-1/+1
| | | | "set up" and "look up" are the verbs, "setup" and "lookup" are the nouns.
* nspawn: voidify umount_verbose()Yu Watanabe2020-01-311-1/+1
| | | | Fixes CID#1415122.
* nspawn: fsck all images when mounting thingsLennart Poettering2020-01-291-4/+8
| | | | | Also, start logging about mount errors, things are hard to debug otherwise.
* Merge pull request #14390 from poettering/gpt-var-tmpZbigniew Jędrzejewski-Szmek2020-01-141-2/+2
|\ | | | | introduce GPT partition types for /var and /var/tmp and support them for auto-discovery
| * docs: import discoverable partitions specLennart Poettering2019-12-231-1/+1
| | | | | | | | | | | | | | | | This was previously available here: https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/ Let's pull it into our repository.
| * dissect: introduce new recognizable partition types for /var and /var/tmpLennart Poettering2019-12-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This has been requested many times before. Let's add it finally. GPT auto-discovery for /var is a bit more complex than for other partition types: the other partitions can to some degree be shared between multiple OS installations on the same disk (think: swap, /home, /srv). However, /var is inherently something bound to an installation, i.e. specific to its identity, or actually *is* its identity, and hence something that cannot be shared. To deal with this this new code is particularly careful when it comes to /var: it will not mount things blindly, but insist that the UUID of the partition matches a hashed version of the machine-id of the installation, so that each installation has a very specific /var associated with it, and would never use any other. (We actually use HMAC-SHA256 on the GPT partition type for /var, keyed by the machine-id, since machine-id is something we want to keep somewhat private). Setting the right UUID for installations takes extra care. To make things a bit simpler to set up, we avoid this safety check for nspawn and RootImage= in unit files, under the assumption that such container and service images unlikely will have multiple installations on them. The check is hence only required when booting full machines, i.e. in in systemd-gpt-auto-generator. To help with putting together images for full machines, PR #14368 introduces a repartition tool that can automatically fill in correctly calculated UUIDs on first boot if images have the var partition UUID initialized to all zeroes. With that in place systems can be put together in a way that on first boot the machine ID is determined and the partition table automatically adjusted to have the /var partition with the right UUID.
* | Merge pull request #14381 from keszybz/ifindex-cleanupLennart Poettering2020-01-131-18/+10
|\ \ | | | | | | Resolve alternative names
| * | Resolve alternative ifnames wherever we would resolve an interface nameZbigniew Jędrzejewski-Szmek2020-01-121-16/+8
| | | | | | | | | | | | To keep the names manageable, "ifname_or_ifindex" is replaced by "interface".
| * | tree-wide: make parse_ifindex simply return the indexZbigniew Jędrzejewski-Szmek2020-01-111-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | We don't need a seperate output parameter that is of type int. glibc() says that the type is "unsigned", but the kernel thinks it's "int". And the "alternative names" interface also uses ints. So let's standarize on ints, since it's clearly not realisitic to have interface numbers in the upper half of unsigned int range.
* | | nspawn: Correct "container" to "host" MAC setting messagerhn2020-01-111-1/+1
|/ /
* | nspawn: set original ifname as alternative if it is truncatedYu Watanabe2020-01-071-10/+55
| |
* | nspawn: Make a custom mount on root imply --read-only.Daan De Meyer2020-01-031-0/+3
| |
* | nspawn: Don't mount read-only if we have a custom mount on root.Daan De Meyer2020-01-033-1/+16
| |
* | Merge pull request #14401 from DaanDeMeyer/nspawn-move-veth-back-to-hostLennart Poettering2020-01-033-30/+92
|\ \ | | | | | | nspawn: move virtual interfaces added with --network-interface back to the host
| * | nspawn: Move --network-interface interfaces back to the host.Daan De Meyer2020-01-023-10/+48
| | |
| * | nspawn-network: Split off udev checking from parse_interface.Daan De Meyer2019-12-233-20/+44
| |/
* | nspawn: Generate unique short veth namesKai Krakow2020-01-021-10/+57
|/ | | | | | | | This commit lowers the chance of having veth name conflicts for machines created with similar names. Replaces: #12865 Fixes: #13417
* core: create inaccessible nodes for users when making runtime dirsAnita Zhang2019-12-182-4/+8
| | | | | | To support ProtectHome=y in a user namespace (which mounts the inaccessible nodes), the nodes need to be accessible by the user. Create these paths and devices in the user runtime directory so they can be used later if needed.
* Merge pull request #14208 from poettering/json-homed-prepareYu Watanabe2019-12-171-15/+4
|\ | | | | json bits from homed PR
| * nspawn-oci: use new json_variant_strv() helperLennart Poettering2019-12-021-14/+3
| |
| * json: add flags parameter to json_parse_file(), for parsing "sensitive" dataLennart Poettering2019-12-021-1/+1
| | | | | | | | | | | | | | This will call json_variant_sensitive() internally while parsing for each allocated sub-variant. This is better than calling it a posteriori at the end, because partially parsed variants will always be properly erased from memory this way.
* | nspawn: fix overlay with automatic temporary treeLennart Poettering2019-12-131-17/+41
| | | | | | | | | | | | | | This makes --overlay=+/foobar::/foobar work again, i.e. where the middle parameter is left out. According to the documentation this is supposed to generate a temporary writable work place in the midle. But it apparently never did. Weird.
* | Merge pull request #14269 from DaanDeMeyer/enable-mounts-on-rootLennart Poettering2019-12-133-61/+58
|\ \ | | | | | | nspawn: Enable specifying root as the mount target directory.
| * | nspawn-mount: Use FLAGS_SET to check flags.Daan De Meyer2019-12-121-14/+14
| | |
| * | nspawn: Only bind-mount directory when necessary.Daan De Meyer2019-12-121-7/+7
| | |
| * | nspawn-mount: Remove unused parametersDaan De Meyer2019-12-123-33/+12
| | |
| * | nspawn: Enable specifying root as the mount target directory.Daan De Meyer2019-12-123-13/+31
| | | | | | | | | | | | Fixes #3847.
* | | nspawn: allow combination of private-network and network-namespace-pathShengjing Zhu2019-12-121-3/+3
| | | | | | | | | | | | Fixes: #14289
* | | tree-wide: use SD_ID128_STRING_MAX where appropriateLennart Poettering2019-12-101-1/+1
| | |