| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Limit size of various tmpfs mounts to 10% of RAM, except volatile root and /var
to 25%. Another exception is made for /dev (also /devs for PrivateDevices) and
/sys/fs/cgroup since no (or very few) regular files are expected to be used.
In addition, since directories, symbolic links, device specials and xattrs are
not counted towards the size= limit, number of inodes is also limited
correspondingly: 4MB size translates to 1k of inodes (assuming 4k each), 10% of
RAM (using 16GB of RAM as baseline) translates to 400k and 25% to 1M inodes.
Because nr_inodes option can't use ratios like size option, there's an
unfortunate side effect that with small memory systems the limit may be on the
too large side. Also, on an extremely small device with only 256MB of RAM, 10%
of RAM for /run may not be enough for re-exec of PID1 because 16MB of free
space is required.
|
|\
| |
| | |
various CMSG_xyz clean-ups, split out of #15571
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We always need to make them unions with a "struct cmsghdr" in them, so
that things properly aligned. Otherwise we might end up at an unaligned
address and the counting goes all wrong, possibly making the kernel
refuse our buffers.
Also, let's make sure we initialize the control buffers to zero when
sending, but leave them uninitialized when reading.
Both the alignment and the initialization thing is mentioned in the
cmsg(3) man page.
|
|\ \
| | |
| | | |
*: switch to BusLocator-oriented helpers
|
| |/
| |
| |
| | |
Mechanical substitution reducing some verbosity
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we're using a set with _put_strdup(), most of the time we want to use
string hash ops on the set, and free the strings when done. This defines
the appropriate a new string_hash_ops_free structure to automatically free
the keys when removing the set, and makes set_put_strdup() and set_put_strdupv()
instantiate the set with those hash ops.
hashmap_put_strdup() was already doing something similar.
(It is OK to instantiate the set earlier, possibly with a different hash ops
structure. set_put_strdup() will then use the existing set. It is also OK
to call set_free_free() instead of set_free() on a set with
string_hash_ops_free, the effect is the same, we're just overriding the
override of the cleanup function.)
No functional change intended.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Consider such configuration:
$ systemd-nspawn --read-only --timezone=copy --resolv-conf=copy-host \
--overlay="+/etc::/etc" <...>
Assuming one wants `/` to be read-only, DNS and `/etc/localtime` to
work. One way to do it is to create an overlay filesystem in `/etc/`.
However, systemd-nspawn tries to create `/etc/resolv.conf` and
`/etc/localtime` before mounting the custom paths, while `/` (and, by
extension, `/etc`) is read-only. Thus it fails to create those files.
Mounting custom paths before modifying anything in `/etc/` makes this
possible.
Full example:
```
$ debootstrap buster /var/lib/machines/t1 http://deb.debian.org/debian
$ systemd-nspawn --private-users=false --timezone=copy --resolv-conf=copy-host --read-only --tmpfs=/var --tmpfs=/run --overlay="+/etc::/etc" -D /var/lib/machines/t1 ping -c 1 example.com
Spawning container t1 on /var/lib/machines/t1.
Press ^] three times within 1s to kill container.
ping: example.com: Temporary failure in name resolution
Container t1 failed with error code 130.
```
With the patch:
```
$ sudo ./build/systemd-nspawn --private-users=false --timezone=copy --resolv-conf=copy-host --read-only --tmpfs=/var --tmpfs=/run --overlay="+/etc::/etc" -D /var/lib/machines/t1 ping -qc 1 example.com
Spawning container t1 on /var/lib/machines/t1.
Press ^] three times within 1s to kill container.
PING example.com (93.184.216.34) 56(84) bytes of data.
--- example.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 110.912/110.912/110.912/0.000 ms
Container t1 exited successfully.
```
|
|
|
|
|
|
| |
We should never re-chown selinuxfs.
Fixes: #15475
|
|
|
|
|
|
|
| |
around cmsg_find()
let's take this once step further, and add type-safety to cmsg_find(),
and imply the CMSG_DATA() macro for finding the cmsg payload.
|
|\
| |
| | |
just the recvmsg_safe() stuff from #15457
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Let's be extra careful whenever we return from recvmsg() and see
MSG_CTRUNC set. This generally means we ran into a programming error, as
we didn't size the control buffer large enough. It's an error condition
we should at least log about, or propagate up. Hence do that.
This is particularly important when receiving fds, since for those the
control data can be of any size. In particular on stream sockets that's
nasty, because if we miss an fd because of control data truncation we
cannot recover, we might not even realize that we are one off.
(Also, when failing early, if there's any chance the socket might be
AF_UNIX let's close all received fds, all the time. We got this right
most of the time, but there were a few cases missing. God, UNIX is hard
to use)
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
with --image=
Strictly speaking this doesn't really fix #15079, but it at least means
we won't hang anymore.
Fixes: #15079
|
|/ |
|
|\
| |
| | |
beef up --resolv-conf= options of systemd-nspawn
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Let's add flavours for copying stub/uplink resolv.conf versions.
Let's add a more brutal "replace" mode, where we'll replace any existing
destination file.
Let's also change what "auto" means: instead of copying the static file,
let's use the stub file, so that DNS search info is copied over.
Fixes: #15340
|
|/ |
|
|
|
|
|
|
| |
Based on a report from Fossies.org using Codespell.
Followup to #15436
|
|
|
|
| |
Fixes #15436.
|
|
|
|
|
|
|
| |
Some fdopendir() calls remain where safe_close() is manually
performed, those could be simplified as well by converting to
use the _cleanup_close_ machinery, but makes things less trivial
to review so left for a future cleanup.
|
|
|
|
|
| |
Mechanical change to eliminate some cruft by using the
new take_fdopen{_unlocked}() wrappers where trivial.
|
| |
|
|
|
|
| |
"set up" and "look up" are the verbs, "setup" and "lookup" are the nouns.
|
|
|
|
| |
Fixes CID#1415122.
|
|
|
|
|
| |
Also, start logging about mount errors, things are hard to debug
otherwise.
|
|\
| |
| | |
introduce GPT partition types for /var and /var/tmp and support them for auto-discovery
|
| |
| |
| |
| |
| |
| |
| |
| | |
This was previously available here:
https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/
Let's pull it into our repository.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This has been requested many times before. Let's add it finally.
GPT auto-discovery for /var is a bit more complex than for other
partition types: the other partitions can to some degree be shared
between multiple OS installations on the same disk (think: swap, /home,
/srv). However, /var is inherently something bound to an installation,
i.e. specific to its identity, or actually *is* its identity, and hence
something that cannot be shared.
To deal with this this new code is particularly careful when it comes to
/var: it will not mount things blindly, but insist that the UUID of the
partition matches a hashed version of the machine-id of the
installation, so that each installation has a very specific /var
associated with it, and would never use any other. (We actually use
HMAC-SHA256 on the GPT partition type for /var, keyed by the machine-id,
since machine-id is something we want to keep somewhat private).
Setting the right UUID for installations takes extra care. To make
things a bit simpler to set up, we avoid this safety check for nspawn
and RootImage= in unit files, under the assumption that such container
and service images unlikely will have multiple installations on them.
The check is hence only required when booting full machines, i.e. in
in systemd-gpt-auto-generator.
To help with putting together images for full machines, PR #14368
introduces a repartition tool that can automatically fill in correctly
calculated UUIDs on first boot if images have the var partition UUID
initialized to all zeroes. With that in place systems can be put
together in a way that on first boot the machine ID is determined and
the partition table automatically adjusted to have the /var partition
with the right UUID.
|
|\ \
| | |
| | | |
Resolve alternative names
|
| | |
| | |
| | |
| | | |
To keep the names manageable, "ifname_or_ifindex" is replaced by "interface".
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We don't need a seperate output parameter that is of type int. glibc() says
that the type is "unsigned", but the kernel thinks it's "int". And the
"alternative names" interface also uses ints. So let's standarize on ints,
since it's clearly not realisitic to have interface numbers in the upper half
of unsigned int range.
|
|/ / |
|
| | |
|
| | |
|
| | |
|
|\ \
| | |
| | | |
nspawn: move virtual interfaces added with --network-interface back to the host
|
| | | |
|
| |/ |
|
|/
|
|
|
|
|
|
| |
This commit lowers the chance of having veth name conflicts for machines
created with similar names.
Replaces: #12865
Fixes: #13417
|
|
|
|
|
|
| |
To support ProtectHome=y in a user namespace (which mounts the inaccessible
nodes), the nodes need to be accessible by the user. Create these paths and
devices in the user runtime directory so they can be used later if needed.
|
|\
| |
| | |
json bits from homed PR
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
This will call json_variant_sensitive() internally while parsing for
each allocated sub-variant. This is better than calling it a posteriori
at the end, because partially parsed variants will always be properly
erased from memory this way.
|
| |
| |
| |
| |
| |
| |
| | |
This makes --overlay=+/foobar::/foobar work again, i.e. where the middle
parameter is left out. According to the documentation this is supposed
to generate a temporary writable work place in the midle. But it
apparently never did. Weird.
|
|\ \
| | |
| | | |
nspawn: Enable specifying root as the mount target directory.
|
| | | |
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | | |
Fixes #3847.
|
| | |
| | |
| | |
| | | |
Fixes: #14289
|
| | | |
|