| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When closing a loop device, the kernel will asynchronously remove
the probed partitions. This can lead to race conditions where we
try to reuse a partition device that still needs to be removed by
the kernel. To avoid such issues, let's explicitly try to remove
any partitions using BLKPG_DEL_PARTITION when we're done with an
image.
To make sure we don't try to remove partitions when we want them
to remain (e.g. systemd-dissect --mount), we add
dissected_image_relinquish() in a similar vein to loop_device_relinquish()
and decrypted_image_relinquish().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up for f470cb6d13558fc06131dc677d54a089a0b07359 which in
turn is a follow-up for a068aceafbffcba85398cce636c25d659265087a.
The latter started to honour hidden files when deciding whether a
directory is empty. The former reverted to the old behaviour to fix
issue #23220.
It introduced a bug though: when a directory contains a larger number of
hidden entries the getdents64() buffer will not suffice to read them,
since we just allocate three entries for it (which is definitely enough
if we just ignore the . + .. entries, but not ig we ignore more).
I think it's a bit confusing that dir_is_empty() can return true even if
rmdir() on the dir would return ENOTEMPTY. Hence, let's rework the
function to make it optional whether hidden files are ignored or not.
After all, I looking at the users of this function I am pretty sure in
more cases we want to honour hidden files.
|
|
|
|
| |
And port some parts over.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No actual code changes, just splitting out of some dev_t handling
related calls from stat-util.[ch], they are quite a number already, and
deserve their own module now I think.
Also, try to settle on the name "devnum" as the name for the concept,
instead of "devno" or "dev" or "devid". "devnum" is the name exported in
udev APIs, hence probably best to stick to that. (this just renames a
few symbols to "devum", local variables are left untouched, to make the
patch not too invasive)
No actual code changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So here's something we should always keep in mind:
systemd-udevd actually does *two* things with BSD file locks on block
devices:
1. While it probes a device it takes a LOCK_SH lock. Thus everyone else
taking a LOCK_EX lock will temporarily block udev from probing
devices, which is good when making changes to it.
2. Whenever a device is closed after write (detected via inotify), udevd
will issue BLKRRPART (requesting the kernel to reread the partition
table). It does this while holding a LOCK_EX lock on the block
device. Thus anyone else taking LOCK_SH or LOCK_EX will temporarily
block udevd from issuing that ioctl. And that's quite relevant, since
the kernel will temporarily flush out all partitions while re-reading
the partition table and then create them anew. Thus it is smart to
take LOCK_SH when dissecting a block device to ensure that no
BLKRRPART is issued in the background, until we mounted the devices.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This revisits the mess around waiting for partition block devices in
the image dissection code. It implements a nice little trick:
Instead of waiting for the kernel to probe the partition table for us
and generate the block devices from it, we'll just do that ourselves.
How can we do it? Via the BLKPG_ADD_PARTITION ioctl, that the kernel has
supported for a while. This ioctl allows creating partition block
devices off "whole" block devices from userspace, without the partitions
necessarily being present in the partition table at all.
So, whenever we want a partition to be there, we'll just issue
BLKPG_ADD_PARTITION. This can either work, in which case we know the
partition is there, and can use it. Yay. Or it can fail with EBUSY,
which the kernel returns if a partition by the selected partition index
already exists (or if an existing partition overlaps with the new one).
But if that's the case, then that's also OK, because the partition will
already exist.
So, regardless if we win or the kernel wins, for us the outcome is the
same: the partition block device will exist after invoking the ioctl.
Yay.
Net effect: we are not dependent on asynchronous uevent messages to wait
for the devices. Instead we synchronously get what we need. This makes
us independent of the (apparently less than reliable) netlink transport,
and should almost always be quicker.
Hopefully addresses #17469 even on older kernels.
Fixes: #17469
|
|
|
|
|
|
|
|
|
|
|
|
| |
get_block_device_harder() returns == 0 if the fs is valid, but it is not
backed by a single devno. (As opposed to returning > 0 if the devno is
valid). Let's catch this case and log a clear message, and don't bother
open the device in that case.
This is mostly cosmetical, as either way, systemd-gpt-auto-generator
doesn't work in scenarios like that.
Prompted-by: #22504
|
|\
| |
| | |
Use new diskseq block device property
|
| |
| |
| |
| |
| |
| |
| |
| | |
DISKSEQ is a reliable way to find out if we missed a uevent or not, as
it's monotonically increasing. If we parse an event with a smaller or
no sequence number, we know we need to wait longer. If we parse an
event with a greater sequence number, we know we missed it and the
device was reused.
|
|/
|
|
|
|
|
|
|
|
|
|
| |
Previously volatile-root was only checked if "/" wasn't backed by a
block device, but the block device isn't necessarily original root block
device (ex: if the rootfs is copied to a ext4 fs backed by zram in the
initramfs), so we always want volatile-root checked.
So shuffle the code around so volatile-root is checked first and
fallback to the automatic logic.
Fix #20557
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the auto-discovered swap partition is LUKS encrypted, decrypt it
automatically.
This aligns with the Discoverable Partitions Specification, though I've
also updated it to explicitly mention that LUKS is now supported here.
Since systemd retries any key already in the kernel keyring, if the swap
partition has the same passphrase as the root partition, the user won't
be prompted a second time for a second passphrase.
See https://github.com/systemd/systemd/issues/20019
|
| |
|
|
|
|
| |
partition flag is set
|
|
|
|
|
|
|
|
|
| |
This tries to shorten the race of device reuse a bit more: let's ignore
udev database entries that are older than the time where we started to
use a loopback device.
This doesn't fix the whole loopback device raciness mess, but it makes
the race window a bit shorter.
|
|
|
|
|
|
|
|
|
|
|
| |
Let's drop all monitor uevent that were enqueued before we actually
started setting up the device.
This doesn't fix the race, but it makes the race window smaller: since
we cannot determine the uevent seqnum and the loopback attachment
atomically, there's a tiny window where uevents might be generated by
the device which we mistake for being associated with out use of the
loopback device.
|
|
|
|
| |
--Dlibcryptsetup=false
|
|
|
|
|
|
|
|
| |
Let's make use of the new dissection in all tools where this makes
sense, which are all tools that dissect images, except for those which
inherently operate on state/configuraiton and thus where an image
without state nor configuration is useless (e.g.
systemd-tmpfiles/systemd-firstboot/… --image= switch).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just some refactoring: let's place the various verity related parameters
in a common structure, and pass that around instead of the individual
parameters.
Also, let's load the PKCS#7 signature data when finding metadata
right-away, instead of delaying this until we need it. In all cases we
call this there's not much time difference between the metdata finding
and the loading, hence this simplifies things and makes sure root hash
data and its signature is now always acquired together.
|
|
|
|
|
| |
And add a comment for the existing cases where things aren't clear
already.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On systems that boot without initrd on a btrfs root file systems the
BTRFS_IOC_DEV_INFO ioctl returns /dev/root as backing device. That
sucks, since that is not a real device visible to userspace.
Since this has been that way since forever, and it doesn't look like the
kernel will get fixed soon for this, let's at least generate a useful
error message in this case.
This is not a bug fix, just a tweak to make this more recognizable.
Once the kernel gets fixed to report the correct device nodes in this
case, in a way userspace can make sense of them things will magically
work for systemd, too.
(Note that this doesn't add a log message about this to really all cases
we call get_device() in, but just the main ones that are called in early
boot context, after all all there's no benefit in seeing this message
too many times.)
https://github.com/systemd/systemd/issues/16953
https://bugs.freedesktop.org/show_bug.cgi?id=84689
https://bugzilla.kernel.org/show_bug.cgi?id=89721
|
|
|
|
|
|
|
|
|
|
| |
Allows to specify mount options for RootImage.
In case of multi-partition images, the partition number can be prefixed
followed by colon. Eg:
RootImageOptions=1:ro,dev 2:nosuid nodev
In absence of a partition number, 0 is assumed.
|
|
|
|
|
|
|
|
| |
dm-verity support in dissect-image at the moment is restricted to GPT
volumes.
If the image a single-filesystem type without a partition table (eg: squashfs)
and a roothash/verity file are passed, set the verity flag and mark as
read-only.
|
|
|
|
|
|
|
|
|
| |
cryptsetup
Let's hook it into both cryptsetup-generator and gpt-auto-generator with
a shared implementation in generator.c
Fixes: #8472
|
|\
| |
| | |
introduce GPT partition types for /var and /var/tmp and support them for auto-discovery
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This has been requested many times before. Let's add it finally.
GPT auto-discovery for /var is a bit more complex than for other
partition types: the other partitions can to some degree be shared
between multiple OS installations on the same disk (think: swap, /home,
/srv). However, /var is inherently something bound to an installation,
i.e. specific to its identity, or actually *is* its identity, and hence
something that cannot be shared.
To deal with this this new code is particularly careful when it comes to
/var: it will not mount things blindly, but insist that the UUID of the
partition matches a hashed version of the machine-id of the
installation, so that each installation has a very specific /var
associated with it, and would never use any other. (We actually use
HMAC-SHA256 on the GPT partition type for /var, keyed by the machine-id,
since machine-id is something we want to keep somewhat private).
Setting the right UUID for installations takes extra care. To make
things a bit simpler to set up, we avoid this safety check for nspawn
and RootImage= in unit files, under the assumption that such container
and service images unlikely will have multiple installations on them.
The check is hence only required when booting full machines, i.e. in
in systemd-gpt-auto-generator.
To help with putting together images for full machines, PR #14368
introduces a repartition tool that can automatically fill in correctly
calculated UUIDs on first boot if images have the var partition UUID
initialized to all zeroes. With that in place systems can be put
together in a way that on first boot the machine ID is determined and
the partition table automatically adjusted to have the /var partition
with the right UUID.
|
|/
|
|
|
|
|
| |
Let's not assume "umask=" is a valid mount option for XBOOTLDR
partitions unconditionally.
Fixes: #14165
|
|
|
|
|
| |
As requested in
https://github.com/systemd/systemd/pull/14196#discussion_r352036184.
|
|
|
|
|
|
|
| |
The docs didn't talk about this, so let's add an explicit mention that the
boot loader must cooperate. And also make the message from the generator
notice level. This should help people who are trying to mix grub and the
gpt auto logic.
|
|
|
|
|
|
|
|
| |
If we fail to write the timeout, let's not exit. (This might happen if another
generator writes the same dropin.) No need to make this fatal.
Since this is non-fatal now and the name doesn't need to be unique, let's make
the drop-in name shorter.
|
|
|
|
|
| |
In particular, let's give a hint when we do nothing in the common case of
root= being used.
|
|
|
|
|
|
| |
open_parent_devno() which is a helper is moved out of the main "business logic"
block of various add_*() functions. And parse_proc_cmdline_item() is moved to
the end, near to run() where it is used. No functional change.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
I want to use efivars.[ch] in proc-cmdline.c, but most of the efivars stuff is
not needed in basic/. Move the file from shared/ to basic/, but then move back
most of the higher-level functions to the new shared/efi-loader.c file.
|
|
|
|
|
| |
It's a special case of strjoin(), so no need to keep both. In particular
as typing strjoin() is even shoert than strappend().
|
| |
|
| |
|
| |
|
|\
| |
| | |
add systemd-nspawn --volatile=overlay support, as well as the same for host systems
|
| |
| |
| |
| | |
when we otherwise cannot determine root device node
|
| |
| |
| |
| |
| |
| |
| | |
can include fs-util.h later
As that header also defines a function open_parent() which does
something different.
|
|/ |
|
|
|
|
|
|
| |
Instead of enabling it unconditionally and then using ConditionPathExists=/etc/fstab,
and possibly masking this condition if it should be enabled for auto gpt stuff,
just pull it in explicitly when required.
|
|\
| |
| | |
make sure to propagate GPT root partition r/w flag into mount r/w flag
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This ensures that the read/write state of the root mount matches the
read/write flag in the GPT partition table entry.
This is only used as fallback in case no ro/rw flag is specified on the
kernel cmdline, and there's no entry for the root partition in
/etc/fstab.
This is missing functionality of the GPT auto logic, as without this the
root partition was always mounted read-only — when booting with zero
configuration in /etc/fstab and /proc/cmdline —, as we defaulted to
read-only behaviour for all mounts. Moreover we honoured the r/o flag in
the partition table for all other partition types, except for the root
partition.
|
| |
| |
| |
| |
| | |
No change in behaviour, but let's track whether ro or rw are specified
on the kernel cmdline at all.
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generators run in a context where waiting for udev is not an option,
simply because it's not running there yet. Hence, let's not wait for it
in this case.
This is generally OK to do as we are operating on the root disk only
here, which should have been probed already by the time we come this
far.
An alternative fix might be to remove the udev dependency from image
dissection again in the long run (and thus replace reliance on
/dev/block/x:y somehow with something else).
Fixes: #11205
|