summaryrefslogtreecommitdiff
path: root/src/gpt-auto-generator
Commit message (Collapse)AuthorAgeFilesLines
* dissect-image: Explicitly remove partitions when done with imageDaan De Meyer2022-05-231-0/+2
| | | | | | | | | | | | | | When closing a loop device, the kernel will asynchronously remove the probed partitions. This can lead to race conditions where we try to reuse a partition device that still needs to be removed by the kernel. To avoid such issues, let's explicitly try to remove any partitions using BLKPG_DEL_PARTITION when we're done with an image. To make sure we don't try to remove partitions when we want them to remain (e.g. systemd-dissect --mount), we add dissected_image_relinquish() in a similar vein to loop_device_relinquish() and decrypted_image_relinquish().
* stat-util: fix dir_is_empty() with hidden/backup filesLennart Poettering2022-05-041-1/+1
| | | | | | | | | | | | | | | | | | | | This is a follow-up for f470cb6d13558fc06131dc677d54a089a0b07359 which in turn is a follow-up for a068aceafbffcba85398cce636c25d659265087a. The latter started to honour hidden files when deciding whether a directory is empty. The former reverted to the old behaviour to fix issue #23220. It introduced a bug though: when a directory contains a larger number of hidden entries the getdents64() buffer will not suffice to read them, since we just allocate three entries for it (which is definitely enough if we just ignore the . + .. entries, but not ig we ignore more). I think it's a bit confusing that dir_is_empty() can return true even if rmdir() on the dir would return ENOTEMPTY. Hence, let's rework the function to make it optional whether hidden files are ignored or not. After all, I looking at the users of this function I am pretty sure in more cases we want to honour hidden files.
* devnum-util: define helper macros for formatting devnum major/minor pairsLennart Poettering2022-04-131-2/+2
| | | | And port some parts over.
* basic: split out dev_t related calls into new devno-util.[ch]Lennart Poettering2022-04-131-0/+1
| | | | | | | | | | | | | | No actual code changes, just splitting out of some dev_t handling related calls from stat-util.[ch], they are quite a number already, and deserve their own module now I think. Also, try to settle on the name "devnum" as the name for the concept, instead of "devno" or "dev" or "devid". "devnum" is the name exported in udev APIs, hence probably best to stick to that. (this just renames a few symbols to "devum", local variables are left untouched, to make the patch not too invasive) No actual code changes.
* tree-wide: take BSD lock on loopback devices we dissect/mount/operate onLennart Poettering2022-04-101-0/+7
| | | | | | | | | | | | | | | | | | | | | So here's something we should always keep in mind: systemd-udevd actually does *two* things with BSD file locks on block devices: 1. While it probes a device it takes a LOCK_SH lock. Thus everyone else taking a LOCK_EX lock will temporarily block udev from probing devices, which is good when making changes to it. 2. Whenever a device is closed after write (detected via inotify), udevd will issue BLKRRPART (requesting the kernel to reread the partition table). It does this while holding a LOCK_EX lock on the block device. Thus anyone else taking LOCK_SH or LOCK_EX will temporarily block udevd from issuing that ioctl. And that's quite relevant, since the kernel will temporarily flush out all partitions while re-reading the partition table and then create them anew. Thus it is smart to take LOCK_SH when dissecting a block device to ensure that no BLKRRPART is issued in the background, until we mounted the devices.
* dissect: rework how we wait for partition block devicesLennart Poettering2022-04-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This revisits the mess around waiting for partition block devices in the image dissection code. It implements a nice little trick: Instead of waiting for the kernel to probe the partition table for us and generate the block devices from it, we'll just do that ourselves. How can we do it? Via the BLKPG_ADD_PARTITION ioctl, that the kernel has supported for a while. This ioctl allows creating partition block devices off "whole" block devices from userspace, without the partitions necessarily being present in the partition table at all. So, whenever we want a partition to be there, we'll just issue BLKPG_ADD_PARTITION. This can either work, in which case we know the partition is there, and can use it. Yay. Or it can fail with EBUSY, which the kernel returns if a partition by the selected partition index already exists (or if an existing partition overlaps with the new one). But if that's the case, then that's also OK, because the partition will already exist. So, regardless if we win or the kernel wins, for us the outcome is the same: the partition block device will exist after invoking the ioctl. Yay. Net effect: we are not dependent on asynchronous uevent messages to wait for the devices. Instead we synchronously get what we need. This makes us independent of the (apparently less than reliable) netlink transport, and should almost always be quicker. Hopefully addresses #17469 even on older kernels. Fixes: #17469
* gpt-auto: properly handle case where we can't determine devno of /usr/ fsLennart Poettering2022-02-141-2/+6
| | | | | | | | | | | | get_block_device_harder() returns == 0 if the fs is valid, but it is not backed by a single devno. (As opposed to returning > 0 if the devno is valid). Let's catch this case and log a clear message, and don't bother open the device in that case. This is mostly cosmetical, as either way, systemd-gpt-auto-generator doesn't work in scenarios like that. Prompted-by: #22504
* Merge pull request #20257 from bluca/seqnoLuca Boccassi2021-08-311-0/+1
|\ | | | | Use new diskseq block device property
| * dissect: use DISKSEQ when waiting for block devicesLuca Boccassi2021-07-281-0/+1
| | | | | | | | | | | | | | | | DISKSEQ is a reliable way to find out if we missed a uevent or not, as it's monotonically increasing. If we parse an event with a smaller or no sequence number, we know we need to wait longer. If we parse an event with a greater sequence number, we know we missed it and the device was reused.
* | gpt-auto-generator: Use volatile-root by default and automatic logic as fallbackKristian Klausen2021-08-311-29/+24
|/ | | | | | | | | | | | Previously volatile-root was only checked if "/" wasn't backed by a block device, but the block device isn't necessarily original root block device (ex: if the rootfs is copied to a ext4 fs backed by zram in the initramfs), so we always want volatile-root checked. So shuffle the code around so volatile-root is checked first and fallback to the automatic logic. Fix #20557
* Mount encrypted swap partitions via gpt-autoHugo Osvaldo Barrera2021-07-081-8/+18
| | | | | | | | | | | | | | If the auto-discovered swap partition is LUKS encrypted, decrypt it automatically. This aligns with the Discoverable Partitions Specification, though I've also updated it to explicitly mention that LUKS is now supported here. Since systemd retries any key already in the kernel keyring, if the swap partition has the same passphrase as the root partition, the user won't be prompted a second time for a second passphrase. See https://github.com/systemd/systemd/issues/20019
* tree-wide: "a" -> "an"Yu Watanabe2021-06-301-1/+1
|
* gpt-auto-generator: pull in systemd-growfs@.service if new GPT growfs ↵Lennart Poettering2021-04-231-5/+22
| | | | partition flag is set
* dissect: ignore udev database entries from before the loopback attachmentLennart Poettering2021-04-201-0/+1
| | | | | | | | | This tries to shorten the race of device reuse a bit more: let's ignore udev database entries that are older than the time where we started to use a loopback device. This doesn't fix the whole loopback device raciness mess, but it makes the race window a bit shorter.
* dissect: ignore old uevents when waiting for loopback partition scanLennart Poettering2021-04-201-0/+1
| | | | | | | | | | | Let's drop all monitor uevent that were enqueued before we actually started setting up the device. This doesn't fix the race, but it makes the race window smaller: since we cannot determine the uevent seqnum and the loopback attachment atomically, there's a tiny window where uevents might be generated by the device which we mistake for being associated with out use of the loopback device.
* gpt-auto-generator: don't generate systemd-cryptsetup@.service when ↵gaoyi2021-04-091-0/+4
| | | | --Dlibcryptsetup=false
* tree-wide: make use of DISSECT_IMAGE_USR_NO_ROOT in various toolsLennart Poettering2021-03-161-1/+7
| | | | | | | | Let's make use of the new dissection in all tools where this makes sense, which are all tools that dissect images, except for those which inherently operate on state/configuraiton and thus where an image without state nor configuration is useless (e.g. systemd-tmpfiles/systemd-firstboot/… --image= switch).
* license: LGPL-2.1+ -> LGPL-2.1-or-laterYu Watanabe2020-11-091-1/+1
|
* dissect: wrap verity settings in new VeritySettings structureLennart Poettering2020-09-171-1/+1
| | | | | | | | | | | | Just some refactoring: let's place the various verity related parameters in a common structure, and pass that around instead of the individual parameters. Also, let's load the PKCS#7 signature data when finding metadata right-away, instead of delaying this until we need it. In all cases we call this there's not much time difference between the metdata finding and the loading, hence this simplifies things and makes sure root hash data and its signature is now always acquired together.
* tree-wide: if get_block_device() returns zero devno, check for it in all casesLennart Poettering2020-09-081-1/+1
| | | | | And add a comment for the existing cases where things aren't clear already.
* btrfs: if BTRFS_IOC_DEV_INFO returns /dev/root generate a friendly error messageLennart Poettering2020-09-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | On systems that boot without initrd on a btrfs root file systems the BTRFS_IOC_DEV_INFO ioctl returns /dev/root as backing device. That sucks, since that is not a real device visible to userspace. Since this has been that way since forever, and it doesn't look like the kernel will get fixed soon for this, let's at least generate a useful error message in this case. This is not a bug fix, just a tweak to make this more recognizable. Once the kernel gets fixed to report the correct device nodes in this case, in a way userspace can make sense of them things will magically work for systemd, too. (Note that this doesn't add a log message about this to really all cases we call get_device() in, but just the main ones that are called in early boot context, after all all there's no benefit in seeing this message too many times.) https://github.com/systemd/systemd/issues/16953 https://bugs.freedesktop.org/show_bug.cgi?id=84689 https://bugzilla.kernel.org/show_bug.cgi?id=89721
* service: add new RootImageOptions featureLuca Boccassi2020-07-291-1/+1
| | | | | | | | | | Allows to specify mount options for RootImage. In case of multi-partition images, the partition number can be prefixed followed by colon. Eg: RootImageOptions=1:ro,dev 2:nosuid nodev In absence of a partition number, 0 is assumed.
* dissect: support single-filesystem verity images with external verity hashLuca Boccassi2020-06-091-1/+1
| | | | | | | | dm-verity support in dissect-image at the moment is restricted to GPT volumes. If the image a single-filesystem type without a partition table (eg: squashfs) and a roothash/verity file are passed, set the verity flag and mark as read-only.
* units: introduce blockdev@.target for properly ordering mounts/swaps against ↵Lennart Poettering2020-01-211-35/+28
| | | | | | | | | cryptsetup Let's hook it into both cryptsetup-generator and gpt-auto-generator with a shared implementation in generator.c Fixes: #8472
* Merge pull request #14390 from poettering/gpt-var-tmpZbigniew Jędrzejewski-Szmek2020-01-141-0/+12
|\ | | | | introduce GPT partition types for /var and /var/tmp and support them for auto-discovery
| * dissect: introduce new recognizable partition types for /var and /var/tmpLennart Poettering2019-12-231-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This has been requested many times before. Let's add it finally. GPT auto-discovery for /var is a bit more complex than for other partition types: the other partitions can to some degree be shared between multiple OS installations on the same disk (think: swap, /home, /srv). However, /var is inherently something bound to an installation, i.e. specific to its identity, or actually *is* its identity, and hence something that cannot be shared. To deal with this this new code is particularly careful when it comes to /var: it will not mount things blindly, but insist that the UUID of the partition matches a hashed version of the machine-id of the installation, so that each installation has a very specific /var associated with it, and would never use any other. (We actually use HMAC-SHA256 on the GPT partition type for /var, keyed by the machine-id, since machine-id is something we want to keep somewhat private). Setting the right UUID for installations takes extra care. To make things a bit simpler to set up, we avoid this safety check for nspawn and RootImage= in unit files, under the assumption that such container and service images unlikely will have multiple installations on them. The check is hence only required when booting full machines, i.e. in in systemd-gpt-auto-generator. To help with putting together images for full machines, PR #14368 introduces a repartition tool that can automatically fill in correctly calculated UUIDs on first boot if images have the var partition UUID initialized to all zeroes. With that in place systems can be put together in a way that on first boot the machine ID is determined and the partition table automatically adjusted to have the /var partition with the right UUID.
* | gpt-auto: don't assume XBOOTLDR is vfatLennart Poettering2020-01-081-2/+15
|/ | | | | | | Let's not assume "umask=" is a valid mount option for XBOOTLDR partitions unconditionally. Fixes: #14165
* gpt-auto-generator: rename function for clarityZbigniew Jędrzejewski-Szmek2019-11-301-5/+6
| | | | | As requested in https://github.com/systemd/systemd/pull/14196#discussion_r352036184.
* gpt-auto-generator: make it easier to notice if boot loader support is missingZbigniew Jędrzejewski-Szmek2019-11-301-1/+2
| | | | | | | The docs didn't talk about this, so let's add an explicit mention that the boot loader must cooperate. And also make the message from the generator notice level. This should help people who are trying to mix grub and the gpt auto logic.
* gpt-auto-generator: use write_drop_in_format() helper and downgrade failureZbigniew Jędrzejewski-Szmek2019-11-301-11/+9
| | | | | | | | If we fail to write the timeout, let's not exit. (This might happen if another generator writes the same dropin.) No need to make this fatal. Since this is non-fatal now and the name doesn't need to be unique, let's make the drop-in name shorter.
* gpt-auto-generator: improve debug messages a bitZbigniew Jędrzejewski-Szmek2019-11-301-2/+5
| | | | | In particular, let's give a hint when we do nothing in the common case of root= being used.
* gpt-auto-generator: move functions aroundZbigniew Jędrzejewski-Szmek2019-11-281-142/+141
| | | | | | open_parent_devno() which is a helper is moved out of the main "business logic" block of various add_*() functions. And parse_proc_cmdline_item() is moved to the end, near to run() where it is used. No functional change.
* tree-wide: drop stat.h or statfs.h when stat-util.h is includedYu Watanabe2019-11-041-1/+0
|
* tree-wide: drop blkid.h when blkid-util.h is includedYu Watanabe2019-11-041-1/+0
|
* tree-wide: drop missing.hYu Watanabe2019-10-311-1/+0
|
* util-lib: split shared/efivars into basic/efivars and shared/efi-loaderZbigniew Jędrzejewski-Szmek2019-09-161-1/+1
| | | | | | I want to use efivars.[ch] in proc-cmdline.c, but most of the efivars stuff is not needed in basic/. Move the file from shared/ to basic/, but then move back most of the higher-level functions to the new shared/efi-loader.c file.
* tree-wide: get rid of strappend()Lennart Poettering2019-07-121-1/+1
| | | | | It's a special case of strjoin(), so no need to keep both. In particular as typing strjoin() is even shoert than strappend().
* tree-wide: replace strjoina() with prefix_roota()Yu Watanabe2019-06-251-2/+2
|
* tree-wide: replace strjoin() with path_join()Yu Watanabe2019-06-211-2/+2
|
* tree-wide: make use of the new WRITE_STRING_FILE_MKDIR_0755 flagLennart Poettering2019-05-081-2/+1
|
* Merge pull request #11243 from poettering/nspawn-root-overlayZbigniew Jędrzejewski-Szmek2019-03-011-4/+22
|\ | | | | add systemd-nspawn --volatile=overlay support, as well as the same for host systems
| * gpt-auto-generator: use new /run/systemd/volatile-root symlink as fallback ↵Lennart Poettering2019-03-011-2/+20
| | | | | | | | when we otherwise cannot determine root device node
| * gpt-auto-generator: rename open_parent() → open_parent_devno() so that we ↵Lennart Poettering2019-03-011-2/+2
| | | | | | | | | | | | | | can include fs-util.h later As that header also defines a function open_parent() which does something different.
* | gpt-auto: also load the boot loader partition during regular bootsLennart Poettering2019-03-011-15/+71
|/
* Pull in systemd-remount-fs.service only when requiredZbigniew Jędrzejewski-Szmek2019-01-031-2/+5
| | | | | | Instead of enabling it unconditionally and then using ConditionPathExists=/etc/fstab, and possibly masking this condition if it should be enabled for auto gpt stuff, just pull it in explicitly when required.
* Merge pull request #10912 from poettering/gpt-root-rwZbigniew Jędrzejewski-Szmek2018-12-201-7/+51
|\ | | | | make sure to propagate GPT root partition r/w flag into mount r/w flag
| * gpt-auto: propagate gpt partition ro/rw flag into root mountLennart Poettering2018-12-181-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This ensures that the read/write state of the root mount matches the read/write flag in the GPT partition table entry. This is only used as fallback in case no ro/rw flag is specified on the kernel cmdline, and there's no entry for the root partition in /etc/fstab. This is missing functionality of the GPT auto logic, as without this the root partition was always mounted read-only — when booting with zero configuration in /etc/fstab and /proc/cmdline —, as we defaulted to read-only behaviour for all mounts. Moreover we honoured the r/o flag in the partition table for all other partition types, except for the root partition.
| * gpt-auto: make arg_root_rw a tri-stateLennart Poettering2018-12-181-2/+2
| | | | | | | | | | No change in behaviour, but let's track whether ro or rw are specified on the kernel cmdline at all.
| * gpt-auto: compare kernel cmdline args with proc_cmdline_key_streq()Lennart Poettering2018-12-181-5/+6
| |
* | gpt-auto-generator: don't wait for udevLennart Poettering2018-12-191-1/+1
|/ | | | | | | | | | | | | | | | Generators run in a context where waiting for udev is not an option, simply because it's not running there yet. Hence, let's not wait for it in this case. This is generally OK to do as we are operating on the root disk only here, which should have been probed already by the time we come this far. An alternative fix might be to remove the udev dependency from image dissection again in the long run (and thus replace reliance on /dev/block/x:y somehow with something else). Fixes: #11205