summaryrefslogtreecommitdiff
path: root/src/basic/cgroup-util.h
Commit message (Collapse)AuthorAgeFilesLines
* license: LGPL-2.1+ -> LGPL-2.1-or-laterYu Watanabe2020-11-091-1/+1
|
* cgroup-util: add cg_get_attribute_as_bool() helperAnita Zhang2020-10-071-0/+3
|
* core: add ManagedOOM*= properties to configure systemd-oomd on the unitAnita Zhang2020-10-071-0/+10
| | | | | This adds the hook ups so it can be read with the usual systemd utilities. Used in later commits by sytemd-oomd.
* core: introduce support for cgroup freezerMichal Sekletár2020-04-301-0/+1
| | | | | | | | | | | | | | | | | | | | With cgroup v2 the cgroup freezer is implemented as a cgroup attribute called cgroup.freeze. cgroup can be frozen by writing "1" to the file and kernel will send us a notification through "cgroup.events" after the operation is finished and processes in the cgroup entered quiescent state, i.e. they are not scheduled to run. Writing "0" to the attribute file does the inverse and process execution is resumed. This commit exposes above low-level functionality through systemd's DBus API. Each unit type must provide specialized implementation for these methods, otherwise, we return an error. So far only service, scope, and slice unit types provide the support. It is possible to check if a given unit has the support using CanFreeze() DBus property. Note that DBus API has a synchronous behavior and we dispatch the reply to freeze/thaw requests only after the kernel has notified us that requested operation was completed.
* basic/cgroup-util: introduce cg_get_keyed_attribute_full()Michal Sekletár2020-04-291-1/+23
| | | | | | Callers of cg_get_keyed_attribute_full() can now specify via the flag whether the missing keyes in cgroup attribute file are OK or not. Also the wrappers for both strict and graceful version are provided.
* cgroup-util: cg_get_xattr_malloc helperAnita Zhang2020-03-241-0/+1
| | | | | `cg_get_xattr_malloc` to read a cgroup xattr value and allocate space to hold said value (simple helper combining existing functions).
* cgroup-util: helper to cg_get_attribute and convert to uint64_tAnita Zhang2020-03-241-0/+2
| | | | | | A common pattern in the codebase is reading a cgroup memory value and converting it to a uint64_t. Let's make it a helper and refactor a few places to use it so it's more concise.
* basic/cgroup-util: modernize cg_split_spec()Zbigniew Jędrzejewski-Szmek2020-03-101-1/+1
| | | | Those cryptic one letter variable names, yuck!
* cgroup-util: add new cg_remove_xattr() for removing xattr from cgroupLennart Poettering2019-11-201-0/+1
|
* logind: drop unused user_tasks_max fieldZbigniew Jędrzejewski-Szmek2019-11-141-3/+0
| | | | | | | | | We would only write to the field, and take the address. All *readers* were removed in 284149392755f086d0a714071c33aa609e61505e. (The explanation for why the field wasn't removed back then is that the patch underwent a few iterations, with the initial version adding translation back and forth. Later versions of the patch simply emit a warning and ignore the old value. Apparently nobody noticed that the value became unused.)
* core: make TasksMax a partially dynamic propertyZbigniew Jędrzejewski-Szmek2019-11-141-1/+0
| | | | | | | | | | | | | | | | | TasksMax= and DefaultTasksMax= can be specified as percentages. We don't actually document of what the percentage is relative to, but the implementation uses the smallest of /proc/sys/kernel/pid_max, /proc/sys/kernel/threads-max, and /sys/fs/cgroup/pids.max (when present). When the value is a percentage, we immediately convert it to an absolute value. If the limit later changes (which can happen e.g. when systemd-sysctl runs), the absolute value becomes outdated. So let's store either the percentage or absolute value, whatever was specified, and only convert to an absolute value when the value is used. For example, when starting a unit, the absolute value will be calculated when the cgroup for the unit is created. Fixes #13419.
* Merge pull request #13246 from keszybz/add-SystemdOptions-efi-variableZbigniew Jędrzejewski-Szmek2019-10-031-27/+4
|\ | | | | Add efi variable to augment /proc/cmdline
| * util-lib: move some functions from basic/cgroup-util to shared/cgroup-setupZbigniew Jędrzejewski-Szmek2019-09-161-26/+0
| | | | | | | | | | | | | | | | | | This way less stuff needs to be in basic. Initially, I wanted to move all the parts of cgroup-utils.[ch] that depend on efivars.[ch] to shared, because efivars.[ch] is in shared/. Later on, I decide to split efivars.[ch], so the move done in this patch is not necessary anymore. Nevertheless, it is still valid on its own. If at some point we want to expose libbasic, it is better to to not have stuff that belong in libshared there.
| * basic/cgroup-util: let cgroup_unified_flush() return the detected hierarchyZbigniew Jędrzejewski-Szmek2019-09-161-1/+4
| | | | | | | | | | | | | | This avoid the use of the global variable. Also rename cgroup_unified_update() to cgroup_unified_cached() and cgroup_unified_flush() to cgroup_unified() to better reflect their new roles.
* | cgroup: introduce support for cgroup v2 CPUSET controllerPavel Hrdina2019-09-241-1/+3
|/ | | | | | | | | | | | | | Introduce support for configuring cpus and mems for processes using cgroup v2 CPUSET controller. This allows users to limit which cpus and memory NUMA nodes can be used by processes to better utilize system resources. The cgroup v2 interfaces to control it are cpuset.cpus and cpuset.mems where the requested configuration is written. However, it doesn't mean that the requested configuration will be actually used as parent cgroup may limit the cpus or mems as well. In order to reflect the real configuration cgroup v2 provides read-only files cpuset.cpus.effective and cpuset.mems.effective which are exported to users as well.
* service: make killmode=cgroup|mixed, SendSIGKILL=no services singletonsDaniel Black2019-01-291-1/+1
| | | | | | | | | | | | | | | | | | KillMode=mixed and control group are used to indicate that all process should be killed off. SendSIGKILL is used for services that require a clean shutdown. These are typically database service where a SigKilled process would result in a lengthy recovery and who's shutdown or startup time is quite variable (so Timeout settings aren't of use). Here we take these two factors and refuse to start a service if there are existing processes within a control group. Databases, while generally having some protection against multiple instances running, lets not stress the rigor of these. Also ExecStartPre parts of the service aren't as rigoriously written to protect against against multiple use. closes #8630
* tree-wide: always declare bitflag enums the same wayLennart Poettering2019-01-071-3/+3
| | | | | let's always use the 1 << x syntax. No change of behaviour or even of the compiled binary.
* cgroup: s/cgroups? ?v?([0-9])/cgroup v\1/gIChris Down2019-01-031-3/+3
| | | | | | | | | | Nitpicky, but we've used a lot of random spacings and names in the past, but we're trying to be completely consistent on "cgroup vN" now. Generated by `fd -0 | xargs -0 -n1 sed -ri --follow-symlinks 's/cgroups? ?v?([0-9])/cgroup v\1/gI'`. I manually ignored places where it's not appropriate to replace (eg. "cgroup2" fstype and in src/shared/linux).
* cgroup: be more careful with which controllers we can enable/disable on a cgroupLennart Poettering2018-11-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes cg_enable_everywhere() to return which controllers are enabled for the specified cgroup. This information is then used to correctly track the enablement mask currently in effect for a unit. Moreover, when we try to turn off a controller, and this works, then this is indicates that the parent unit might succesfully turn it off now, too as our unit might have kept it busy. So far, when realizing cgroups, i.e. when syncing up the kernel representation of relevant cgroups with our own idea we would strictly work from the root to the leaves. This is generally a good approach, as when controllers are enabled this has to happen in root-to-leaves order. However, when controllers are disabled this has to happen in the opposite order: in leaves-to-root order (this is because controllers can only be enabled in a child if it is already enabled in the parent, and if it shall be disabled in the parent then it has to be disabled in the child first, otherwise it is considered busy when it is attempted to remove it in the parent). To make things complicated when invalidating a unit's cgroup membershup systemd can actually turn off some controllers previously turned on at the very same time as it turns on other controllers previously turned off. In such a case we have to work up leaves-to-root *and* root-to-leaves right after each other. With this patch this is implemented: we still generally operate root-to-leaves, but as soon as we noticed we successfully turned off a controller previously turned on for a cgroup we'll re-enqueue the cgroup realization for all parents of a unit, thus implementing leaves-to-root where necessary.
* cgroup v2: Don't require CPU controller for CPU accounting in 4.15+Chris Down2018-11-181-0/+3
| | | | | | | | | | | systemd only uses functions that are as of Linux 4.15+ provided externally to the CPU controller (currently usage_usec), so if we have a new enough kernel, we don't need to set CGROUP_MASK_CPU for CPUAccounting=true as the CPU controller does not need to necessarily be enabled in this case. Part of this patch is modelled on an earlier patch by Ryutaroh Matsumoto (see PR #9665).
* cgroup: add new helper that knows which controllers are mounted togetherLennart Poettering2018-11-161-0/+9
|
* basic/cgroup-util: remove two unnecessary includesZbigniew Jędrzejewski-Szmek2018-11-141-2/+0
|
* cgroup-util: make definition of CGROUP_CONTROLLER_TO_MASK() unsignedLennart Poettering2018-10-261-1/+1
| | | | | | Otherwise doing comparing a CGroupMask (which is unsigned in effect) with the result of CGROUP_CONTROLLER_TO_MASK() will result in warnings about signedness differences.
* cgroup-util: add mask definitions for sets of controllers supported by ↵Lennart Poettering2018-10-261-0/+10
| | | | cgroupsv1 vs. cgroupsv2
* core: support cgroup v2 device controllerRoman Gushchin2018-10-091-0/+2
| | | | | | | | Cgroup v2 provides the eBPF-based device controller, which isn't currently supported by systemd. This commit aims to provide such support. There are no user-visible changes, just the device policy and whitelist start working if cgroup v2 is used.
* core: refactor bpf firewall support into a pseudo-controllerRoman Gushchin2018-10-091-0/+6
| | | | | The idea is to introduce a concept of bpf-based pseudo-controllers to make adding new bpf-based features easier.
* tree-wide: remove Lennart's copyright linesLennart Poettering2018-06-141-4/+0
| | | | | | | | | | | These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.
* tree-wide: drop 'This file is part of systemd' blurbLennart Poettering2018-06-141-2/+0
| | | | | | | | | | | | | | | | This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.
* tree-wide: drop license boilerplateZbigniew Jędrzejewski-Szmek2018-04-061-13/+0
| | | | | | | | | | Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.
* cgroup-util: rework cg_get_keyed_attribute() a bitLennart Poettering2018-02-091-1/+1
| | | | | | | | | | | | | Let's make sure we don't clobber the return parameter on failure, to follow our coding style. Also, break the loop early if we have all attributes we need. This also changes the keys parameter to a simple char**, so that we can use STRV_MAKE() for passing the list of attributes to read. This also makes it possible to distuingish the case when the whole attribute file doesn't exist from one key in it missing. In the former case we return -ENOENT, in the latter we now return -ENXIO.
* cgroup-util: merge cg_set_tasks_access() and cg-set_group_access() into oneLennart Poettering2017-11-251-2/+1
| | | | | | | | | | We never use these functions seperately, hence don't bother splitting them into to. Also, simplify things a bit, and maintain tables for the attribute files to chown. Let's also update those tables a bit, and include thenew "cgroup.threads" file in it, that needs to be delegated too, according to the documentation.
* cgroup-util: move Set* allocation into cg_kernel_controllers()Lennart Poettering2017-11-211-1/+1
| | | | | Previously, callers had to do this on their own. Let's make the call do that instead, making the caller code a bit shorter.
* cgroup: move cgroup controller names def.h → cgroup-util.hLennart Poettering2017-11-211-0/+4
| | | | | These definitions are clearly cgroup specific, hence let's move them out of def.h
* Add SPDX license identifiers to source files under the LGPLZbigniew Jędrzejewski-Szmek2017-11-191-0/+1
| | | | | This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
* cgroup-util: add brief comments clarifying which controllers are v2-only and ↵Lennart Poettering2017-11-131-4/+4
| | | | which v1-only
* core: introduce cg_mask_from_string()/cg_mask_to_string()Franck Bui2017-05-041-0/+2
|
* cgroup: rename cg_unified() → cg_unified_controller()Lennart Poettering2017-02-241-1/+1
| | | | | cg_unified() is a bit generic a name, let's make clear that it checks whether a specified controller is in unified mode.
* cgroup: change cg_unified() to possibly return errors againLennart Poettering2017-02-241-3/+3
| | | | | | | | | We use our cgroup APIs in various contexts, including from our libraries sd-login, sd-bus. As we don#t control those environments we can't rely that the unified cgroup setup logic succeeds, and hence really shouldn't assert on it. This more or less reverts 415fc41ceaeada2e32639f24f134b1c248b9e43f.
* Rename cg_is_unified_systemd_controller_wanted to cg_is_hybrid_wantedZbigniew Jędrzejewski-Szmek2017-02-221-1/+1
| | | | Less typing and doesn't make the table so incredibly wide.
* core: make hybrid cgroup unified mode keep compat /sys/fs/cgroup/systemd ↵Tejun Heo2017-02-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hierarchy Currently the hybrid mode mounts cgroup v2 on /sys/fs/cgroup instead of the v1 name=systemd hierarchy. While this works fine for systemd itself, it breaks tools which expect cgroup v1 hierarchy on /sys/fs/cgroup/systemd. This patch updates the hybrid mode so that it mounts v2 hierarchy on /sys/fs/cgroup/unified and keeps v1 "name=systemd" hierarchy on /sys/fs/cgroup/systemd for compatibility. systemd itself doesn't depend on the "name=systemd" hierarchy at all. All operations take place on the v2 hierarchy as before but the v1 hierarchy is kept in sync so that any tools which expect it to be there can keep doing so. This allows systemd to take advantage of cgroup v2 process management without requiring other tools to be aware of the hybrid mode. The hybrid mode is implemented by mapping the special systemd controller to /sys/fs/cgroup/unified and making the basic cgroup utility operations - cg_attach(), cg_create(), cg_rmdir() and cg_trim() - also operate on the /sys/fs/cgroup/systemd hierarchy whenever the cgroup2 hierarchy is updated. While a bit messy, this will allow dropping complications from using cgroup v1 for process management a lot sooner than otherwise possible which should make it a net gain in terms of maintainability. v2: Fixed !cgns breakage reported by @evverx and renamed the unified mount point to /sys/fs/cgroup/unified as suggested by @brauner. v3: chown the compat hierarchy too on delegation. Suggested by @evverx. v4: [zj] - drop the change to default, full "legacy" is still the default.
* core: simplify cg_[all_]unified()Tejun Heo2017-02-181-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | cg_[all_]unified() test whether a specific controller or all controllers are on the unified hierarchy. While what's being asked is a simple binary question, the callers must assume that the functions may fail any time, which unnecessarily complicates their usages. This complication is unnecessary. Internally, the test result is cached anyway and there are only a few places where the test actually needs to be performed. This patch simplifies cg_[all_]unified(). * cg_[all_]unified() are updated to return bool. If the result can't be decided, assertion failure is triggered. Error handlings from their callers are dropped. * cg_unified_flush() is updated to calculate the new result synchrnously and return whether it succeeded or not. Places which need to flush the test result are updated to test for failure. This ensures that all the following cg_[all_]unified() tests succeed. * Places which expected possible cg_[all_]unified() failures are updated to call and test cg_unified_flush() before calling cg_[all_]unified(). This includes functions used while setting up mounts during boot and manager_setup_cgroup().
* nspawn: cleanup and chown the synced cgroup hierarchy (#4223)Evgeny Vereshchagin2016-10-131-0/+4
| | | Fixes: #4181
* core: add "invocation ID" concept to service managerLennart Poettering2016-10-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.
* Merge pull request #3965 from htejun/systemd-controller-on-unifiedZbigniew Jędrzejewski-Szmek2016-08-191-1/+11
|\
| * core: use the unified hierarchy for the systemd cgroup controller hierarchyTejun Heo2016-08-171-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, systemd uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, systemd uses a named legacy hierarchy mounted on /sys/fs/cgroup/systemd without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for systemd to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates systemd so that it prefers the unified hierarchy for the systemd cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for systemd cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only systemd controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().
| * core: rename cg_unified() to cg_all_unified()Tejun Heo2016-08-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | A following patch will update cgroup handling so that the systemd controller (/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel resource controllers are on the legacy hierarchies. This would require distinguishing whether all controllers are on cgroup v2 or only the systemd controller is. In preparation, this patch renames cg_unified() to cg_all_unified(). This patch doesn't cause any functional changes.
* | logind: update empty and "infinity" handling for [User]TasksMax (#3835)Tejun Heo2016-08-181-0/+4
|/ | | | | | | | | | | | | | | | The parsing functions for [User]TasksMax were inconsistent. Empty string and "infinity" were interpreted as no limit for TasksMax but not accepted for UserTasksMax. Update them so that they're consistent with other knobs. * Empty string indicates the default value. * "infinity" indicates no limit. While at it, replace opencoded (uint64_t) -1 with CGROUP_LIMIT_MAX in TasksMax handling. v2: Update empty string to indicate the default value as suggested by Zbigniew Jędrzejewski-Szmek. v3: Fixed empty UserTasksMax handling.
* core: add cgroup CPU controller support on the unified hierarchyTejun Heo2016-08-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately, due to the disagreements in the kernel development community, CPU controller cgroup v2 support has not been merged and enabling it requires applying two small out-of-tree kernel patches. The situation is explained in the following documentation. https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu While it isn't clear what will happen with CPU controller cgroup v2 support, there are critical features which are possible only on cgroup v2 such as buffered write control making cgroup v2 essential for a lot of workloads. This commit implements systemd CPU controller support on the unified hierarchy so that users who choose to deploy CPU controller cgroup v2 support can easily take advantage of it. On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max" replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config options are added with the usual compat translation. CPU quota settings remain unchanged and apply to both legacy and unified hierarchies. v2: - Error in man page corrected. - CPU config application in cgroup_context_apply() refactored. - CPU accounting now works on unified hierarchy.
* Merge pull request #3589 from brauner/cgroup_namespaceLennart Poettering2016-07-251-0/+2
|\ | | | | Cgroup namespace
| * cgroup: detect cgroup namespacesChristian Brauner2016-07-091-0/+2
| | | | | | | | | | - define CLONE_NEWCGROUP - add fun to detect whether cgroup namespaces are supported