summaryrefslogtreecommitdiff
path: root/Grow.c
Commit message (Collapse)AuthorAgeFilesLines
* mdadm: fix growing containersNigel Croxon2021-04-061-8/+11
| | | | | | | | | | | | | | This fixes growing containers which was broken with commit 4ae96c802203ec3c (mdadm: fix reshape from RAID5 to RAID6 with backup file) The issue being that containers use the function wait_for_reshape_isms and expect a number value and not a string value of "max". The change is to test for external before setting the correct value. Signed-off-by: Nigel Croxon <ncroxon@redhat.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow: Block reshape when external metadata and write-intent bitmapJakub Radtke2021-03-091-9/+15
| | | | | | | | | Current kernel sysfs interface for the bitmap is limited. It allows the applying of the bitmap on non-active volumes only. The reshape operation for a volume with a bitmap should be blocked. Signed-off-by: Jakub Radtke <jakub.radtke@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow: be careful of corrupt dev_roles listNeilBrown2021-03-031-3/+12
| | | | | | | | | | | | | | | | | | I've seen a case where the dev_roles list of a linear array was corrupt. ->max_dev was > 128 and > raid_disks, and the extra slots were '0', not 0xFFFE or 0xFFFF. This caused problems when a 128th device was added. So: 1/ make Grow_Add_device more robust so that if numbers look wrong, it fails-safe. 2/ make examine_super1() report details if the dev_roles array is corrupt. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* mdadm: fix reshape from RAID5 to RAID6 with backup fileNigel Croxon2021-03-031-2/+5
| | | | | | | | | | | | | Reshaping a 3-disk RAID5 to 4-disk RAID6 will cause a hang of the resync after the grow. Adding a spare disk to avoid degrading the array when growing is successful, but not successful when supplying a backup file on the command line. If the reshape job is not already running, set the sync_max value to max. Signed-off-by: Nigel Croxon <ncroxon@redhat.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* mdadm: Unify forks behaviourMariusz Tkaczyk2020-11-251-45/+7
| | | | | | | | | | | | | | If mdadm is run by udev or systemd, it gets a pipe as each stream. Forks in the background may run after an event or service has been processed when udev is detached from pipe. As a result process fails quietly if any message is written. To prevent from it, each fork has to close all parent streams. Leave stderr and stdout opened only for debug purposes. Unify it across all forks. Introduce other descriptors detection by scanning /proc/self/fd directory. Add generic method for managing systemd services. Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com>
* mdadm/Grow: prevent md's fd from being occupied during delayed timeallenpeng2020-06-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we start reshaping on md which shares sub-devices with another resyncing md, it may be forced to wait for others to complete. mdadm occupies the md's fd during this time, which causes the md can not be stopped and the filesystem can not be mounted on the md. We can close md's fd earlier to solve this problem. Reproducible Steps: 1. create two partitions on sda, sdb, sdc, sdd 2. create raid1 with sda1, sdb1 mdadm -C /dev/md1 --assume-clean -l1 -n2 /dev/sda1 /dev/sdb1 3. create raid5 with sda2, sdb2, sdc2 mdadm -C /dev/md2 --assume-clean -l5 -n3 /dev/sda2 /dev/sdb2 /dev/sdc2 4. start resync at md1 echo repair > /sys/block/md1/md/sync_action 5. reshape raid5 to raid6 mdadm -a /dev/md2 /dev/sdd2 mdadm --grow /dev/md2 -n4 -l6 --backup-file=/root/md2-backup Now mdadm is occupying the fd of md2, causing md2 unable to be stopped 6.Try to stop md2, an error message shows mdadm -S /dev/md2 mdadm: Cannot get exclusive access to /dev/md3:Perhaps a running process, mounted filesystem or active volume group? Reviewed-by: Alex Wu <alexwu@synology.com> Reviewed-by: BingJing Chang <bingjingc@synology.com> Reviewed-by: Danny Shih <dannyshih@synology.com> Signed-off-by: ChangSyun Peng <allenpeng@synology.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Fix reshape for decreasing data offsetCorey Hickey2019-02-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ...when not changing the number of disks. This patch needs context to explain. These are the relevant parts of the original code (condensed and annotated): if (dir > 0) { /* Increase data offset (reshape backwards) */ if (data_offset < sd->data_offset + min) { pr_err("--data-offset too small on %s\n", dn); goto release; } } else { /* Decrease data offset (reshape forwards) */ if (data_offset < sd->data_offset - min) { pr_err("--data-offset too small on %s\n", dn); goto release; } } When this code is reached, mdadm has already decided on a reshape direction. When increasing the data offset, the reshape runs backwards (dir==1); when decreasing the data offset, the reshape runs forwards (dir==-1). The conditional within the backwards reshape is correct: the requested offset must be larger than the old offset plus a minimum delta; thus the reshape has room to work. For the forwards reshape, the requested offset needs to be smaller than the old offset minus a minimum delta; to do this correctly, the comparison must be reversed. Also update the error message. Note: I have tested this change on a RAID 5 on Linux 4.18.0 and verified that there were no errors from the kernel and that the device data remained intact. I do not know if there are considerations for different RAID levels. Signed-off-by: Corey Hickey <bugfood-c@fatooh.org> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Fix spelling typos.Dimitri John Ledkov2019-02-111-3/+3
| | | | | Signed-off-by: Dimitri John Ledkov <xnox@ubuntu.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow: report correct new chunk size.NeilBrown2018-12-061-1/+1
| | | | | | | | When using "--grow --chunk=" to change chunk size, the old chunksize is reported instead of the new. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow: avoid overflow in compute_backup_blocks()NeilBrown2018-12-061-1/+2
| | | | | | | | | | | With a chunk size of 16Meg and data drive count of 8, this calculate can easily overflow the 'int' type that is used for the multiplications. So force it to use "long" instead. Reported-and-tested-by: Ed Spiridonov <edo.rus@gmail.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow: Frozen array can't be idleMariusz Tkaczyk2018-08-011-1/+2
| | | | | | | | | | | When array is frozen but there is no recovery/reshape in mdstat, check_idle() will not return error but grow countinue can still working. Check is array frozen. Do not use sysfs sync_action parameter because it doesn't exist for Raid0, simply check metadata_version in mdstat. Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Coverity: Resource leak: close fd before returnAnthony Youngman2018-07-111-0/+1
| | | | | Anthony Youngman <anthony@youngman.org.uk> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* mdadm/grow: correct size and chunk_size castingRoman Sobanski2018-04-271-1/+1
| | | | | | | | | | With commit 4b74a905a67e ("mdadm/grow: Component size must be larger than chunk size") mdadm returns incorrect message if size given to grow was greater than 2 147 483 647 K. Cast chunk_size to "unsigned long long" instead of casting size to "int". Signed-off-by: Roman Sobanski <roman.sobanski@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow.c: Block any level migration with chunk size changeMariusz Tkaczyk2018-01-251-0/+5
| | | | | | | | | | | | Mixing level and chunk changes in one grow operation is not supported. Mdadm performs level migration correctly and ignores new chunk, but after migration it tries to write this chunk to sysfs properties. This is dangerous and can cause unexpected behaviours. Block it before level migration starts. Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* mdadm/grow: correct the s->size > 1 to make 'max' workZhilong Liu2017-11-281-1/+1
| | | | | | | | | s->size > 1 : s->size is '1' when '--grow --size max' parameter is specified, so correct this test here. Fixes: 1b21c449e6f2 ("mdadm/grow: adding a test to ensure resize was required") Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* To support clustered raid10Guoqing Jiang2017-11-091-0/+6
| | | | | | | | | | We are now considering to extend clustered raid to support raid10. But only near layout is supported, so make the check when create the array or switch the bitmap from internal to clustered. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* mdadm/grow: adding a test to ensure resize was requiredZhilong Liu2017-10-111-2/+2
| | | | | | | | | | | | | | | | To fix the commit: 4b74a905a67e (mdadm/grow: Component size must be larger than chunk size) array.level > 1 : against the raids which chunk_size is meaningful. s->size > 0 : ensure that changing component size has required. array.chunk_size / 1024 > s->size : ensure component size should be always >= current chunk_size when requires resize, otherwise, mddev->pers->resize would be set mddev->dev_sectors as '0'. Reported-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Suggested-by: NeilBrown <neilb@suse.com> Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow: Use all 80 charactersJes Sorensen2017-10-021-220/+197
| | | | | | | Try to use the full line length and avoid breaking up lines excessively. Equally break up lines that are too long for no reason. Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow: fix switching on PPL during recoveryPawel Baldysiak2017-10-021-3/+0
| | | | | | | | | | | If raid memeber is not in sync - it is skipped during enablement of PPL. This is not correct, since the drive that we are currently recovering to does not have ppl_size and ppl_sector properly set in sysfs. Remove this skipping, so all drives are updated during turning on the PPL. Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* mdadm/grow: Component size must be larger than chunk sizeZhilong Liu2017-10-021-0/+6
| | | | | | | | | Grow: Changing component size must be larger than current chunk size against stripe raids, otherwise Grow_reshape() would set s->size to '0'. Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow: stop previous reshape process firstTomasz Majchrzak2017-10-021-2/+2
| | | | | | | | | | | | | | | If array is stopped during reshape and assembled again straight away, reshape process in a background might still be running. systemd doesn't start a new service if one already exists. If there is a race, previous process might terminate and new one is not created. Reshape doesn't continue after assemble. Tell systemd to restart the service rather than just start it. It will assure previous service is stopped first. If it's not running, stopping has no effect and only new process is started. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Error messages should end with a newline character.NeilBrown2017-08-161-2/+2
| | | | | | | | Add "\n" to the end of error messages which don't already have one. Also spell "opened" correctly. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow: don't allow to enable PPL when reshape is in progressTomasz Majchrzak2017-06-091-0/+12
| | | | | | | | Don't allow to enable PPL consistency policy when reshape is in progress. Current PPL implementation doesn't work when reshape is taking place. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow: don't allow array geometry change with ppl enabledTomasz Majchrzak2017-06-091-0/+7
| | | | | | | | | Don't allow array geometry change (size expand, disk adding) when PPL consistency policy is enabled. Current PPL implementation doesn't work when reshape is taking place. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow: set component size prior to array sizeTomasz Majchrzak2017-06-051-0/+2
| | | | | | | | | | | It is a partial revert of commit 758b327cf5a7 ("Grow: Remove unnecessary optimization"). For native metadata component size is set in kernel for entire disk space. As external metadata supports multiple arrays within one disk, the component size is set to array size. If component size is not updated prior to array size update, the grow operation fails. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* mdadm: Fixup != broken formattingJes Sorensen2017-05-161-6/+7
| | | | Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* mdadm: Fixup more broken logical operator formattingJes Sorensen2017-05-161-7/+9
| | | | Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* mdadm: Fixup a large number of bad formatting of logical operatorsJes Sorensen2017-05-161-18/+16
| | | | | | Logical oprators never belong at the beginning of a line. Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* mdadm/util: unify fstat checking blkdev into functionZhilong Liu2017-05-051-6/+4
| | | | | | | | | | | | declare function fstat_is_blkdev() to integrate repeated fstat checking block device operations, it returns true/1 when it is a block device, and returns false/0 when it isn't. The fd and devname are necessary parameters, *rdev is optional, parse the pointer of dev_t *rdev, if valid, assigned the device number to dev_t *rdev, if NULL, ignores. Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* change back 0644 permission for Grow.cZhilong Liu2017-05-031-0/+0
| | | | | | | | Fixes commit: 26714713cd2b ("mdadm: Change timestamps to unsigned data type.") Signed-off-by: Zhilong Liu <zlliu@suse.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow: Grow_continue_command: Avoid aliasing array variableJes Sorensen2017-05-021-3/+3
| | | | | | | While this would cause a warning since the two are different types, lets avoid aliasing an existing variable. Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow_continue_command: ensure 'content' is properly initialised.NeilBrown2017-04-201-0/+1
| | | | | | | | | | | | | | Grow_continue_command() call verify_reshape_position(), which assumes that info->sys_name is initialised. 'info' in verify_reshape_position() is 'content' in Grow_continue_command(). In the st->ss->external != 0 branch of that function, sysfs_init() is called to initialize content->sys_name. In the st->ss->external == 0 branch, ->sys_name is not initialized so verify_reshape_position() will not do the right thing. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>
* Grow: Stop bothering about md driver versions older than 0.90.00Jes Sorensen2017-04-051-7/+0
| | | | Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* sysfs: Make sysfs_init() return an error codeJes Sorensen2017-03-301-6/+33
| | | | | | | | Rather than have the caller inspect the returned content, return an error code from sysfs_init(). In addition make all callers actually check it. Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* Grow: Do not shadow an existing variableJes Sorensen2017-03-301-3/+3
| | | | | | | Declaring 'int rv' twice within the same function is asking for trouble. Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* Grow: Remove unnecessary optimizationJes Sorensen2017-03-301-12/+0
| | | | | | | | | | | Per explanation by Neil, this optimization of writing "size" to the attribute of each device, however when reducing the size of devices, the size change isn't permitted until the array has been shrunk, so this will fail anyway. This effectively reverts 65a9798b58b4e4de0157043e2b30a738c27eff43 Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* util: Introduce md_set_array_info()Jes Sorensen2017-03-291-9/+8
| | | | | | Switch from using ioctl(SET_ARRAY_INFO) to using md_set_array_info() Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* util: Introduce md_get_disk_info()Jes Sorensen2017-03-291-7/+7
| | | | | | | This removes all the inline ioctl calls for GET_DISK_INFO, allowing us to switch to sysfs in one place, and improves type checking. Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* util: Introduce md_get_array_info()Jes Sorensen2017-03-291-15/+16
| | | | | | | | | | | Remove most direct ioctl calls for GET_ARRAY_INFO, except for one, which will be addressed in the next patch. This is the start of the effort to clean up the use of ioctl calls and introduce a more structured API, which will use sysfs and fall back to ioctl for backup. Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* Grow: Fixup a pile of cosmetic issuesJes Sorensen2017-03-291-28/+32
| | | | | | No code change, simply cleanup ugliness. Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* Grow: support consistency policy changeArtur Paszkiewicz2017-03-291-0/+172
| | | | | | | | | | | Extend the --consistency-policy parameter to work also in Grow mode. Using it changes the currently active consistency policy in the kernel driver and updates the metadata to make this change permanent. Currently this supports only changing between "ppl" and "resync" policies, that is enabling or disabling PPL at runtime. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* super1: PPL supportArtur Paszkiewicz2017-03-291-1/+14
| | | | | | | | | | | | | | | | Enable creating and assembling raid5 arrays with PPL for 1.x metadata. When creating, reserve enough space for PPL and store its size and location in the superblock and set MD_FEATURE_PPL bit. Write an initial empty header in the PPL area on each device. PPL is stored in the metadata region reserved for internal write-intent bitmap, so don't allow using bitmap and PPL together. While at it, fix two endianness issues in write_empty_r5l_meta_block() and write_init_super1(). Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* Add 'force' flag to *hot_remove_disk().NeilBrown2017-03-281-1/+1
| | | | | | | | | | | | | | | | | | In rare circumstances, the short period that *hot_remove_disk() waits isn't long enough to IO to complete. This particularly happens when a device is failing and many retries are still happening. We don't want to increase the normal wait time for "mdadm --remove" as that might be use just to test if a device is active or not, and a delay would be problematic. So allow "--force" to mean that mdadm should try extra hard for a --remove to complete, waiting up to 5 seconds. Note that this patch fixes a comment which claim the previous wait time was half a second, where it was really 50msec. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* Retry HOT_REMOVE_DISK a few times.NeilBrown2017-03-281-8/+1
| | | | | | | | | | | | | | | HOT_REMOVE_DISK can fail with EBUSY if there are outstanding IO request that have not completed yet. It can sometimes be helpful to wait a little while for these to complete. We already do this in impose_level() when reshaping a device, but not in Manage.c in response to an explicit --remove request. So create hot_remove_disk() to central this code, and call it where-ever it makes sense to wait for a HOT_REMOVE_DISK to succeed. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
* Increase buffer for sysfs disk stateTomasz Majchrzak2016-11-171-2/+4
| | | | | | | | | | | | Bad block support has incremented sysfs disk state reported by kernel ("external_bbl") so it became longer than 20 bytes. It causes reshape to fail as it reads truncated entry from sysfs. Increase buffer so it can accommodate the string including all state values currently implemented in kernel at the same time. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
* Allow level migration only for single-array containerMariusz Dabrowski2016-10-191-0/+20
| | | | | | | | | | | IMSM doesn't allow to change RAID level of array in container with two arrays but array count check is being done too late (after removing disks) and in some cases (e. g. RAID 0 and RAID 1 migrated to RAID 0) both arrays become degraded. This patch adds array count check before disks are being removed. Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
* MDADM:Check mdinfo->reshape_active more times before calling Grow_continueXiao Ni2016-06-161-30/+37
| | | | | | | | | | | | | | | | | When reshaping a 3 drives raid5 to 4 drives raid5, there is a chance that it can't start the reshape. If the disks are not enough to have spaces for relocating the data_offset, it needs to call start_reshape and then run mdadm --grow --continue by systemd. But mdadm --grow --continue fails because it checkes that info->reshape_active is 0. The info->reshape_active is got from the superblock of underlying devices. Function start_reshape write reshape to /sys/../sync_action. Before writing latest superblock to underlying devices, mdadm --grow --continue is called. There is a chance info->reshape_active is 0. We should wait for superblock updating more time before calling Grow_continue. Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
* Use dev_t for devnm2devid and devid2devnmMike Lovell2016-06-031-1/+1
| | | | | | | | | | | | | | | | | | | Commit 4dd2df0966ec added a trip through makedev(), major(), and minor() for device major and minor numbers. This would cause mdadm to fail in operating on a device with a minor number bigger than (2^19)-1 due to it changing from dev_t to a signed int and back. Where this was found as a problem was when a array was created with a device specified as a name like /dev/md/raidname and there were already 128 arrays on the system. In this case, mdadm would chose 1048575 ((2^20)-1) for the array and minor number. This would cause the major and minor number to become negative when generated from devnm2devid() and passed to major() and minor() in open_dev_excl(). open_dev_excl() would then call dev_open() which would detect the negative minor number and call open() on the *char containing the major:minor pair which isn't a valid file. Signed-off-by: Mike Lovell <mlovell@bluehost.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
* Grow: Apply some more consistent formatting to Grow_addbitmap()Jes Sorensen2016-05-121-20/+21
| | | | | | | This should be purely cosmetic and cause no functional change ... famous last words! Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
* Grow: Simplify error paths in Grow_addbitmap()Jes Sorensen2016-05-121-10/+10
| | | | | | | This gets rid of some repeated exit paths, making the code a little cleaner. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>