From 9b2e0016d04c6542ace0128eb82ecb3b10c97e43 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Sat, 9 Jan 2021 16:02:58 +0000 Subject: bvec/iter: disallow zero-length segment bvecs zero-length bvec segments are allowed in general, but not handled by bio and down the block layer so filtered out. This inconsistency may be confusing and prevent from optimisations. As zero-length segments are useless and places that were generating them are patched, declare them not allowed. Reviewed-by: Christoph Hellwig Signed-off-by: Pavel Begunkov Reviewed-by: Ming Lei Signed-off-by: Jens Axboe --- Documentation/block/biovecs.rst | 2 ++ Documentation/filesystems/porting.rst | 7 +++++++ 2 files changed, 9 insertions(+) (limited to 'Documentation') diff --git a/Documentation/block/biovecs.rst b/Documentation/block/biovecs.rst index 36771a131b56..ddb867e0185b 100644 --- a/Documentation/block/biovecs.rst +++ b/Documentation/block/biovecs.rst @@ -40,6 +40,8 @@ normal code doesn't have to deal with bi_bvec_done. There is a lower level advance function - bvec_iter_advance() - which takes a pointer to a biovec, not a bio; this is used by the bio integrity code. +As of 5.12 bvec segments with zero bv_len are not supported. + What's all this get us? ======================= diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index 867036aa90b8..c722d94f29ea 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -865,3 +865,10 @@ no matter what. Everything is handled by the caller. clone_private_mount() returns a longterm mount now, so the proper destructor of its result is kern_unmount() or kern_unmount_array(). + +--- + +**mandatory** + +zero-length bvec segments are disallowed, they must be filtered out before +passed on to an iterator. -- cgit v1.2.1 From c42bca92be928ce7dece5fc04cf68d0e37ee6718 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Sat, 9 Jan 2021 16:03:03 +0000 Subject: bio: don't copy bvec for direct IO The block layer spends quite a while in blkdev_direct_IO() to copy and initialise bio's bvec. However, if we've already got a bvec in the input iterator it might be reused in some cases, i.e. when new ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable performance boost, and it also reduces memory footprint. Suggested-by: Matthew Wilcox Reviewed-by: Christoph Hellwig Signed-off-by: Pavel Begunkov Reviewed-by: Ming Lei Signed-off-by: Jens Axboe --- Documentation/filesystems/porting.rst | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'Documentation') diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index c722d94f29ea..1f8cf8e10b34 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -872,3 +872,12 @@ its result is kern_unmount() or kern_unmount_array(). zero-length bvec segments are disallowed, they must be filtered out before passed on to an iterator. + +--- + +**mandatory** + +For bvec based itererators bio_iov_iter_get_pages() now doesn't copy bvecs but +uses the one provided. Anyone issuing kiocb-I/O should ensure that the bvec and +page references stay until I/O has completed, i.e. until ->ki_complete() has +been called or returned with non -EIOCBQUEUED code. -- cgit v1.2.1 From 67883ade7a98a7589ca50e97b1c7b7893886d30e Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 26 Jan 2021 15:52:38 +0100 Subject: f2fs: remove FAULT_ALLOC_BIO Sleeping bio allocations do not fail, which means that injecting an error into sleeping bio allocations is a little silly. Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Reviewed-by: Chaitanya Kulkarni Acked-by: Damien Le Moal Signed-off-by: Jens Axboe --- Documentation/filesystems/f2fs.rst | 1 - 1 file changed, 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index dae15c96e659..624f5f3ed93e 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -179,7 +179,6 @@ fault_type=%d Support configuring fault injection type, should be FAULT_KVMALLOC 0x000000002 FAULT_PAGE_ALLOC 0x000000004 FAULT_PAGE_GET 0x000000008 - FAULT_ALLOC_BIO 0x000000010 FAULT_ALLOC_NID 0x000000020 FAULT_ORPHAN 0x000000040 FAULT_BLOCK 0x000000080 -- cgit v1.2.1 From f1836426cea77fad342aa74bec8bf489a5d64b27 Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Thu, 28 Jan 2021 13:47:26 +0900 Subject: block: document zone_append_max_bytes attribute The description of the zone_append_max_bytes sysfs queue attribute is missing from Documentation/block/queue-sysfs.rst. Add it. Signed-off-by: Damien Le Moal Reviewed-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Martin K. Petersen Reviewed-by: Johannes Thumshirn Signed-off-by: Jens Axboe --- Documentation/block/queue-sysfs.rst | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst index 2638d3446b79..edc6e6960b96 100644 --- a/Documentation/block/queue-sysfs.rst +++ b/Documentation/block/queue-sysfs.rst @@ -261,6 +261,12 @@ For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES is not supported. +zone_append_max_bytes (RO) +-------------------------- +This is the maximum number of bytes that can be written to a sequential +zone of a zoned block device using a zone append write operation +(REQ_OP_ZONE_APPEND). This value is always 0 for regular block devices. + zoned (RO) ---------- This indicates if the device is a zoned block device and the zone model of the -- cgit v1.2.1 From a805a4fa4fa376bbc145762bb8b09caa2fa8af48 Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Thu, 28 Jan 2021 13:47:30 +0900 Subject: block: introduce zone_write_granularity limit Per ZBC and ZAC specifications, host-managed SMR hard-disks mandate that all writes into sequential write required zones be aligned to the device physical block size. However, NVMe ZNS does not have this constraint and allows write operations into sequential zones to be aligned to the device logical block size. This inconsistency does not help with software portability across device types. To solve this, introduce the zone_write_granularity queue limit to indicate the alignment constraint, in bytes, of write operations into zones of a zoned block device. This new limit is exported as a read-only sysfs queue attribute and the helper blk_queue_zone_write_granularity() introduced for drivers to set this limit. The function blk_queue_set_zoned() is modified to set this new limit to the device logical block size by default. NVMe ZNS devices as well as zoned nullb devices use this default value as is. The scsi disk driver is modified to execute the blk_queue_zone_write_granularity() helper to set the zone write granularity of host-managed SMR disks to the disk physical block size. The accessor functions queue_zone_write_granularity() and bdev_zone_write_granularity() are also introduced. Signed-off-by: Damien Le Moal Reviewed-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Reviewed-by: Chaitanya Kulkarni Reviewed-by: Johannes Thumshirn Signed-off-by: Jens Axboe --- Documentation/block/queue-sysfs.rst | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst index edc6e6960b96..4dc7f0d499a8 100644 --- a/Documentation/block/queue-sysfs.rst +++ b/Documentation/block/queue-sysfs.rst @@ -279,4 +279,11 @@ devices are described in the ZBC (Zoned Block Commands) and ZAC do not support zone commands, they will be treated as regular block devices and zoned will report "none". +zone_write_granularity (RO) +--------------------------- +This indicates the alignment constraint, in bytes, for write operations in +sequential zones of zoned block devices (devices with a zoned attributed +that reports "host-managed" or "host-aware"). This value is always 0 for +regular block devices. + Jens Axboe , February 2009 -- cgit v1.2.1