<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/linux.git/include/linux/slab.h, branch proc-cmdline</title>
<subtitle>git.kernel.org: pub/scm/linux/kernel/git/torvalds/linux.git
</subtitle>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/'/>
<entry>
<title>headers: untangle kmemleak.h from mm.h</title>
<updated>2018-04-06T04:36:27+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2018-04-05T23:25:34+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=514c60324960137e74457fdc233a339b985fa8a8'/>
<id>514c60324960137e74457fdc233a339b985fa8a8</id>
<content type='text'>
Currently &lt;linux/slab.h&gt; #includes &lt;linux/kmemleak.h&gt; for no obvious
reason.  It looks like it's only a convenience, so remove kmemleak.h
from slab.h and add &lt;linux/kmemleak.h&gt; to any users of kmemleak_* that
don't already #include it.  Also remove &lt;linux/kmemleak.h&gt; from source
files that do not use it.

This is tested on i386 allmodconfig and x86_64 allmodconfig.  It would
be good to run it through the 0day bot for other $ARCHes.  I have
neither the horsepower nor the storage space for the other $ARCHes.

Update: This patch has been extensively build-tested by both the 0day
bot &amp; kisskb/ozlabs build farms.  Both of them reported 2 build failures
for which patches are included here (in v2).

[ slab.h is the second most used header file after module.h; kernel.h is
  right there with slab.h. There could be some minor error in the
  counting due to some #includes having comments after them and I didn't
  combine all of those. ]

[akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
Link: http://kisskb.ellerman.id.au/kisskb/head/13396/
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reported-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;	[2 build failures]
Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;	[2 build failures]
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Wei Yongjun &lt;weiyongjun1@huawei.com&gt;
Cc: Luis R. Rodriguez &lt;mcgrof@kernel.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Mimi Zohar &lt;zohar@linux.vnet.ibm.com&gt;
Cc: John Johansen &lt;john.johansen@canonical.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently &lt;linux/slab.h&gt; #includes &lt;linux/kmemleak.h&gt; for no obvious
reason.  It looks like it's only a convenience, so remove kmemleak.h
from slab.h and add &lt;linux/kmemleak.h&gt; to any users of kmemleak_* that
don't already #include it.  Also remove &lt;linux/kmemleak.h&gt; from source
files that do not use it.

This is tested on i386 allmodconfig and x86_64 allmodconfig.  It would
be good to run it through the 0day bot for other $ARCHes.  I have
neither the horsepower nor the storage space for the other $ARCHes.

Update: This patch has been extensively build-tested by both the 0day
bot &amp; kisskb/ozlabs build farms.  Both of them reported 2 build failures
for which patches are included here (in v2).

[ slab.h is the second most used header file after module.h; kernel.h is
  right there with slab.h. There could be some minor error in the
  counting due to some #includes having comments after them and I didn't
  combine all of those. ]

[akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
Link: http://kisskb.ellerman.id.au/kisskb/head/13396/
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reported-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;	[2 build failures]
Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;	[2 build failures]
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Wei Yongjun &lt;weiyongjun1@huawei.com&gt;
Cc: Luis R. Rodriguez &lt;mcgrof@kernel.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Mimi Zohar &lt;zohar@linux.vnet.ibm.com&gt;
Cc: John Johansen &lt;john.johansen@canonical.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>slab: make usercopy region 32-bit</title>
<updated>2018-04-06T04:36:24+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2018-04-05T23:21:31+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=7bbdb81ee3de73f2381ceec1bbee831f4c913b5c'/>
<id>7bbdb81ee3de73f2381ceec1bbee831f4c913b5c</id>
<content type='text'>
If kmem case sizes are 32-bit, then usecopy region should be too.

Link: http://lkml.kernel.org/r/20180305200730.15812-21-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If kmem case sizes are 32-bit, then usecopy region should be too.

Link: http://lkml.kernel.org/r/20180305200730.15812-21-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>slab: make kmem_cache_create() work with 32-bit sizes</title>
<updated>2018-04-06T04:36:23+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2018-04-05T23:20:37+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=f4957d5bd09165b165df851fbf8c658f7fcd9922'/>
<id>f4957d5bd09165b165df851fbf8c658f7fcd9922</id>
<content type='text'>
struct kmem_cache::size and ::align were always 32-bit.

Out of curiosity I created 4GB kmem_cache, it oopsed with division by 0.
kmem_cache_create(1UL&lt;&lt;32+1) created 1-byte cache as expected.

size_t doesn't work and never did.

Link: http://lkml.kernel.org/r/20180305200730.15812-6-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
struct kmem_cache::size and ::align were always 32-bit.

Out of curiosity I created 4GB kmem_cache, it oopsed with division by 0.
kmem_cache_create(1UL&lt;&lt;32+1) created 1-byte cache as expected.

size_t doesn't work and never did.

Link: http://lkml.kernel.org/r/20180305200730.15812-6-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>slab: make kmalloc_size() return "unsigned int"</title>
<updated>2018-04-06T04:36:23+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2018-04-05T23:20:26+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=0be70327ec8cf6dd6847cbd8b75ca51be864a6ea'/>
<id>0be70327ec8cf6dd6847cbd8b75ca51be864a6ea</id>
<content type='text'>
kmalloc_size() derives size of kmalloc cache from internal index, which
can't be negative.

Propagate unsignedness a bit.

Link: http://lkml.kernel.org/r/20180305200730.15812-3-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
kmalloc_size() derives size of kmalloc cache from internal index, which
can't be negative.

Propagate unsignedness a bit.

Link: http://lkml.kernel.org/r/20180305200730.15812-3-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>slab: make kmalloc_index() return "unsigned int"</title>
<updated>2018-04-06T04:36:23+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2018-04-05T23:20:22+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=36071a279b4100afe9fbee18727ad78daa307591'/>
<id>36071a279b4100afe9fbee18727ad78daa307591</id>
<content type='text'>
kmalloc_index() return index into an array of kmalloc kmem caches,
therefore should be unsigned.

Space savings with SLUB on trimmed down .config:

	add/remove: 0/1 grow/shrink: 6/56 up/down: 85/-557 (-472)
	Function                                     old     new   delta
	calculate_sizes                              924     983     +59
	on_freelist                                  589     604     +15
	init_cache_random_seq                        122     127      +5
	ext4_mb_init                                1206    1210      +4
	slab_pad_check.part                          270     271      +1
	cpu_partial_store                            112     113      +1
	usersize_show                                 28      27      -1
		...
	new_slab                                    1871    1837     -34
	slab_order                                   204       -    -204

This patch start a series of converting SLUB (mostly) to "unsigned int".

1) Most integers in the code are in fact unsigned entities: array
   indexes, lengths, buffer sizes, allocation orders. It is therefore
   better to use unsigned variables

2) Some integers in the code are either "size_t" or "unsigned long" for
   no reason.

   size_t usually comes from people trying to maintain type correctness
   and figuring out that "sizeof" operator returns size_t or
   memset/memcpy takes size_t so should everything passed to it.

   However the number of 4GB+ objects in the kernel is very small. Most,
   if not all, dynamically allocated objects with kmalloc() or
   kmem_cache_create() aren't actually big. Maintaining wide types
   doesn't do anything.

   64-bit ops are bigger than 32-bit on our beloved x86_64,
   so try to not use 64-bit where it isn't necessary
   (read: everywhere where integers are integers not pointers)

3) in case of SLAB allocators, there are additional limitations
   *) page-&gt;inuse, page-&gt;objects are only 16-/15-bit,
   *) cache size was always 32-bit
   *) slab orders are small, order 20 is needed to go 64-bit on x86_64
      (PAGE_SIZE &lt;&lt; order)

Basically everything is 32-bit except kmalloc(1ULL&lt;&lt;32) which gets
shortcut through page allocator.

Christoph said:
:
: That changes with large base page size on power and ARM64 f.e. but then
: we do not want to encourage larger allocations through slab anyways.

Link: http://lkml.kernel.org/r/20180305200730.15812-2-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
kmalloc_index() return index into an array of kmalloc kmem caches,
therefore should be unsigned.

Space savings with SLUB on trimmed down .config:

	add/remove: 0/1 grow/shrink: 6/56 up/down: 85/-557 (-472)
	Function                                     old     new   delta
	calculate_sizes                              924     983     +59
	on_freelist                                  589     604     +15
	init_cache_random_seq                        122     127      +5
	ext4_mb_init                                1206    1210      +4
	slab_pad_check.part                          270     271      +1
	cpu_partial_store                            112     113      +1
	usersize_show                                 28      27      -1
		...
	new_slab                                    1871    1837     -34
	slab_order                                   204       -    -204

This patch start a series of converting SLUB (mostly) to "unsigned int".

1) Most integers in the code are in fact unsigned entities: array
   indexes, lengths, buffer sizes, allocation orders. It is therefore
   better to use unsigned variables

2) Some integers in the code are either "size_t" or "unsigned long" for
   no reason.

   size_t usually comes from people trying to maintain type correctness
   and figuring out that "sizeof" operator returns size_t or
   memset/memcpy takes size_t so should everything passed to it.

   However the number of 4GB+ objects in the kernel is very small. Most,
   if not all, dynamically allocated objects with kmalloc() or
   kmem_cache_create() aren't actually big. Maintaining wide types
   doesn't do anything.

   64-bit ops are bigger than 32-bit on our beloved x86_64,
   so try to not use 64-bit where it isn't necessary
   (read: everywhere where integers are integers not pointers)

3) in case of SLAB allocators, there are additional limitations
   *) page-&gt;inuse, page-&gt;objects are only 16-/15-bit,
   *) cache size was always 32-bit
   *) slab orders are small, order 20 is needed to go 64-bit on x86_64
      (PAGE_SIZE &lt;&lt; order)

Basically everything is 32-bit except kmalloc(1ULL&lt;&lt;32) which gets
shortcut through page allocator.

Christoph said:
:
: That changes with large base page size on power and ARM64 f.e. but then
: we do not want to encourage larger allocations through slab anyways.

Link: http://lkml.kernel.org/r/20180305200730.15812-2-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>usercopy: Allow strict enforcement of whitelists</title>
<updated>2018-01-15T20:07:48+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2017-11-30T21:04:32+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=2d891fbc3bb681ba1f826e7ee70dbe38ca7465fe'/>
<id>2d891fbc3bb681ba1f826e7ee70dbe38ca7465fe</id>
<content type='text'>
This introduces CONFIG_HARDENED_USERCOPY_FALLBACK to control the
behavior of hardened usercopy whitelist violations. By default, whitelist
violations will continue to WARN() so that any bad or missing usercopy
whitelists can be discovered without being too disruptive.

If this config is disabled at build time or a system is booted with
"slab_common.usercopy_fallback=0", usercopy whitelists will BUG() instead
of WARN(). This is useful for admins that want to use usercopy whitelists
immediately.

Suggested-by: Matthew Garrett &lt;mjg59@google.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This introduces CONFIG_HARDENED_USERCOPY_FALLBACK to control the
behavior of hardened usercopy whitelist violations. By default, whitelist
violations will continue to WARN() so that any bad or missing usercopy
whitelists can be discovered without being too disruptive.

If this config is disabled at build time or a system is booted with
"slab_common.usercopy_fallback=0", usercopy whitelists will BUG() instead
of WARN(). This is useful for admins that want to use usercopy whitelists
immediately.

Suggested-by: Matthew Garrett &lt;mjg59@google.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>usercopy: Prepare for usercopy whitelisting</title>
<updated>2018-01-15T20:07:47+00:00</updated>
<author>
<name>David Windsor</name>
<email>dave@nullcore.net</email>
</author>
<published>2017-06-11T02:50:28+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=8eb8284b412906181357c2b0110d879d5af95e52'/>
<id>8eb8284b412906181357c2b0110d879d5af95e52</id>
<content type='text'>
This patch prepares the slab allocator to handle caches having annotations
(useroffset and usersize) defining usercopy regions.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on
my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code.

Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory:           48074720
Usercopyable Memory:          6367532  13.2%
         task_struct                    0.2%         4480/1630720
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%       269760/8740224
         dentry                        11.1%       585984/5273856
         mm_struct                     29.1%         54912/188448
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          81920/81920
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        167936/167936
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        455616/455616
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        812032/812032
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory:           95516184
Usercopyable Memory:          8497452   8.8%
         task_struct                    0.2%         4000/1456000
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%     1217280/39439872
         dentry                        11.1%     1623200/14608800
         mm_struct                     29.1%         73216/251264
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          94208/94208
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        245760/245760
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        563520/563520
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        794624/794624
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1257472/1257472

Signed-off-by: David Windsor &lt;dave@nullcore.net&gt;
[kees: adjust commit log, split out a few extra kmalloc hunks]
[kees: add field names to function declarations]
[kees: convert BUGs to WARNs and fail closed]
[kees: add attack surface reduction analysis to commit log]
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch prepares the slab allocator to handle caches having annotations
(useroffset and usersize) defining usercopy regions.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on
my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code.

Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory:           48074720
Usercopyable Memory:          6367532  13.2%
         task_struct                    0.2%         4480/1630720
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%       269760/8740224
         dentry                        11.1%       585984/5273856
         mm_struct                     29.1%         54912/188448
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          81920/81920
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        167936/167936
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        455616/455616
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        812032/812032
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory:           95516184
Usercopyable Memory:          8497452   8.8%
         task_struct                    0.2%         4000/1456000
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%     1217280/39439872
         dentry                        11.1%     1623200/14608800
         mm_struct                     29.1%         73216/251264
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          94208/94208
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        245760/245760
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        563520/563520
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        794624/794624
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1257472/1257472

Signed-off-by: David Windsor &lt;dave@nullcore.net&gt;
[kees: adjust commit log, split out a few extra kmalloc hunks]
[kees: add field names to function declarations]
[kees: convert BUGs to WARNs and fail closed]
[kees: add attack surface reduction analysis to commit log]
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>usercopy: Include offset in hardened usercopy report</title>
<updated>2018-01-15T20:07:45+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2018-01-10T22:48:22+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=f4e6e289cb9cf67885b6b18b9d56d2c3e1c714a1'/>
<id>f4e6e289cb9cf67885b6b18b9d56d2c3e1c714a1</id>
<content type='text'>
This refactors the hardened usercopy code so that failure reporting can
happen within the checking functions instead of at the top level. This
simplifies the return value handling and allows more details and offsets
to be included in the report. Having the offset can be much more helpful
in understanding hardened usercopy bugs.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This refactors the hardened usercopy code so that failure reporting can
happen within the checking functions instead of at the top level. This
simplifies the return value handling and allows more details and offsets
to be included in the report. Having the offset can be much more helpful
in understanding hardened usercopy bugs.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: remove __GFP_COLD</title>
<updated>2017-11-16T02:21:06+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2017-11-16T01:38:03+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=453f85d43fa9ee243f0fc3ac4e1be45615301e3f'/>
<id>453f85d43fa9ee243f0fc3ac4e1be45615301e3f</id>
<content type='text'>
As the page free path makes no distinction between cache hot and cold
pages, there is no real useful ordering of pages in the free list that
allocation requests can take advantage of.  Juding from the users of
__GFP_COLD, it is likely that a number of them are the result of copying
other sites instead of actually measuring the impact.  Remove the
__GFP_COLD parameter which simplifies a number of paths in the page
allocator.

This is potentially controversial but bear in mind that the size of the
per-cpu pagelists versus modern cache sizes means that the whole per-cpu
list can often fit in the L3 cache.  Hence, there is only a potential
benefit for microbenchmarks that alloc/free pages in a tight loop.  It's
even worse when THP is taken into account which has little or no chance
of getting a cache-hot page as the per-cpu list is bypassed and the
zeroing of multiple pages will thrash the cache anyway.

The truncate microbenchmarks are not shown as this patch affects the
allocation path and not the free path.  A page fault microbenchmark was
tested but it showed no sigificant difference which is not surprising
given that the __GFP_COLD branches are a miniscule percentage of the
fault path.

Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As the page free path makes no distinction between cache hot and cold
pages, there is no real useful ordering of pages in the free list that
allocation requests can take advantage of.  Juding from the users of
__GFP_COLD, it is likely that a number of them are the result of copying
other sites instead of actually measuring the impact.  Remove the
__GFP_COLD parameter which simplifies a number of paths in the page
allocator.

This is potentially controversial but bear in mind that the size of the
per-cpu pagelists versus modern cache sizes means that the whole per-cpu
list can often fit in the L3 cache.  Hence, there is only a potential
benefit for microbenchmarks that alloc/free pages in a tight loop.  It's
even worse when THP is taken into account which has little or no chance
of getting a cache-hot page as the per-cpu list is bypassed and the
zeroing of multiple pages will thrash the cache anyway.

The truncate microbenchmarks are not shown as this patch affects the
allocation path and not the free path.  A page fault microbenchmark was
tested but it showed no sigificant difference which is not surprising
given that the __GFP_COLD branches are a miniscule percentage of the
fault path.

Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kmemcheck: remove whats left of NOTRACK flags</title>
<updated>2017-11-16T02:21:05+00:00</updated>
<author>
<name>Levin, Alexander (Sasha Levin)</name>
<email>alexander.levin@verizon.com</email>
</author>
<published>2017-11-16T01:35:58+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=d8be75663cec0069b85f80191abd2682ce4a512f'/>
<id>d8be75663cec0069b85f80191abd2682ce4a512f</id>
<content type='text'>
Now that kmemcheck is gone, we don't need the NOTRACK flags.

Link: http://lkml.kernel.org/r/20171007030159.22241-5-alexander.levin@verizon.com
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tim Hansen &lt;devtimhansen@gmail.com&gt;
Cc: Vegard Nossum &lt;vegardno@ifi.uio.no&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that kmemcheck is gone, we don't need the NOTRACK flags.

Link: http://lkml.kernel.org/r/20171007030159.22241-5-alexander.levin@verizon.com
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tim Hansen &lt;devtimhansen@gmail.com&gt;
Cc: Vegard Nossum &lt;vegardno@ifi.uio.no&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
