<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/linux.git/kernel/debug/debug_core.c, branch proc-cmdline</title>
<subtitle>git.kernel.org: pub/scm/linux/kernel/git/torvalds/linux.git
</subtitle>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/'/>
<entry>
<title>sched/headers: Prepare for new header dependencies before moving code to &lt;linux/sched/nmi.h&gt;</title>
<updated>2017-03-02T07:42:30+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-02-08T17:51:31+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=38b8d208a4544c9a26b10baec89b8a21042e5305'/>
<id>38b8d208a4544c9a26b10baec89b8a21042e5305</id>
<content type='text'>
We are going to move softlockup APIs out of &lt;linux/sched.h&gt;, which
will have to be picked up from other headers and a couple of .c files.

&lt;linux/nmi.h&gt; already includes &lt;linux/sched.h&gt;.

Include the &lt;linux/nmi.h&gt; header in the files that are going to need it.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We are going to move softlockup APIs out of &lt;linux/sched.h&gt;, which
will have to be picked up from other headers and a couple of .c files.

&lt;linux/nmi.h&gt; already includes &lt;linux/sched.h&gt;.

Include the &lt;linux/nmi.h&gt; header in the files that are going to need it.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/vmacache, sched/headers: Introduce 'struct vmacache' and move it from &lt;linux/sched.h&gt; to &lt;linux/mm_types&gt;</title>
<updated>2017-03-02T07:42:25+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-02-03T10:03:31+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=314ff7851fc8ea66cbf48eaa93d8ebfb5ca084a9'/>
<id>314ff7851fc8ea66cbf48eaa93d8ebfb5ca084a9</id>
<content type='text'>
The &lt;linux/sched.h&gt; header includes various vmacache related defines,
which are arguably misplaced.

Move them to mm_types.h and minimize the sched.h impact by putting
all task vmacache state into a new 'struct vmacache' structure.

No change in functionality.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The &lt;linux/sched.h&gt; header includes various vmacache related defines,
which are arguably misplaced.

Move them to mm_types.h and minimize the sched.h impact by putting
all task vmacache state into a new 'struct vmacache' structure.

No change in functionality.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kernel/debug/debug_core.c: more properly delay for secondary CPUs</title>
<updated>2016-12-15T00:04:08+00:00</updated>
<author>
<name>Douglas Anderson</name>
<email>dianders@chromium.org</email>
</author>
<published>2016-12-14T23:05:49+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=2d13bb6494c807bcf3f78af0e96c0b8615a94385'/>
<id>2d13bb6494c807bcf3f78af0e96c0b8615a94385</id>
<content type='text'>
We've got a delay loop waiting for secondary CPUs.  That loop uses
loops_per_jiffy.  However, loops_per_jiffy doesn't actually mean how
many tight loops make up a jiffy on all architectures.  It is quite
common to see things like this in the boot log:

  Calibrating delay loop (skipped), value calculated using timer
  frequency.. 48.00 BogoMIPS (lpj=24000)

In my case I was seeing lots of cases where other CPUs timed out
entering the debugger only to print their stack crawls shortly after the
kdb&gt; prompt was written.

Elsewhere in kgdb we already use udelay(), so that should be safe enough
to use to implement our timeout.  We'll delay 1 ms for 1000 times, which
should give us a full second of delay (just like the old code wanted)
but allow us to notice that we're done every 1 ms.

[akpm@linux-foundation.org: simplifications, per Daniel]
Link: http://lkml.kernel.org/r/1477091361-2039-1-git-send-email-dianders@chromium.org
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Reviewed-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Cc: Jason Wessel &lt;jason.wessel@windriver.com&gt;
Cc: Brian Norris &lt;briannorris@chromium.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.0+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We've got a delay loop waiting for secondary CPUs.  That loop uses
loops_per_jiffy.  However, loops_per_jiffy doesn't actually mean how
many tight loops make up a jiffy on all architectures.  It is quite
common to see things like this in the boot log:

  Calibrating delay loop (skipped), value calculated using timer
  frequency.. 48.00 BogoMIPS (lpj=24000)

In my case I was seeing lots of cases where other CPUs timed out
entering the debugger only to print their stack crawls shortly after the
kdb&gt; prompt was written.

Elsewhere in kgdb we already use udelay(), so that should be safe enough
to use to implement our timeout.  We'll delay 1 ms for 1000 times, which
should give us a full second of delay (just like the old code wanted)
but allow us to notice that we're done every 1 ms.

[akpm@linux-foundation.org: simplifications, per Daniel]
Link: http://lkml.kernel.org/r/1477091361-2039-1-git-send-email-dianders@chromium.org
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Reviewed-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Cc: Jason Wessel &lt;jason.wessel@windriver.com&gt;
Cc: Brian Norris &lt;briannorris@chromium.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.0+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>debug: prevent entering debug mode on panic/exception.</title>
<updated>2015-02-19T18:39:03+00:00</updated>
<author>
<name>Colin Cross</name>
<email>ccross@android.com</email>
</author>
<published>2015-01-28T11:32:14+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=5516fd7b92a7c16b9111a88aaa1b13dd937d8ac5'/>
<id>5516fd7b92a7c16b9111a88aaa1b13dd937d8ac5</id>
<content type='text'>
On non-developer devices, kgdb prevents the device from rebooting
after a panic.

Incase of panics and exceptions, to allow the device to reboot, prevent
entering debug mode to avoid getting stuck waiting for the user to
interact with debugger.

To avoid entering the debugger on panic/exception without any extra
configuration, panic_timeout is being used which can be set via
/proc/sys/kernel/panic at run time and CONFIG_PANIC_TIMEOUT sets the
default value.

Setting panic_timeout indicates that the user requested machine to
perform unattended reboot after panic. We dont want to get stuck waiting
for the user input incase of panic.

Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: kgdb-bugreport@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Cc: Android Kernel Team &lt;kernel-team@android.com&gt;
Cc: John Stultz &lt;john.stultz@linaro.org&gt;
Cc: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Colin Cross &lt;ccross@android.com&gt;
[Kiran: Added context to commit message.
panic_timeout is used instead of break_on_panic and
break_on_exception to honor CONFIG_PANIC_TIMEOUT
Modified the commit as per community feedback]
Signed-off-by: Kiran Raparthy &lt;kiran.kumar@linaro.org&gt;
Signed-off-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Signed-off-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On non-developer devices, kgdb prevents the device from rebooting
after a panic.

Incase of panics and exceptions, to allow the device to reboot, prevent
entering debug mode to avoid getting stuck waiting for the user to
interact with debugger.

To avoid entering the debugger on panic/exception without any extra
configuration, panic_timeout is being used which can be set via
/proc/sys/kernel/panic at run time and CONFIG_PANIC_TIMEOUT sets the
default value.

Setting panic_timeout indicates that the user requested machine to
perform unattended reboot after panic. We dont want to get stuck waiting
for the user input incase of panic.

Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: kgdb-bugreport@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Cc: Android Kernel Team &lt;kernel-team@android.com&gt;
Cc: John Stultz &lt;john.stultz@linaro.org&gt;
Cc: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Colin Cross &lt;ccross@android.com&gt;
[Kiran: Added context to commit message.
panic_timeout is used instead of break_on_panic and
break_on_exception to honor CONFIG_PANIC_TIMEOUT
Modified the commit as per community feedback]
Signed-off-by: Kiran Raparthy &lt;kiran.kumar@linaro.org&gt;
Signed-off-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Signed-off-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kdb: Fix off by one error in kdb_cpu()</title>
<updated>2015-02-19T18:39:02+00:00</updated>
<author>
<name>Jason Wessel</name>
<email>jason.wessel@windriver.com</email>
</author>
<published>2015-01-08T21:46:55+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=df0036d117e6c9df36324e517728e33543065f9a'/>
<id>df0036d117e6c9df36324e517728e33543065f9a</id>
<content type='text'>
There was a follow on replacement patch against the prior
"kgdb: Timeout if secondary CPUs ignore the roundup".

See: https://lkml.org/lkml/2015/1/7/442

This patch is the delta vs the patch that was committed upstream:
  * Fix an off-by-one error in kdb_cpu().
  * Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we
    really want a static limit.
  * Removed the "KGDB: " prefix from the pr_crit() in debug_core.c
    (kgdb-next contains a patch which introduced pr_fmt() to this file
    to the tag will now be applied automatically).

Cc: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There was a follow on replacement patch against the prior
"kgdb: Timeout if secondary CPUs ignore the roundup".

See: https://lkml.org/lkml/2015/1/7/442

This patch is the delta vs the patch that was committed upstream:
  * Fix an off-by-one error in kdb_cpu().
  * Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we
    really want a static limit.
  * Removed the "KGDB: " prefix from the pr_crit() in debug_core.c
    (kgdb-next contains a patch which introduced pr_fmt() to this file
    to the tag will now be applied automatically).

Cc: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kernel/debug/debug_core.c: Logging clean-up</title>
<updated>2014-11-11T15:31:53+00:00</updated>
<author>
<name>Fabian Frederick</name>
<email>fabf@skynet.be</email>
</author>
<published>2014-06-12T19:30:11+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=0f16996cf2ed7c368dd95b4c517ce572b96a10f5'/>
<id>0f16996cf2ed7c368dd95b4c517ce572b96a10f5</id>
<content type='text'>
-Convert printk( to pr_foo()
-Add pr_fmt
-Coalesce formats

Cc: Jason Wessel &lt;jason.wessel@windriver.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Signed-off-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
-Convert printk( to pr_foo()
-Add pr_fmt
-Coalesce formats

Cc: Jason Wessel &lt;jason.wessel@windriver.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Signed-off-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kgdb: timeout if secondary CPUs ignore the roundup</title>
<updated>2014-11-11T15:31:53+00:00</updated>
<author>
<name>Daniel Thompson</name>
<email>daniel.thompson@linaro.org</email>
</author>
<published>2014-11-11T15:31:53+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=a1465d2f396e416a0049332b20fca5977384b9f5'/>
<id>a1465d2f396e416a0049332b20fca5977384b9f5</id>
<content type='text'>
Currently if an active CPU fails to respond to a roundup request the CPU
that requested the roundup will become stuck.  This needlessly reduces the
robustness of the debugger.

This patch introduces a timeout allowing the system state to be examined
even when the system contains unresponsive processors.  It also modifies
kdb's cpu command to make it censor attempts to switch to unresponsive
processors and to report their state as (D)ead.

Signed-off-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Cc: Jason Wessel &lt;jason.wessel@windriver.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently if an active CPU fails to respond to a roundup request the CPU
that requested the roundup will become stuck.  This needlessly reduces the
robustness of the debugger.

This patch introduces a timeout allowing the system state to be examined
even when the system contains unresponsive processors.  It also modifies
kdb's cpu command to make it censor attempts to switch to unresponsive
processors and to report their state as (D)ead.

Signed-off-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Cc: Jason Wessel &lt;jason.wessel@windriver.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arch: Mass conversion of smp_mb__*()</title>
<updated>2014-04-18T12:20:48+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2014-03-17T17:06:10+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=4e857c58efeb99393cba5a5d0d8ec7117183137c'/>
<id>4e857c58efeb99393cba5a5d0d8ec7117183137c</id>
<content type='text'>
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: per-thread vma caching</title>
<updated>2014-04-07T23:35:53+00:00</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr@hp.com</email>
</author>
<published>2014-04-07T22:37:25+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=615d6e8756c87149f2d4c1b93d471bca002bd849'/>
<id>615d6e8756c87149f2d4c1b93d471bca002bd849</id>
<content type='text'>
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Tested-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Tested-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>KGDB: make kgdb_breakpoint() as noinline</title>
<updated>2014-02-26T11:16:25+00:00</updated>
<author>
<name>Vijaya Kumar K</name>
<email>Vijaya.Kumar@caviumnetworks.com</email>
</author>
<published>2014-01-28T11:20:20+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/linux.git/commit/?id=d498d4b47fb3050f2f7840cc49251f87f04d1ca9'/>
<id>d498d4b47fb3050f2f7840cc49251f87f04d1ca9</id>
<content type='text'>
The function kgdb_breakpoint() sets up break point at
compile time by calling arch_kgdb_breakpoint();
Though this call is surrounded by wmb() barrier,
the compile can still re-order the break point,
because this scheduling barrier is not a code motion
barrier in gcc.

Making kgdb_breakpoint() as noinline solves this problem
of code reording around break point instruction and also
avoids problem of being called as inline function from
other places

More details about discussion on this can be found here
http://comments.gmane.org/gmane.linux.ports.arm.kernel/269732

Signed-off-by: Vijaya Kumar K &lt;Vijaya.Kumar@caviumnetworks.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Acked-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The function kgdb_breakpoint() sets up break point at
compile time by calling arch_kgdb_breakpoint();
Though this call is surrounded by wmb() barrier,
the compile can still re-order the break point,
because this scheduling barrier is not a code motion
barrier in gcc.

Making kgdb_breakpoint() as noinline solves this problem
of code reording around break point instruction and also
avoids problem of being called as inline function from
other places

More details about discussion on this can be found here
http://comments.gmane.org/gmane.linux.ports.arm.kernel/269732

Signed-off-by: Vijaya Kumar K &lt;Vijaya.Kumar@caviumnetworks.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Acked-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
