libitm/libitm.texi


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774

\input texinfo @c -*-texinfo-*-

@c %**start of header
@setfilename libitm.info
@settitle GNU libitm
@c %**end of header


@copying
Copyright @copyright{} 2011 Free Software Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled ``GNU
Free Documentation License''.
@end copying

@ifinfo
@dircategory GNU Libraries
@direntry
* libitm: (libitm).                    GNU Transactional Memory Library
@end direntry

This manual documents the GNU Transactional Memory Library.

@insertcopying
@end ifinfo


@setchapternewpage odd

@titlepage
@title The GNU Transactional Memory Library
@page
@vskip 0pt plus 1filll
@comment For the @value{version-GCC} Version*
@sp 1
@insertcopying
@end titlepage

@summarycontents
@contents
@page


@node Top
@top Introduction
@cindex Introduction

This manual documents the usage and internals of libitm, the GNU Transactional
Memory Library. It provides transaction support for accesses to a process'
memory, enabling easy-to-use synchronization of accesses to shared memory by
several threads.


@comment
@comment  When you add a new menu item, please keep the right hand
@comment  aligned to the same column.  Do not use tabs.  This provides
@comment  better formatting.
@comment
@menu
* Enabling libitm::            How to enable libitm for your applications.
* C/C++ Language Constructs for TM::
                               Notes on the language-level interface supported
                               by gcc.
* The libitm ABI::             Notes on the external ABI provided by libitm.
* Internals::                  Notes on libitm's internal synchronization.
* GNU Free Documentation License::
                               How you can copy and share this manual.
* Index::                      Index of this documentation.
@end menu


@c ---------------------------------------------------------------------
@c Enabling libitm
@c ---------------------------------------------------------------------

@node Enabling libitm
@chapter Enabling libitm

To activate support for TM in C/C++, the compile-time flag @option{-fgnu-tm}
must be specified. This enables TM language-level constructs such as
transaction statements (e.g., @code{__transaction_atomic}, @pxref{C/C++
Language Constructs for TM} for details).

@c ---------------------------------------------------------------------
@c C/C++ Language Constructs for TM
@c ---------------------------------------------------------------------

@node C/C++ Language Constructs for TM
@chapter C/C++ Language Constructs for TM

Transactions are supported in C++ and C in the form of transaction statements,
transaction expressions, and function transactions. In the following example,
both @code{a} and @code{b} will be read and the difference will be written to
@code{c}, all atomically and isolated from other transactions:

@example
__transaction_atomic @{ c = a - b; @}
@end example

Therefore, another thread can use the following code to concurrently update
@code{b} without ever causing @code{c} to hold a negative value (and without
having to use other synchronization constructs such as locks or C++11
atomics):

@example
__transaction_atomic @{ if (a > b) b++; @}
@end example

GCC follows the @uref{https://sites.google.com/site/tmforcplusplus/, Draft
Specification of Transactional Language Constructs for C++ (v1.1)} in its
implementation of transactions.

The precise semantics of transactions are defined in terms of the C++11/C11
memory model (see the specification). Roughly, transactions provide
synchronization guarantees that are similar to what would be guaranteed when
using a single global lock as a guard for all transactions. Note that like
other synchronization constructs in C/C++, transactions rely on a
data-race-free program (e.g., a nontransactional write that is concurrent
with a transactional read to the same memory location is a data race).

@c ---------------------------------------------------------------------
@c The libitm ABI
@c ---------------------------------------------------------------------

@node The libitm ABI
@chapter The libitm ABI

The ABI provided by libitm is basically equal to the Linux variant of Intel's
current TM ABI specification document (Revision 1.1, May 6 2009) but with the
differences listed in this chapter. It would be good if these changes would
eventually be merged into a future version of this specification. To ease
look-up, the following subsections mirror the structure of this specification.

@section [No changes] Objectives
@section [No changes] Non-objectives

@section Library design principles
@subsection [No changes] Calling conventions
@subsection [No changes] TM library algorithms
@subsection [No changes] Optimized load and store routines
@subsection [No changes] Aligned load and store routines

@subsection Data logging functions

The memory locations accessed with transactional loads and stores and the
memory locations whose values are logged must not overlap. This required
separation only extends to the scope of the execution of one transaction
including all the executions of all nested transactions.

The compiler must be consistent (within the scope of a single transaction)
about which memory locations are shared and which are not shared with other
threads (i.e., data must be accessed either transactionally or
nontransactionally). Otherwise, non-write-through TM algorithms would not work.

For memory locations on the stack, this requirement extends to only the
lifetime of the stack frame that the memory location belongs to (or the
lifetime of the transaction, whichever is shorter).  Thus, memory that is
reused for several stack frames could be target of both data logging and
transactional accesses; however, this is harmless because these stack frames'
lifetimes will end before the transaction finishes.

@subsection [No changes] Scatter/gather calls
@subsection [No changes] Serial and irrevocable mode
@subsection [No changes] Transaction descriptor
@subsection Store allocation

There is no @code{getTransaction} function. 

@subsection [No changes] Naming conventions

@subsection Function pointer encryption

Currently, this is not implemented.


@section Types and macros list

@code{_ITM_codeProperties} has changed, @pxref{txn-code-properties,,Starting a
transaction}.
@code{_ITM_srcLocation} is not used. 


@section Function list

@subsection Initialization and finalization functions
These functions are not part of the ABI.

@subsection [No changes] Version checking
@subsection [No changes] Error reporting
@subsection [No changes] inTransaction call

@subsection State manipulation functions
There is no @code{getTransaction} function. Transaction identifiers for
nested transactions will be ordered but not necessarily sequential (i.e., for
a nested transaction's identifier @var{IN} and its enclosing transaction's
identifier @var{IE}, it is guaranteed that @math{IN >= IE}).

@subsection [No changes] Source locations

@subsection Starting a transaction

@subsubsection Transaction code properties

@anchor{txn-code-properties}
The bit @code{hasNoXMMUpdate} is instead called @code{hasNoVectorUpdate}.
Iff it is set, vector register save/restore is not necessary for any target
machine.

The @code{hasNoFloatUpdate} bit (@code{0x0010}) is new. Iff it is set, floating
point register save/restore is not necessary for any target machine.

@code{undoLogCode} is not supported and a fatal runtime error will be raised
if this bit is set. It is not properly defined in the ABI why barriers
other than undo logging are not present; Are they not necessary (e.g., a
transaction operating purely on thread-local data) or have they been omitted by
the compiler because it thinks that some kind of global synchronization
(e.g., serial mode) might perform better? The specification suggests that the
latter might be the case, but the former seems to be more useful.

The @code{readOnly} bit (@code{0x4000}) is new. @strong{TODO} Lexical or dynamic
scope?

@code{hasNoRetry} is not supported. If this bit is not set, but
@code{hasNoAbort} is set, the library can assume that transaction
rollback will not be requested.

It would be useful if the absence of externally-triggered rollbacks would be
reported for the dynamic scope as well, not just for the lexical scope
(@code{hasNoAbort}). Without this, a library cannot exploit this together
with flat nesting.

@code{exceptionBlock} is not supported because exception blocks are not used.

@subsubsection [No changes] Windows exception state
@subsubsection [No changes] Other machine state

@subsubsection [No changes] Results from beginTransaction

@subsection Aborting a transaction

@code{_ITM_rollbackTransaction} is not supported. @code{_ITM_abortTransaction}
is supported but the abort reasons @code{exceptionBlockAbort},
@code{TMConflict}, and @code{userRetry} are not supported. There are no
exception blocks in general, so the related cases also do not have to be
considered. To encode @code{__transaction_cancel [[outer]]}, compilers must
set the new @code{outerAbort} bit (@code{0x10}) additionally to the
@code{userAbort} bit in the abort reason.

@subsection Committing a transaction

The exception handling (EH) scheme is different. The Intel ABI requires the
@code{_ITM_tryCommitTransaction} function that will return even when the
commit failed and will have to be matched with calls to either
@code{_ITM_abortTransaction} or @code{_ITM_commitTransaction}. In contrast,
gcc relies on transactional wrappers for the functions of the Exception
Handling ABI and on one additional commit function (shown below). This allows
the TM to keep track of EH internally and thus it does not have to embed the
cleanup of EH state into the existing EH code in the program.
@code{_ITM_tryCommitTransaction} is not supported.
@code{_ITM_commitTransactionToId} is also not supported because the
propagation of thrown exceptions will not bypass commits of nested
transactions.

@example
void _ITM_commitTransactionEH(void *exc_ptr) ITM_REGPARM;
void *_ITM_cxa_allocate_exception (size_t);
void _ITM_cxa_throw (void *obj, void *tinfo, void *dest);
void *_ITM_cxa_begin_catch (void *exc_ptr);
void _ITM_cxa_end_catch (void);
@end example

@code{_ITM_commitTransactionEH} must be called to commit a transaction if an
exception could be in flight at this position in the code. @code{exc_ptr} is
the current exception or zero if there is no current exception.
The @code{_ITM_cxa...} functions are transactional wrappers for the respective
@code{__cxa...} functions and must be called instead of these in transactional
code.

To support this EH scheme, libstdc++ needs to provide one additional function
(@code{_cxa_tm_cleanup}), which is used by the TM to clean up the exception
handling state while rolling back a transaction:

@example
void __cxa_tm_cleanup (void *unthrown_obj, void *cleanup_exc,
                       unsigned int caught_count);
@end example

@code{unthrown_obj} is non-null if the program called
@code{__cxa_allocate_exception} for this exception but did not yet called
@code{__cxa_throw} for it. @code{cleanup_exc} is non-null if the program is
currently processing a cleanup along an exception path but has not caught this
exception yet. @code{caught_count} is the nesting depth of
@code{__cxa_begin_catch} within the transaction (which can be counted by the TM
using @code{_ITM_cxa_begin_catch} and @code{_ITM_cxa_end_catch});
@code{__cxa_tm_cleanup} then performs rollback by essentially performing
@code{__cxa_end_catch} that many times.


@subsection Exception handling support

Currently, there is no support for functionality like
@code{__transaction_cancel throw} as described in the C++ TM specification.
Supporting this should be possible with the EH scheme explained previously
because via the transactional wrappers for the EH ABI, the TM is able to
observe and intercept EH.


@subsection [No changes] Transition to serial--irrevocable mode
@subsection [No changes] Data transfer functions
@subsection [No changes] Transactional memory copies

@subsection Transactional versions of memmove

If either the source or destination memory region is to be accessed
nontransactionally, then source and destination regions must not be
overlapping. The respective @code{_ITM_memmove} functions are still
available but a fatal runtime error will be raised if such regions do overlap.
To support this functionality, the ABI would have to specify how the
intersection of the regions has to be accessed (i.e., transactionally or
nontransactionally).

@subsection [No changes] Transactional versions of memset
@subsection [No changes] Logging functions

@subsection User-registered commit and undo actions

Commit actions will get executed in the same order in which the respective
calls to @code{_ITM_addUserCommitAction} happened. Only
@code{_ITM_noTransactionId} is allowed as value for the
@code{resumingTransactionId} argument. Commit actions get executed after
privatization safety has been ensured.

Undo actions will get executed in reverse order compared to the order in which
the respective calls to @code{_ITM_addUserUndoAction} happened. The ordering of
undo actions w.r.t. the roll-back of other actions (e.g., data transfers or
memory allocations) is undefined.

@code{_ITM_getThreadnum} is not supported currently because its only purpose
is to provide a thread ID that matches some assumed performance tuning output,
but this output is not part of the ABI nor further defined by it.

@code{_ITM_dropReferences} is not supported currently because its semantics and
the intention behind it is not entirely clear. The
specification suggests that this function is necessary because of certain
orderings of data transfer undos and the releasing of memory regions (i.e.,
privatization). However, this ordering is never defined, nor is the ordering of
dropping references w.r.t. other events.

@subsection [New] Transactional indirect calls

Indirect calls (i.e., calls through a function pointer) within transactions
should execute the transactional clone of the original function (i.e., a clone
of the original that has been fully instrumented to use the TM runtime), if
such a clone is available. The runtime provides two functions to
register/deregister clone tables:

@example
struct clone_entry
@{
  void *orig, *clone;
@};

void _ITM_registerTMCloneTable (clone_entry *table, size_t entries);
void _ITM_deregisterTMCloneTable (clone_entry *table);
@end example

Registered tables must be writable by the TM runtime, and must be live
throughout the life-time of the TM runtime.

@strong{TODO} The intention was always to drop the registration functions
entirely, and create a new ELF Phdr describing the linker-sorted table.  Much
like what currently happens for @code{PT_GNU_EH_FRAME}.
This work kept getting bogged down in how to represent the @var{N} different
code generation variants.  We clearly needed at least two---SW and HW
transactional clones---but there was always a suggestion of more variants for
different TM assumptions/invariants.

The compiler can then use two TM runtime functions to perform indirect calls in
transactions:
@example
void *_ITM_getTMCloneOrIrrevocable (void *function) ITM_REGPARM;
void *_ITM_getTMCloneSafe (void *function) ITM_REGPARM;
@end example

If there is a registered clone for supplied function, both will return a
pointer to the clone. If not, the first runtime function will attempt to switch
to serial--irrevocable mode and return the original pointer, whereas the second
will raise a fatal runtime error.

@subsection [New] Transactional dynamic memory management

@example
void *_ITM_malloc (size_t)
       __attribute__((__malloc__)) ITM_PURE;
void *_ITM_calloc (size_t, size_t)
       __attribute__((__malloc__)) ITM_PURE;
void _ITM_free (void *) ITM_PURE;
@end example

These functions are essentially transactional wrappers for @code{malloc},
@code{calloc}, and @code{free}. Within transactions, the compiler should
replace calls to the original functions with calls to the wrapper functions.


@section [No changes] Future Enhancements to the ABI

@section Sample code 

The code examples might not be correct w.r.t. the current version of the ABI,
especially everything related to exception handling.


@section [New] Memory model

The ABI should define a memory model and the ordering that is guaranteed for
data transfers and commit/undo actions, or at least refer to another memory
model that needs to be preserved. Without that, the compiler cannot ensure the
memory model specified on the level of the programming language (e.g., by the
C++ TM specification).

For example, if a transactional load is ordered before another load/store, then
the TM runtime must also ensure this ordering when accessing shared state. If
not, this might break the kind of publication safety used in the C++ TM
specification. Likewise, the TM runtime must ensure privatization safety.


@c ---------------------------------------------------------------------
@c Internals
@c ---------------------------------------------------------------------

@node Internals
@chapter Internals

@section TM methods and method groups

libitm supports several ways of synchronizing transactions with each other.
These TM methods (or TM algorithms) are implemented in the form of
subclasses of @code{abi_dispatch}, which provide methods for
transactional loads and stores as well as callbacks for rollback and commit.
All methods that are compatible with each other (i.e., that let concurrently
running transactions still synchronize correctly even if different methods
are used) belong to the same TM method group. Pointers to TM methods can be
obtained using the factory methods prefixed with @code{dispatch_} in
@file{libitm_i.h}. There are two special methods, @code{dispatch_serial} and
@code{dispatch_serialirr}, that are compatible with all methods because they
run transactions completely in serial mode.

@subsection TM method life cycle

The state of TM methods does not change after construction, but they do alter
the state of transactions that use this method. However, because
per-transaction data gets used by several methods, @code{gtm_thread} is
responsible for setting an initial state that is useful for all methods.
After that, methods are responsible for resetting/clearing this state on each
rollback or commit (of outermost transactions), so that the transaction
executed next is not affected by the previous transaction.

There is also global state associated with each method group, which is
initialized and shut down (@code{method_group::init()} and @code{fini()})
when switching between method groups (see @file{retry.cc}).

@subsection Selecting the default method

The default method that libitm uses for freshly started transactions (but
not necessarily for restarted transactions) can be set via an environment
variable (@env{ITM_DEFAULT_METHOD}), whose value should be equal to the name
of one of the factory methods returning abi_dispatch subclasses but without
the "dispatch_" prefix (e.g., "serialirr" instead of
@code{GTM::dispatch_serialirr()}).

Note that this environment variable is only a hint for libitm and might not
be supported in the future.


@section Nesting: flat vs. closed

We support two different kinds of nesting of transactions. In the case of
@emph{flat nesting}, the nesting structure is flattened and all nested
transactions are subsumed by the enclosing transaction. In contrast,
with @emph{closed nesting}, nested transactions that have not yet committed
can be rolled back separately from the enclosing transactions; when they
commit, they are subsumed by the enclosing transaction, and their effects
will be finally committed when the outermost transaction commits.
@emph{Open nesting} (where nested transactions can commit independently of the
enclosing transactions) are not supported.

Flat nesting is the default nesting mode, but closed nesting is supported and
used when transactions contain user-controlled aborts
(@code{__transaction_cancel} statements). We assume that user-controlled
aborts are rare in typical code and used mostly in exceptional situations.
Thus, it makes more sense to use flat nesting by default to avoid the
performance overhead of the additional checkpoints required for closed
nesting. User-controlled aborts will correctly abort the innermost enclosing
transaction, whereas the whole (i.e., outermost) transaction will be restarted
otherwise (e.g., when a transaction encounters data conflicts during
optimistic execution).


@section Locking conventions

This section documents the locking scheme and rules for all uses of locking
in libitm. We have to support serial(-irrevocable) mode, which is implemented
using a global lock as explained next (called the @emph{serial lock}). To
simplify the overall design, we use the same lock as catch-all locking
mechanism for other infrequent tasks such as (de)registering clone tables or
threads. Besides the serial lock, there are @emph{per-method-group locks} that
are managed by specific method groups (i.e., groups of similar TM concurrency
control algorithms), and lock-like constructs for quiescence-based operations
such as ensuring privatization safety.

Thus, the actions that participate in the libitm-internal locking are either
@emph{active transactions} that do not run in serial mode, @emph{serial
transactions} (which (are about to) run in serial mode), and management tasks
that do not execute within a transaction but have acquired the serial mode
like a serial transaction would do (e.g., to be able to register threads with
libitm). Transactions become active as soon as they have successfully used the
serial lock to announce this globally (@pxref{serial-lock-impl,,Serial lock
implementation}). Likewise, transactions become serial transactions as soon as
they have acquired the exclusive rights provided by the serial lock (i.e.,
serial mode, which also means that there are no other concurrent active or
serial transactions). Note that active transactions can become serial
transactions when they enter serial mode during the runtime of the
transaction.

@subsection State-to-lock mapping

Application data is protected by the serial lock if there is a serial
transaction and no concurrently running active transaction (i.e., non-serial).
Otherwise, application data is protected by the currently selected method
group, which might use per-method-group locks or other mechanisms. Also note
that application data that is about to be privatized might not be allowed to be
accessed by nontransactional code until privatization safety has been ensured;
the details of this are handled by the current method group.

libitm-internal state is either protected by the serial lock or accessed
through custom concurrent code. The latter applies to the public/shared part
of a transaction object and most typical method-group-specific state.

The former category (protected by the serial lock) includes:
@itemize @bullet
@item The list of active threads that have used transactions.
@item The tables that map functions to their transactional clones.
@item The current selection of which method group to use.
@item Some method-group-specific data, or invariants of this data. For example,
resetting a method group to its initial state is handled by switching to the
same method group, so the serial lock protects such resetting as well.
@end itemize
In general, such state is immutable whenever there exists an active
(non-serial) transaction. If there is no active transaction, a serial
transaction (or a thread that is not currently executing a transaction but has
acquired the serial lock) is allowed to modify this state (but must of course
be careful to not surprise the current method group's implementation with such
modifications).

@subsection Lock acquisition order

To prevent deadlocks, locks acquisition must happen in a globally agreed-upon
order. Note that this applies to other forms of blocking too, but does not
necessarily apply to lock acquisitions that do not block (e.g., trylock()
calls that do not get retried forever). Note that serial transactions are
never return back to active transactions until the transaction has committed.
Likewise, active transactions stay active until they have committed.
Per-method-group locks are typically also not released before commit.

Lock acquisition / blocking rules:
@itemize @bullet

@item Transactions must become active or serial before they are allowed to
use method-group-specific locks or blocking (i.e., the serial lock must be
acquired before those other locks, either in serial or nonserial mode).

@item Any number of threads that do not currently run active transactions can
block while trying to get the serial lock in exclusive mode. Note that active
transactions must not block when trying to upgrade to serial mode unless there
is no other transaction that is trying that (the latter is ensured by the
serial lock implementation.

@item Method groups must prevent deadlocks on their locks. In particular, they
must also be prepared for another active transaction that has acquired
method-group-specific locks but is blocked during an attempt to upgrade to
being a serial transaction. See below for details.

@item Serial transactions can acquire method-group-specific locks because there
will be no other active nor serial transaction.

@end itemize

There is no single rule for per-method-group blocking because this depends on
when a TM method might acquire locks. If no active transaction can upgrade to
being a serial transaction after it has acquired per-method-group locks (e.g.,
when those locks are only acquired during an attempt to commit), then the TM
method does not need to consider a potential deadlock due to serial mode.

If there can be upgrades to serial mode after the acquisition of
per-method-group locks, then TM methods need to avoid those deadlocks:
@itemize @bullet
@item When upgrading to a serial transaction, after acquiring exclusive rights
to the serial lock but before waiting for concurrent active transactions to
finish (@pxref{serial-lock-impl,,Serial lock implementation} for details),
we have to wake up all active transactions waiting on the upgrader's
per-method-group locks.
@item Active transactions blocking on per-method-group locks need to check the
serial lock and abort if there is a pending serial transaction.
@item Lost wake-ups have to be prevented (e.g., by changing a bit in each
per-method-group lock before doing the wake-up, and only blocking on this lock
using a futex if this bit is not group).
@end itemize

@strong{TODO}: Can reuse serial lock for gl-*? And if we can, does it make
sense to introduce further complexity in the serial lock? For gl-*, we can
really only avoid an abort if we do -wb and -vbv.


@subsection Serial lock implementation
@anchor{serial-lock-impl}

The serial lock implementation is optimized towards assuming that serial
transactions are infrequent and not the common case. However, the performance
of entering serial mode can matter because when only few transactions are run
concurrently or if there are few threads, then it can be efficient to run
transactions serially.

The serial lock is similar to a multi-reader-single-writer lock in that there
can be several active transactions but only one serial transaction. However,
we do want to avoid contention (in the lock implementation) between active
transactions, so we split up the reader side of the lock into per-transaction
flags that are true iff the transaction is active. The exclusive writer side
remains a shared single flag, which is acquired using a CAS, for example.
On the fast-path, the serial lock then works similar to Dekker's algorithm but
with several reader flags that a serial transaction would have to check.
A serial transaction thus requires a list of all threads with potentially
active transactions; we can use the serial lock itself to protect this list
(i.e., only threads that have acquired the serial lock can modify this list).

We want starvation-freedom for the serial lock to allow for using it to ensure
progress for potentially starved transactions (@pxref{progress-guarantees,,
Progress Guarantees} for details). However, this is currently not enforced by
the implementation of the serial lock.

Here is pseudo-code for the read/write fast paths of acquiring the serial
lock (read-to-write upgrade is similar to write_lock:
@example
// read_lock:
tx->shared_state |= active;
__sync_synchronize(); // or STLD membar, or C++0x seq-cst fence
while (!serial_lock.exclusive)
  if (spinning_for_too_long) goto slowpath;

// write_lock:
if (CAS(&serial_lock.exclusive, 0, this) != 0)
  goto slowpath; // writer-writer contention
// need a membar here, but CAS already has full membar semantics
bool need_blocking = false;
for (t: all txns)
  @{
    for (;t->shared_state & active;)
      if (spinning_for_too_long) @{ need_blocking = true; break; @}
  @}
if (need_blocking) goto slowpath;
@end example

Releasing a lock in this spin-lock version then just consists of resetting
@code{tx->shared_state} to inactive or clearing @code{serial_lock.exclusive}.

However, we can't rely on a pure spinlock because we need to get the OS
involved at some time (e.g., when there are more threads than CPUs to run on).
Therefore, the real implementation falls back to a blocking slow path, either
based on pthread mutexes or Linux futexes.


@subsection Reentrancy

libitm has to consider the following cases of reentrancy:
@itemize @bullet

@item Transaction calls unsafe code that starts a new transaction: The outer
transaction will become a serial transaction before executing unsafe code.
Therefore, nesting within serial transactions must work, even if the nested
transaction is called from within uninstrumented code.

@item Transaction calls either a transactional wrapper or safe code, which in
turn starts a new transaction: It is not yet defined in the specification
whether this is allowed. Thus, it is undefined whether libitm supports this.

@item Code that starts new transactions might be called from within any part
of libitm: This kind of reentrancy would likely be rather complex and can
probably be avoided. Therefore, it is not supported.

@end itemize

@subsection Privatization safety

Privatization safety is ensured by libitm using a quiescence-based approach.
Basically, a privatizing transaction waits until all concurrent active
transactions will either have finished (are not active anymore) or operate on
a sufficiently recent snapshot to not access the privatized data anymore. This
happens after the privatizing transaction has stopped being an active
transaction, so waiting for quiescence does not contribute to deadlocks.

In method groups that need to ensure publication safety explicitly, active
transactions maintain a flag or timestamp in the public/shared part of the
transaction descriptor. Before blocking, privatizers need to let the other
transactions know that they should wake up the privatizer.

@strong{TODO} Ho to implement the waiters? Should those flags be
per-transaction or at a central place? We want to avoid one wake/wait call
per active transactions, so we might want to use either a tree or combining
to reduce the syscall overhead, or rather spin for a long amount of time
instead of doing blocking. Also, it would be good if only the last transaction
that the privatizer waits for would do the wake-up.

@subsection Progress guarantees
@anchor{progress-guarantees}

Transactions that do not make progress when using the current TM method will
eventually try to execute in serial mode. Thus, the serial lock's progress
guarantees determine the progress guarantees of the whole TM. Obviously, we at
least need deadlock-freedom for the serial lock, but it would also be good to
provide starvation-freedom (informally, all threads will finish executing a
transaction eventually iff they get enough cycles).

However, the scheduling of transactions (e.g., thread scheduling by the OS)
also affects the handling of progress guarantees by the TM. First, the TM
can only guarantee deadlock-freedom if threads do not get stopped. Likewise,
low-priority threads can starve if they do not get scheduled when other
high-priority threads get those cycles instead.

If all threads get scheduled eventually, correct lock implementations will
provide deadlock-freedom, but might not provide starvation-freedom. We can
either enforce the latter in the TM's lock implementation, or assume that
the scheduling is sufficiently random to yield a probabilistic guarantee that
no thread will starve (because eventually, a transaction will encounter a
scheduling that will allow it to run). This can indeed work well in practice
but is not necessarily guaranteed to work (e.g., simple spin locks can be
pretty efficient).

Because enforcing stronger progress guarantees in the TM has a higher runtime
overhead, we focus on deadlock-freedom right now and assume that the threads
will get scheduled eventually by the OS (but don't consider threads with
different priorities). We should support starvation-freedom for serial
transactions in the future. Everything beyond that is highly related to proper
contention management across all of the TM (including with TM method to
choose), and is future work.

@strong{TODO} Handling thread priorities: We want to avoid priority inversion
but it's unclear how often that actually matters in practice. Workloads that
have threads with different priorities will likely also require lower latency
or higher throughput for high-priority threads. Therefore, it probably makes
not that much sense (except for eventual progress guarantees) to use
priority inheritance until the TM has priority-aware contention management.


@c ---------------------------------------------------------------------
@c GNU Free Documentation License
@c ---------------------------------------------------------------------

@include fdl.texi

@c ---------------------------------------------------------------------
@c Index
@c ---------------------------------------------------------------------

@node Index
@unnumbered Index

@printindex cp

@bye