U.S. patent application number 13/778318 was filed with the patent office on 2013-07-04 for increasing functionality of a reader-writer lock.
The applicant listed for this patent is Ali-Reza Adl-Tabatabai, Bratin Saha. Invention is credited to Ali-Reza Adl-Tabatabai, Bratin Saha.
Application Number | 20130173869 13/778318 |
Document ID | / |
Family ID | 38576899 |
Filed Date | 2013-07-04 |
United States Patent
Application |
20130173869 |
Kind Code |
A1 |
Saha; Bratin ; et
al. |
July 4, 2013 |
Increasing Functionality Of A Reader-Writer Lock
Abstract
In one embodiment, the present invention includes a method for
accessing a shared memory associated with a reader-writer lock
according to a first concurrency mode, dynamically changing from
the first concurrency mode to a second concurrency mode, and
accessing the shared memory according to the second concurrency
mode. In this way, concurrency modes can be adaptively changed
based on system conditions. Other embodiments are described and
claimed.
Inventors: |
Saha; Bratin; (Santa Clara,
CA) ; Adl-Tabatabai; Ali-Reza; (San Jose,
CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Saha; Bratin
Adl-Tabatabai; Ali-Reza |
Santa Clara
San Jose |
CA
CA |
US
US |
|
|
Family ID: |
38576899 |
Appl. No.: |
13/778318 |
Filed: |
February 27, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13325688 |
Dec 14, 2011 |
8407386 |
|
|
13778318 |
|
|
|
|
11392381 |
Mar 29, 2006 |
8099538 |
|
|
13325688 |
|
|
|
|
Current U.S.
Class: |
711/152 ;
718/102 |
Current CPC
Class: |
G06F 9/4881 20130101;
G06F 9/526 20130101; G06F 12/145 20130101 |
Class at
Publication: |
711/152 ;
718/102 |
International
Class: |
G06F 12/14 20060101
G06F012/14; G06F 9/48 20060101 G06F009/48 |
Claims
1. A processor comprising: a lock manager including a counter to
maintain a count of aborted transactions and an inflation logic to
control a mode of operation for a transactional memory system, the
lock manager to enable a transaction to access data associated with
a lock variable without acquiring the lock variable, use the data,
commit if a value of the lock variable has not changed from the
time of access to the time of commitment, and otherwise abort.
2. The processor of claim 1, wherein the lock manager is to
increment the count of aborted transactions if the transaction is
aborted, wherein the count is associated with a group of lock
variables including the lock variable.
3. The processor of claim 2, wherein the group corresponds to lock
variables accessed by a first thread of a plurality of threads
executing on the transactional memory system.
4. The processor of claim 2, wherein the inflation logic is to
dynamically change a mode of operation of the lock variable from an
optimistic concurrency mode to a pessimistic concurrency mode based
in part on the count.
5. The processor of claim 4, wherein the inflation logic is to set
a mode indicator in a control portion of the lock variable to
dynamically change the mode of operation.
6. The processor of claim 1, wherein the lock manager is to upgrade
a reader associated with the lock variable to a writer status.
7. The processor of claim 6, wherein the lock manager is to set an
upgrade indicator in a control portion of the lock variable to
upgrade the reader.
8. A method comprising: maintaining a count of aborted transactions
in a counter of a lock manager of a processor; controlling a mode
of operation for a transactional memory system; enabling a
transaction to access data associated with a lock variable without
acquiring the lock variable; using the data to perform an operation
of the transaction; and committing the transaction if a value of
the lock variable has not changed from the time of access to the
time of commitment, and otherwise aborting the transaction.
9. The method of claim 8, further comprising incrementing the count
of aborted transactions if the transaction is aborted, wherein the
count is associated with a group of lock variables including the
lock variable.
10. The method of claim 9, wherein the group corresponds to lock
variables accessed by a first thread of a plurality of threads
executing on the transactional memory system.
11. The method of claim 9, further comprising dynamically changing
a mode of operation of the lock variable from an optimistic
concurrency mode to a pessimistic concurrency mode based in part on
the count.
12. The method of claim 11, further comprising setting a mode
indicator in a control portion of the lock variable to dynamically
change the mode of operation.
13. The method of claim 8, further comprising upgrading a reader
associated with the lock variable to a writer status.
14. The method of claim 13, further comprising setting an upgrade
indicator in a control portion of the lock variable to upgrade the
reader.
15. At least one computer readable medium including instructions
that when executed enable a system to: maintain a count of aborted
transactions in a counter of a lock manager of a processor; control
a mode of operation for a transactional memory system; enable a
transaction to access data associated with a lock variable without
acquiring the lock variable; use the data to perform an operation
of the transaction; and commit the transaction if a value of the
lock variable has not changed from the time of access to the time
of commitment, and otherwise abort the transaction.
16. The at least one computer readable medium of claim 15, further
comprising instructions that when executed enable the system to
increment the count of aborted transactions if the transaction is
aborted, wherein the count is associated with a group of lock
variables including the lock variable and the group corresponds to
lock variables accessed by a first thread of a plurality of threads
executing on the transactional memory system.
17. The at least one computer readable medium of claim 16, further
comprising instructions that when executed enable the system to
dynamically change a mode of operation of the lock variable from an
optimistic concurrency mode to a pessimistic concurrency mode based
in part on the count, and set a mode indicator in a control portion
of the lock variable to dynamically change the mode of
operation.
18. The at least one computer readable medium of claim 15, further
comprising instructions that when executed enable the system to
upgrade a reader associated with the lock variable to a writer
status.
19. The at least one computer readable medium of claim 18, further
comprising instructions that when executed enable the system to set
an upgrade indicator in a control portion of the lock variable to
upgrade the reader.
Description
[0001] This application is a divisional of U.S. patent application
Ser. No. 13/325,688, filed Dec. 14, 2011, which is a continuation
of U.S. patent application Ser. No. 11/392,381, filed Mar. 29,
2006, now U.S. Pat. No. 8,099,538, issued Jan. 17, 2012, the
content of which is hereby incorporated by reference.
BACKGROUND
[0002] Embodiments of the present invention relate to computer
systems, and more particularly to such systems that use lock
variables to control access to data.
[0003] Computer systems including multiprocessor (MP) and single
processor systems may include a plurality of threads, each of which
executes program instructions independently from other threads. Use
of multiple processors and/or threads allows various tasks or
functions (and even multiple applications) to be handled more
efficiently and with greater speed. When using multiple threads or
processors, two or more processors or threads can share the same
data stored within the system. However, care must be taken to
maintain memory ordering when sharing data.
[0004] For data consistency purposes, if multiple threads or
processors desire to read, modify, or write data at a shared memory
location, the multiple agents may not be allowed to perform
operations on the data simultaneously. Further complicating the use
of multiple processors is that data is often stored in a cache
associated with a processor. Because such caches are typically
localized to a specific processor, multiple caches in a
multiprocessor computer system can contain different copies of a
given data item. Any agent accessing this data should receive a
valid or updated (i.e., latest) data value, and data being written
from the cache back into memory must be the current data so that
cache coherency is maintained.
[0005] Multithreaded (MT) software uses different mechanisms to
interact and coordinate between different threads. Two common forms
of synchronization are barriers and semaphores (locks). A barrier
mechanism helps a program to synchronize different threads at
predefined points in the program, where each thread waits for a
memory variable to reach a predetermined barrier level.
Synchronization is achieved once all threads have completed the
updates. When the barrier is reached, all threads can then
proceed.
[0006] A semaphore lock mechanism is used to guarantee mutual
exclusion across multiple threads while accessing a shared memory
variable or structure (i.e., a shared element). In order to provide
a unique and consistent view of the shared element, it is guarded
by a lock variable. Different types of locks exist. For example, a
spin-lock mechanism is typically implemented such that a thread
needing access to the shared element must acquire the guarding lock
(i.e., locking) via an atomic semaphore operation. When a lock is
acquired, the remaining threads can only acquire the lock after it
is released (i.e., unlocking) by the original requester. Locking is
performed by designating a particular value to represent a locked
state, and a different value to represent an unlocked state.
[0007] Reader-writer locks allow multiple concurrent readers or a
single writer to acquire the lock at any time. Reader-writer locks
are used in sophisticated concurrent systems, for example, in
implementing a software transaction memory (STM). To design
software applications to scale for multi-core processors,
reader-writer locks may be used to allow concurrency and allow more
parallelism to be exploited.
[0008] Many modern languages include transactions as the basic
synchronization primitive. A hardware transactional memory (HTM) is
insufficient for these languages since these languages use nested
transaction, partial aborts, non-transactional instructions and a
number of other features. An STM implementation can provide these
features. However, the usual implementation of a STM is optimistic,
as each thread executes operations in an atomic block as if no
other threads exist. When the atomic block finishes, data accessed
by the block is checked for consistency with current data at a
given memory location. If consistency is verified, the transaction
is committed; otherwise the atomic block is aborted and must be
restarted. Typical locks, however, are not optimized for use in an
STM.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] FIG. 1 is a block diagram of a lockword in accordance with
one embodiment of the present invention.
[0010] FIG. 2 is a flow diagram of a method in accordance with one
embodiment of the present invention.
[0011] FIG. 3 is a flow diagram of a method including adaptive
switching of concurrency modes in accordance with an embodiment of
the present invention.
[0012] FIGS. 4A-4K are various bit patterns to represent different
modes of a lockword in accordance with an embodiment of the present
invention.
[0013] FIG. 5 is a block diagram of an implementation of a lock and
associated shared memory in accordance with an embodiment of the
present invention.
[0014] FIG. 6 is a block diagram of a system in accordance with an
embodiment of the present invention.
DETAILED DESCRIPTION
[0015] In various embodiments, a lock for a shared memory structure
may be in the form of a data structure having two portions, namely
a first portion and a second portion. The first portion may
correspond to an identifier portion that is used to identify a
write owner of the lock or an indication of the number of reader
owners of the lock. The second portion may correspond to a control
portion that may be accessed and written to by various entities
(e.g., threads) to acquire access to the lock or to implement or
change features or modes of operation of the lock.
[0016] In many implementations, the lock may be a reader-writer
lock and may take the form of a data structure that can be sized
differently in different embodiments. In one implementation, the
lock may be a 32-bit structure that includes the first portion
(i.e., an identifier portion) and the second portion (i.e., a
control portion). In this implementation, the control portion may
correspond to the low order 4 bits, while the identifier portion
may correspond to the upper 28 bits, although the scope of the
present invention is not so limited. The term "lockword" is used
herein to refer to a lock variable in accordance with an embodiment
of the present invention. Furthermore, while the term "lockword" is
used throughout, it is to be understood that this term is not
limited to any particular size of lock variable and instead a
lockword may be any size desired for a particular implementation.
Additional structures may be associated with a lockword, including
a shared data structure that is to be accessed when a lock is
acquired. Also, a mutual exclusion structure (MUTEX) may also be
associated with the lockword. Furthermore, wait variables and the
like may further be associated with the lockword as will be
described below.
[0017] In various implementations, the control portion of the lock
may be used to enable different lock features and modes of
operation via a single control structure. Accordingly, entities may
access the control portion, read its contents and/or write thereto
in order to acquire the lock and/or modify properties or features
of the lock. While only a few representative control mechanisms are
described herein, it is to be understood that the scope of the
present invention is not limited in this regard, and a lock may
include other features and modes of operation controlled by
elements in a control portion.
[0018] Referring now to FIG. 1, shown is a block diagram of a
lockword in accordance with one embodiment of the present
invention. As shown in FIG. 1, lockword 10 includes a first portion
30 and a second portion 20. In the embodiment of FIG. 1, first
portion 30 may correspond to an indicator portion, while second
portion 20 may correspond to a control portion. In various
implementations, the size of lockword 10 and its corresponding
portions may be any desired size; however, in one embodiment
lockword 10 may be a 32-bit word.
[0019] As further shown in FIG. 1, second portion 20 includes a
plurality of subsisting elements. Specifically, a notify (N)
element 22, an upgrade (U) element 24, an inflation (I) element 26
and a reader (R) element 28 may be present. More specifically, N
element 22 may correspond to a bit 0 (b0); U element 24 may
correspond to a bit 1 (b1); I element 26 may correspond to a bit 2
(b2); and R element 28 may correspond to a bit 3 (b3) of lockword
10. Although in one embodiment each of these elements may be a
single bit and may correspond to a control indicator for different
states of lockword 10 and its modes of operation, the scope of the
present invention is not so limited.
[0020] In one embodiment, N element 22 may be used to indicate that
a reader seeks notification after a writer has acquired and
released lockword 10. In addition to writing to N element 22, a
reader may also store an identifier in a wait variable or other
location. The reader performs these operations after acquiring the
reader lock but before it has released the reader lock. This
operation may be idempotent; that is, even if multiple readers want
notification a single bit suffices to tell the writer to wake up
all readers waiting at a corresponding wait variable. Because a
reader can not acquire the lock (and hence will not try to set the
notification bit) when a writer has acquired the lock, there is no
race condition between setting this N element and a writer waking
up the readers, since the writer wakes up the readers only at the
time of release. In one embodiment, this scheme of notification
allows an implementation via instructions to monitor a memory
region and wait for a store thereto, e.g., MONITOR and MWAIT
instructions in an Intel Architecture (IA)-32 environment. In one
embodiment, N element 22 may be written using a bit test and set
instruction (e.g., the BTS instruction in an IA-32
environment).
[0021] In one embodiment, U element 24 may be used as an upgrade
indicator. If a reader needs to be upgraded to a writer, it
atomically tries to set U element 24. If it succeeds, it waits
until all readers have released their read locks. Correspondingly,
if a would-be writer or reader sees U element 24 set, it does not
try to acquire lockword 10. When all readers have released their
locks, the upgrader acquires lockword 10 as a write lock. If it
fails to atomically set U element 24, the reader may stop trying to
upgrade itself to a writer. Depending on the context in which the
reader-writer lock is being used, the reader may take further
actions; for example, if the reader is executing a software
transaction, then it may abort its transaction. In one embodiment,
to effect the abort, the reader may release all locks it has
acquired.
[0022] In one embodiment, I element 26 may be used as an inflation
indicator. It may be set to one if lockword 10 is inflated, and to
zero if lockword 10 is not inflated. Operation using I element 26
will described further below. In one embodiment, a reader
indicator, i.e., R element 28, may be always set to zero if a
writer has acquired lockword 10 otherwise it may be set to one.
[0023] While these particular features and states for the control
elements of control portion 20 have been described, it is to be
understood that the scope of the present invention is not limited
in this regard and in other embodiments fewer, additional, or
different elements and indicators for different modes of operation
or features can be present.
[0024] Referring now to FIG. 2, shown is a flow diagram of a method
in accordance with one embodiment of the present invention. As
shown in FIG. 2, method 100 may be used to perform an upgrade of a
reader to a writer status and to acquire a write lock on the
lockword. In various embodiments, method 100 may be implemented in
software, firmware, hardware or a combination thereof. For example,
a processor core may be programmed to enable a thread to upgrade
its status to a write status and acquire the lockword in order to
write data to a shared memory associated with the lockword.
[0025] As shown in FIG. 2, method 100 may begin by receiving an
indication to upgrade a reader to a writer status (block 110). For
example, when a given thread previously having read access to a
shared memory location desires to write data to the shared memory
location, it may seek to upgrade to writer status. Accordingly, the
thread may attempt to write to an upgrade indicator of the lockword
(block 115). Next, it may be determined whether the attempt to set
the upgrade indicator was successful (diamond 120). If the attempt
was unsuccessful, e.g., the upgrade attempt failed because another
entity has already sought to be upgraded or for another reason,
control passes to block 125. There, a transaction of the thread may
be aborted (block 125). For example, the thread may be processing a
transaction, e.g., of a software transaction memory (STM). Because
it cannot fully complete the transaction (i.e., because it cannot
write data to the shared memory location associated with the
lockword), the transaction is aborted. Accordingly, various
activities to abort the transaction, e.g., rolling back data,
releasing other locks and the like may be performed. At this point,
method 100 may conclude.
[0026] Still referring to FIG. 2, if instead at diamond 120 it is
determined that the attempt to set the upgrade indicator was
successful, control passes to block 130. There, the thread may wait
for release of any reader locks on the lockword (block 130). For
example, one or more readers may have previously acquired a lock on
the lockword. Accordingly, the thread may wait for the lockword to
be released prior to performing further activities with respect to
the lockword.
[0027] Upon release of the lockword, the thread may acquire a write
lock and set the lockword with its thread identifier (TID) (block
140). In one implementation, the write lock may be acquired by
setting predetermined values for the elements or bits within the
control portion of the lockword. Furthermore, to identify itself as
the owner of the lockword, the thread may insert its thread
identifier into the first portion (i.e., indicator) portion of the
lockword. Accordingly, at this time the thread has successfully
gained ownership of the lockword and thus may write data to the
shared memory location associated with the lockword (block
150).
[0028] After this write, the thread may release the write lock and
set the lockword to its initial value (block 160). For example, the
thread may clear its TID from the control portion and may further
place a predetermined value in the control portion. This
predetermined value may correspond to an initial value of the
control portion, in some embodiments. In one implementation, the
initial value may correspond to a value of 0.quadrature.8, although
the scope of the present invention is not so limited. Note that the
events performed in blocks 130, 140, 150 and 160 may also
correspond to the events for obtaining a write lock (without first
upgrading from reader status).
[0029] After release of the write lock, method 100 may conclude.
While described with this particular implementation in the
embodiment of FIG. 2, it is to be understood that the scope of the
present invention is not so limited and in other embodiments, other
manners of upgrading a reader to a writer status and other manners
of acquiring a write lock and releasing the same may be
realized.
[0030] In various embodiments, reader-writer locks can be used in
multiple modes of operation. More specifically, these reader-writer
locks can be used in multiple concurrency schemes, namely an
optimistic concurrency mode and a pessimistic concurrency mode. In
an optimistic concurrency mode, readers read data associated with
the shared memory of a lockword without taking any form of lock and
using the data as desired. When the reader reaches a commitment
phase (e.g., of a transaction using the data), the lockword is
analyzed to validate the data by confirming that the value of the
lockword has not changed since the reader read the data. In this
way, the reader validates that the value of the data read has not
changed. Such optimistic concurrency can be relatively efficient
and provide for improved caching effects.
[0031] However, optimistic concurrency can lead to a high number of
abort operations when used in a STM, at least during certain
execution periods. That is, when the data associated with a
lockword is modified after it has been read by a reader and before
the reader commits the operation that used the data, that operation
and other pending operations, e.g., of a transaction are aborted to
avoid data inconsistencies. Accordingly, depending on given system
conditions, a lockword may be used instead in a pessimistic
concurrency scheme. In such a pessimistic concurrency scheme,
reader-writer locks enable read concurrency, but explicitly prevent
writers from accessing the data while a read lock is present. Thus
the data remains coherent, however performance can be degraded, as
a writer cannot acquire the lockword (and the associated shared
memory) until the one or more readers have released the
lockword.
[0032] In various embodiments, an adaptive approach may be used to
switch between these different concurrency modes based on system
conditions. In some embodiments, a control element within the
control portion of the lockword may be used to enable adaptive
switching between these concurrency modes. Referring back to FIG.
1, inflation element 26 may be used to indicate a change of mode of
operation. Specifically, setting inflation element 26 may inflate
operation from an optimistic form of concurrency to a pessimistic
form of concurrency, or vice versa in some embodiments.
[0033] Different manners of providing for adaptive switching
between concurrency modes may be realized. Referring now to FIG. 3,
shown is a flow diagram of a method enabling adaptive switching of
concurrency modes in accordance with an embodiment of the present
invention. As shown in FIG. 3, method 200 may begin by accessing
shared memory associated with a lockword according to a first
concurrency mode (block 210). This first concurrency mode may
correspond to an optimistic concurrency mode in which a reader may
access data at a shared memory location without first acquiring a
lock on the lockword. Next, the reader, which may correspond to a
given thread, may use the data as desired, e.g., in performing an
operation according to an instruction. When the reader gets to the
commit phase of the transaction, it may be determined whether the
lockword value has changed (diamond 220). In one manner, when the
thread reads the data, it may also read the lockword and store the
then pending value of the lockword at a given memory location (not
shown in FIG. 3). Then at instruction commit time, the thread may
compare the previously-stored value with the current value of
lockword to determine whether they match. If at diamond 220 it is
determined that the values match (in other words the lockword value
has not changed), the operation commits normally and normal
operation continues. As one example of such operation, control may
pass back to block 210 where the shared memory may be accessed by
the same thread or different threads, as desired.
[0034] Still referring to FIG. 3, if instead at diamond 220 it is
determined that the lockword value has changed, in other words that
the data at the shared memory location has changed, control passes
to block 230. There, a count of changed lockwords may be
incremented (block 230). For example, such a count may be
maintained for a group of lockwords. In different embodiments, the
group may correspond to some logical grouping such as lockwords
accessed by a given thread or lockwords associated with certain
transactions or the like. Note that in some embodiments, this count
may be maintained for a given time interval before the count is
reset.
[0035] Still referring to FIG. 3, next it may be determined whether
the count exceeds a threshold (diamond 240). This threshold may
correspond to a number of lockword changes (which may also
correspond to a number of transaction aborts occurring in the first
concurrency mode). If it is determined that the count does not
exceed the threshold, control passes back to block 210, where the
same thread may again access the shared memory, e.g., in an effort
to retry the transaction.
[0036] If instead at diamond 240 it is determined that the count
exceeds the threshold, control passes to block 250. There operation
of the lockword may be dynamically changed to the second
concurrency mode (block 250). This concurrency mode may correspond
to a pessimistic concurrency mode, in various implementations.
According to such a pessimistic concurrency mode, in order to read
data at a shared memory location corresponding to a lockword, the
reader must first acquire a lock. Accordingly, control passes to
block 260, where the thread may acquire a read lock in order to
access the shared memory location (block 260). After reading the
data at the shared memory location and performing other actions
(e.g., committing the transaction within which the shared data was
read), control passes to block 270, where the thread may release
the read lock (block 270). Subsequently, a writer may acquire a
lock on the lockword. Accordingly, method 200 may conclude.
[0037] While described with this particular implementation in the
embodiment of FIG. 3, it is to be understood that the scope of the
present invention is not limited in this regard and different
manners of inflating and/or deflating between different concurrency
modes may be realized. Furthermore, while FIG. 3 particularly
describes inflation of a lockword from an optimistic concurrency
mode to a pessimistic concurrency mode, it is further to be
understood that the vice versa operation may also be performed.
That is, if it is determined that the pessimistic concurrency mode
is negatively affecting performance, the lockword may be adaptively
reconfigured to operate in the optimistic concurrency mode, e.g.,
to improve cache operation.
[0038] Referring now to FIGS. 4A-4K, shown are various bit patterns
to represent different modes of a lockword in accordance with an
embodiment of the present invention. Note that in the described
figures, a value of x is used to denote a bit value that can be
either zero or one. FIG. 4A shows a bit pattern of an initial state
of a lockword. As shown in FIG. 4A, the control portion of the
lockword corresponds to a value of 0.quadrature.8, as there are no
writers and no readers. Note that the lower 4 bits (for
notification, upgrades, or inflation) may be used only when readers
have acquired the lock. When a writer has the lock, none of this
functionality is used. Thus the bit pattern is invariant, making
the implementation efficient.
[0039] The algorithm for a reader acquiring a read lock may be as
follows in Table 1, in one embodiment:
TABLE-US-00001 TABLE 1 1. temp = value of lockword 2. If lower 4
bits of temp are non-zero and upgrade bit is not set a. If
CAS(&lockword, temp, temp + 0xF) succeeds, i. Reader lock
acquired. ii. Add it to set of read locks acquired and return. b.
Else goto step 1 (another got in ahead). 3. Else, goto step 1 (a
writer has acquired or an upgraded reader is waiting).
In one embodiment, the algorithm for a reader releasing the lock
may be as shown in Table 2:
TABLE-US-00002 TABLE 2 1. If notification is required (after a
writer has subsequently acquired and released the lock), then set
the notify bit, and add yourself to the corresponding wait
variable. 2. Atomically decrement the value of lockword by 0xF. 3.
Delete lock from read set
Note that a reader increments the value of lockword by 0xF on
acquire and decrements by the same on release. This ensures that
the lower bits are unperturbed by the read lock operation. For
example, if the notification bit was set, it does not get erased.
Also, when a reader has the lock, the R indicator remains set.
[0040] To acquire a lock on the initial state of a lockword, a
writer may clear the reader element to indicate a write lock, as
shown in FIG. 4B. In one embodiment, a compare-and-swap (CAS)
instruction may be used. Such an instruction may be in the form of
a tuple (addr, oldValue, newValue) of an atomic compare-exchange
(cmpxchg) operation, where the value at the location addr is
compared to oldValue and if the two values are the same, then
newValue is stored in the location. When the writer releases the
lock, it may set the lockword to an initial value that signifies no
current readers or writers, as shown in FIG. 4C.
[0041] In one embodiment, the algorithm for a writer acquiring the
lock is as set forth in Table 3:
TABLE-US-00003 TABLE 3 1. If the lock is part of read set /* need
to upgrade */ a. Atomically try to set the upgrade bit b. If
succeeds i. Release own read lock on the lockword. ii. Wait for all
other current readers to release the read lock. iii. Acquire the
write lock when there are no readers anymore (note that future
readers and writers will not try to acquire the lock anymore since
the upgrade bit has been set). 1. Set lockword to (tid << 4).
iv. If notify bit was set, then add lock to notify set so that
sleeping readers can be notified on release, else add lock to write
set. v. Return. c. Otherwise (upgrade attempt has failed) i. Abort
transaction (this would imply that all locks are released and any
other cleanup is performed). 2. temp = value of lockword. 3. If the
upper bits (bit 4 - bit 32) temp are zero (this means there are no
readers or writers) a. If the upgrade bit is not set i. If
CAS(&lockword, temp, tid << 4) succeeds 1. write lock was
acquired 2. If notify bit was set, (which is true if the notify bit
in temp was set), then add lock to notify set, otherwise add lock
to write set. 3. Return. ii. Else goto step 2 /* someone got in
ahead */ b. Else, goto step 2 /* an upgraded reader is waiting */
4. Else, goto step 2 /* a writer or readers have the lock */
Note that when the lock is acquired, a thread identifier (TID) is
shifted into the lockword. This preserves the invariant that when a
writer has the lock, the lower four bits are always zero.
[0042] When the writer releases the lock, the bit pattern shown in
FIG. 4C is present, with the control portion in its initial state.
In one embodiment, the algorithm for a writer releasing the lock is
shown in Table 4:
TABLE-US-00004 TABLE 4 1. If this lock was part of the notify set,
then wake up readers sleeping on the corresponding wait variable.
2. Set lockword to 0x8.
The write lock release sets the lockword to the initial value. If
some readers had asked for notification, then the writer wakes them
up at the corresponding wait variable.
[0043] In optimistic concurrency, a lock is in two phases, it is
either owned by a writer or it contains a version number. When a
writer releases the lock, it increments the version number. Thus,
the version number is a monotonically increasing function, and is
guaranteed to change if a writer has acquired the lock. A reader
never acquires the lock. During a read, a reader tests whether the
lock is free, and if so remembers the version number of the lock.
At commit, it tests the version number again and if the version
numbers match, then no writer has acquired the lock in between.
This may provide better cache effects than a reader-writer lock
mechanism, as the optimistic versioning approach does not cause a
store on a read operation.
[0044] In one embodiment, optimistic concurrency may have the R
indicator set to one if the lockword contains a version number and
zero if the lockword is owned by a writer. To perform a write lock
acquire, a thread remembers the old version number, and as before
inserts its TID (by left shifting by 4 bits). This preserves the
invariant that the lower four bits on a write lock acquire are
zero. On a lock release, the writer increments the old version
number by 0xF. This ensures that the lower 4 bits remain
unperturbed, in particular, the R indicator remains set, which
gives a valid version number.
[0045] Embodiments of the present invention may thus provide for
adaptivity between optimistic and pessimistic forms. The adaptive
reader-writer lock structure may be arranged as follows in one
embodiment:
TABLE-US-00005 typedef struct { volatile uint32 lockword; volatile
uint32 readerLockMutex; volatile uint32 numReaderLockers; }
AdaptiveReaderWriterLock;
[0046] Every lockword may have an associated MUTEX, but the MUTEX
is used only when inflation is in effect. Every lockword may also
have an associated field that counts the number of readers that
have acquired the read lock explicitly. Again, it is used only when
inflation is in effect. Thus, given a lockword the associated
MUTEX, as well as the count field can be obtained since they are
arranged sequentially in memory. Implementations can choose to
associate the lockword with the MUTEX and count fields in different
ways.
[0047] To perform versioning with a reader, the algorithm of Table
5 may be used:
TABLE-US-00006 TABLE 5 1. temp = value of lockword 2. if lower 4
bits of temp are non-zero a. temp1 = temp with inflation bit masked
off b. remember temp1 as the version number and return 3. Else,
goto step 1.
Note that for obtaining the proper version number, the inflation
indicator may be masked. A validation algorithm for the reader may
be implemented as shown in Table 6, in one embodiment:
TABLE-US-00007 TABLE 6 1. temp = value of lockword 2. temp1 = temp
with inflation bit masked off 3. if temp1 = remembered version
number a. return True. 4. Else, return False.
[0048] Suppose a reader wants to perform read locking and not use
versioning. Then the algorithm of Table 7 may be used, in one
embodiment:
TABLE-US-00008 TABLE 7 1. temp = value of lockword 2. If lower 4
bits of temp are non-zero and upgrade bit is not set /* this means
that a writer does not own the lock */ a. Acquire the mutex
(readerLockMutex) corresponding to lockword. b. Increment the
number readers field (numReaderLockers) c. If inflation bit of temp
is already set i. Release readerLockMutex ii. Add lock to read set
iii. Return (read lock was acquired). d. Else (inflation bit of
temp was not set) i. If CAS(&lockword, temp, temp | 0x4)
succeeds /* this sets the inflation bit */ 1. Release
readerLockMutex 2. Add lock to read set 3. Return (read lock was
acquired). ii. Else 1. Decrement numReaderLockers 2. Release
readerLockMutex 3. Goto step 1. 3. Else, goto step 1 /* retry
*/
[0049] The read lock release in an adaptive scheme may work as
shown in the algorithm of Table 8:
TABLE-US-00009 TABLE 8 1. Acquire the mutex (readerLockMutex)
corresponding to lockword. 2. Decrement the number of readers
(numReaderLockers). 3. If (numReaderLockers == 0) a. Reset the
inflation bit in the lockword. 4. Release readerLockMutex. 5.
Delete lock from read set. 6. Return.
[0050] Next, if one of the readers desires to upgrade to a writer
status, the bit pattern of FIG. 4H may be present, in which the
number of reader locks is decremented to one, and the inflation
indicator (i.e., bit 1) is set to a value of 1. As shown in FIG.
4D, a reader desiring to be upgraded may set an upgrade indicator
(i.e., bit 1) to a value of one. When the reader is upgraded, the
bit pattern shown in FIG. 4E exists, indicating an acquired write
lock. To acquire a write lock, the algorithm of Table 9 may be
used:
TABLE-US-00010 TABLE 9 1. If the lock is part of read set /* need
to upgrade */ a. Atomically try to set the upgrade bit b. If
succeeds i. Acquire readerLockMutex corresponding to lockword. ii.
Decrement the number of reader lockers (numReaderLockers). iii.
Release readerLockMutex. iv. Delete lock from my read set. v. Wait
till numReaderLockers = = 0. vi. Reset the inflation bit in the
lockword. vii. temp = current value of lockword viii. temp1 = temp
with N, U, and I bits masked off. ix. Remember temp1 as the old
version number for the lockword. x. Set lockword to (tid <<
4). xi. If notify bit was set, then add to notify set, else add to
write set. xii. Return. c. Else (upgrade attempt has failed) i.
Abort transaction. 2. temp = value of lockword. 3. If the lower
bits (bit 0 - bit 3) of temp are non-zero, and the inflation bit is
not set /* this means no one has taken a reader lock or writer lock
*/ a. If the upgrade bit is not set i. If CAS(&lockword, temp,
tid << 4) succeeds 1. write lock was acquired 2. If notify
bit was set, then add to notify set, otherwise to write set 3.
Return. ii. Else, Goto step 2. /* someone got in ahead */ b. Else,
Goto step 2. /* an upgraded reader is waiting */ 4. Else, Goto step
2. /* a writer or readers have the lock */
Note that this algorithm preserves the invariant that a write lock
acquire sets the lower 4 bits to zero. When the lockword is
inflated to a pessimistic mode of operation, the bit pattern shown
in FIG. 4G is present in which the inflation indicator (i.e., bit
2) is set to a value of one. In the bit pattern of FIG. 4G, two
explicit read lock acquires are present, as the indicator portion
has a value of 2.
[0051] FIG. 41 shows a bit pattern prior to the upgrade in which
all readers have released the read lock and the inflation indicator
is now deflated. FIG. 4J shows the bit pattern after the reader
upgrades to a writer status. The algorithm for a writer releasing
the lock may be as shown in Table 10:
TABLE-US-00011 TABLE 10 1. If this variable was part of the notify
set, then wake up readers sleeping on the corresponding wait
variable. 2. Set lockword to (oldVersionNumber + 0xF).
The write lock release increments the version number by 0xF which
means that the lower bits remain unperturbed including an R
indicator, which preserves the invariant that it is set for a valid
version number.
[0052] Finally, FIG. 4K shows the lockword when the write lock has
been released (i.e., the control portion is back to its initial
state) and the indicator portion is incremented with the next
version number. Of course in other embodiments, different bit
patterns to represent various states of the lockword and associated
readers and writers may be realized.
[0053] Referring now to FIG. 5, shown is a block diagram of an
implementation of a lock environment in accordance with an
embodiment of the present invention. As shown in FIG. 5,
environment 300 may include a lockword 310 having a first portion
330 and a second portion 320. Lockword 310 may be in accordance
with an embodiment of the present invention and may provide for
control of various modes of operation and features via second
portion 320. A count of reader locks or an identifier for a writer
may be stored in first portion 330.
[0054] As shown in FIG. 5, lockword 310 may be associated with a
shared memory 340, which may be a memory location that is protected
by lockword 310. Additional memory locations may be present in
environment 300, including, for example, a wait variable 345 which
may be used to store addresses or other identifiers for threads
desiring a notification when a writer releases a lock on lockword
310.
[0055] For purposes of illustration, FIG. 5 shows a plurality of
threads, namely a first thread 365.sub.1 (i.e., thread T1), a
second thread 365.sub.2 (i.e., thread T2), and a third thread
365.sub.3 (i.e., thread T3) (collectively generically thread 365).
Each of threads 365 may be in communication with lockword 310 to
read the values present and write values thereto, e.g., to acquire
a lock or to signal a desired feature or mode of operation for
lockword 310. While not shown specifically for ease of illustration
of FIG. 5, it is to be understood that each of threads 365 may
further be in communication with shared memory 340 and wait
variable 345.
[0056] Furthermore, as shown in FIG. 5, each of threads 365 may be
coupled to a lock manager 350, which may further be coupled to
lockword 310. Lock manager 350 may be adapted to perform management
or control activities with respect to lockword 310. In various
embodiments, lock manager 350 may be implemented in software,
hardware, firmware and/or a combination thereof. In one example
embodiment, lock manager 350 may be implemented as a state machine,
although the scope of the present invention is not so limited.
[0057] Lock manager 350 may be used to control a concurrency mode
of operation for lockword 310. As one example, upon initial
configuration lockword 310 may be set for an optimistic concurrency
mode to avoid the expense of acquiring locks and cache effects
associated therewith. However by operating in an optimistic mode,
one or more threads 365 may have to abort a transaction when a
value of lockword 310 changes from the time that data in shared
memory 340 is accessed and when an instruction related to the data
later commits. Upon such aborts, a counter 352 within lock manager
350 may be incremented. Lock manager 350 may further include an
inflation logic 354. Inflation logic 354 may be adapted to compare
the value in counter 352 to a threshold. This threshold may
correspond to a threshold number of transaction aborts. If greater
than this threshold number of transaction aborts occurs, lock
manager 350 may cause lockword 310 to be inflated to a pessimistic
mode of operation. As described above, such mode of operation may
be implemented by setting an inflation indicator within second
portion 320, although the scope of the present invention is not so
limited. While shown with this particular implementation in the
embodiment of FIG. 5, it is to be understood that the scope of the
present invention is not so limited.
[0058] As described above, reader-writer locks in accordance with
an embodiment of the present invention may be used in connection
with an STM. In such embodiments, transactions may be performed by
threads in different concurrency modes, based upon a particular
system operation. When operating in an optimistic concurrency mode,
a thread may need to abort a transaction if a value of an accessed
data associated with a lockword changes during use of the data. In
a pessimistic concurrency mode, reader concurrency may be
guaranteed at the expense of lower performance.
[0059] Different system architectures may implement an STM for use
with reader-writer locks. Referring now to FIG. 6, shown is a block
diagram of a system in accordance with an embodiment of the present
invention. As shown in FIG. 6, system 400 may be a computer system
including a plurality of processors 405a and 405b. Processors 405a
and 405b may be single core or multicore processors. In other
embodiments, processors 405a and 405b may be cores of a single
package multiprocessor, although the scope of the present invention
is not limited in this regard.
[0060] Still referring to FIG. 6, processors 405a and 405b are
coupled via a bus 410 to a system memory 415, which may be a
dynamic random access memory (DRAM), in some embodiments and a
storage 420, which may be a disk drive or other mass storage
device. Further coupled to bus 410 may be a plurality of peripheral
devices 445a and 445b, which may be network devices, multimedia
devices and the like.
[0061] As further shown in FIG. 6, a transactional memory system
430 may be included in system 400. Transactional memory system 430
may implement a STM such that a thread executing on one of
processors 405a and 405b may execute various operations of a
transaction atomically. If the data used in the transaction remains
concurrent from the time of access until the time of transaction
commitment, the transaction commits normally. If instead, one or
more accessed data values change prior to transaction commitment,
the transaction may abort. Various shared memory locations within
transactional memory system 430 may be protected by a lockword such
as the reader-writer locks described herein. While shown with this
particular implementation in the embodiment of FIG. 6, the scope of
the present invention is not so limited.
[0062] Embodiments may be implemented in code and may be stored on
a storage medium having stored thereon instructions which can be
used to program a system to perform the instructions. The storage
medium may include, but is not limited to, any type of disk
including floppy disks, optical disks, compact disk read-only
memories (CD-ROMs), compact disk rewritables (CD-RWs), and
magneto-optical disks, semiconductor devices such as read-only
memories (ROMs), random access memories (RAMs) such as dynamic
random access memories (DRAMs), static random access memories
(SRAMs), erasable programmable read-only memories (EPROMs), flash
memories, electrically erasable programmable read-only memories
(EEPROMs), magnetic or optical cards, or any other type of media
suitable for storing or transmitting electronic instructions.
[0063] While the present invention has been described with respect
to a limited number of embodiments, those skilled in the art will
appreciate numerous modifications and variations therefrom. It is
intended that the appended claims cover all such modifications and
variations as fall within the true spirit and scope of this present
invention.
* * * * *