Spinlock

<h2 id="example-implementation">Example implementation</h2>
The following example uses x86 assembly language to implement a spinlock. It will work on any <a href="/facts/Intel/SMF0gJJX">Intel</a> <a href="/facts/80386/ag5eSgWv">80386</a> compatible processor.

; Intel syntax

locked:                      ; The lock variable. 1 = locked, 0 = unlocked.
     dd      0

spin_lock:
     mov     eax, 1          ; Set the EAX register to 1.
     xchg    eax, [locked]   ; Atomically swap the EAX register with
                             ;  the lock variable.
                             ; This will always store 1 to the lock, leaving
                             ;  the previous value in the EAX register.
     test    eax, eax        ; Test EAX with itself. Among other things, this will
                             ;  set the processor's Zero Flag if EAX is 0.
                             ; If EAX is 0, then the lock was unlocked and
                             ;  we just locked it.
                             ; Otherwise, EAX is 1 and we didn't acquire the lock.
     jnz     spin_lock       ; Jump back to the MOV instruction if the Zero Flag is
                             ;  not set; the lock was previously locked, and so
                             ; we need to spin until it becomes unlocked.
     ret                     ; The lock has been acquired, return to the calling
                             ;  function.

spin_unlock:
     xor     eax, eax        ; Set the EAX register to 0.
     xchg    eax, [locked]   ; Atomically swap the EAX register with
                             ;  the lock variable.
     ret                     ; The lock has been released.

<h2 id="significant-optimizations">Significant optimizations</h2>
The simple implementation above works on all CPUs using the x86 architecture. However, a number of performance optimizations are possible:
On later implementations of the x86 architecture, spin_unlock can safely use an unlocked MOV instead of the slower locked XCHG. This is due to subtle <a href="/facts/Memory_ordering/SSUt66mb">memory ordering</a> rules which support this, even though MOV is not a full <a href="/facts/Memory_barrier/am65f6bs">memory barrier</a>. However, some processors (some <a href="/facts/Cyrix/zPI6NhTm">Cyrix</a> processors, some revisions of the <a href="/facts/Intel/SMF0gJJX">Intel</a> <a href="/facts/Pentium_Pro/mTAN64fo">Pentium Pro</a> (due to bugs), and earlier <a href="/facts/Pentium_(brand)/czINY469">Pentium</a> and <a href="/facts/I486/2zfEAXAw">i486</a> <a href="/facts/Symmetric_multiprocessing/T9CahUsN">SMP</a> systems) will do the wrong thing and data protected by the lock could be corrupted. On most non-x86 architectures, explicit memory barrier or atomic instructions (as in the example) must be used. On some systems, such as <a href="/facts/IA-64/m6egpGno">IA-64</a>, there are special "unlock" instructions which provide the needed memory ordering.
To reduce inter-CPU <a href="/facts/Bus_(computing)/R45TKwS7">bus traffic</a>, code trying to acquire a lock should loop reading without trying to write anything until it reads a changed value. Because of <a href="/facts/MESI/BUCBWvtS">MESI</a> caching protocols, this causes the cache line for the lock to become "Shared"; then there is remarkably no bus traffic while a CPU waits for the lock. This optimization is effective on all CPU architectures that have a cache per CPU, because MESI is so widespread. On Hyper-Threading CPUs, pausing with rep nop gives additional performance by hinting to the core that it can work on the other thread while the lock spins waiting.<a class="footnote-ref" id="fnref:2" href="#fn:2">2</a> 
<a href="/facts/Transactional_Synchronization_Extensions/j1CYdvFP">Transactional Synchronization Extensions</a> and other hardware <a href="/facts/Transactional_memory/N7CcDl7w">transactional memory</a> instruction sets serve to replace locks in most cases. Although locks are still required as a fallback, they have the potential to greatly improve performance by having the processor handle entire blocks of atomic operations. This feature is built into some mutex implementations, for example in <a href="/facts/Glibc/AL6IYDkA">glibc</a>. The Hardware Lock Elision (HLE) in x86 is a weakened but backwards-compatible version of TSE, and we can use it here for locking without losing any compatibility. In this particular case, the processor can choose to not lock until two threads actually conflict with each other.<a class="footnote-ref" id="fnref:3" href="#fn:3">3</a>
A simpler version of the test can use the cmpxchg instruction on x86, or the __sync_bool_compare_and_swap built into many Unix compilers.
With the optimizations applied, a sample would look like:

; In C: while (!__sync_bool_compare_and_swap(&locked, 0, 1)) while (locked) __builtin_ia32_pause();
spin_lock:
 mov ecx, 1 ; Set the ECX register to 1.
retry:
 xor eax, eax ; Zero out EAX, because cmpxchg compares against EAX.
 XACQUIRE lock cmpxchg [locked], ecx
 ; atomically decide: if locked is zero, write ECX to it.
 ; XACQUIRE hints to the processor that we are acquiring a lock.
 je out ; If we locked it (old value equal to EAX: 0), return.
pause:
 mov eax, [locked] ; Read locked into EAX.
 test eax, eax ; Perform the zero-test as before.
 jz retry ; If it's zero, we can retry.
 rep nop ; Tell the CPU that we are waiting in a spinloop, so it can
 ; work on the other thread now. Also written as the "pause".
 jmp pause ; Keep check-pausing.
out:
 ret ; All done.

spin_unlock:
    XRELEASE mov [locked], 0   ; Assuming the memory ordering rules apply, release the 
                               ;  lock variable with a "lock release" hint.
    ret                        ; The lock has been released.

On any multi-processor system that uses the <a href="/facts/MESI_protocol/BUCBWvtS"> MESI contention protocol</a>,
such a test-and-test-and-set lock (TTAS) performs much better than the simple test-and-set lock (TAS) approach.<a class="footnote-ref" id="fnref:4" href="#fn:4">4</a>
With large numbers of processors,
adding a random <a href="/facts/Exponential_backoff/pk2cAbqm">exponential backoff</a> delay before re-checking the lock performs even better than TTAS.<a class="footnote-ref" id="fnref:5" href="#fn:5">5</a><a class="footnote-ref" id="fnref:6" href="#fn:6">6</a>
A few multi-core processors have a "power-conscious spin-lock" instruction that puts a processor to sleep, then wakes it up on the next cycle after the lock is freed. A spin-lock using such instructions is more efficient and uses less energy than spin locks with or without a back-off loop.<a class="footnote-ref" id="fnref:7" href="#fn:7">7</a>

<h2 id="alternatives">Alternatives</h2>
The primary disadvantage of a spinlock is that, while <a href="/facts/Wait_(operating_system)/pDCYnnyj">waiting</a> to acquire a lock, it wastes time that might be productively spent elsewhere. There are two ways to avoid this:

<ol><li>Do not acquire the lock. In many situations it is possible to design data structures that <a href="/facts/Non-blocking_synchronization/KhLDma0v">do not require locking</a>, e.g. by using per-thread or per-CPU data and disabling <a href="/facts/Interrupt/Mm6C4rpc">interrupts</a>.</li>
<li><a href="/facts/Context_switch/N0oJQ6CC">Switch</a> to a different thread while waiting. This typically involves attaching the current thread to a queue of threads waiting for the lock, followed by switching to another thread that is ready to do some useful work. This scheme also has the advantage that it guarantees that <a href="/facts/Resource_starvation/E9UjtIWS">resource starvation</a> does not occur as long as all threads eventually relinquish locks they acquire and scheduling decisions can be made about which thread should progress first. Spinlocks that never entail switching, usable by <a href="/facts/Real-time_operating_system/IvUCFwCL">real-time operating systems</a>, are sometimes called raw spinlocks.<a class="footnote-ref" id="fnref:8" href="#fn:8">8</a></li></ol>
Most operating systems (including <a href="/facts/Solaris_(operating_system)/KPfT98EJ">Solaris</a>, <a href="/facts/Mac_OS_X/Y93vVpTm">Mac OS X</a> and <a href="/facts/FreeBSD/RKsFTm7F">FreeBSD</a>) use a hybrid approach called "adaptive <a href="/facts/Mutual_exclusion/1dwvCxDW">mutex</a>". The idea is to use a spinlock when trying to access a resource locked by a currently-running thread, but to sleep if the <a href="/facts/Thread_(computing)/hh8WHyPd">thread</a> is not currently running. (The latter is always the case on single-processor systems.)<a class="footnote-ref" id="fnref:9" href="#fn:9">9</a>
<a href="/facts/OpenBSD/xmPS9tek">OpenBSD</a> attempted to replace spinlocks with <a href="/facts/Ticket_lock/0tVHsodQ">ticket locks</a> which enforced <a href="/facts/FIFO_(computing_and_electronics)/RctvBNgE">first-in-first-out</a> behaviour, however this resulted in more CPU usage in the kernel and larger applications, such as <a href="/facts/Firefox/xez2zv8H">Firefox</a>, becoming much slower.<a class="footnote-ref" id="fnref:10" href="#fn:10">10</a><a class="footnote-ref" id="fnref:11" href="#fn:11">11</a>

<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Synchronization_(computer_science)/n7ID76uM">Synchronization</a></li>
<li><a href="/facts/Busy_spin/WUdbMj6z">Busy spin</a></li>
<li><a href="/facts/Deadlock_(computer_science)/YDpdfnPP">Deadlock (computer science)</a></li>
<li><a href="/facts/Seqlock/UvnoJrhR">Seqlock</a></li>
<li><a href="/facts/Ticket_lock/0tVHsodQ">Ticket lock</a></li></ul>

<h2 id="external-links">External links</h2>
<ul><li><a href="http://www.opengroup.org/onlinepubs/009695399/functions/pthread_spin_lock.html">pthread_spin_lock documentation</a> from The Open Group Base Specifications Issue 6, IEEE Std 1003.1, 2004 Edition</li>
<li><a href="https://github.com/concurrencykit/ck/blob/master/include/ck_spinlock.h">Variety of spinlock Implementations</a> from Concurrency Kit</li>
<li>Article "<a href="https://web.archive.org/web/20041211235628/http://www.codeproject.com/threads/spinlocks.asp">User-Level Spin Locks - Threads, Processes & IPC</a>" by Gert Boddaert</li>
<li>Article <a href="http://tech693.blogspot.com/2018/08/java-spin-lock-implementation.html">Spin Lock Example in Java</a></li>
<li>Paper "<a href="http://www.cs.washington.edu/homes/tom/pubs/spinlock.html">The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors</a>" by <a href="/facts/Thomas_E._Anderson/sVFPdle9">Thomas E. Anderson</a></li>
<li>Paper "<a href="http://portal.acm.org/citation.cfm?id=103727.103729">Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors</a>" by John M. Mellor-Crummey and <a href="/facts/Michael_L._Scott/85MujmUF">Michael L. Scott</a>. This paper received the <a href="http://www.podc.org/dijkstra/2006.html">2006 Dijkstra Prize in Distributed Computing</a>.</li>
<li><a href="http://msdn.microsoft.com/en-us/magazine/cc163726.aspx">Spin-Wait Lock</a> by Jeffrey Richter</li>
<li><a href="http://austria.sourceforge.net/dox/html/classSpinLock.html">Austria C++ SpinLock Class Reference</a></li>
<li><a href="http://msdn2.microsoft.com/en-us/library/ms684122(VS.85).aspx">Interlocked Variable Access(Windows)</a></li>
<li><a href="http://pages.cs.wisc.edu/~remzi/OSTEP/threads-locks.pdf">Operating Systems: Three Easy Pieces (Chapter: Locks)</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Silberschatz, Abraham; Galvin, Peter B. (1994). Operating System Concepts (Fourth ed.). Addison-Wesley. pp. 176–179. ISBN 0-201-59292-4. <a href="0-201-59292-4" target="_blank">0-201-59292-4</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">"gcc - x86 spinlock using cmpxchg". Stack Overflow. <a href="https://stackoverflow.com/a/6935581" target="_blank">https://stackoverflow.com/a/6935581</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">"New Technologies in the Arm Architecture" (PDF). Archived (PDF) from the original on 2019-04-02. Retrieved 2019-09-26. <a href="https://static.sched.com/hosted_files/bkk19/3c/BKK19-202_New-Technologies-in-Arm-Architecture.pdf" target="_blank">https://static.sched.com/hosted_files/bkk19/3c/BKK19-202_New-Technologies-in-Arm-Architecture.pdf</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Maurice Herlihy and Nir Shavit.
"The Art of Multiprocessor Programming".
"Spin Locks and Contention". <a href="http://cs.brown.edu/courses/cs176/lectures/chapter_07.pdf" target="_blank">http://cs.brown.edu/courses/cs176/lectures/chapter_07.pdf</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Maurice Herlihy and Nir Shavit.
"The Art of Multiprocessor Programming".
"Spin Locks and Contention". <a href="http://cs.brown.edu/courses/cs176/lectures/chapter_07.pdf" target="_blank">http://cs.brown.edu/courses/cs176/lectures/chapter_07.pdf</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">"Boost.Fiber Tuning: Exponential back-off". <a href="https://www.boost.org/doc/libs/1_78_0/libs/fiber/doc/html/fiber/tuning.html" target="_blank">https://www.boost.org/doc/libs/1_78_0/libs/fiber/doc/html/fiber/tuning.html</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">John Goodacre and Andrew N. Sloss.
"Parallelism and the ARM Instruction Set Architecture".
p. 47. <a href="https://www.ics.uci.edu/~eli/courses/cs244-w12/arm.pdf" target="_blank">https://www.ics.uci.edu/~eli/courses/cs244-w12/arm.pdf</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">Jonathan Corbet (9 December 2009). "Spinlock naming resolved". LWN.net. Archived from the original on 7 May 2013. Retrieved 14 May 2013. <a href="https://lwn.net/Articles/365863/" target="_blank">https://lwn.net/Articles/365863/</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">Silberschatz, Abraham; Galvin, Peter B. (1994). Operating System Concepts (Fourth ed.). Addison-Wesley. p. 198. ISBN 0-201-59292-4. <a href="0-201-59292-4" target="_blank">0-201-59292-4</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
<li id="fn:10">Ted Unangst (2013-06-01). "src/lib/librthread/rthread.c - Revision 1.71". Archived from the original on 2021-02-27. Retrieved 2022-01-25. <a href="http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/librthread/rthread.c?rev=1.71&content-type=text/x-cvsweb-markup" target="_blank">http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/librthread/rthread.c?rev=1.71&content-type=text/x-cvsweb-markup</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></li>
<li id="fn:11">Ted Unangst (2016-05-06). "tedu comment on Locking in WebKit - Lobsters". <a href="https://lobste.rs/c/6cybxn" target="_blank">https://lobste.rs/c/6cybxn</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></li>
</ol>

Spinlock open-in-new

Spinlock