In systems as Multiprocessor system, multi-core and NUMA system, where a dedicated cache for each processor, core or node is used, a consistency problem may occur when a same data is stored in more than one cache. This problem arises when a data is modified in one cache. This problem can be solved in two ways:
Note: Coherency generally applies only to data (as operands) and not to instructions (see Self-Modifying Code).
The schemes can be classified based on:
Three approaches are adopted to maintain the coherency of data.
Protocol used in bus-based systems like a SMP systems
Systems operating under a single OS (Operating System) with two or more homogeneous processors and with a centralized shared Main Memory
Each processor has its own cache that acts as a bridge between processor and Main Memory. The connection is made using a System Bus or a Crossbar ("xbar") or a mix of two previously approach, bus for address and crossbar for Data (Data crossbar).123
The bottleneck of these systems is the traffic and the Memory bandwidth. Bandwidth can be increasing by using large data bus path, data crossbar, memory interleaving (multi-bank parallel access) and out of order data transaction. The traffic can be reduced by using a cache that acts as a "filter" versus the shared memory, that is the cache is an essential element for shared-memory in SMP systems.
In multiprocessor systems with separate caches that share a common memory, a same datum can be stored in more than one cache. A data consistency problem may occur when data is modified in one cache only. The protocols to maintain the coherency for multiple processors are called cache-coherency protocols.
Usually in SMP the coherency is based on the "Bus watching" or "Snoopy" (after the Peanuts' character Snoopy ) approach. In a snooping system, all the caches monitor (or snoop) the bus transactions to intercept the data and determine if they have a copy on its cache.
Various cache-coherency protocols are used to maintain data coherency between caches.4
These protocols are generally classified based only on the cache states (from 3 to 5 and 7 or more) and the transactions between them, but this could create some confusion.
This definition is incomplete because it lacks important and essential information as the actions that these produce. These actions can be invoked by the processor or the bus controller (e.g. intervention, invalidation, broadcasting, etc.). The type of actions are implementation dependent. The states and transaction rules do not capture everything about a protocol. For instance protocol MESI with shared-intervention on unmodified data and MESI without intervention are different (see below). At the same time, some protocols with different states can be practically the same. For instance, the 4-state MESI Illinois and 5-state MERSI (R-MESI) IBM / MESIF-Intel protocols are only different implementations of the same functionality (see below).
The most common protocols are the 4-state MESI and the 5-state MOESI, each letter standing for one of the possible states of the cache. Other protocols use some proper subset of these but with different implementations along with their different but equivalent terminology. The terms MESI, MOESI or any subset of them generally refer to a class of protocols instead of a specific one.
The states MESI and MOESI are often and more commonly called by different names.
Special states:
– Note: The main terminologies are SD-D-R-V-I and MOESI and so they will be used both.
(*) Special state – Asking for a reservation for load and store doubleword (for 64-bit implementations).
The main operations are:
Write Through
Write-Back
Write Allocate
Write-no-Allocate
There are three characteristics of cached data:
(*) – Implementation depending.
Note: Not to confuse the more restrictive "owner" definition in MOESI protocol with this more general definition.
The cache operations are:
States MESI = D-R-V-I
States MEOSI = D-R-SD-V-I = T-MESI IBM56
States MESI = D-R-V-I57
States D-R-V-I (MESI) 585960
(Bull-Honeywell Italia)
States D-SD-R-V-I (MOESI) Patented protocol (F. Zulian)61
States D-V-I (MSI)62
States D-SD-V-I (MOSI)63
States D-VE-S (MES)64
States D-SD-VE-SC (MOES)65
Note – the state SC, despite the term "clean", can be "clean" or "dirty" as the S state of the other protocols. SC and S are equivalents
States MERSI or R-MESI States MESIF Patented protocols – IBM (1997)66 – Intel (2002)67
MESI and MOESI are the most popular protocols
It is common opinion that MOESI is an extension of MESI protocol and therefore it is more sophisticate and more performant. This is true only if compared with standard MESI, that is MESI with "not sharing intervention". MESI with "sharing intervention", as MESI Illinois like or the equivalent 5-state protocols MERSI / MESIF, are much more performant than the MOESI protocol.
In MOESI, cache-to-cache operations is made only on modified data. Instead in MESI Illinois type and MERSI / MESIF protocols, the cache-to-cache operations are always performed both with clean that with modified data. In case of modified data, the intervention is made by the "owner" M, but the ownership is not loosed because it is migrated in another cache (R/F cache in MERSI / MESIF or a selected cache as Illinois type). The only difference is that the MM must be updated. But also in MOESI this transaction should be done later in case of replacement, if no other modification occurs meanwhile. However this it is a smaller limit compared to the memory transactions due to the not-intervention, as in case of clean data for MOESI protocol. (see e.g. "Performance evaluation between MOESI (Shanghai) and MESIF Nehalem-EP"68)
The most advance systems use only R-MESI / MESIF protocol or the more complete RT-MESI, HRT-ST-MESI and POWER4 IBM protocols that are an enhanced merging of MESI and MOESI protocols
Note: Cache-to-cache is an efficient approach in multiprocessor/multicore systems direct connected between them, but less in Remote cache as in NUMA systems where a standard MESI is preferable. Example in POWER4 IBM protocol "shared intervention" is made only "local" and not between remote module.
States RT-MESI IBM patented protocol6970
Processor operations
It is an improvement of RT-MESI protocol71 and it is a subset of HRT-ST-MESI protocol72
An additional improvement can be obtained using more than a ST state, ST1, ST2, … STn.
IBM patented full HRT-ST-MESI protocol7374
- I state = Invalid Tag (*) – Invalid Data - H state = Valid Tag – Invalid Data
- I state is set at the cache initialization and its state changes only after a processor Read or Write miss. After it will not return more in this state.
- H has the same functionality of I state but in addition with the ability to capture any bus transaction that match the Tag of the directory and to update the data cache.
- After the first utilization I is replaced by H in its functions
(*) – Note: The Tag for definition is always valid, but until the first updating of the cache line it is considered invalid in order to avoid to update the cache also when this line has been not still required and used.
States M-T-Me-S-I -Mu-SL = RT-MESI+Mu75
Under some conditions the most efficient and complete protocol turns out to be the HRT-ST-MESI protocol.
US patent 5701413, "Multi-processor system with shared memory" https://worldwide.espacenet.com/textdoc?DB=EPODOC&IDX=US5701413 ↩
EP patent 0923032A1, "Method for transferring data in a multiprocessor computer system with crossbar interconnecting unit" https://worldwide.espacenet.com/textdoc?DB=EPODOC&IDX=EP0923032A1 ↩
"Specification and Verification of the PowerScale Bus Arbitration Protocol: An Industrial Experiment with LOTOS, Chap. 2, Pag. 4" (PDF). ftp://ftp.inrialpes.fr/pub/vasy/publications/cadp/Chehaibar-Garavel-et-al-96.pdf ↩
, Archibald, James; Baer, Jean-Loup (1986). "Cache coherence protocols: evaluation using a multiprocessor simulation model" (PDF). ACM Transactions on Computer Systems. 4 (4). Association for Computing Machinery (ACM): 273–298. doi:10.1145/6513.6514. ISSN 0734-2071. S2CID 713808. http://ctho.org/toread/forclass/18-742/3/p273-archibald.pdf ↩
""MPC7400 RISC Microprocessor User's Manual"" (PDF). http://pccomponents.com/datasheets/MOT-MPC7400.PDF ↩
US Patent 5996049, "Cache-coherency protocol with recently read state for data and instructions" https://worldwide.espacenet.com/textdoc?DB=EPODOC&IDX=US5996049 ↩
""An Introduction to the Intel® QuickPath Interconnect"" (PDF). http://www.intel.ie/content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf ↩
US Patent 6922756, "Forward state for use in cache coherency in a multiprocessor system" https://worldwide.espacenet.com/textdoc?DB=EPODOC&IDX=US6922756 ↩
"POWER4 System Microarchitecture" (PDF). cc.gatech.edu. 2008-10-08. Archived from the original (PDF) on 2013-11-07. https://web.archive.org/web/20131107140531/http://www.cc.gatech.edu/~bader/COURSES/UNM/ece637-Fall2003/papers/TDF02.pdf ↩
"IBM PowerPC 476FP L2 Cache Core Databook" https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/8D5342097498C81A852575C50078D867/$file/L2CacheController_v1.5_ext_Pub.pdf ↩
US Patent 6275908, "Cache Coherency Protocol Including an HR State" https://worldwide.espacenet.com/textdoc?DB=EPODOC&IDX=US6275908 ↩
US Patent 6334172, "Cache Coherency Protocol with Tagged State for Modified Values" https://worldwide.espacenet.com/textdoc?DB=EPODOC&IDX=US6334172 ↩
""MPC750UM/D 12/2001 Rev. 1 MPC750 RISC Microprocessor Family User's Manual"" (PDF). http://www.freescale.com/files/32bit/doc/ref_manual/MPC750UM.pdf ↩
Shanley, T. (1998). Pentium Pro and Pentium II System Architecture. Mindshare PC System Architecture Series. Addison-Wesley. p. 160. ISBN 978-0-201-30973-7. 978-0-201-30973-7 ↩
AMD64 Architecture Programmer's Manual. Vol. 2: System Programming. AMD. May 2013 – via Internet Archive. https://archive.org/details/24593APMV21 ↩
Sweazey, P.; Smith, A. J. (1986). "A class of compatible cache consistency protocols and their support by the IEEE futurebus" (PDF). ACM SIGARCH Computer Architecture News. 14 (2). Association for Computing Machinery (ACM): 414–423. doi:10.1145/17356.17404. ISSN 0163-5964. S2CID 9713683. http://pdf.aminer.org/000/419/524/a_class_of_compatible_cache_consistency_protocols_and_their_support.pdf ↩
Papamarcos, Mark S.; Patel, Janak H. (1984). "A low-overhead coherence solution for multiprocessors with private cache memories". Proceedings of the 11th annual international symposium on Computer architecture - ISCA '84. New York, New York, USA: ACM Press. pp. 348–354. doi:10.1145/800015.808204. ISBN 0818605383. 0818605383 ↩
Goodman, James R. (1983). "Using cache memory to reduce processor-memory traffic" (PDF). Proceedings of the 10th annual international symposium on Computer architecture - ISCA '83. New York, New York, USA: ACM Press. pp. 124–131. doi:10.1145/800046.801647. ISBN 0897911016. 0897911016 ↩
Hwang, K. (2011). Advanced Computer Architecture, 2E. McGraw-Hill computer science series. McGraw Hill Education. p. 301. ISBN 978-0-07-070210-3. 978-0-07-070210-3 ↩
EP patent 0396940B1, Ferruccio Zulian, "Cache memory and related consistency protocol" https://worldwide.espacenet.com/textdoc?DB=EPODOC&IDX=EP0396940B1 ↩
Hackenberg, Daniel; Molka, Daniel; Nagel, Wolfgang E. (2009-12-12). "Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems" (PDF). Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. New York, NY, USA: ACM. pp. 413–422. doi:10.1145/1669112.1669165. ISBN 9781605587981. 9781605587981 ↩
Rolf, Trent (2009), Cache Organization and Memory Management of the Intel Nehalem Computer Architecture (PDF) http://gec.di.uminho.pt/Discip/MInf/cpd1011/PAC/material/nehalemPaper.pdf ↩
David Kanter (2007-08-28), "The Common System Interface: Intel's Future Interconnect", Real World Tech: 5, retrieved 2012-08-12 http://www.realworldtech.com/common-system-interface/5/ ↩
"Optimizing the MESI Cache Coherence Protocol for Multithreaded Applications on Small Symmetric Multiprocessor Systems". Neal Tibrewala's Resume. 2003-12-12. Archived from the original on 2016-10-22. http://tibrewala.net/papers/mesi98/ ↩