The idea of directory-based cache coherence systems began long ago. The idea of DASH (Directory Architecture for SHared-memory) was first proposed by C.K. Tang3 in the mid 1970s. However, applying it to cache coherence was proposed a few years later in 1978, when researchers at Stanford University proposed the first version of this coherence systems called Stanford DASH, in a paper4 that described the system with the difficulties and improvements associated with such designs. Beside this approach, several attempts were done to provide a scalable systems. For instance, BBN Butterfly5 which was introduced in 1985, and IBM PR36 which was introduced in 1987, are some examples of scalable multiprocessor systems. However, both of these systems have a drawback; For example, BBN Butterfly does not have caches. Similarly, IBM PR3 does not provide hardware cache coherence, which limits the performance of both of these designs, especially when employing high performance processors.7
The limitations of other competitors made it easier for DASH based systems to get chosen when designing cache coherence systems and other systems needing scalability in cache-based nodes. In 1985, James Archibald8 and Jean-Loup Baer from the University of Washington published a paper9 that proposes a more economical, expandable, and modular variation of the "global directory" approach in the term of hardware use in the design.
In 1992, Daniel Lenoski from Stanford university published a paper10 proposing advances in cache coherence protocols for directory-based systems. In a 1996 paper,11 he introduced the design of the SGI Origin 2000, a family of server computers employing directory based cache coherence. The subsequent Origin 300012 was introduced in July 2000.
Unlike snoopy coherence protocols, in a directory based coherence approach, the information about which caches have a copy of a block is maintained in a structure called directory. In a directory based scheme, participating caches do not broadcast requests to all other sharing caches of the block in order to locate cached copies, instead it queries the directory to retrieve the information about which block have cached copies and sends only to those particular processors and hence traffic saving compared to a snoopy protocol is large. In well optimized applications, most data sharing is only for data that is read only, and there is little sharing for data that is frequently read and written. A directory approach can result in a substantial traffic saving compared to broadcast/snoopy approach in such applications.
As shown in the data flow diagram, the actors involved in a distributed shared memory system implementing directory based coherence protocol are:
Requestor and Owner nodes maintain their state transition similar to a snoopy coherence protocols like MESI protocol. However, unlike a bus based implementation where nodes communicate using a common bus, directory based implementation uses message passing model to exchange information required for maintaining cache coherence.
Directory node acts as a serializing point and all communications are directed through this node to maintain correctness.
A directory node keeps track of the overall state of a cache block in the entire cache system for all processors. It can be in three states :
Explanation of the directory state transition finite-state machine (refer image 1) is captured below in the table:
BusRdX
In addition to cache state, a directory must track which processors have data when in the shared state. This is required for sending invalidation and intervention requests to the individual processor caches which have the cache block in shared state. Few of the popular implementation approaches are:
The protocol described above is the basic implementation and race conditions can occur due to the fact that directory can be out of sync with the caches and also messages between processors can be overlapping. More complex implementations are available like Scalable Coherent Interface which have multiple states.
DASH13 cache coherence protocol is another protocol that uses directory-based coherence scheme. DASH protocol uses a clustered approach, where processors inside a cluster are kept coherent using bus based snooping scheme, while the clusters are connected in a directory approach. Even though various protocols use different implementations for tracking cache blocks, however the concept of directory remains same.
Solihin, Yan (2009). Fundamentals of parallel computer architecture. pp. 319–360. ↩
Tang, C.K. "Cache system design in the tightly coupled multiprocessor system". AFIPS '76 Proceedings of the June 7–10, 1976, National Computer Conference and Exposition. ↩
"The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor" (PDF). Computer Systems Laboratory. http://people.eecs.berkeley.edu/~kubitron/cs258/handouts/papers/p148-lenoski.pdf ↩
Schmidt, G.E. "The Butterfly Parallel Processor". In Proc. Of ICS. ↩
"The IBM research parallel processor prototype PR3: Introduction and architicture". In Proceeding of the 1985 International Conference of Parallel Processing. ↩
"Design of Scalable Shared-Memory Multiprocessors: The DASH approach". Computer System Laboratory, Stanford University. ↩
"James Archibald". ece.byu.edu. Archived from the original on 2017-08-02. Retrieved 2016-11-15. https://web.archive.org/web/20170802134914/http://ece.byu.edu/faculty/jka ↩
"An economical solution to the cache coherence problem". ISCA '84 Proceedings of the 11th Annual International Symposium on Computer Architecture. ↩
Lenoski, Daniel; Laudon, James; Gharachorloo, Kourosh; Weber, Wolf-Dietrich; Gupta, Anoop; Hennessy, John; Horowitz, Mark; Lam, Monica S. (1992-03-01). "The Stanford Dash Multiprocessor". Computer. 25 (3): 63–79. doi:10.1109/2.121510. ISSN 0018-9162. S2CID 9731523. /wiki/Doi_(identifier) ↩
Laudon, James; Lenoski, Daniel (1997-01-01). "The SGI Origin". Proceedings of the 24th annual international symposium on Computer architecture - ISCA '97. New York, NY, USA: ACM. pp. 241–251. doi:10.1145/264107.264206. ISBN 978-0897919012. S2CID 692050. 978-0897919012 ↩
Corp., Silicon Graphics International. "Support Home Page". support1-sgi.custhelp.com. Archived from the original on 2018-04-13. Retrieved 2016-11-16. https://web.archive.org/web/20180413084346/https://support1-sgi.custhelp.com/ ↩