Although sometimes called a "bus", QPI is a scalable interconnect fabric with dynamic routing capabilities. It was designed to compete with HyperTransport that had been used by Advanced Micro Devices (AMD) since around 2003.67 Intel developed QPI at its Massachusetts Microprocessor Design Center (MMDC) by members of what had been the Alpha Development Group, which Intel had acquired from Compaq and HP and in turn originally came from Digital Equipment Corporation (DEC).8 Its development had been reported as early as 2004.9
Intel first delivered it for desktop processors in November 2008 on the Intel Core i7-9xx and X58 chipset. It was released in Xeon processors code-named Nehalem in March 2009 and Itanium processors in February 2010 (code named Tukwila).10
It was supplanted by the Intel Ultra Path Interconnect starting in 2017 on the Xeon Skylake-SP platforms. 11
The QPI is an element of a system architecture that Intel calls the QuickPath architecture that implements what Intel calls QuickPath technology.12 In its simplest form on a single-processor motherboard, a single QPI is used to connect the processor to the IO Hub (e.g., to connect an Intel Core i7 to an X58). In more complex instances of the architecture, separate QPI link pairs connect one or more processors and one or more IO hubs or routing hubs in a network on the motherboard, allowing all of the components to access other components via the network. As with HyperTransport, the QuickPath Architecture assumes that the processors will have integrated memory controllers, and enables a non-uniform memory access (NUMA) architecture.
Each QPI comprises two 20-lane point-to-point data links, one in each direction (full duplex), with a separate clock pair in each direction, for a total of 42 signals. Each signal is a differential pair, so the total number of pins is 84. The 20 data lanes are divided onto four "quadrants" of 5 lanes each. The basic unit of transfer is the 80-bit flit, which has 8 bits for error detection, 8 bits for "link-layer header", and 64 bits for data. One 80-bit flit is transferred in two clock cycles (four 20-bit transfers, two per clock tick.) QPI bandwidths are advertised by computing the transfer of 64 bits (8 bytes) of data every two clock cycles in each direction.13
Although the initial implementations use single four-quadrant links, the QPI specification permits other implementations. Each quadrant can be used independently. On high-reliability servers, a QPI link can operate in a degraded mode. If one or more of the 20+1 signals fails, the interface will operate using 10+1 or even 5+1 remaining signals, even reassigning the clock to a data signal if the clock fails.14 The initial Nehalem implementation used a full four-quadrant interface to achieve 25.6 GB/s (6.4GT/s × 1 byte × 4), which provides exactly double the theoretical bandwidth of Intel's 1600 MHz FSB used in the X48 chipset.
Although some high-end Core i7 processors expose QPI, other "mainstream" Nehalem desktop and mobile processors intended for single-socket boards (e.g. LGA 1156 Core i3, Core i5, and other Core i7 processors from the Lynnfield/Clarksfield and successor families) do not expose QPI externally, because these processors are not intended to participate in multi-socket systems.
However, QPI is used internally on these chips to communicate with the "uncore", which is part of the chip containing memory controllers, CPU-side PCI Express and GPU, if present; the uncore may or may not be on the same die as the CPU core, for instance it is on a separate die in the Westmere-based Clarkdale/Arrandale.15161718: 3
In post-2009 single-socket chips starting with Lynnfield, Clarksfield, Clarkdale and Arrandale, the traditional northbridge functions are integrated into these processors, which therefore communicate externally via the slower DMI and PCI Express interfaces.
Thus, there is no need to incur the expense of exposing the (former) front-side bus interface via the processor socket.19
Although the core–uncore QPI link is not present in desktop and mobile Sandy Bridge processors (as it was on Clarkdale, for example), the internal ring interconnect between on-die cores is also based on the principles behind QPI, at least as far as cache coherency is concerned.20: 10
Being a synchronous circuit the QPI operates at a clock rate of 2.4 GHz, 2.93 GHz, 3.2 GHz, 3.6 GHz, 4.0 GHz or 4.8 GHz (3.6 GHz and 4.0 GHz frequencies were introduced with the Sandy Bridge-E/EP platform and 4.8 GHz with the Haswell-E/EP platform). The clock rate for a particular link depends on the capabilities of the components at each end of the link and the signal characteristics of the signal path on the printed circuit board. The non-extreme Core i7 9xx processors are restricted to a 2.4 GHz frequency at stock reference clocks.
Bit transfers occur on both the rising and the falling edges of the clock, so the transfer rate is double the clock rate.
Intel describes the data throughput (in GB/s) by counting only the 64-bit data payload in each 80-bit flit. However, Intel then doubles the result because the unidirectional send and receive link pair can be simultaneously active. Thus, Intel describes a 20-lane QPI link pair (send and receive) with a 3.2 GHz clock as having a data rate of 25.6 GB/s. A clock rate of 2.4 GHz yields a data rate of 19.2 GB/s. More generally, by this definition a two-link 20-lane QPI transfers eight bytes per clock cycle, four in each direction.
The rate is computed as follows:
QPI is specified as a five-layer architecture, with separate physical, link, routing, transport, and protocol layers.21 In devices intended only for point-to-point QPI use with no forwarding, such as the Core i7-9xx and Xeon DP processors, the transport layer is not present and the routing layer is minimal.
"An Introduction to the Intel QuickPath Interconnect" (PDF). Intel Corporation. January 30, 2009. Retrieved June 14, 2011. http://www.intel.com/technology/quickpath/introduction.pdf ↩
DailyTech report Archived 2013-10-17 at the Wayback Machine, retrieved August 21, 2007 http://www.dailytech.com/SMT+Multilevel+Cache+Confirmed+for+Nehalem/article8082.htm ↩
Eva Glass (May 16, 2007). "Intel CSI name revealed: Slow, slow, quick quick slow". The Inquirer. Archived from the original on June 10, 2012. Retrieved September 13, 2013. https://web.archive.org/web/20120610002636/http://www.theinquirer.net/inquirer/news/1016558/intel-csi-revealed ↩
David Kanter (2011-07-20). "Intel's Quick Path Evolved". Realworldtech.com. Retrieved 2014-01-21. http://www.realworldtech.com/qpi-evolved/ ↩
SoftPedia: Intel Plans to Replace Xeon with Its New Skylake-Based “Purley” Super Platform http://news.softpedia.com/news/Intel-Plans-to-Replace-Xeon-With-Its-New-Skylake-based-Purley-Super-Platform-484828.shtml ↩
Gabriel Torres (August 25, 2008). "Everything You Need to Know About The QuickPath Interconnect (QPI)". Hardware Secrets. Retrieved January 23, 2017. http://www.hardwaresecrets.com/everything-you-need-to-know-about-the-quickpath-interconnect-qpi/ ↩
Charlie Demerjian (December 13, 2005). "Intel Intel gets knickers in a twist over Tanglewood". The Inquirer. Archived from the original on September 3, 2010. Retrieved September 13, 2013. https://web.archive.org/web/20100903014920/http://www.theinquirer.net/inquirer/news/1010791/intel-knickers-twist-tanglewood ↩
David Kanter (August 28, 2007). "The Common System Interface: Intel's Future Interconnect". Real World Tech. Retrieved August 14, 2014. http://www.realworldtech.com/common-system-interface/ ↩
Eva Glass (December 12, 2004). "Intel's Whitefield takes four core IA-32 shape". The Inquirer. Archived from the original on May 24, 2009. Retrieved September 13, 2013. https://web.archive.org/web/20090524173105/http://www.theinquirer.net/inquirer/news/1028779/intels--whitefield-takes-four-core-ia-32--shape ↩
David Kanter (May 5, 2006). "Intel's Tukwila Confirmed to be Quad Core". Real World Tech. Archived from the original on May 10, 2012. Retrieved September 13, 2013. https://web.archive.org/web/20120510074305/http://realworldtech.com/page.cfm?NewsID=361 ↩
"Intel® Xeon® Processor Scalable Family Technical Overview". https://www.intel.com/content/www/us/en/developer/articles/technical/xeon-processor-scalable-family-technical-overview.html ↩
"Intel Demonstrates Industry's First 32nm Chip and Next-Generation Nehalem Microprocessor Architecture". Archived from the original on 2008-01-02. Retrieved 2007-12-31. https://web.archive.org/web/20080102101316/http://www.intel.com/pressroom/archive/releases/20070918corp_a.htm?iid=tech_arch_32nm+body_pressrelease ↩
Chris Angelini (2009-09-07). "QPI, Integrated Memory, PCI Express, And LGA 1156 - Intel Core i5 And Core i7: Intel's Mainstream Magnum Opus". Tomshardware.com. Retrieved 2014-01-21. http://www.tomshardware.com/reviews/intel-core-i5,2410-3.html ↩
Published on 25th January 2010 by Richard Swinburne (2010-01-25). "Feature - Intel GMA HD Graphics Performance". bit-tech.net. Retrieved 2014-01-21.{{cite web}}: CS1 maint: numeric names: authors list (link) http://www.bit-tech.net/hardware/graphics/2010/01/25/intel-gma-hd-graphics-performance/1 ↩
"Intel Clarkdale 32nm CPU-and-GPU chip benchmarked (again) - CPU - Feature". HEXUS.net. 2009-09-25. Retrieved 2014-01-21. http://hexus.net/tech/features/cpu/20419-intel-clarkdale-32nm-cpu-and-gpu-chip-benchmarked-again/ ↩
Oded Lempel (2013-07-28). "2nd Generation Intel Core Processor Family: Intel Core i7, i5 and i3" (PDF). hotchips.org. Archived from the original (PDF) on 2020-07-29. Retrieved 2014-01-21. https://web.archive.org/web/20200729000210/http://www.hotchips.org/wp-content/uploads/hc_archives/hc23/HC23.19.9-Desktop-CPUs/HC23.19.911-Sandy-Bridge-Lempel-Intel-Rev%207.pdf ↩
Lily Looi, Stephan Jourdan, Transitioning the Intel® Next Generation Microarchitectures (Nehalem and Westmere) into the Mainstream Archived 2020-08-02 at the Wayback Machine, Hot Chips 21, August 24, 2009 http://www.hotchips.org/wp-content/uploads/hc_archives/hc21/2_mon/HC21.24.400.ClientProcessors-Epub/HC21.24.442.Looi-Intel_NhmClient_Hotchips2009b.pdf ↩