SSE originally added eight new 128-bit registers known as XMM0 through XMM7. The AMD64 extensions from AMD added a further eight registers XMM8 through XMM15, and this extension is duplicated in the Intel 64 architecture. There is also a new 32-bit control/status register, MXCSR. The registers XMM8 through XMM15 are accessible only in 64-bit operating mode.
SSE used only a single data type for XMM registers:
SSE2 would later expand the usage of the XMM registers to include:
Because these 128-bit registers are additional machine states that the operating system must preserve across task switches, they are disabled by default until the operating system explicitly enables them. This means that the OS must know how to use the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions that can save all x86 and SSE register states at once. This support was quickly added to all major IA-32 operating systems.
The first CPU to support SSE, the Pentium III, shared execution resources between SSE and the floating-point unit (FPU).4 While a compiled application can interleave FPU and SSE instructions side-by-side, the Pentium III will not issue an FPU and an SSE instruction in the same clock cycle. This limitation reduces the effectiveness of pipelining, but the separate XMM registers do allow SIMD and scalar floating-point operations to be mixed without the performance hit from explicit MMX/floating-point mode switching.
SSE introduced both scalar and packed floating-point instructions.
The following simple example demonstrates the advantage of using SSE. Consider an operation like vector addition, which is used very often in computer graphics applications. To add two single precision, four-component vectors together using x86 requires four floating-point addition instructions.
This corresponds to four x86 FADD instructions in the object code. On the other hand, as the following pseudo-code shows, a single 128-bit 'packed-add' instruction can replace the four scalar addition instructions.
The following programs can be used to determine which, if any, versions of SSE are supported on a system
"Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture". Intel. April 2022. pp. 5-16–5-19. Archived from the original on April 25, 2022. Retrieved May 16, 2022. https://cdrdv2.intel.com/v1/dl/getContent/671200 ↩
Diefendorff, Keith (March 8, 1999). "Pentium III = Pentium II + SSE: Internet SSE Architecture Boosts Multimedia Performance" (PDF). Microprocessor Report. 13 (3). Archived (PDF) from the original on April 17, 2018. Retrieved September 1, 2017. http://docencia.ac.upc.edu/ETSETB/SEGPAR/microprocessors/pentium3%20(mpr).pdf ↩
"AMD Extensions to the 3DNow and MMX Instruction Sets Manual" (PDF). Advanced Micro Devices, Inc. March 2000. Archived from the original (PDF) on May 17, 2008. Retrieved April 18, 2024. https://web.archive.org/web/20080517014932/http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22466.pdf ↩
Vance, Ashlee (August 3, 2007). "AMD plots single thread boost with x86 extensions". The Register. Archived from the original on April 27, 2011. Retrieved August 24, 2017. /wiki/Ashlee_Vance ↩
"AMD64 Technology: 128-Bit SSE5 Instruction Set" (PDF). AMD. August 2007. Archived (PDF) from the original on August 25, 2017. Retrieved August 24, 2017. http://developer.amd.com/wordpress/media/2012/10/AMD64_128_Bit_SSE5_Instrs.pdf ↩
"AMD64 Technology AMD64 Architecture Programmer's Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4 Instructions" (PDF). AMD. November 2009. Archived (PDF) from the original on January 31, 2017. Retrieved August 24, 2017. https://support.amd.com/TechDocs/43479.pdf ↩
Girkar, Milind (October 1, 2013). "Intel® Advanced Vector Extensions (Intel® AVX)". Intel. Archived from the original on August 25, 2017. Retrieved August 24, 2017. https://software.intel.com/en-us/isa-extensions/intel-avx ↩
"Download the Intel® Processor Identification Utility". Intel. July 24, 2017. Archived from the original on August 25, 2017. Retrieved August 24, 2017. https://www.intel.com/content/www/us/en/support/processors/000005651.html ↩