Intel SSE4 consists of 54 instructions. A subset consisting of 47 instructions, referred to as SSE4.1 in some Intel documentation, is available in Penryn. Additionally, SSE4.2, a second subset consisting of the seven remaining instructions, is first available in Nehalem-based Core i7. Intel credits feedback from developers as playing an important role in the development of the instruction set.
Starting with Barcelona-based processors, AMD introduced the SSE4a instruction set, which has four SSE4 instructions and four new SSE instructions. These instructions are not found in Intel's processors supporting SSE4.1 and AMD processors only started supporting Intel's SSE4.1 and SSE4.2 (the full SSE4 instruction set) in the Bulldozer-based FX processors. With SSE4a the misaligned SSE feature was also introduced which meant unaligned load instructions were as fast as aligned versions on aligned addresses. It also allowed disabling the alignment check on non-load SSE operations accessing memory.6 Intel later introduced similar speed improvements to unaligned SSE in their Nehalem processors, but did not introduce misaligned access by non-load SSE instructions until AVX.7
What is now known as SSSE3 (Supplemental Streaming SIMD Extensions 3), introduced in the Intel Core 2 processor line, was referred to as SSE4 by some media until Intel came up with the SSSE3 moniker. Internally dubbed Merom New Instructions, Intel originally did not plan to assign a special name to them, which was criticized by some journalists.8 Intel eventually cleared up the confusion and reserved the SSE4 name for their next instruction set extension.9
Intel is using the marketing term HD Boost to refer to SSE4.10
Unlike all previous iterations of SSE, SSE4 contains instructions that execute operations which are not specific to multimedia applications. It features a number of instructions whose action is determined by a constant field and a set of instructions that take XMM0 as an implicit third operand.
Several of these instructions are enabled by the single-cycle shuffle engine in Penryn. (Shuffle operations reorder bytes within a register.)
These instructions were introduced with Penryn microarchitecture, the 45 nm shrink of Intel's Core microarchitecture. Support is indicated via the CPUID.01H:ECX.SSE41[Bit 19] flag.
This is equivalent to setting the Z flag if none of the bits masked by SRC are set, and the C flag if all of the bits masked by SRC are set.
SSE4.2 added STTNI (String and Text New Instructions),12 several new instructions that perform character searches and comparison on two operands of 16 bytes at a time. These were designed (among other things) to speed up the parsing of XML documents.13 It also added a CRC32 instruction to compute cyclic redundancy checks as used in certain data transfer protocols. These instructions were first implemented in the Nehalem-based Intel Core i7 product line, and complete the SSE4 instruction set. AMD on the other hand first added support starting with the Bulldozer microarchitecture. Support is indicated via the CPUID.01H:ECX.SSE42[Bit 20] flag.
Windows 11 24H2 requires the CPU to support SSE4.2, otherwise the Windows kernel is unbootable.14
These instructions operate on integer rather than SSE registers, because they are not SIMD instructions, but appear at the same time and although introduced by AMD with the SSE4a instruction set, they are counted as separate extensions with their own dedicated CPUID bits to indicate support. Intel implements POPCNT beginning with the Nehalem microarchitecture and LZCNT beginning with the Haswell microarchitecture. AMD implements both, beginning with the Barcelona microarchitecture.
AMD calls this pair of instructions Advanced Bit Manipulation (ABM).
The encoding of LZCNT takes the same encoding path as the encoding of the BSR (bit scan reverse) instruction. This results in an issue where LZCNT called on some CPUs not supporting it, such as Intel CPUs prior to Haswell, may incorrectly execute the BSR operation instead of raising an invalid instruction exception. This is an issue as the result values of LZCNT and BSR are different.
Trailing zeros can be counted using the BSF (bit scan forward) or TZCNT instructions.
Windows 11 24H2 requires the CPU to support POPCNT, otherwise the Windows kernel is unbootable.17
The SSE4a instruction group was introduced in AMD's Barcelona microarchitecture. These instructions are not available in Intel processors. Support is indicated via the CPUID.80000001H:ECX.SSE4A[Bit 6] flag.20
See also: X86-64 § Microarchitecture levels
Intel Streaming SIMD Extensions 4 (SSE4) Instruction Set Innovation Archived May 30, 2009, at the Wayback Machine, Intel. http://www.intel.com/technology/architecture-silicon/sse4-instructions/index.htm ↩
Tuning for Intel SSE4 for the 45nm Next Generation Intel Core Microarchitecture Archived March 8, 2021, at the Wayback Machine, Intel. https://intel.wingateweb.com/published/BMAS005/BMAS005_100Eng.pdf ↩
"Intel SSE4 Programming Reference" (PDF). Archived (PDF) from the original on February 15, 2020. Retrieved December 26, 2014. https://software.intel.com/sites/default/files/m/8/b/8/D9156103.pdf ↩
""Barcelona" Processor Feature: SSE Misaligned Access". AMD. Archived from the original on August 9, 2016. Retrieved March 3, 2015. https://web.archive.org/web/20160809220231/http://developer.amd.com/community/blog/2008/04/14/barcelona-processor-feature-sse-misaligned-access/ ↩
"Inside Intel Nehalem Microarchitecture". Archived from the original on April 2, 2015. Retrieved March 3, 2015. http://www.hardwaresecrets.com/article/Inside-Intel-Nehalem-Microarchitecture/535/7 ↩
My Experience With "Conroe" Archived October 15, 2013, at the Wayback Machine, DailyTech http://www.dailytech.com/article.aspx?newsid=3348 ↩
Extending the World's Most Popular Processor Architecture Archived November 24, 2011, at the Wayback Machine, Intel ftp://download.intel.com/technology/architecture/new-instructions-paper.pdf ↩
"Intel - Data Center Solutions, IOT, and PC Innovation". Intel. Archived from the original on February 7, 2013. Retrieved September 17, 2009. http://www.intel.com/technology/product/demos/hdb/demo.htm ↩
Motion Estimation with Intel Streaming SIMD Extensions 4 (Intel SSE4) Archived June 16, 2018, at the Wayback Machine, Intel. https://software.intel.com/en-us/articles/motion-estimation-with-intel-streaming-simd-extensions-4-intel-sse4 ↩
"Schema Validation with Intel Streaming SIMD Extensions 4 (Intel SSE4)". Archived from the original on June 17, 2018. Retrieved February 6, 2012. http://software.intel.com/en-us/articles/schema-validation-with-intel-streaming-simd-extensions-4-intel-sse4/ ↩
"XML Parsing Accelerator with Intel Streaming SIMD Extensions 4 (Intel SSE4)". Archived from the original on June 17, 2018. Retrieved February 6, 2012. http://software.intel.com/en-us/articles/xml-parsing-accelerator-with-intel-streaming-simd-extensions-4-intel-sse4/ ↩
Klotz, Aaron (April 24, 2024). "Microsoft blocks some PCs from Windows 11 24H2 — CPU must support SSE4.2 or the OS will not boot". Tom's Hardware. Retrieved April 29, 2024. https://www.tomshardware.com/software/windows/microsoft-updates-windows-11-24h2-requirements-cpu-must-support-sse42-or-the-os-will-not-boot ↩
Intel SSE4 Programming Reference Archived February 15, 2020, at the Wayback Machine p. 61. See also RFC 3385 Archived June 19, 2008, at the Wayback Machine for discussion of the CRC32C polynomial. https://software.intel.com/sites/default/files/m/8/b/8/D9156103.pdf ↩
Fast, Parallelized CRC Computation Using the Nehalem CRC32 Instruction — Dr. Dobbs, April 12, 2011 http://drdobbs.com/cpp/229401411 ↩
Sen, Sayan (March 17, 2024). "Microsoft fixes a misfired PopCnt block but Windows 11 24H2 requirements may be here to stay". Neowin. Retrieved March 17, 2024. https://www.neowin.net/news/microsoft-fixes-a-misfired-popcnt-block-but-windows-11-24h2-requirements-may-be-here-to-stay/ ↩
Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 2B: Instruction Set Reference, N–Z Archived March 8, 2011, at the Wayback Machine. http://www.intel.com/products/processor/manuals/ ↩
"AMD CPUID Specification" (PDF). Archived (PDF) from the original on November 1, 2013. Retrieved October 30, 2013. http://developer.amd.com/wordpress/media/2012/10/254811.pdf ↩
Rahul Chaturvedi (September 17, 2007). ""Barcelona" Processor Feature: SSE4a Instruction Set". Archived from the original on October 25, 2013. https://archive.today/20131025122939/http://developer.amd.com/community/blog/barcelona-processor-feature-sse4a-instruction-set/ ↩
Rahul Chaturvedi (October 2, 2007). ""Barcelona" Processor Feature: SSE4a, part 2". Archived from the original on October 25, 2013. https://archive.today/20131025123127/http://developer.amd.com/community/blog/barcelona-processor-feature-sse4a-part-2/ ↩
"AMD FX-Series FX-6300 - FD6300WMW6KHK / FD6300WMHKBOX". Archived from the original on August 17, 2017. Retrieved October 9, 2015. http://www.cpu-world.com/CPUs/Bulldozer/AMD-FX-Series%20FX-6300.html ↩