SX-Aurora TSUBASA is a successor to the NEC SX series and SUPER-UX, which are vector computer systems upon which the Earth Simulator supercomputer is based. Its hardware consists of x86 Linux hosts with vector engines (VEs) connected via PCI express (PCIe) interconnect.6
High memory bandwidth (0.75–1.2 TB/s), comes from eight cores and six HBM2 memory modules on a silicon interposer implemented in the form-factor of a PCIe card.7 Operating system functionality for the VE is offloaded to the VH and handled mainly by user space daemons running the VEOS.8
Depending on the clock frequency (1.4 or 1.6 GHz), each VE CPU has eight cores and a peak performance of 2.15 or 2.45 TFLOPS in double precision. The processor has the world's first implementation of six HBM2 modules on a Silicon interposer with a total of 24 or 48 GB of high bandwidth memory. It is integrated in the form-factor of a standard full length, full height, double width PCIe card that is hosted by an x86_64 server, the Vector Host (VH). The server can host up to eight VEs, clusters VHs can scale to arbitrary number of nodes.91011
Version 2 Vector Engine12
(double precision GFLOPS)
(single precision GFLOPS)
(double precision TFLOPS)
(single precision TFLOPS)
Version 1 Vector Engine
The version 1.0 of the Vector Engine was produced in 16 nm FinFET process (from TSMC) and released in three SKUs (subsequent versions add an E at the end):13
Each of the eight SX-Aurora cores has 64 logical vector registers.14 These have 256 x 64 Bits length implemented as a mix of pipeline and 32-fold parallel SIMD units. The registers are connected to three FMA floating-point multiply and add units that can run in parallel, as well as two ALU arithmetical logical units handling fixed point operations and a divide and square root pipe.15 Considering only the FMA units and their 32-fold SIMD parallelism, a vector core is capable of 192 double precision operations per cycle.16 In "packed" vector operations, where two single precision values are loaded into the space of one double precision slot in the vector registers, the vector unit delivers twice as many operations per clock cycle compared to double precision.
A Scalar Processing Unit (SPU) handles non-vector instructions on each of the cores.
The memory of the SX-Aurora TSUBASA processor consists of six HBM2 second generation high-bandwidth memory modules implemented in the same package as the CPU with the help of Chip-on-Wafer-on-Substrate technology. Depending on the processor model, the HBM2 modules are either 4 or 8 die 3D modules with either 4 or 8 GB capacity, each. The SX-Aurora CPUs thus have either 24 GB or 48 GB HBM2 memory. The models implemented with large HBM2 modules have 1.2 TB/s memory bandwidth.17
The cores of a vector engine share 16 MB of "Last-Level-Cache" (LLC), a write-back cache directly connected to the vector registers and the L2 cache of the SPU. The LLC cache line size is 128 Bytes. The priority of data retention in the LLC can to some extent be controlled in software, allowing the programmer to specify which of the variables or arrays should be retained in cache, a feature comparable to that of the Advanced Data Buffer (ADB) of the NEC SX-ACE.
NEC is currently selling the SX-Aurora TSUBASA vector engine integrated into four platforms:1819
Within a VH node VEs can communicate with each other through PCIe. Large parallel systems built with SX-Aurora use Infiniband in a PeerDirect setup as interconnect.
NEC also used to sell the SX-Aurora TSUBASA vector engine integrated into five platforms:
All types are exclusively air cooled with the exception of the A500 series, which also utilizes watercooling.
The operating system of the vector engine (VE) is called "VEOS", and has been offloaded entirely to the host system, the vector host (VH).21 VEOS consists of kernel modules and user space daemons that:
VEOS supports multitasking on the VE and almost all Linux system calls are supported in the VE libc.23 Offloading operating system services to the VH shifts OS jitter away from the VE at the expense of increased latencies.24 All VE operating system related packages are licensed under the GNU General Public License and have been published at github.com/veos-sxarr-nec.
A Software Development Kit is available from NEC for developers and customers. It contains proprietary products and must be purchased from NEC. The SDK contains:
NEC MPI is also a proprietary implementation and is conforming to the MPI-3.1 standard specification.28
Hybrid programs can be created that use the VE as an accelerator for certain host kernel functions by using VE offloading C-API.29 To some extent VE offloading is comparable to OpenCL and CUDA, but provides a simpler API and allows the kernels to be developed in normal C, C++ or Fortran and use almost any syscall on the VE. Python bindings to VEO are available at github.com/SX-Aurora/py-veo.
1 NEC Numerical Library Collection is a collection of mathematical libraries that supports the development of numerical simulation programs.
"NEC SX-Aurora TSUBASA - Vector Engine". www.nec.com. Retrieved 2018-03-20. https://www.nec.com/en/global/solutions/hpc/sx/vector_engine.html ↩
Morgan, Timothy Prickett (October 27, 2017). "Can Vector Supercomputing Be Revived?". The Next Platform. https://www.nextplatform.com/2017/10/26/can-vector-supercomputing-revived/ ↩
"NEC releases new high-end HPC product line, SX-Aurora TSUBASA". NEC. Retrieved 2018-03-21. https://www.nec.com/en/press/201710/global_20171025_01.html ↩
Imai, Teruyuki (2019), Gerofi, Balazs; Ishikawa, Yutaka; Riesen, Rolf; Wisniewski, Robert W. (eds.), "NEC Earth Simulator and the SX-Aurora TSUBASA", Operating Systems for Supercomputers and High Performance Computing, High-Performance Computing Series, vol. 1, Singapore: Springer, pp. 139–160, doi:10.1007/978-981-13-6624-6_9, ISBN 978-981-13-6624-6, S2CID 204811906 978-981-13-6624-6 ↩
Morgan, Timothy Prickett (2017-11-22). "A Deep Dive Into NEC's Aurora Vector Engine". The Next Platform. Retrieved 2020-07-02. https://www.nextplatform.com/2017/11/22/deep-dive-necs-aurora-vector-engine/ ↩
Focht, Erich. "First steps with the SX-Aurora TSUBASA vector engine". sx-aurora.github.io. Retrieved 2020-07-02. https://sx-aurora.github.io/posts/VE-first-steps/ ↩
SX-Aurora TSUBASA Brochure https://www.nec.com/en/global/solutions/hpc/sx/docs/SX-Aurora_e.pdf ↩
"NEC Vector Engine Models". www.nec.com. Retrieved 15 September 2020. https://www.nec.com/en/global/solutions/hpc/sx/vector_engine.html ↩
"SX-Aurora TSUBASA" (PDF). NEC Corporation. February 2020. https://www.nec.com/en/global/solutions/hpc/sx/docs/SX-Aurora_eng_202002.pdf ↩
"NEC SX-Aurora TSUBASA Architecture". www.nec.com. Retrieved 2018-03-20. https://www.nec.com/en/global/solutions/hpc/sx/architecture.html ↩
"SX-Aurora - Microarchitectures - NEC - WikiChip". en.wikichip.org. Retrieved 2020-07-02. https://en.wikichip.org/wiki/nec/microarchitectures/sx-aurora ↩
"NEC SX-Aurora TSUBASA". https://www.nec.com/en/global/solutions/hpc/sx/index.html? ↩
"NEC SX-Aurora TSUBASA A500-64". www.nec.com. https://www.nec.com/en/global/solutions/hpc/sx/A500-64.html ↩
"NEC SX Aurora TSUBASA — VSC documentation 1.0 documentation". vlaams-supercomputing-centrum-vscdocumentation.readthedocs-hosted.com. Retrieved 2020-07-02. https://vlaams-supercomputing-centrum-vscdocumentation.readthedocs-hosted.com/en/latest/antwerp/uantwerp_SX_Aurara_TSUBASA.html ↩
"A Look at NEC's Latest Vector Processor, the SX-Aurora". WikiChip Fuse. 2018-12-09. Retrieved 2020-08-27. https://fuse.wikichip.org/news/1833/a-look-at-necs-latest-vector-processor-the-sx-aurora/ ↩
"NEC SX Aurora TSUBASA — VSC documentation 1.0 documentation". vlaams-supercomputing-centrum-vscdocumentation.readthedocs-hosted.com. Retrieved 2020-08-27. https://vlaams-supercomputing-centrum-vscdocumentation.readthedocs-hosted.com/en/latest/antwerp/uantwerp_SX_Aurara_TSUBASA.html ↩
"NEC SX-Aurora TSUBASA Documentation". https://www.hpc.nec/documentation ↩
"NEC SX-Aurora TSUBASA Vector System". Rechenzentrum der CAU. Retrieved 2020-08-27. https://www.rz.uni-kiel.de/en/our-portfolio/hiperf/nec-sx-aurora-tsubasa-vector-system ↩
"NEC MPI User's Guide". https://www.hpc.nec/documents/mpi/NEC_MPI_User_Guide_en/chap1.html ↩
"SX-Aurora/veoffload". GitHub. Retrieved 2018-03-21. https://github.com/SX-Aurora/veoffload ↩