CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them. CuPy shares the same API set as NumPy and SciPy, allowing it to be a drop-in replacement to run NumPy/SciPy code on GPU. CuPy supports Nvidia CUDA GPU platform, and AMD ROCm GPU platform starting in v9.0.
CuPy has been initially developed as a backend of Chainer deep learning framework, and later established as an independent project in 2017.
CuPy is a part of the NumPy ecosystem array libraries and is widely adopted to utilize GPU with Python, especially in high-performance computing environments such as Summit, Perlmutter, EULER, and ABCI.
CuPy is a NumFOCUS sponsored project.
Features
CuPy implements NumPy/SciPy-compatible APIs, as well as features to write user-defined GPU kernels or access low-level APIs.1213
NumPy-compatible APIs
The same set of APIs defined in the NumPy package (numpy.*) are available under cupy.* package.
- Multi-dimensional array (cupy.ndarray) for boolean, integer, float, and complex data types
- Module-level functions
- Linear algebra functions
- Fast Fourier transform
- Random number generator
SciPy-compatible APIs
The same set of APIs defined in the SciPy package (scipy.*) are available under cupyx.scipy.* package.
- Sparse matrices (cupyx.scipy.sparse.*_matrix) of CSR, COO, CSC, and DIA format
- Discrete Fourier transform
- Advanced linear algebra
- Multidimensional image processing
- Sparse linear algebra
- Special functions
- Signal processing
- Statistical functions
User-defined GPU kernels
- Kernel templates for element-wise and reduction operations
- Raw kernel (CUDA C/C++)
- Just-in-time transpiler (JIT)
- Kernel fusion
Distributed computing
- Distributed communication package (cupyx.distributed), providing collective and peer-to-peer primitives
Low-level CUDA features
- Stream and event
- Memory pool
- Profiler
- Host API binding
- CUDA Python support14
Interoperability
- DLPack15
- CUDA Array Interface16
- NEP 13 (__array_ufunc__)17
- NEP 18 (__array_function__)1819
- Array API Standard2021
Examples
Array creation
>>> import cupy as cp >>> x = cp.array([1, 2, 3]) >>> x array([1, 2, 3]) >>> y = cp.arange(10) >>> y array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])Basic operations
>>> import cupy as cp >>> x = cp.arange(12).reshape(3, 4).astype(cp.float32) >>> x array([[ 0., 1., 2., 3.], [ 4., 5., 6., 7.], [ 8., 9., 10., 11.]], dtype=float32) >>> x.sum(axis=1) array([ 6., 22., 38.], dtype=float32)Raw CUDA C/C++ kernel
>>> import cupy as cp >>> kern = cp.RawKernel(r''' ... extern "C" __global__ ... void multiply_elemwise(const float* in1, const float* in2, float* out) { ... int tid = blockDim.x * blockIdx.x + threadIdx.x; ... out[tid] = in1[tid] * in2[tid]; ... } ... ''', 'multiply_elemwise') >>> in1 = cp.arange(16, dtype=cp.float32).reshape(4, 4) >>> in2 = cp.arange(16, dtype=cp.float32).reshape(4, 4) >>> out = cp.zeros((4, 4), dtype=cp.float32) >>> kern((4,), (4,), (in1, in2, out)) # grid, block and arguments >>> out array([[ 0., 1., 4., 9.], [ 16., 25., 36., 49.], [ 64., 81., 100., 121.], [144., 169., 196., 225.]], dtype=float32)Applications
- spaCy2223
- XGBoost24
- turboSETI (Berkeley SETI)25
- NVIDIA RAPIDS26272829
- einops3031
- scikit-learn32
- MONAI
- Chainer33
See also
- Free software portal
External links
References
Okuta, Ryosuke; Unno, Yuya; Nishino, Daisuke; Hido, Shohei; Loomis, Crissman (2017). CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations (PDF). Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS). http://learningsys.org/nips17/assets/papers/paper_16.pdf ↩
"CuPy 9.0 Brings AMD GPU Support To This Numpy-Compatible Library - Phoronix". Phoronix. 29 April 2021. Retrieved 21 June 2022. https://www.phoronix.com/scan.php?page=news_item&px=CuPy-9.0-Released ↩
"AMD Leads High Performance Computing Towards Exascale and Beyond". 28 June 2021. Retrieved 21 June 2022. Most recently, CuPy, an open-source array library with Python, has expanded its traditional GPU support with the introduction of version 9.0 that now offers support for the ROCm stack for GPU-accelerated computing. https://ir.amd.com/news-events/press-releases/detail/1012/amd-leads-high-performance-computing-towards-exascale-and ↩
"Preferred Networks released Version 2 of Chainer, an Open Source framework for Deep Learning - Preferred Networks, Inc". 2 June 2017. Retrieved 18 June 2022. https://www.preferred.jp/en/news/pr20170602/ ↩
"NumPy". numpy.org. Retrieved 21 June 2022. https://numpy.org/ ↩
Gorelick, Micha; Ozsvald, Ian (April 2020). High Performance Python: Practical Performant Programming for Humans (2nd ed.). O'Reilly Media, Inc. p. 190. ISBN 9781492055020. 9781492055020 ↩
Oak Ridge Leadership Computing Facility. "Installing CuPy". OLCF User Documentation. Retrieved 21 June 2022. /wiki/Oak_Ridge_Leadership_Computing_Facility ↩
National Energy Research Scientific Computing Center. "Using Python on Perlmutter". NERSC Documentation. Retrieved 21 June 2022. /wiki/National_Energy_Research_Scientific_Computing_Center ↩
ETH Zurich. "CuPy". ScientificComputing. Retrieved 21 June 2022. /wiki/ETH_Zurich ↩
National Institute of Advanced Industrial Science and Technology. "Chainer". ABCI 2.0 User Guide. Retrieved 21 June 2022. /wiki/National_Institute_of_Advanced_Industrial_Science_and_Technology ↩
"Sponsored Projects - NumFOCUS". Retrieved 8 September 2024. https://numfocus.org/sponsored-projects ↩
"Overview". CuPy documentation. Retrieved 18 June 2022. https://docs.cupy.dev/en/latest/overview.html ↩
"Comparison Table". CuPy documentation. Retrieved 18 June 2022. https://docs.cupy.dev/en/latest/reference/comparison.html ↩
"CUDA Python | NVIDIA Developer". Retrieved 21 June 2022. https://developer.nvidia.com/cuda-python ↩
"Welcome to DLPack's documentation!". DLPack 0.6.0 documentation. Retrieved 21 June 2022. https://dmlc.github.io/dlpack/latest/ ↩
"CUDA Array Interface (Version 3)". Numba 0.55.2+0.g2298ad618.dirty-py3.7-linux-x86_64.egg documentation. Retrieved 21 June 2022. https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html ↩
"NEP 13 — A mechanism for overriding Ufuncs — NumPy Enhancement Proposals". numpy.org. Retrieved 21 June 2022. https://numpy.org/neps/nep-0013-ufunc-overrides.html ↩
"NEP 18 — A dispatch mechanism for NumPy's high level array functions — NumPy Enhancement Proposals". numpy.org. Retrieved 21 June 2022. https://numpy.org/neps/nep-0018-array-function-protocol.html ↩
Charles R Harris; K. Jarrod Millman; Stéfan J. van der Walt; et al. (16 September 2020). "Array programming with NumPy" (PDF). Nature. 585 (7825): 357–362. arXiv:2006.10256. doi:10.1038/S41586-020-2649-2. ISSN 1476-4687. PMC 7759461. PMID 32939066. Wikidata Q99413970. https://www.nature.com/articles/s41586-020-2649-2.pdf ↩
"2021 report - Python Data APIs Consortium" (PDF). Retrieved 21 June 2022. https://data-apis.org/files/2021_annual_report_DataAPIs_Consortium.pdf ↩
"Purpose and scope". Python array API standard 2021.12 documentation. Retrieved 21 June 2022. https://data-apis.org/array-api/latest/purpose_and_scope.html ↩
"Install spaCy". spaCy Usage Documentation. Retrieved 21 June 2022. https://spacy.io/usage#gpu ↩
Patel, Ankur A.; Arasanipalai, Ajay Uppili (May 2021). Applied Natural Language Processing in the Enterprise (1st ed.). O'Reilly Media, Inc. p. 68. ISBN 9781492062578. 9781492062578 ↩
"Python Package Introduction". xgboost 1.6.1 documentation. Retrieved 21 June 2022. https://xgboost.readthedocs.io/en/stable/python/python_intro.html#data-interface ↩
"UCBerkeleySETI/turbo_seti: turboSETI -- python based SETI search algorithm". GitHub. Retrieved 21 June 2022. https://github.com/UCBerkeleySETI/turbo_seti#turbo_seti ↩
"Open GPU Data Science | RAPIDS". Retrieved 21 June 2022. https://rapids.ai/ ↩
"API Docs". RAPIDS Docs. Retrieved 21 June 2022. https://docs.rapids.ai/api ↩
"Efficient Data Sharing between CuPy and RAPIDS". Retrieved 21 June 2022. https://medium.com/rapids-ai/using-rapids-memory-manager-with-cupy-8d08fe8f58fa ↩
"10 Minutes to cuDF and CuPy". Retrieved 21 June 2022. https://medium.com/rapids-ai/10-minutes-to-cudf-and-cupy-e131cac0439b ↩
Alex, Rogozhnikov (2022). Einops: Clear and Reliable Tensor Manipulations with Einstein-like Notation. International Conference on Learning Representations. https://openreview.net/forum?id=oapKSVM2bcj ↩
"arogozhnikov/einops: Deep learning operations reinvented (for pytorch, tensorflow, jax and others)". GitHub. Retrieved 21 June 2022. https://github.com/arogozhnikov/einops ↩
"Array API support (experimental) — scikit-learn documentation". Retrieved 8 September 2024. https://scikit-learn.org/stable/modules/array_api.html ↩
Tokui, Seiya; Okuta, Ryosuke; Akiba, Takuya; Niitani, Yusuke; Ogawa, Toru; Saito, Shunta; Suzuki, Shuji; Uenishi, Kota; Vogel, Brian; Vincent, Hiroyuki Yamazaki (2019). Chainer: A Deep Learning Framework for Accelerating the Research Cycle. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. doi:10.1145/3292500.3330756. https://dl.acm.org/doi/10.1145/3292500.3330756 ↩