Permute instructions occur in both scalar processors as well as vector processing engines as well as GPUs. In vector instruction sets they are typically named "Register Gather/Scatter" operations such as in RISC-V vectors,2 and take Vectors as input for both source elements and source array, and output another Vector.
In scalar instruction sets the scalar registers are broken down into smaller sections (unpacked, SIMD style) where the fragments are then used as array sources. The (small, partial) results are then concatenated (packed) back into the scalar register as the result.
Some ISAs, particularly for cryptographic applications, even have bit-level permute operations, such as bdep (bit deposit) in RISC-V bitmanip;3 in the Power ISA it is known as bpermd and has been included for several decades, and is still in the Power ISA v.3.0 B spec.4
Also in some non-vector ISAs, due to there sometimes being insufficient space in the one source input register to specify the permutation source array in full (particularly if the operation involves bit-level permutation), will include partial reordering instructions. Examples include VSHUFF32x4 from AVX-512.
Permute operations in different forms are surprisingly common, occurring in AltiVec, Power ISA, PowerPC G4, AVX-512, SVE2,5 vector processors, and GPUs. They are sufficiently important that LLVM added the shufflevector6 intrinsic and GCC added the __builtin_shuffle intrinsic.7 GCC's intrinsic matches the functionality of OpenCL's shuffle intrinsics.8 Note that all of these, mathematically, are not permutations because duplicates can occur in the output.
Intel® 64 and IA-32 architectures software developer's manual combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4 (PDF). Intel. June 2021. p. 5-356 Vol. 2C. https://software.intel.com/content/dam/develop/external/us/en/documents-tps/325462-sdm-vol-1-2abcd-3abcd.pdf ↩
"RISC-V "V" Vector Extension – 16.4. Vector Register Gather Instructions". GitHub – riscv/riscv-v-spec. Retrieved 2021-07-10. https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-register-gather-instructions ↩
"riscv/riscv-bitmanip". GitHub. Retrieved 2021-07-10. https://github.com/riscv/riscv-bitmanip ↩
"Power ISA Version 3.0 B". Power.org. 2017-03-27. Retrieved 2019-08-11. https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 ↩
ARM HPC, SVE2 Extension summary, p32 https://indico.ph.ed.ac.uk/event/69/contributions/892/attachments/734/910/Arm_Exascale.pdf ↩
"LLVM 13 documentation: shufflevector". LLVM Language Reference Manual. Retrieved 2021-07-10. https://llvm.org/docs/LangRef.html#shufflevector-instruction ↩
"Vector Extensions (Using the GNU Compiler Collection (GCC))". GCC, the GNU Compiler Collection - GNU Project - Free Software Foundation (FSF). Retrieved 2021-07-10. https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html ↩
"OpenCL Specification: shuffle, shuffle2". The Khronos Group Inc. Retrieved 2021-07-10. https://www.khronos.org/registry/OpenCL/sdk/2.1/docs/man/xhtml/shuffle.html ↩