SIMD means Single Instruction on Multiple Data. It is a class of computers that are capable of performing the same operations on several data all at once. As an example such a 1 GHz computer is capable of doing the addition (or substraction, multication...) of four pairs of numbers in one nanosecond whereas a non-SIMD computer will need 4 nanoseconds, one for each of the 4 additions of the pairs of numbers.
NSIMD is a vectorization library that abstracts SIMD programming. It was designed to exploit the maximum power of processors at a low development cost.
To achieve maximum performance, NSIMD mainly relies on the inline optimization pass of the compiler. Therefore using any mainstream compiler such as GCC, Clang, MSVC, XL C/C++, ICC and others with NSIMD will give you a zero-cost SIMD abstraction library.
Most of the"library is open sourced on github and can be downloaded and tested at will thanks to its MIT license.
A small part of it is made of a proprietary binary at the price of 49.90 €/user
and can purshased at
store.agenium-scale.com . It contains among
- trigonometric functions
- inverse trigonometric functions
- hyperbolic functions
- inverse hyperbolic functions
- exponentials - logarithms
We have put NSIMD into GROMACS to demonstrate its potential. GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is heavely used in the HPC community to bench super computers and has became a reference in this area.
As GROMACS is already a fully optimized software our goal is to obtain similar running times and we do! It also prooves the claims of NSIMD, namely low development cost for high performences and portability. We have replaced nearly 11000 lines of GROMACS code by 4700 lines of NSIMD code.
We work for the french Army and use NSIMD as the base library for our neural networks inference engine. Its C++ API allows us to write all layer kernels once and have better performances than Caffe on Intel Workstations and Arm mobile devices (such as smartphones). We speed-up neural networks using quantizations and fixed-point arithmetic which are all supported by NSIMD.
Open Sourced version
A part of the library is open sourced on github (
A small part of it is made of a proprietary binary at the price of 49.90 €/user and can purshased at store.agenium-scale.com