Chapter 8.4 : The vectorization of reduction with intrinsic functions
During this section, we will use :
- Inclusion of file immintrin.h
- Intrinsic function : _mm256_load_ps
- Intrinsic function : _mm256_store_ps
- Intrinsic function : _mm256_storeu_ps (variant of the _mm256_store_ps to store values in unaligned vector)
- Intrinsic function : _mm256_add_ps
- Intrinsic function : _mm256_broadcast_ss (to duplicate a float 8 times in a vectorial register)
- Enable specific optimisations with -O3 -march=native -mtune=native -mavx2