3.6.3.2 : The hadamard_product function

We do not need to specify the alignement or to use __restrict__ because we force the compiler to do want we want :
1
2
3
4
5
6
7
8
9
10
void hadamard_product(float* tabResult, const float* tabX, const float* tabY, long unsigned int nbElement){
	long unsigned int vecSize(VECTOR_ALIGNEMENT/sizeof(float));
	long unsigned int nbVec(nbElement/vecSize);
	for(long unsigned int i(0lu); i < nbVec; ++i){
		__m256 vecX = _mm256_load_ps(tabX + i*vecSize);
		__m256 vecY = _mm256_load_ps(tabY + i*vecSize);
		__m256 vecRes = _mm256_mul_ps(vecX, vecY);
		_mm256_store_ps(tabResult + i*vecSize, vecRes);
	}
}


Remember, if you do NOT provide aligned data to this kernel you will have a segmentation fault error.