Previous The CMakeLists.txt file |
Parent Let's swap the loops over j and k |
Outline | Next The performances |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
$ make -- Configuring done -- Generating done -- Build files have been written to: ExampleOptimisation/build [ 2%] Built target hadamard_product_O2 [ 4%] Built target hadamard_product_O1 [ 6%] Built target hadamard_product_vectorize [ 8%] Built target hadamard_product_O0 [ 8%] Built target hadamard_product_O3 [ 10%] Built target hadamard_product_Ofast [ 13%] Built target hadamard_product_intrinsics [ 15%] Built target asterics_hpc [ 17%] Built target saxpy_O2 [ 17%] Built target saxpy_O0 [ 19%] Built target saxpy_O3 [ 21%] Built target saxpy_O1 [ 23%] Built target saxpy_Ofast [ 26%] Built target saxpy_vectorize [ 26%] Built target saxpy_intrinsics [ 28%] Built target reduction_real_O2 [ 30%] Built target reduction_real_intrinsics_interleave8_O3 [ 34%] Built target reduction_real_O1 [ 36%] Built target reduction_real_Ofast [ 39%] Built target reduction_O0 [ 41%] Built target reduction_O1 [ 41%] Built target reduction_O2 [ 43%] Built target reduction_O3 [ 45%] Built target reduction_real_intrinsics_interleave4_O3 [ 47%] Built target reduction_real_vectorize_Ofast [ 52%] Built target reduction_real_intrinsics_interleave2_O3 [ 54%] Built target reduction_real_intrinsics_O3 [ 56%] Built target reduction_real_O3 [ 58%] Built target reduction_real_O0 [ 60%] Built target reduction_real_vectorize_O3 [ 63%] Built target barycentre_intrinsics [ 65%] Built target barycentre_base_O2 [ 67%] Built target barycentre_base_O1 [ 69%] Built target barycentre_base_O0 [ 71%] Built target barycentre_vectorizeSplit_O3 [ 76%] Built target barycentre_base_Ofast [ 78%] Built target barycentre_base_O3 [ 80%] Built target barycentre_vectorize_O3 [ 84%] Built target sgemm_base_O1 [ 86%] Built target sgemm_base_Ofast [ 89%] Built target sgemm_base_O3 [ 91%] Built target sgemm_base_O0 Scanning dependencies of target sgemm_swap_Ofast [ 93%] Building CXX object 6-Sgemm/CMakeFiles/sgemm_swap_Ofast.dir/sgemm_swap.cpp.o [ 93%] Building CXX object 6-Sgemm/CMakeFiles/sgemm_swap_Ofast.dir/main_sgemm_swap.cpp.o [ 95%] Linking CXX executable sgemm_swap_Ofast [ 95%] Built target sgemm_swap_Ofast Scanning dependencies of target sgemm_swap_O3 [ 97%] Building CXX object 6-Sgemm/CMakeFiles/sgemm_swap_O3.dir/sgemm_swap.cpp.o [ 97%] Building CXX object 6-Sgemm/CMakeFiles/sgemm_swap_O3.dir/main_sgemm_swap.cpp.o [ 97%] Linking CXX executable sgemm_swap_O3 [ 97%] Built target sgemm_swap_O3 [100%] Built target sgemm_base_O2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
$ make plot_all
[ 1%] Built target asterics_hpc
[ 2%] Built target sgemm_swap_O3
[ 3%] Built target sgemm_base_Ofast
[ 4%] Built target sgemm_base_O3
[ 7%] Built target sgemm_swap_Ofast
Scanning dependencies of target plot_sgemmSwap
[ 7%] Run sgemm_swap_Ofast program
SGEMM Swap
evaluateSgemm : nbElement = 10, cyclePerElement = 15.69 cy/el, elapsedTime = 1569 cy
evaluateSgemm : nbElement = 20, cyclePerElement = 16.23 cy/el, elapsedTime = 6492 cy
evaluateSgemm : nbElement = 30, cyclePerElement = 23.3 cy/el, elapsedTime = 20970 cy
evaluateSgemm : nbElement = 50, cyclePerElement = 30.3024 cy/el, elapsedTime = 75756 cy
evaluateSgemm : nbElement = 80, cyclePerElement = 37.2161 cy/el, elapsedTime = 238183 cy
evaluateSgemm : nbElement = 100, cyclePerElement = 45.8609 cy/el, elapsedTime = 458609 cy
[ 8%] Run sgemm_swap_O3 program
SGEMM Swap
evaluateSgemm : nbElement = 10, cyclePerElement = 14.94 cy/el, elapsedTime = 1494 cy
evaluateSgemm : nbElement = 20, cyclePerElement = 16.95 cy/el, elapsedTime = 6780 cy
evaluateSgemm : nbElement = 30, cyclePerElement = 22.7011 cy/el, elapsedTime = 20431 cy
evaluateSgemm : nbElement = 50, cyclePerElement = 28.504 cy/el, elapsedTime = 71260 cy
evaluateSgemm : nbElement = 80, cyclePerElement = 36.1014 cy/el, elapsedTime = 231049 cy
evaluateSgemm : nbElement = 100, cyclePerElement = 45.3551 cy/el, elapsedTime = 453551 cy
[ 8%] Call gnuplot sgemmSwap
[ 9%] Built target plot_sgemmSwap
[ 10%] Built target hadamard_product_intrinsics
[ 12%] Built target hadamard_product_vectorize
[ 12%] Built target hadamard_product_O3
[ 14%] Built target plot_hadamardIntrinsics
[ 15%] Built target hadamard_product_Ofast
[ 17%] Built target hadamard_product_O2
[ 18%] Built target hadamard_product_O1
[ 19%] Built target hadamard_product_O0
[ 21%] Built target plot_hadamardBase
[ 23%] Built target plot_hadamardVectorize
[ 23%] Built target saxpy_intrinsics
[ 24%] Built target saxpy_O3
[ 25%] Built target saxpy_vectorize
[ 28%] Built target plot_saxpyIntrinsics
[ 29%] Built target plot_saxpyVectorize
[ 30%] Built target saxpy_Ofast
[ 31%] Built target saxpy_O2
[ 31%] Built target saxpy_O0
[ 32%] Built target saxpy_O1
[ 35%] Built target plot_saxpyBase
[ 36%] Built target reduction_real_intrinsics_O3
[ 37%] Built target reduction_real_intrinsics_interleave8_O3
[ 39%] Built target reduction_real_Ofast
[ 40%] Built target reduction_real_intrinsics_interleave4_O3
[ 41%] Built target reduction_real_vectorize_Ofast
[ 43%] Built target reduction_real_intrinsics_interleave2_O3
[ 47%] Built target plot_reductionIntrinsicsInterleave8
[ 48%] Built target reduction_real_vectorize_O3
[ 50%] Built target reduction_real_O3
[ 52%] Built target plot_reductionVectorize
[ 53%] Built target reduction_O3
[ 54%] Built target reduction_O0
[ 56%] Built target reduction_O1
[ 56%] Built target reduction_O2
[ 58%] Built target plot_reductionBase
[ 59%] Built target reduction_real_O0
[ 60%] Built target reduction_real_O2
[ 63%] Built target reduction_real_O1
[ 67%] Built target plot_reductionReal
[ 69%] Built target plot_reductionIntrinsicsInterleave2
[ 71%] Built target plot_reductionIntrinsicsInterleave4
[ 74%] Built target plot_reductionIntrinsics
[ 75%] Built target barycentre_vectorize_O3
[ 76%] Built target barycentre_intrinsics
[ 78%] Built target barycentre_vectorizeSplit_O3
[ 79%] Built target barycentre_base_O3
[ 81%] Built target plot_barycentreIntrinsics
[ 82%] Built target barycentre_base_O2
[ 84%] Built target barycentre_base_O1
[ 85%] Built target barycentre_base_O0
[ 87%] Built target barycentre_base_Ofast
[ 90%] Built target plot_barycentreBase
[ 92%] Built target plot_barycentreVectorize
[ 93%] Built target sgemm_base_O2
[ 96%] Built target sgemm_base_O1
[ 97%] Built target sgemm_base_O0
[100%] Built target plot_sgemmBase
[100%] Built target plot_all
|
Previous The CMakeLists.txt file |
Parent Let's swap the loops over j and k |
Outline | Next The performances |