SIMD with OpenMP 4.0
OpenMP 4.0 introduces
#pragma omp simd and
#pragma omp declare simd which
instruct the compiler to issue vectorized code. This makes it possible
to combine vectorization (SIMD in-core parallelization) with parallel execution
over cores in a controlled manner.
You need at least gcc 4.9 for OpenMP 4.0 to be included. Even gcc 5.2 does not generate optimal code for our mandelbrot example. I guess this will change in future versions.
Intel version 16 does a nice job.