public marks

PUBLIC MARKS with tags SSE2 & optimization

27 February 2008 12:45

Simple SSE optimized sin, cos, log and exp

by ogrisel
I chose to write them in pure SSE1 MMX so that they run on the pentium III of your grand mother, and also on my brave athlon-xp, since thoses beast are not SSE2 aware. Intel AMath showed me that the performance gain for using SSE2 for that purpose was not large enough (10%) to consider providing an SSE2 version (but it can be done very quickly). The functions use only the _mm_ intrinsics , there is no inline assembly in the code. Advantage: easier to debug, works out of the box on 64 bit setups, let the compiler choose what should be stored in a register, and what is stored in memory. Inconvenient: some versions of gcc 3.x are badly broken with certain intrinsic functions ( _mm_movehl_ps , _mm_cmpeq_ps etc). Mingw's gcc for example -- beware that the brokeness is dependent on the optimization level. A workaround is provided (inline asm replacement for the braindead intrinsics), it is not nice but robust, and broken compilers are detected by the validation program below.

PUBLIC TAGS related to tag SSE2

cos +   exp +   log +   open source +   optimization +   SIMD +   sin +   tan +   trancendental functions +  

Active users

ogrisel
last mark : 27/02/2008 12:53