<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel rdf:about="http://blogmarks.net/api/marks/tag/trancendental functions">
<title>Public marks with tag &quot;trancendental functions&quot;</title>
<description>Public marks with tag &quot;trancendental functions&quot;</description>
<link>http://blogmarks.net/marks/tag/trancendental functions</link>
<items><rdf:Seq><rdf:li resource="http://blogmarks.net/api/user/ogrisel/mark/1057713990"/>
</rdf:Seq></items>
</channel>
<item rdf:about="http://blogmarks.net/api/user/ogrisel/mark/1057713990">
<title>Simple SSE optimized sin, cos, log and exp</title>
<link>http://gruntthepeon.free.fr/ssemath/</link>
<description>I chose to write them in pure SSE1 MMX so that they run on the pentium III of your grand mother, and also on my brave athlon-xp, since thoses beast are not SSE2 aware. Intel AMath showed me that the performance gain for using SSE2 for that purpose was not large enough (10%) to consider providing an SSE2 version (but it can be done very quickly).

The functions use only the _mm_ intrinsics , there is no inline assembly in the code. Advantage: easier to debug, works out of the box on 64 bit setups, let the compiler choose what should be stored in a register, and what is stored in memory. Inconvenient: some versions of gcc 3.x are badly broken with certain intrinsic functions ( _mm_movehl_ps , _mm_cmpeq_ps etc). Mingw's gcc for example -- beware that the brokeness is dependent on the optimization level. A workaround is provided (inline asm replacement for the braindead intrinsics), it is not nice but robust, and broken compilers are detected by the validation program below.</description>
<dc:date>2008-02-27T12:53:10Z</dc:date>
<dc:author>ogrisel</dc:author>
<dc:subject>optimization, SIMD, trancendental functions, exp, log, sin, cos, tan, SSE2, open source</dc:subject>
<content:encoded><![CDATA[<div class="mark">
<a href="http://gruntthepeon.free.fr/ssemath/"><img border="0" src="http://blogmarks.net/screenshots/2008/02/27/c513566d9cae6d163c7bbf37c47f2df9.jpg" alt="" /></a>
<div class="xfolkentry">
<h4><a class="taggedlink" href="http://gruntthepeon.free.fr/ssemath/">Simple SSE optimized sin, cos, log and exp</a></h4>
 
by <a href="http://blogmarks.net/user/ogrisel">ogrisel</a> 
<p class="description">I chose to write them in pure SSE1 MMX so that they run on the pentium III of your grand mother, and also on my brave athlon-xp, since thoses beast are not SSE2 aware. Intel AMath showed me that the performance gain for using SSE2 for that purpose was not large enough (10%) to consider providing an SSE2 version (but it can be done very quickly).

The functions use only the _mm_ intrinsics , there is no inline assembly in the code. Advantage: easier to debug, works out of the box on 64 bit setups, let the compiler choose what should be stored in a register, and what is stored in memory. Inconvenient: some versions of gcc 3.x are badly broken with certain intrinsic functions ( _mm_movehl_ps , _mm_cmpeq_ps etc). Mingw's gcc for example -- beware that the brokeness is dependent on the optimization level. A workaround is provided (inline asm replacement for the braindead intrinsics), it is not nice but robust, and broken compilers are detected by the validation program below.</p>
<p class="tags">
<a rel="tag" class="tag public_tag" href="http://blogmarks.net/marks/tag/optimization">optimization</a>
<a rel="tag" class="tag public_tag" href="http://blogmarks.net/marks/tag/SIMD">SIMD</a>
<a rel="tag" class="tag public_tag" href="http://blogmarks.net/marks/tag/trancendental%2Bfunctions">trancendental functions</a>
<a rel="tag" class="tag public_tag" href="http://blogmarks.net/marks/tag/exp">exp</a>
<a rel="tag" class="tag public_tag" href="http://blogmarks.net/marks/tag/log">log</a>
<a rel="tag" class="tag public_tag" href="http://blogmarks.net/marks/tag/sin">sin</a>
<a rel="tag" class="tag public_tag" href="http://blogmarks.net/marks/tag/cos">cos</a>
<a rel="tag" class="tag public_tag" href="http://blogmarks.net/marks/tag/tan">tan</a>
<a rel="tag" class="tag public_tag" href="http://blogmarks.net/marks/tag/SSE2">SSE2</a>
<a rel="tag" class="tag public_tag" href="http://blogmarks.net/marks/tag/open%2Bsource">open source</a>
</p>
<div class="action-bar">
<a href="http://blogmarks.net/my/marks,new?id=1057713990">Copy</a> | 
<a href="http://blogmarks.net/link/2684788">React (0)</a></div>
</div>
</div>
]]></content:encoded>
</item> </rdf:RDF>