<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for hilbert-space</title>
	<atom:link href="http://hilbert-space.de/?feed=comments-rss2" rel="self" type="application/rss+xml" />
	<link>http://hilbert-space.de</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Wed, 16 May 2012 18:04:27 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>Comment on Compiling CMEM for the Beagleboard&#8230; by terpaccount</title>
		<link>http://hilbert-space.de/?p=14&#038;cpage=1#comment-1149</link>
		<dc:creator>terpaccount</dc:creator>
		<pubDate>Wed, 16 May 2012 18:04:27 +0000</pubDate>
		<guid isPermaLink="false">http://hilbert-space.de/?p=14#comment-1149</guid>
		<description>2012 and this is still useful.  Very nice, thanks.</description>
		<content:encoded><![CDATA[<p>2012 and this is still useful.  Very nice, thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on ARM NEON Optimization. An Example by Mac OSX and Assembly Programming &#124; Alauda Projects</title>
		<link>http://hilbert-space.de/?p=22&#038;cpage=1#comment-1141</link>
		<dc:creator>Mac OSX and Assembly Programming &#124; Alauda Projects</dc:creator>
		<pubDate>Mon, 30 Apr 2012 09:21:16 +0000</pubDate>
		<guid isPermaLink="false">http://hilbert-space.de/?p=22#comment-1141</guid>
		<description>[...] An example using NEON for optimization at Hilbert-Space.de, [...]</description>
		<content:encoded><![CDATA[<p>[...] An example using NEON for optimization at Hilbert-Space.de, [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on DSP default cache-sizes not optimal? by Abhijeet</title>
		<link>http://hilbert-space.de/?p=77&#038;cpage=1#comment-1065</link>
		<dc:creator>Abhijeet</dc:creator>
		<pubDate>Fri, 09 Dec 2011 13:43:05 +0000</pubDate>
		<guid isPermaLink="false">http://hilbert-space.de/?p=77#comment-1065</guid>
		<description>Hi, 

Can you please let me know what all modifications can one make to the cache inside the BeagleBoard. I am interested in checking for cache optimization specifically for Set Top Boxes but do not have enough information available to go ahead. I already have a beagleboard.

Any inputs or pointers will be welcome. 

Thanks, 
Abhijeet</description>
		<content:encoded><![CDATA[<p>Hi, </p>
<p>Can you please let me know what all modifications can one make to the cache inside the BeagleBoard. I am interested in checking for cache optimization specifically for Set Top Boxes but do not have enough information available to go ahead. I already have a beagleboard.</p>
<p>Any inputs or pointers will be welcome. </p>
<p>Thanks,<br />
Abhijeet</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on ARM NEON Optimization. An Example by sophana</title>
		<link>http://hilbert-space.de/?p=22&#038;cpage=1#comment-1050</link>
		<dc:creator>sophana</dc:creator>
		<pubDate>Wed, 05 Oct 2011 09:04:42 +0000</pubDate>
		<guid isPermaLink="false">http://hilbert-space.de/?p=22#comment-1050</guid>
		<description>As said earlier, the bug was filed in gcc bugzilla.
I tested 4.6.1, the generate assembly was much better but still suboptimal.
latest 4.6 snapshot: same result.

but latest 4.7 snapshot generates optimal code.

tested with android on a samsung galaxy s2: the c neon code is 4 times better than the C reference</description>
		<content:encoded><![CDATA[<p>As said earlier, the bug was filed in gcc bugzilla.<br />
I tested 4.6.1, the generate assembly was much better but still suboptimal.<br />
latest 4.6 snapshot: same result.</p>
<p>but latest 4.7 snapshot generates optimal code.</p>
<p>tested with android on a samsung galaxy s2: the c neon code is 4 times better than the C reference</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on ARM NEON Optimization. An Example by Tom</title>
		<link>http://hilbert-space.de/?p=22&#038;cpage=1#comment-1042</link>
		<dc:creator>Tom</dc:creator>
		<pubDate>Fri, 26 Aug 2011 15:44:12 +0000</pubDate>
		<guid isPermaLink="false">http://hilbert-space.de/?p=22#comment-1042</guid>
		<description>Hi all!

I have a question on the timings discussed here:
I took all implementations from here and from &quot;pulsar&quot; (incl. the &quot;16 pixels at a time&quot; version) 
and compared the runtime on a 600Mhz ARM with NEON and 256k L2 cache.
If I assured, none of the image data has been used before (e.g. is not cached), I indeed got the same order of execution times
for the different implementations, however the gain was significantly lower than the values discussed here.
It was something about the &quot;pulsar 16 pixel&quot;-implementation (compare comment 24 on this page) being 15% faster than the &quot;fastest&quot; implementation on this page, for instance.
(whereas Etienne claims it to be twice as fast)

Using a small image size and repeating the calculation (=L2 cache should work perfectly),
the differences in the execution time compare better to the expectation mentioned here.

Am I doing anything wrong?</description>
		<content:encoded><![CDATA[<p>Hi all!</p>
<p>I have a question on the timings discussed here:<br />
I took all implementations from here and from &#8220;pulsar&#8221; (incl. the &#8220;16 pixels at a time&#8221; version)<br />
and compared the runtime on a 600Mhz ARM with NEON and 256k L2 cache.<br />
If I assured, none of the image data has been used before (e.g. is not cached), I indeed got the same order of execution times<br />
for the different implementations, however the gain was significantly lower than the values discussed here.<br />
It was something about the &#8220;pulsar 16 pixel&#8221;-implementation (compare comment 24 on this page) being 15% faster than the &#8220;fastest&#8221; implementation on this page, for instance.<br />
(whereas Etienne claims it to be twice as fast)</p>
<p>Using a small image size and repeating the calculation (=L2 cache should work perfectly),<br />
the differences in the execution time compare better to the expectation mentioned here.</p>
<p>Am I doing anything wrong?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on ARM NEON Optimization. An Example by Ankit</title>
		<link>http://hilbert-space.de/?p=22&#038;cpage=1#comment-989</link>
		<dc:creator>Ankit</dc:creator>
		<pubDate>Fri, 01 Jul 2011 09:50:30 +0000</pubDate>
		<guid isPermaLink="false">http://hilbert-space.de/?p=22#comment-989</guid>
		<description>Hi everybody,

I am new in the field of playing with assembly language and thus challenging the optimization of professional Neon Compilers. Here are my questions?
1)Suppose if I have a C Code (e.g. fft.c), then how can I view its Neon optimized assembly code using Real View Development System? A command line syntax can also work, but for windows only.
2)Is there any general guideline for converting C codes into their corresponding Neon assembly counterparts?

Pardon me for my unprofessional way of asking questions, but it would be very gracious of all of you people if anyone can answer my queries ASAP.

Thank You</description>
		<content:encoded><![CDATA[<p>Hi everybody,</p>
<p>I am new in the field of playing with assembly language and thus challenging the optimization of professional Neon Compilers. Here are my questions?<br />
1)Suppose if I have a C Code (e.g. fft.c), then how can I view its Neon optimized assembly code using Real View Development System? A command line syntax can also work, but for windows only.<br />
2)Is there any general guideline for converting C codes into their corresponding Neon assembly counterparts?</p>
<p>Pardon me for my unprofessional way of asking questions, but it would be very gracious of all of you people if anyone can answer my queries ASAP.</p>
<p>Thank You</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on ARM NEON Optimization. An Example by srinivas</title>
		<link>http://hilbert-space.de/?p=22&#038;cpage=1#comment-980</link>
		<dc:creator>srinivas</dc:creator>
		<pubDate>Mon, 06 Jun 2011 11:32:52 +0000</pubDate>
		<guid isPermaLink="false">http://hilbert-space.de/?p=22#comment-980</guid>
		<description>Thanks Nils :)</description>
		<content:encoded><![CDATA[<p>Thanks Nils <img src='http://hilbert-space.de/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on ARM NEON Optimization. An Example by Nils</title>
		<link>http://hilbert-space.de/?p=22&#038;cpage=1#comment-978</link>
		<dc:creator>Nils</dc:creator>
		<pubDate>Sun, 05 Jun 2011 09:19:07 +0000</pubDate>
		<guid isPermaLink="false">http://hilbert-space.de/?p=22#comment-978</guid>
		<description>The vld and vst instructions increment the pointers as a side-effect. That&#039;s what the &#039;!&#039; char does in the instruction.</description>
		<content:encoded><![CDATA[<p>The vld and vst instructions increment the pointers as a side-effect. That&#8217;s what the &#8216;!&#8217; char does in the instruction.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on ARM NEON Optimization. An Example by srinivas</title>
		<link>http://hilbert-space.de/?p=22&#038;cpage=1#comment-976</link>
		<dc:creator>srinivas</dc:creator>
		<pubDate>Sat, 04 Jun 2011 19:23:06 +0000</pubDate>
		<guid isPermaLink="false">http://hilbert-space.de/?p=22#comment-976</guid>
		<description>Hi,
I&#039;m new to assembly programming and I tried to understand by converting each intrinsic into ARM assembly. I cannot find any corresponding assembly instructions for &quot;src  += 8*3; dest += 8;&quot; in rgb_to_gray.s (although the assembly file works as expected when executed)

Can someone kindly throw some light on it.</description>
		<content:encoded><![CDATA[<p>Hi,<br />
I&#8217;m new to assembly programming and I tried to understand by converting each intrinsic into ARM assembly. I cannot find any corresponding assembly instructions for &#8220;src  += 8*3; dest += 8;&#8221; in rgb_to_gray.s (although the assembly file works as expected when executed)</p>
<p>Can someone kindly throw some light on it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on ARM NEON Optimization. An Example by Michael</title>
		<link>http://hilbert-space.de/?p=22&#038;cpage=1#comment-973</link>
		<dc:creator>Michael</dc:creator>
		<pubDate>Wed, 25 May 2011 09:41:37 +0000</pubDate>
		<guid isPermaLink="false">http://hilbert-space.de/?p=22#comment-973</guid>
		<description>hi, mike
I can&#039;t enable NEON in vs2008 with Compact 7, do you know how to do this?
Also, I can&#039;t compile your sample NEON code successfully in vs2008 with Compact 7,
Do you know how to enable NEON function?</description>
		<content:encoded><![CDATA[<p>hi, mike<br />
I can&#8217;t enable NEON in vs2008 with Compact 7, do you know how to do this?<br />
Also, I can&#8217;t compile your sample NEON code successfully in vs2008 with Compact 7,<br />
Do you know how to enable NEON function?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

