<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>webr3.org &#187; optimization</title>
	<atom:link href="http://webr3.org/blog/category/optimization/feed/" rel="self" type="application/rss+xml" />
	<link>http://webr3.org/blog</link>
	<description>brain&#039;s on fire!</description>
	<lastBuildDate>Tue, 19 Jul 2011 15:38:29 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>A lighter way to configure Apache for FOAF+SSL</title>
		<link>http://webr3.org/blog/optimization/a-lighter-way-to-configure-apache-for-foafssl/</link>
		<comments>http://webr3.org/blog/optimization/a-lighter-way-to-configure-apache-for-foafssl/#comments</comments>
		<pubDate>Fri, 02 Apr 2010 17:31:00 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[linked data]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[Apache Corporation]]></category>
		<category><![CDATA[Computer networking]]></category>
		<category><![CDATA[Computing]]></category>
		<category><![CDATA[Cryptographic protocols]]></category>
		<category><![CDATA[Electronic commerce]]></category>
		<category><![CDATA[FOAF]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[Internet protocols]]></category>
		<category><![CDATA[Secure communication]]></category>
		<category><![CDATA[SSL]]></category>
		<category><![CDATA[Technology/Internet]]></category>
		<category><![CDATA[Transport Layer Security]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=303</guid>
		<description><![CDATA[Just a snippet post to say that I've found a lighter (and imho preferable) way to configure Apache to accept client side SSL certificates (with regards to FOAF+SSL).
The Standard Way
This way essentially exports all SSL data, certs, client and server side if you read the notes has performance penalty.

   SSLVerifyClient optional_no_ca
   [...]]]></description>
			<content:encoded><![CDATA[<p>Just a snippet post to say that I've found a lighter (and imho preferable) way to configure Apache to accept client side SSL certificates (with regards to FOAF+SSL).</p>
<p><strong>The Standard Way</strong><br />
This way essentially exports all SSL data, certs, client and server side if you read the notes has performance penalty.<br />
<code><br />
   SSLVerifyClient optional_no_ca<br />
   SSLVerifyDepth 1<br />
   SSLOptions +StdEnvVars<br />
   SSLOptions +ExportCertData<br />
</code></p>
<p><strong>The Lighter Way</strong><br />
This way simply passes in the SSL_CLIENT_CERT in to the env REMOTE_USER and skips the rest which you don't use (for FOAF+SSL).<br />
<code><br />
   SSLVerifyClient optional_no_ca<br />
   SSLVerifyDepth 1<br />
   SSLUserName SSL_CLIENT_CERT<br />
</code></p>
<p>Tested and works very nicely (again, imho).</p>
<p>note: Enabling SSLOptions +FakeBasicAuth will overwrite this with the Subject from the client side certificate.</p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/optimization/a-lighter-way-to-configure-apache-for-foafssl/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Flat Packing, the ultimate code optimization?</title>
		<link>http://webr3.org/blog/optimization/flat-packing-the-ultimate-code-optimization/</link>
		<comments>http://webr3.org/blog/optimization/flat-packing-the-ultimate-code-optimization/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 00:44:50 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[optimization]]></category>
		<category><![CDATA[ActionScript]]></category>
		<category><![CDATA[Class]]></category>
		<category><![CDATA[Compiler optimizations]]></category>
		<category><![CDATA[Curly bracket programming languages]]></category>
		<category><![CDATA[haXe]]></category>
		<category><![CDATA[Inline expansion]]></category>
		<category><![CDATA[Java programming language]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=147</guid>
		<description><![CDATA[I've been thinking about something recently, I guess this one's a big thing and hopefully somebody will go and implement this very soon.
A lot of work is always done on the code optimization side of things, and recently this has hit the ActionScript world; most notably (imho) with haxe and TAAS - both of these [...]]]></description>
			<content:encoded><![CDATA[<p>I've been thinking about something recently, I guess this one's a big thing and hopefully somebody will go and implement this very soon.</p>
<p>A lot of work is always done on the code optimization side of things, and recently this has hit the ActionScript world; most notably (imho) with haxe and TAAS - both of these things have a lot of optimizations improving the compile process, which means our code ultimately runs faster :) inlining, code reduction, dead code elimination, flow optimization etc.</p>
<p>Ultimately, the fastest compiled script will always be a flat set of vm operations with none of that expensive method calling and instantiation, and lets remember that OO is pretty much a syntax we use to develop, haxe makes it clear with its common syntax which compiles to multiple vm targets.</p>
<p>So, here's the background question - why do our actionscript classes compile to compiled classes?, all the code from those nested called methods could be factored right the way back in to one big method, and a single class, then heavily optimized - imagine the speed gains.</p>
<p>Thinking further, implementing doesn't appear too hard - i mean all your going to need to do is essentially copy the code from one method and place it in the calling method at the correct position(s); any duplicate code introduced would be offset easily by the amount of code cut out by removing all those un-needed classes and methods - further the unused classes and methods would be lost; and the final chunks of code could be optimized to hell and back.</p>
<p>The only problem i can see is that we need compiled classes to distribute or pull in remotely sometimes, which is why I'd propose a simple "Service" interface (or suchlike) the point of which is that any class marked as a service class (or gateway?) would keep its class structure and public methods.. after all everything behind the public methods is of no concern to any other part of the system.</p>
<p>If you consider a flex application for example, or that class which uses circa 5% of all the large libs included in the final compiled script.. not to mention the additional optimizations that could be made on the big blocks of code in each method!</p>
<p>So that's it, Flat Packing - the ultimate optimization?</p>
<p>ps: quite sure this applies to most OO languages not just actionscript ( haxe guys will agree I'm sure (I hope) ).</p>
<p>Regards!</p>
<p><img src="http://webr3.org/blog/wp-content/uploads/2009/09/flatpack.jpg" alt="flatpack" title="flatpack" width="600" height="250" class="alignnone size-full wp-image-148" /></p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/optimization/flat-packing-the-ultimate-code-optimization/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Optimized Flash Player 10 Z-Sorting Class</title>
		<link>http://webr3.org/blog/flash-10/optimized-flash-player-10-z-sorting-class/</link>
		<comments>http://webr3.org/blog/flash-10/optimized-flash-player-10-z-sorting-class/#comments</comments>
		<pubDate>Sun, 13 Sep 2009 21:54:57 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[Flash 10]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[Ralph Hauwert]]></category>
		<category><![CDATA[Vector]]></category>
		<category><![CDATA[z-sort]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=138</guid>
		<description><![CDATA[Following my earlier post on using vectors to Z-sort instead of arrays, I've updated Ralph Hauwerts SimpleZSorter util class released back in late 2008 on Lee Brimelows flash blog.
The change isn't a massive one, but does speed things up - simply it replaces the old:
important: turns it this runs slower if the z-val is outside [...]]]></description>
			<content:encoded><![CDATA[<p>Following my <a href="http://webr3.org/blog/flash-10/fast-z-sorting-in-flash-10/">earlier post on using vectors to Z-sort</a> instead of arrays, I've updated <a href="http://www.unitzeroone.com/blog/" target="_blank">Ralph Hauwert</a>s <a href="http://code.google.com/p/leebrimelow/source/browse/#svn/trunk/as3/com/theflashblog/util3d">SimpleZSorter util class</a> released back in late 2008 on <a href="http://theflashblog.com/?p=470" target="_blank">Lee Brimelows flash blog</a>.</p>
<p>The change isn't a massive one, <del datetime="2009-09-14T09:15:20+00:00">but does speed things up</del> - simply it replaces the old:</p>
<p><strong>important: turns it this runs slower if the z-val is outside of the range 0-1; a LOT slower, so don't use!</strong></p>
<pre class="brush: as3; title: ;">
sortArray.sortOn(&quot;screenZ&quot;, Array.NUMERIC | Array.DESCENDING);
</pre>
<p>with the new:</p>
<pre class="brush: as3; title: ;">
sortVector.sort( function compare( x : SimpleZSortVO , y : SimpleZSortVO ) : int {    return x.screenZ - y.screenZ; } );
</pre>
<p>and here's <a href="http://webr3.org/experiments/flash-10/fast-z-sort/SimpleZSorter.zip">the source all nicely zipped up</a>, you should be able to simply drop this in your fp10 swfs as a replacement for the older version,</p>
<p>Thanks to Ralph for one opensourcing it and two giving me a heads up that its okay to "upgrade", sure it can still be made faster still with a bit of thought.</p>
<p><img class="alignnone size-full wp-image-140" title="z-sort2" src="http://webr3.org/blog/wp-content/uploads/2009/09/z-sort2.jpg" alt="z-sort2" width="600" height="250" /></p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/flash-10/optimized-flash-player-10-z-sorting-class/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>PixelBender is more useful than I assumed!</title>
		<link>http://webr3.org/blog/general/pixelbender-is-more-useful-than-i-assumed/</link>
		<comments>http://webr3.org/blog/general/pixelbender-is-more-useful-than-i-assumed/#comments</comments>
		<pubDate>Sat, 18 Jul 2009 11:18:25 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[Flash 10]]></category>
		<category><![CDATA[PixelBender]]></category>
		<category><![CDATA[general]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[3D computer graphics]]></category>
		<category><![CDATA[Abstract algebra]]></category>
		<category><![CDATA[ActionScript]]></category>
		<category><![CDATA[Computer graphics]]></category>
		<category><![CDATA[haXe]]></category>
		<category><![CDATA[image processing algorithms]]></category>
		<category><![CDATA[Linear algebra]]></category>
		<category><![CDATA[Perlin Noise]]></category>
		<category><![CDATA[Pixel Bender]]></category>
		<category><![CDATA[Shader]]></category>
		<category><![CDATA[Technology/Internet]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=94</guid>
		<description><![CDATA[All too often it's too easy to overlook a technology, or indeed its uses, by mentaly boxing it in to the role it's advertised to fill.
PixelBender is one of these technologies, as advertised "You can use the Pixel Bender kernel language to implement image processing algorithms (filters or effects) in a hardware-independent manner"; however this [...]]]></description>
			<content:encoded><![CDATA[<p>All too often it's too easy to overlook a technology, or indeed its uses, by mentaly boxing it in to the role it's advertised to fill.</p>
<p>PixelBender is one of these technologies, as advertised "You can use the Pixel Bender kernel language to implement image processing algorithms (filters or effects) in a hardware-independent manner"; however this isn't the whole truth of the matter.</p>
<p>PixelBender is a highly under-rated and under-used technology imho. Simply put PixelBender is actually a very fast way of converting or manipulating large sets of numerical data, whether it's image based or not. In this post I'm only going to skim the surface and give some pointers.</p>
<p>The standard input types cover Vector, ByteArray and BitmapData, likewise with output data, additionally many types are supported for parameters, everything from ints up to 4x4 Matrix' of floats.</p>
<p>Some obvious and simple uses include:</p>
<ul>
<li>3D Data Manipulation (Float3 input and output - rgb or xyz)</li>
<li>Raw Sound Data Manipulation (from Sound.extract, you could even build an FFT implementation in PixelBender)</li>
<li>Visual Effects Calculations</li>
<li>Batch Processing and Data Transformation.</li>
</ul>
<p>Starting with Batch Processing and Data Transformation, here is the simplest PixelBender usage and example you could ever get.</p>
<h3>Problem</h3>
<p>Turn a BitmapData in to a Vector of RGB float values where each value is float &lt; 1 and each set of 3 values comprises a pixel.</p>
<p>Doing this with pure AS3 or haXe or anything else is quite intensive, then you add PixelBender in to the equation</p>
<p><strong>Solution</strong><br />
PixelBender Code:</p>
<pre>&lt;languageVersion : 1.0;&gt;
kernel Identity
&lt;   namespace : "org.webr3.pb";
 vendor : "WebR3";
 version : 1;
 description : "Converts BitmapData to Vector Number (rgb)";
&gt;
{
 input image3 src;
 output pixel3 dst;

 void evaluatePixel()
 {
 dst = sampleNearest( src, outCoord() );
 }
}</pre>
<p>And to use this we can create a simple wrapper class in AS3:</p>
<pre>package test
{
 import __AS3__.vec.Vector;

 import flash.display.BitmapData;
 import flash.display.Shader;
 import flash.display.ShaderJob;
 import flash.utils.ByteArray;

 public class ShaderVector
 {
 [Embed("/rgb.pbj", mimeType="application/octet-stream")]
 private var shaderClass : Class;

 private var shader : Shader;

 public function ShaderVector()
 {        
 shader = new Shader(new shaderClass() as ByteArray);
 }

 public function convert( img : BitmapData ) : Vector.&lt;Number&gt;
 {
 var output : Vector.&lt;Number&gt; = new Vector.&lt;Number&gt;( (img.width*img.height) * 3 , false );
 shader.data.src.input = img;
 var job : ShaderJob = new ShaderJob( shader , output , img.width , img.height );
 job.start( true );
 output.fixed = true;
 return output;
 }

 }
}</pre>
<p>Now we've got a kind of worker / utility class which we can farm all the hard work on to; usage is simple:</p>
<pre>var worker : ShaderVector = new ShaderVector();
var rgb : Vector.&lt;Number&gt; = worker.convert( someBitmapData );</pre>
<p>that's it, in one line we've solved the problem.</p>
<p>PixelBender is a lot easier than you may think, the above code is a small model you can easily manipulate just embed different pixel bender files.</p>
<p>Here's another simple kernel file; this one will take a BitmapData of perlin noise and convert it in to Vector of 3D positions for us to use.</p>
<p>It takes the bitmap data, converts every pixels r,g,b values in to float -0.5 to +0.5 values, then scales them up and finally returns back a vector of what are essentially 3D Perlin Noise Particles (like the ones in my <a href="http://webr3.org/blog/haxe/3d-perlin-particle-light-cloud-and-source-haxe-flash-10/">earlier experiments</a>)</p>
<pre>&lt;languageVersion : 1.0;&gt;
kernel Scaler
&lt;   namespace : "org.webr3.pb";
 vendor : "WebR3";
 version : 1;
 description : "For Perlin Noise, Converts BitmapData to 3D x,y,z positions";
&gt;
{
 input image3 src;
 output pixel3 dst;

 parameter float scaleX
 &lt;
 defaultValue : float(1.0);
 minValue     : float(0.1);
 maxValue     : float(100000.0);
 &gt;;

 parameter float scaleY
 &lt;
 defaultValue : float(1.0);
 minValue     : float(0.1);
 maxValue     : float(100000.0);
 &gt;;

 parameter float scaleZ
 &lt;
 defaultValue : float(1.0);
 minValue     : float(0.000001);
 maxValue     : float(100000.0);
 &gt;;

 void evaluatePixel()
 {
 dst = sampleNearest( src, outCoord() );
 dst -= 0.5;
 dst.r *= scaleX;
 dst.g *= scaleY;
 dst.b *= scaleZ;
 }
}</pre>
<p>and the modified shader vector, only 3 new lines to set up the scale values:</p>
<pre>package test
{
 import __AS3__.vec.Vector;

 import flash.display.BitmapData;
 import flash.display.Shader;
 import flash.display.ShaderJob;
 import flash.utils.ByteArray;

 public class ShaderVector
 {
 [Embed("/rgb.pbj", mimeType="application/octet-stream")]
 private var shaderClass : Class;

 private var shader : Shader;

 public function ShaderVector()
 {        
 shader = new Shader(new shaderClass() as ByteArray);
 shader.data.scaleX.value = [200];
 shader.data.scaleY.value = [200];
 shader.data.scaleZ.value = [200];
 }

 public function convert( img : BitmapData ) : Vector.&lt;Number&gt;
 {
 var output : Vector.&lt;Number&gt; = new Vector.&lt;Number&gt;( (img.width*img.height) * 3 , false );
 shader.data.src.input = img;
 var job : ShaderJob = new ShaderJob( shader , output , img.width , img.height );
 job.start( true );
 output.fixed = true;
 return output;
 }

 }
}</pre>
<p>If you're thinking this is complex, it's honestly not - try coding this in pure AS3 and it'll be a lot of complex slow lines, been there and done that!</p>
<p>I'm going to leave the examples there for now, if your looking for a 3D example then <a href="http://bit.ly/JnQzA" target="_blank">check the PixelBender files in Ralphs 3D Particle example</a>.</p>
<h3>Conclusion.</h3>
<p>AS3 together with PixelBender, haXe and Alchemy is fast becoming an immense tech for us developers, have a play around you may be suprised by the results you can achieve by utilizing these techs.</p>
<p>Coming very soon I'll be posting some examples which show what can be done when you use PixelBender, Fast Memory, Vectors and haXe / AS3 together, but in unconventional ways. Until then I hope this is enough to get some more dev's playing with PixelBender.</p>
<p>peace ::</p>
<p><img class="alignnone size-full wp-image-96" title="pb" src="http://webr3.org/blog/wp-content/uploads/2009/07/pb.jpg" alt="pb" width="600" height="250" /></p>
<p>image is a still from <a href="http://www.apple.com/trailers/wb/iamlegendawakening/" target="_self">I am Legend : Awakening</a> a very well made 5 minute short animation.</p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/general/pixelbender-is-more-useful-than-i-assumed/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Speeding up Apache2 and MySQL5 the easy way.</title>
		<link>http://webr3.org/blog/optimization/speeding-up-apache2-and-mysql5-the-easy-way/</link>
		<comments>http://webr3.org/blog/optimization/speeding-up-apache2-and-mysql5-the-easy-way/#comments</comments>
		<pubDate>Wed, 15 Jul 2009 22:28:14 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[backend db server]]></category>
		<category><![CDATA[HTTP]]></category>
		<category><![CDATA[HTTP persistent connection]]></category>
		<category><![CDATA[Hypertext Transfer Protocol]]></category>
		<category><![CDATA[Keepalive]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[ram]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[Technology/Internet]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=85</guid>
		<description><![CDATA[One of the sites I manage is what I'd would call a high traffic website, traffic sits around 900 requests a second all day and peaks around 3500 a second in rush hour (when the new stories for the day are published each morning). Nearer 6000 per second at peak.
Skippable Details
All this traffic is round [...]]]></description>
			<content:encoded><![CDATA[<p>One of the sites I manage is what I'd would call a high traffic website, traffic sits around 900 requests a second all day and peaks around 3500 a second in rush hour (when the new stories for the day are published each morning). Nearer 6000 per second at peak.</p>
<h3>Skippable Details</h3>
<p>All this traffic is round robin'd to 2 apache2 servers (each is 2x 2.86 quad xeons w/ 4GB ram) and one backend db server of similar spec.</p>
<p>Over the years familiar problems rear there heads, the mysql "Too Many Connections" error, Apache running like a dog under stress etc.</p>
<p>This weekend I got the chance to rebuild the servers from scratch, and a chance to optimize everything - works very very well now!</p>
<h3>Easy Optimization - Apache2</h3>
<p>KeepAliveTimeout: Number of seconds to wait for the next request from the same client on the same connection.</p>
<p>Translate this to "Number of seconds to lock up a worker thread just incase another request comes in from the same client on the same connection" - in short a lot of browsers will request the page, keep the connection alive and use it to request images, css etc.</p>
<p>The default is 15 seconds; in my experience about 40-50% of connections will use KeepAlive, so if you have enough ram to support 500 requests before going in to swap memory, and you get 500 requests a second, by the time 3 seconds are past you've got a load of workers with RAM assigned doing nothing and all the hard work is getting done using swap memory instead, from this poitn on basically all the hard work is getting done using swap rather than ram, ya?</p>
<p>Thus, the simple fix is to knock KeepAliveTimeout right down - some people recommend 2 seconds, my personal preference is 4 seconds - works a charm for me :)</p>
<p>To analyse this yourself just check the server-status provided by apache (normally only viewable from localhost, easy way around this is to &lt;?php echo file_get_contents('http://localhost/server-status'); ?&gt;</p>
<h3>Easy Optimization - MySQL Server</h3>
<p>Other than the normal cnf values you should be optimizing [ key_buffer, sort_buffer_size, max_allowed_packet, thread_stack, thread_cache_size, max_connections, table_cache, query_cache_limit, query_cache_size ] - there is one more that is very important; namely wait_timeout.</p>
<p>wait_timeout - The number of seconds the server waits for activity on a noninteractive connection before closing it.</p>
<p>By default this value is very high, but often you get connections that just aren't closed properly, especially in the PHP apache world - if you ever hit that "Too many connections" error or see a lot of connections open for prolonged periods of time then this could / should be addressed.</p>
<p>Often you will see connections sitting idle for 10-15 seconds, for no good reason. Best to address it I'd say; and thus we lower wait_timeout to 5 seconds.</p>
<p>Connections are cleared faster, more resources free for other connections and all round its good. Remember one of mysql's strongest features is it's ability to open and close connections ultra-fast.</p>
<h3>Summary</h3>
<p>Honestly just check the your apache server status, see how many connections are open, how many of them are flagged K for keep alive then drop the keep alive time, take a note of processor usage / memory usage, drop the KeepAliveTimeout to 2, 4, 6, 8 and check it all again, you may be very suprised, it sure beats weeks of manual code optimizatios to make those scripts run faster.</p>
<p>Regards</p>
<p><img class="alignnone size-full wp-image-86" title="top" src="http://webr3.org/blog/wp-content/uploads/2009/07/top.jpg" alt="top" width="600" height="250" /></p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/optimization/speeding-up-apache2-and-mysql5-the-easy-way/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Something about practical usage of flash.Memory in HaXe</title>
		<link>http://webr3.org/blog/haxe/something-about-practical-usage-of-flash-memory-in-haxe/</link>
		<comments>http://webr3.org/blog/haxe/something-about-practical-usage-of-flash-memory-in-haxe/#comments</comments>
		<pubDate>Thu, 09 Jul 2009 17:17:18 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[Flash 10]]></category>
		<category><![CDATA[haXe]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[3D computer graphics]]></category>
		<category><![CDATA[animation]]></category>
		<category><![CDATA[Arrays]]></category>
		<category><![CDATA[ByteArray]]></category>
		<category><![CDATA[Class]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Fast Memory]]></category>
		<category><![CDATA[flash player]]></category>
		<category><![CDATA[Integer]]></category>
		<category><![CDATA[Lookup table]]></category>
		<category><![CDATA[ram]]></category>
		<category><![CDATA[Technology/Internet]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=79</guid>
		<description><![CDATA[It's dawned on me that I've completely negated to mention why the haXe implementation of flash.Memory (flash player 10's new opcodes / fastmemory support) is so good.
Other than the really obvious bit that its direct access to a block of ram/fast memory which works extremely quickly, there is one other small but vital detail that's [...]]]></description>
			<content:encoded><![CDATA[<p>It's dawned on me that I've completely negated to mention why the haXe implementation of flash.Memory (flash player 10's new opcodes / fastmemory support) is so good.</p>
<p>Other than the really obvious bit that its direct access to a block of ram/fast memory which works extremely quickly, there is one other small but vital detail that's haXe specific and makes a vast difference between alchemy and haxe.</p>
<h2>The Big Difference</h2>
<p>You can think of flash.Memory as a static ByteArray, thus it is accessible from <em>both</em> static class and instance methods.</p>
<p>Now, another feature which offers huge speed increases in haXe is inlining.</p>
<p>Spelling it out - You can only inline static private methods, flash.Memory is all static access. So stick all your data (or big chunks of it) in flash.Memory and then inline  most of your heavy number crunching class.</p>
<p><strong>All inlined, all using fast memory = very very quick swfs!</strong></p>
<h2>Ideas and Tricks</h2>
<p>First of all, get the thought of only being able to have one chunk or type of data in fast memory at a time, you can partition it up and load everything in there.</p>
<p>A common setup I use is to have the first 1024 bytes as fast storage for float, int, byte, uint variables, the next xxx-thousand bytes as raw bitmap data, then all the other data I may want - like a block of 500k x,y,z floats for 3d data, followed by a few lookup charts.</p>
<p>the first 1024 bytes as fast storage for float, int, byte, uint variables - <em>because</em> you can then access the most commonly used vars using small int offsets (0-255) - and since its a small int your using it avoids all stacks completely, the bytecode will only contain the opcode to get the value from fast memory followed by the offset actually in the bytecode - for anything 256 and over its going to hit the int stack to get the offset first (which is barely noticable, but you know every fraction counts)</p>
<p>the next xxx-thousand bytes as raw bitmap data - <em>because </em>you can then work on your bitmap data ultra fast and write it back to the BitmapData instance using setPixels; very fast indeed, faster than a fixed sized vector even.</p>
<p>then all the other data - this is partially obvious, what may not be is the use of lookup charts, but I'll cover that in a moment.</p>
<h3>Accessing..</h3>
<p>Remember flash.Memory differs from a byte array in that you have to add in the offset for each value you want to grab</p>
<ul>
<li>Byte = 1 Byte</li>
<li>Int = 4 Bytes</li>
<li>Float = 4 Bytes</li>
<li>Double = 8 Bytes</li>
</ul>
<p>To use flash.Memory with functions which require a ByteArray.. [ like BitmapData.getPixels() -  Sound.extract() ] .. you can access it directly from flash.system.ApplicationDomain.currentDomain.domainMemory for example:</p>
<pre>// set position to start of the data we need
flash.system.ApplicationDomain.currentDomain.domainMemory.position = SOME_OFFSET;
// use the fast memory
bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory );</pre>
<p><strong>note:</strong> don't be fooled though, you can't simply call the functions on the bytearray in currentDomain, although <em>it is the data</em> that is in fast memory, it does not use the opcodes, which are what speed it up!</p>
<p><strong>Lookup Charts</strong></p>
<p>Often you'll find yourself calling the same code over and over and over; perhaps without realising just how much; consider we make an animation which uses perlin noise to generate 2d or 3d data - like this example - the example reads about 40k uint values from perlin noise, then converts each pixel to individual r,g,b values, then turns it in to a float -1 to 1 using a simple calculation. Thing is.. that means the same calculation runs 3x40000 times per frame at 50 frames a second.. 6 million times a second!</p>
<p>In this scenario, there are only 256 possible inputs to the calculation and 256 results - so rather than working it out 6 million times we can load it all in to a lookup chart, stick the lookup chart in fast memory and then lookup values rather than calculating - often you can really speed things up by doing this. <a href="http://webr3.org/experiments/perlin-particles/light-cloud/PerlinParticleEffects.hx">Practical example with comments here</a>.</p>
<p>Hope this is of use to somebody and clears things up a bit; I use the methods often so you'll see them in the source of my experiments.</p>
<p><img class="alignnone size-full wp-image-80" title="nopic" src="http://webr3.org/blog/wp-content/uploads/2009/07/nopic.jpg" alt="nopic" width="600" height="250" /></p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/haxe/something-about-practical-usage-of-flash-memory-in-haxe/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Haxe Optimization with Inline (Elliots Questions)</title>
		<link>http://webr3.org/blog/haxe/haxe-optimization-with-inline-elliots-questions/</link>
		<comments>http://webr3.org/blog/haxe/haxe-optimization-with-inline-elliots-questions/#comments</comments>
		<pubDate>Wed, 01 Jul 2009 01:54:31 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[haXe]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[Elliot Rock]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=76</guid>
		<description><![CDATA[
Elliot Rock asked me 3 fantastic questions about my previous post, it's good information so best to share for all I think!
So the inline methods and variables is the main optimisation in the final version?
it's the only optimisation made, I simply added the "inline" keyword to 2 methods and all static vars.
Can you explain where [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-77" title="inliner" src="http://webr3.org/blog/wp-content/uploads/2009/06/inliner.jpg" alt="inliner" width="600" height="250" /></p>
<p><a href="http://www.elliotrock.com/">Elliot Rock</a> asked me 3 fantastic questions about my <a href="http://webr3.org/blog/haxe/massive-amount-of-3d-particles-take-two/">previous post</a>, it's good information so best to share for all I think!</p>
<h2>So the inline methods and variables is the main optimisation in the final version?</h2>
<p><strong>it's the only optimisation made</strong>, I simply added the "inline" keyword to 2 methods and all static vars.</p>
<h2>Can you explain where and why the HaXe methods greatly improve performance please :)</h2>
<p>haxe offers 3 main forms of performance optimisation (we're being swf specific here);</p>
<ol>
<li>haXe produces bytecode/opcodes which are optimized better than the adobe compilers / mxmlc. This provides a small but significant increases in speed which all mount up. See the links at the foot of the page for more information.</li>
<li>haXe provides the ability to inline, which I'll explain further in a moment, this often gives a fantastic increase in speed.</li>
<li>haXe provides access to the new opcodes for fast memory access; the only other place to get them is through Alchemy.</li>
</ol>
<p>I must stress that these are 3 minor benefits of haxe compared to the hundreds of others, but they are very good ones none the less!</p>
<h2>I know inlines are so great but haven't found an explanation on why :)</h2>
<p>The simple answer is it produces less and better optimized opcodes + reduces calls to the stack(s).</p>
<p>I'm going to jump straight in to full on detail here! from the examples in question, here is snippet of code from the non-optimised version:</p>
<h3>Non-Optimized</h3>
<pre>private static var WIDTH : Int = 550;
// ...
po = pointToOffset( Std.int(realVector[a*3]) , Std.int(realVector[(a*3)+1]) , WIDTH );
// ...
private static function pointToOffset( vx : Int , vy : Int , vw : Int ) : UInt {</pre>
<p>taking the single functional line only (the middle one) here is are the opcodes produced for it:</p>
<pre>OLabel
OGetLex(Idx(62))
OFindProp(Idx(83))
OGetProp(Idx(83))
OReg(6)
OSmallInt(3)
OOp(OpIMul)
OGetProp(Idx(17))
OToNumber
OToInt
OFindProp(Idx(83))
OGetProp(Idx(83))
OReg(6)
OSmallInt(3)
OOp(OpIMul)
OSmallInt(1)
OOp(OpIAdd)
OGetProp(Idx(17))
OToNumber
OToInt
OGetLex(Idx(62))         // get name Identical
OGetProp(Idx(47))        // get property WIDTH
OCallProperty(Idx(67),3) // call pointToOffset with arg count: 3 (from previous uncommented opcodes)
OToUInt                  // set result as UInt
OToInt                   // set above as Int
OSetReg(5)               // store it in variable po</pre>
<p>and here's the pointToOffset opcodes</p>
<pre>OReg(1)
OReg(2)
OReg(3)
OOp(OpIMul)  // multiply argument 1 and 2
OOp(OpIAdd)  // add argument 3 to result of previous op
ORet         // return result</pre>
<p>so.. it's going to get the static var WIDTH from the class, then pass it and the other 2 arguments from our vector through to pointToOffset, run all the pointToOffset code, get the result, convert it and store in variable po. *phew* quite a lot really.</p>
<h3>Optimised</h3>
<p>now let's look at the optimised version, remember the only thing to change here is the addition of the keyword "inline":</p>
<pre>private static inline var WIDTH : Int = 550;
// ...
po = pointToOffset( Std.int(realVector[a*3]) , Std.int(realVector[(a*3)+1]) , WIDTH );
// ...
static inline function pointToOffset( vx : Int , vy : Int , vw : Int ) : UInt {</pre>
<p>and the corresponding op codes (snipped as the rest if the same)</p>
<pre>OToNumber
OToInt
OIntRef(Idx(4)) // value 550 on the int stack
OOp(OpIMul)     // multiply first two arguments (vector values)
OOp(OpIAdd)     // add the third (width)
OToInt          // set to int
OSetReg(5)      // store in po</pre>
<p><strong>the static var width and the function point to offset are removed and never called.</strong></p>
<p>back to an english explanation, we've removed some normal but heavy operations and replaced them with 3 simple and light operations, giving us a big speed increase.</p>
<p>All credit to <a href="http://ncannasse.fr/">Nicolas</a> the creator of <a href="http://haxe.org/">haxe</a> for this, it's amazing what he's done.</p>
<h3>Reading</h3>
<ul>
<li><a href="http://ncannasse.fr/blog/flash_9_optimizations ">http://ncannasse.fr/blog/flash_9_optimizations </a></li>
<li><a href="http://ncannasse.fr/blog/haxe_swc">http://ncannasse.fr/blog/haxe_swc</a> (specifically "More Optimizations")</li>
<li><a href="http://haxe.org/ref/inline">http://haxe.org/ref/inline</a></li>
</ul>
<p>Hope that answers the questions, and thanks Elliot.</p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/haxe/haxe-optimization-with-inline-elliots-questions/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Massive amount of 3D Particles Take Two</title>
		<link>http://webr3.org/blog/haxe/massive-amount-of-3d-particles-take-two/</link>
		<comments>http://webr3.org/blog/haxe/massive-amount-of-3d-particles-take-two/#comments</comments>
		<pubDate>Tue, 30 Jun 2009 15:45:16 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[Flash 10]]></category>
		<category><![CDATA[experiments]]></category>
		<category><![CDATA[haXe]]></category>
		<category><![CDATA[optimization]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=71</guid>
		<description><![CDATA[
Ralph Hauwert created a great demo of the power of Flash 10 / PixelBender / Alchemy  that you're probably familiar with, then Joa Ebert made a fantastic Flash 9 pure AS3 version that's very impressive, then I gave it a go with haXe and flash.Memory which turned out rather well too.
It's been bugging me [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-72" title="take2" src="http://webr3.org/blog/wp-content/uploads/2009/06/take2.jpg" alt="take2" width="600" height="250" /></p>
<p>Ralph Hauwert created a great <a href="http://www.unitzeroone.com/blog/2009/03/18/flash-10-massive-amounts-of-3d-particles-with-alchemy-source-included/" target="_blank">demo of the power of Flash 10 / PixelBender / Alchemy </a> that you're probably familiar with, then Joa Ebert made a fantastic <a href="http://blog.joa-ebert.com/2009/04/03/massive-amounts-of-3d-particles-without-alchemy-and-pixelbender/">Flash 9 pure AS3 version</a> that's very impressive, then I gave it <a href="http://webr3.org/blog/haxe/flash-10-massive-amounts-of-3d-particles-with-haxe/">a go with haXe and flash.Memory</a> which turned out rather well too.</p>
<p>It's been bugging me for some time that the code just wasn't real world enough though - so I thought I'd give it a go using AS3 and flash 10, using only native methods and no fancy 3D calculations that us commoners don't quite get. In short I've stripped it down and turned it into a simple class which uses flash 10 Vectors and Matrix3D.</p>
<p>I've taken a different approach with these demo's, rather than all in realtime, you simply click and it runs the enterFrame code once and displays the time in ms that it took. There is however a <a href="http://webr3.org/experiments/particle-pusher-take2/realtime/">final realtime version</a> using this code at the end.</p>
<p>Four examples are provided, Pure AS3, then the same code converted to haxe, then the haxe "inline" keyword added and finally the whole thing working in realtime.</p>
<h2>AS3 Flash 10 Version with Vector / Matrix3D</h2>
<p>Here's the main source used:</p>
<pre class="brush: as3; title: ;">private function enterFrameHandler( event : Event ): void {
 var d : Date = new Date();
 var t : Number = d.getTime();

 updateMatrix( mouseX , mouseY );
 _matrix.transformVectors( particleVector , realVector );

 var pxs : Vector.&amp;lt;uint&amp;gt; = new Vector.&amp;lt;uint&amp;gt;( PIXELS , true );
 var po : int = 0;
 var a : int = 0;

 while( a &amp;lt; PARTICLES ) {
   po = pointToOffset( int(realVector[a*3]) , int(realVector[(a*3)+1]) , WIDTH );
   if( 0 &amp;lt; po &amp;amp;&amp;amp; po &amp;lt; PIXELS) {
     pxs[po] = increaseColor( pxs[po] );
   }
   a++;
 }
 // update display
 this.bitmapData.lock();
 this.bitmapData.setVector( this.bitmapData.rect , pxs );
 this.bitmapData.unlock( this.bitmapData.rect );

 d = new Date();
 textField.text = &quot;single frame time: &quot; + ( d.getTime() - t );
}

private function updateMatrix( mx : Number , my : Number ) : void {
 tx = tx + ((mx - tx)/10);
 ty = ty + ((my - ty)/10);
 _matrix.identity();
 _matrix.appendRotation( tx , Vector3D.Y_AXIS );
 _matrix.appendRotation( ty , Vector3D.X_AXIS );
 _matrix.appendTranslation( CX, CY, 10 );
}

private static function increaseColor( c : uint ) : uint {
 return c &amp;lt; MAXCOLOR ? c + SHADE : 0xFFFFFF;
}

private static function pointToOffset( vx : int , vy : int , vw : int ) : uint {
 return vx + ( vy * vw );
}
</pre>
<p><a href="http://webr3.org/experiments/particle-pusher-take2/pure-as3/" target="_blank">And here's the AS3 result (with full source)</a></p>
<h2>The same code in haXe</h2>
<p>next up I spent literally 2 minutes converting this to haXe syntax, changed no functionality - just slight syntax changes between the languages.</p>
<pre class="brush: plain; title: ;">private function enterFrameHandler( event : Event ) : Void {
 var d : Date = Date.now();
 var t : Float = d.getTime();

 updateMatrix( mouseX , mouseY );
 _matrix.transformVectors( particleVector , realVector );

 var pxs : Vector&amp;lt;UInt&amp;gt; = new Vector&amp;lt;UInt&amp;gt;( PIXELS , true );
 var po : Int = 0;
 var a : Int = 0;

 while( a &amp;lt; PARTICLES ) {
   po = pointToOffset( Std.int(realVector[a*3]) , Std.int(realVector[(a*3)+1]) , WIDTH );
   if( 0 &amp;lt; po &amp;amp;&amp;amp; po &amp;lt; PIXELS) {
     pxs[po] = increaseColor( pxs[po] );
   }
   a++;
 }
 // update display
 this.bitmapData.lock();
 this.bitmapData.setVector( this.bitmapData.rect , pxs );
 this.bitmapData.unlock( this.bitmapData.rect );

 d = Date.now();
 textField.text = &quot;single frame time: &quot; + ( d.getTime() - t );
}

private function updateMatrix( mx : Float , my : Float ) : Void {
 tx = tx + ((mx - tx)/10);
 ty = ty + ((my - ty)/10);
 _matrix.identity();
 _matrix.appendRotation( tx , Vector3D.Y_AXIS );
 _matrix.appendRotation( ty , Vector3D.X_AXIS );
 _matrix.appendTranslation( CX, CY, 10 );
}

private static function increaseColor( c : UInt ) : UInt {
 return c &amp;lt; MAXCOLOR ? c + SHADE : 0xFFFFFF;
}

private static function pointToOffset( vx : Int , vy : Int , vw : Int ) : UInt {
 return vx + ( vy * vw );
}</pre>
<p><a href="http://webr3.org/experiments/particle-pusher-take2/identical/" target="_blank">And here's the identical haXe result (with full source)</a></p>
<h2>Optimized haXe Version</h2>
<p>Next up I spent another minute optimizing (literally, one minute) - haXe supports inlining of variables and methods, so I added the inline keyword to the static vars and two of the static methods (increaseColor and pointToOffset)</p>
<pre class="brush: plain; title: ;">private function enterFrameHandler( event : Event ) : Void {
 var d : Date = Date.now();
 var t : Float = d.getTime();

 updateMatrix( mouseX , mouseY );
 _matrix.transformVectors( particleVector , realVector );

 var pxs : Vector&amp;lt;UInt&amp;gt; = new Vector&amp;lt;UInt&amp;gt;( PIXELS , true );
 var po : Int = 0;
 var a : Int = 0;

 while( a &amp;lt; PARTICLES ) {
   po = pointToOffset( Std.int(realVector[a*3]) , Std.int(realVector[(a*3)+1]) , WIDTH );
   if( 0 &amp;lt; po &amp;amp;&amp;amp; po &amp;lt; PIXELS) {
     pxs[po] = increaseColor( pxs[po] );
   }
   a++;
 }
 // update display
 this.bitmapData.lock();
 this.bitmapData.setVector( this.bitmapData.rect , pxs );
 this.bitmapData.unlock( this.bitmapData.rect );

 d = Date.now();
 textField.text = &quot;single frame time: &quot; + ( d.getTime() - t );
}

private function updateMatrix( mx : Float , my : Float ) : Void {
  tx = tx + ((mx - tx)/10);
  ty = ty + ((my - ty)/10);
  _matrix.identity();
  _matrix.appendRotation( tx , Vector3D.Y_AXIS );
  _matrix.appendRotation( ty , Vector3D.X_AXIS );
  _matrix.appendTranslation( CX, CY, 10 );
}

static inline function increaseColor( c : UInt ) : UInt {
  return c &amp;lt; MAXCOLOR ? c + SHADE : 0xFFFFFF;
}

static inline function pointToOffset( vx : Int , vy : Int , vw : Int ) : UInt {
 return vx + ( vy * vw );
}
</pre>
<p><a href="http://webr3.org/experiments/particle-pusher-take2/optimized/" target="_blank">And here's the inlined haXe result (with full source)</a></p>
<h2>Summary</h2>
<p>In under five minutes and with tiny code changes we've gained well over 100% speed increase; using nothing fancy and the speed rivals all the heavily optimized versions using every trick in the book.</p>
<p>The final result (simply with the event listener changed to Event.ENTER_FRAME and fps added) <a href="http://webr3.org/experiments/particle-pusher-take2/realtime/">is available here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/haxe/massive-amount-of-3d-particles-take-two/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>BitmapData, Vectors, ByteArrays and Optimization</title>
		<link>http://webr3.org/blog/haxe/bitmapdata-vectors-bytearrays-and-optimization/</link>
		<comments>http://webr3.org/blog/haxe/bitmapdata-vectors-bytearrays-and-optimization/#comments</comments>
		<pubDate>Mon, 29 Jun 2009 10:13:31 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[Flash 10]]></category>
		<category><![CDATA[haXe]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[ActionScript]]></category>
		<category><![CDATA[BitmapData]]></category>
		<category><![CDATA[ByteArray]]></category>
		<category><![CDATA[Flash Optimization]]></category>
		<category><![CDATA[Vector]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=53</guid>
		<description><![CDATA[
In this post I'm looking at the layout of BitmapData and the fastest way to access / manipulate the raw data. The syntax used is haXe, however the information is good for as3/flash10 right up until the last section which deals with using fast memory and is haXe/Alchemy specific.
The Simple Stuff
A BitmapData 5px by 5px [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-58" title="opt" src="http://webr3.org/blog/wp-content/uploads/2009/06/opt.jpg" alt="opt" width="600" height="250" /></p>
<p>In this post I'm looking at the layout of BitmapData and the fastest way to access / manipulate the raw data. The syntax used is haXe, however the information is good for as3/flash10 right up until the last section which deals with using fast memory and is haXe/Alchemy specific.</p>
<h3>The Simple Stuff</h3>
<p>A BitmapData 5px by 5px will comprise of 25 pixels; the flash API is a bit misleading as to the point coordinate system used,<br />
as such here's the layout of the data contained:</p>
<pre>       X
   0,-,-,-,-
   -,1,-,-,-
 Y -,-,2,-,-
   -,-,-,3,-
   -,-,-,-,4</pre>
<p>the upper left corner is point(0,0) - the lower right is point(4,4); [not 1,1 to 5,5 as the manual suggests]</p>
<p>to access the data I'm going to be using three specific methods:</p>
<ul>
<li> BitmapData.getPixel( x, y );</li>
<li> BitmapData.getPixels( rect );</li>
<li> BitmapData.getVector( rect );</li>
</ul>
<p>the data returned by getPixels and getVector is in array format (ByteArray and Vector&lt;UInt&gt; respectively), both are 0 indexed with 25 entries.</p>
<p>Pixel values run from left to right, so the 5px example above converts to</p>
<pre>index[0] = point(0,0)
index[1] = -
index[2] = -
index[3] = -
index[4] = -
index[5] = -
index[6] = point(1,1)
index[7] = -
index[8] = -
index[9] = -
index[10] = -
index[11] = -
index[12] = point(2,2)
index[13] = -
index[14] = -
index[15] = -
index[16] = -
index[17] = -
index[18] = point(3,3)
index[19] = -
index[20] = -
index[21] = -
index[22] = -
index[23] = -
index[24] = point(4,4)</pre>
<h3>Converting between X,Y and Offsets</h3>
<p>to work with the data in its raw format we're going to need some helper functions/code to convert between point(x,y) and offset.</p>
<p><strong>X,Y to Offset</strong></p>
<pre>function pointToOffset( x : Int , y : Int , width : Int ) : Int {
    return (y * width) + x;
}</pre>
<p>thus for point(3,3):</p>
<pre>    (3 x 5) + 3 = 18</pre>
<p><strong>Offset to X,Y</strong></p>
<pre>function offsetToPoint( offset : Int , width : Int ) : Point {
    return new Point( offset%width , Std.int(offset/width) );
}
// or
bitmapData.getPixel( offset%width , Std.int(offset/width) );</pre>
<p>thus for offset 18 :</p>
<pre>    x: 18%5 = 3
    y: 18/5 = 3</pre>
<h2>Speed Tests</h2>
<p>For each of the three methods [ getPixel(x,y); getPixels(rect); getVector(rect); ] I've done a lot of testing to see which is fastest. All tests use a non transparent bitmapData of 1,920,000 pixels (1600x1200) the task of the test itself is to get all pixels from the bitmapData, load them in to an array/bytearray/vector, add 1 to the color for each pixel and then write the new data back to the BitmapData.</p>
<h3>Starting with the Flash 9 methods available:</h3>
<p><strong>Test One, getPixel -&gt; array -&gt; setPixel</strong><br />
This method incurs two full loops because there is no setPixels(array) method, the time of a bare loop of 1920000 is 8ms.</p>
<pre>var o : Array&lt;UInt&gt; = new Array&lt;UInt&gt;();
for( i in 0...1920000 ) {
    o[i] = bitmapData.getPixel( i%WIDTH , Std.int(i/WIDTH) ) + 1;
}
for( i in 0...1920000 ) {
    bitmapData.setPixel( i%WIDTH , Std.int(i/WIDTH) , o[i] );
}</pre>
<p>results [all times in ms]</p>
<pre>555, 569, 574, 573, 575, 570, 570, 570, 568, 576, 575, 573, 569, 563, 570</pre>
<h3>Test Two - getPixels -&gt; ByteArray -&gt; setPixels</h3>
<p>This method needs two ByteArrays to avoid using temporary var's and setting positions multiple times, the time of simply calling getPixels() alone is 90ms!</p>
<pre>var b : ByteArray = bitmapData.getPixels( bitmapData.rect );
var n : ByteArray = new ByteArray();
n.endian = b.endian; // getPixels often returns big endian values and byte arrays are little endian by default
b.position = 0; // other wise we get an error
for( i in 0...1920000 ) {
    n.writeUnsignedInt( b.readUnsignedInt() + 1 );
}
n.position = 0; // otherwise we get an error
bitmapData.setPixels( bitmapData.rect , n );</pre>
<p>results [all times in ms]</p>
<pre>597, 590, 591, 598, 591, 591, 595, 593, 595, 591, 591, 594, 591, 594, 588</pre>
<p>Rather disapointing results really, we'd get a lovely 2fps at that rate (all tests ran on a quad core btw).</p>
<h2>On to the methods available in flash 10, which let's us use Vectors</h2>
<h3>Test Three, getPixel -&gt; Vector -&gt; setVector</h3>
<p>This test is included purely to show the speed difference between Vectors and Arrays (test one)</p>
<pre>var o : Vector&lt;UInt&gt; = new Vector&lt;UInt&gt;( 1920000 , true );
for( i in 0...1920000 ) {
    o[i] = bitmapData.getPixel( i%WIDTH , Std.int(i/WIDTH) ) + 1;
}
for( i in 0...1920000 ) {
    bitmapData.setPixel( i%WIDTH , Std.int(i/WIDTH) , o[i] );
}</pre>
<p>results [all times in ms]</p>
<pre>439, 443, 447, 444, 447, 445, 448, 447, 446, 447, 447, 445, 447, 449, 448</pre>
<h3>Test Four, getVector -&gt; Vector -&gt; setVector (unfixed vector)</h3>
<p>unfixed vector initialization is virtually instant, way too quick for me to measure anyways (or to make a difference)</p>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
var o : Vector&lt;UInt&gt; = new Vector&lt;UInt&gt;();
for( i in 0...1920000 ) {
    o[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , o );</pre>
<p>results [all times in ms]</p>
<pre>96, 66, 80, 71, 75, 65, 74, 72, 76, 68, 77, 73, 72, 77, 68, 70, 72, 72, 81, 68</pre>
<h3>Test Five, getVector -&gt; Vector -&gt; setVector (fixed size vector)</h3>
<p>fixed size vectors take time, in this instance Vector&lt;UInt&gt;(1920000,true) takes circa 5ms</p>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
var o : Vector&lt;UInt&gt; = new Vector&lt;UInt&gt;( 1920000 , true );
for( i in 0...1920000 ) {
    o[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , o );</pre>
<p>results [all times in ms]</p>
<pre>53, 41, 41, 46, 45, 47, 41, 40, 45, 45, 46, 41, 40, 46, 45, 47, 41, 50, 42, 46</pre>
<h3>Test Six, getVector -&gt; setVector</h3>
<p>this time we cut out the second vector, as essentially it's not needed</p>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
for( i in 0...1920000 ) {
    v[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , v );</pre>
<p>results [all times in ms]</p>
<pre>44, 36, 38, 36, 38, 36, 36, 38, 36, 36, 37, 39, 35, 38, 38, 36, 39, 36, 38, 38</pre>
<p>Some fantastic speed gains, and since we're getting to smaller figures we need to start counting the cost of each line:</p>
<pre>bitmapData.getVector( bitmapData.rect );</pre>
<p>10, 8, 9, 9, 8, 10, 8, 9, 8, 11, 8, 10, 8, 10, 8, 9, 8, 9, 8 [ms]</p>
<pre>for( i in 0...1920000 ) {
    v[i] = v[i] + 1;
}</pre>
<p>25, 25, 25, 27, 27, 26, 25, 26, 25, 25, 26, 25, 27, 25, 25, 26, 25, 26, 25, 25</p>
<p>however the loop alone takes 8ms so we're looking at av 19ms for this part</p>
<pre>bitmapData.setVector( bitmapData.rect , v );</pre>
<p>11, 5, 6, 4, 4, 6, 4, 6, 4, 4, 5, 5, 6, 4, 6, 4, 5, 5, 4, 5</p>
<p>and finally to be really count the cost we need to see what the uint+1 would take by itself</p>
<pre>for( i in 0...1920000 ) {
    0xFFFFF9 + 1;
}</pre>
<p>9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8, 9, 9</p>
<p>so the true cost of using this single vector method in test six is 37-9 = 28ms for getVector, update vector, setVector</p>
<p>the true cost of using a vector inside the loop is 26-9 = 17ms (we need this later)</p>
<h2>Adding haXe in to the equation.</h2>
<p>Finally, as you probably know haXe provides access to fast memory; fast memory is essentially a ByteArray set in flash.system.ApplicationDomain.currentDomain.domainMemory and accessed through the new opcodes.</p>
<p>A little trick we can use with haxe is to write info from this fast memory to BitmapData via setPixels() since it is a ByteArray.</p>
<p>First off we need to set up the fast memory storage somewhere in our app:</p>
<pre>var storage : ByteArray = new ByteArray();
storage.endian = Endian.LITTLE_ENDIAN;
storage.length = BYTES;
flash.Memory.select( storage );</pre>
<p>to calulate the value of BYTES we simply do WIDTH*HEIGHT*4 (because a UInt takes up 4 Bytes of memory)</p>
<p>we know from earlier that getVector only takes circa 9ms, whereas getPixels takes a shocking 90ms; so what I'm going to do is:</p>
<h3>Test Seven : getVector -&gt; flash.Memory -&gt; setPixels</h3>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
for( i in 0...1920000 ) {
    flash.Memory.setI32( i*4 , v[i] + 1 );
}
// ensure position of fast memory is 0
flash.system.ApplicationDomain.currentDomain.domainMemory.position = 0;
bitmapData.lock();
bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory );
bitmapData.unlock( bitmapData.rect );</pre>
<p>results [all times in ms]<br />
34, 31, 32, 34, 32, 32, 32, 33, 32, 33, 31, 33, 32, 31, 33, 32, 33, 32, 33, 31,</p>
<p>We've shaved off 5ms! but hold on.. there's more - we need to analyse this</p>
<p>bitmapData.getVector( bitmapData.rect ); as we know from earlier takes 9ms</p>
<p>the loop takes 9ms</p>
<pre>flash.system.ApplicationDomain.currentDomain.domainMemory.position = 0;
bitmapData.lock();
bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory );
bitmapData.unlock( bitmapData.rect );</pre>
<p>together all takes 5ms</p>
<p>which means that the true cost of using flash.Memory is 32-9-9-5 = 9ms</p>
<p>if you remember from test six, the vector cost was 17ms so we've gained some and lost some, but overall came out on top, and certainly a tonne better than flash 9!</p>
<p>now, these tests where all with a big chunk of 1600x1200 pixels, in a real world app we'd have less.. so lets see with 800x450px (360,000 pixels)</p>
<h3>Vector Only:</h3>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
for( i in 0...360000 ) {
    v[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , v );</pre>
<p>results [all times in ms]<br />
13, 6, 6, 6, 6, 7, 6, 7, 6, 6, 7, 6, 6, 7, 7, 7, 7, 7, 7, 7  = 137 / 20 : av 6.85ms = 146fps</p>
<h3>Vector + Memory</h3>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
for( i in 0...360000 ) {
flash.Memory.setI32( i*4 , v[i] + 1 );
}

// ensure position of fast memory is 0
flash.system.ApplicationDomain.currentDomain.domainMemory.position = 0;
bitmapData.lock();
bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory );
bitmapData.unlock( bitmapData.rect );</pre>
<p>results [all times in ms]<br />
7, 6, 6, 6, 5, 6, 6, 5, 5, 5, 6, 5, 5, 5, 6, 5, 5, 5, 5, 5 = 109 / 20 : av 5.45ms = 183fps</p>
<h2>Summary</h2>
<p>so, there's your choices 800x450px swf using methods to update raw bitmap data at 3fps for flash 9 vs <strong>146fps</strong> for pure as3 vs <strong>183 fps</strong> for haXe (and probably alchemy).</p>
<p><strong>imho:</strong> vectors are a fantastic choice, my personal preference goes to using fastmemory, but thats because i can avoid the stack using it and know that whilst its a little more coding, you can't get faster.</p>
<p><strong>overall:</strong> I'm just grateful we've got options to be able to deliver this kind of performance, really opens the doors for some stunning work and libraries not least the upcoming version of papervision and no doublt some libs for haXe.</p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/haxe/bitmapdata-vectors-bytearrays-and-optimization/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

