<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>webr3.org &#187; Flash Optimization</title>
	<atom:link href="http://webr3.org/blog/tag/flash-optimization/feed/" rel="self" type="application/rss+xml" />
	<link>http://webr3.org/blog</link>
	<description>brain&#039;s on fire!</description>
	<lastBuildDate>Tue, 19 Jul 2011 15:38:29 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>BitmapData, Vectors, ByteArrays and Optimization</title>
		<link>http://webr3.org/blog/haxe/bitmapdata-vectors-bytearrays-and-optimization/</link>
		<comments>http://webr3.org/blog/haxe/bitmapdata-vectors-bytearrays-and-optimization/#comments</comments>
		<pubDate>Mon, 29 Jun 2009 10:13:31 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[Flash 10]]></category>
		<category><![CDATA[haXe]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[ActionScript]]></category>
		<category><![CDATA[BitmapData]]></category>
		<category><![CDATA[ByteArray]]></category>
		<category><![CDATA[Flash Optimization]]></category>
		<category><![CDATA[Vector]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=53</guid>
		<description><![CDATA[
In this post I'm looking at the layout of BitmapData and the fastest way to access / manipulate the raw data. The syntax used is haXe, however the information is good for as3/flash10 right up until the last section which deals with using fast memory and is haXe/Alchemy specific.
The Simple Stuff
A BitmapData 5px by 5px [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-58" title="opt" src="http://webr3.org/blog/wp-content/uploads/2009/06/opt.jpg" alt="opt" width="600" height="250" /></p>
<p>In this post I'm looking at the layout of BitmapData and the fastest way to access / manipulate the raw data. The syntax used is haXe, however the information is good for as3/flash10 right up until the last section which deals with using fast memory and is haXe/Alchemy specific.</p>
<h3>The Simple Stuff</h3>
<p>A BitmapData 5px by 5px will comprise of 25 pixels; the flash API is a bit misleading as to the point coordinate system used,<br />
as such here's the layout of the data contained:</p>
<pre>       X
   0,-,-,-,-
   -,1,-,-,-
 Y -,-,2,-,-
   -,-,-,3,-
   -,-,-,-,4</pre>
<p>the upper left corner is point(0,0) - the lower right is point(4,4); [not 1,1 to 5,5 as the manual suggests]</p>
<p>to access the data I'm going to be using three specific methods:</p>
<ul>
<li> BitmapData.getPixel( x, y );</li>
<li> BitmapData.getPixels( rect );</li>
<li> BitmapData.getVector( rect );</li>
</ul>
<p>the data returned by getPixels and getVector is in array format (ByteArray and Vector&lt;UInt&gt; respectively), both are 0 indexed with 25 entries.</p>
<p>Pixel values run from left to right, so the 5px example above converts to</p>
<pre>index[0] = point(0,0)
index[1] = -
index[2] = -
index[3] = -
index[4] = -
index[5] = -
index[6] = point(1,1)
index[7] = -
index[8] = -
index[9] = -
index[10] = -
index[11] = -
index[12] = point(2,2)
index[13] = -
index[14] = -
index[15] = -
index[16] = -
index[17] = -
index[18] = point(3,3)
index[19] = -
index[20] = -
index[21] = -
index[22] = -
index[23] = -
index[24] = point(4,4)</pre>
<h3>Converting between X,Y and Offsets</h3>
<p>to work with the data in its raw format we're going to need some helper functions/code to convert between point(x,y) and offset.</p>
<p><strong>X,Y to Offset</strong></p>
<pre>function pointToOffset( x : Int , y : Int , width : Int ) : Int {
    return (y * width) + x;
}</pre>
<p>thus for point(3,3):</p>
<pre>    (3 x 5) + 3 = 18</pre>
<p><strong>Offset to X,Y</strong></p>
<pre>function offsetToPoint( offset : Int , width : Int ) : Point {
    return new Point( offset%width , Std.int(offset/width) );
}
// or
bitmapData.getPixel( offset%width , Std.int(offset/width) );</pre>
<p>thus for offset 18 :</p>
<pre>    x: 18%5 = 3
    y: 18/5 = 3</pre>
<h2>Speed Tests</h2>
<p>For each of the three methods [ getPixel(x,y); getPixels(rect); getVector(rect); ] I've done a lot of testing to see which is fastest. All tests use a non transparent bitmapData of 1,920,000 pixels (1600x1200) the task of the test itself is to get all pixels from the bitmapData, load them in to an array/bytearray/vector, add 1 to the color for each pixel and then write the new data back to the BitmapData.</p>
<h3>Starting with the Flash 9 methods available:</h3>
<p><strong>Test One, getPixel -&gt; array -&gt; setPixel</strong><br />
This method incurs two full loops because there is no setPixels(array) method, the time of a bare loop of 1920000 is 8ms.</p>
<pre>var o : Array&lt;UInt&gt; = new Array&lt;UInt&gt;();
for( i in 0...1920000 ) {
    o[i] = bitmapData.getPixel( i%WIDTH , Std.int(i/WIDTH) ) + 1;
}
for( i in 0...1920000 ) {
    bitmapData.setPixel( i%WIDTH , Std.int(i/WIDTH) , o[i] );
}</pre>
<p>results [all times in ms]</p>
<pre>555, 569, 574, 573, 575, 570, 570, 570, 568, 576, 575, 573, 569, 563, 570</pre>
<h3>Test Two - getPixels -&gt; ByteArray -&gt; setPixels</h3>
<p>This method needs two ByteArrays to avoid using temporary var's and setting positions multiple times, the time of simply calling getPixels() alone is 90ms!</p>
<pre>var b : ByteArray = bitmapData.getPixels( bitmapData.rect );
var n : ByteArray = new ByteArray();
n.endian = b.endian; // getPixels often returns big endian values and byte arrays are little endian by default
b.position = 0; // other wise we get an error
for( i in 0...1920000 ) {
    n.writeUnsignedInt( b.readUnsignedInt() + 1 );
}
n.position = 0; // otherwise we get an error
bitmapData.setPixels( bitmapData.rect , n );</pre>
<p>results [all times in ms]</p>
<pre>597, 590, 591, 598, 591, 591, 595, 593, 595, 591, 591, 594, 591, 594, 588</pre>
<p>Rather disapointing results really, we'd get a lovely 2fps at that rate (all tests ran on a quad core btw).</p>
<h2>On to the methods available in flash 10, which let's us use Vectors</h2>
<h3>Test Three, getPixel -&gt; Vector -&gt; setVector</h3>
<p>This test is included purely to show the speed difference between Vectors and Arrays (test one)</p>
<pre>var o : Vector&lt;UInt&gt; = new Vector&lt;UInt&gt;( 1920000 , true );
for( i in 0...1920000 ) {
    o[i] = bitmapData.getPixel( i%WIDTH , Std.int(i/WIDTH) ) + 1;
}
for( i in 0...1920000 ) {
    bitmapData.setPixel( i%WIDTH , Std.int(i/WIDTH) , o[i] );
}</pre>
<p>results [all times in ms]</p>
<pre>439, 443, 447, 444, 447, 445, 448, 447, 446, 447, 447, 445, 447, 449, 448</pre>
<h3>Test Four, getVector -&gt; Vector -&gt; setVector (unfixed vector)</h3>
<p>unfixed vector initialization is virtually instant, way too quick for me to measure anyways (or to make a difference)</p>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
var o : Vector&lt;UInt&gt; = new Vector&lt;UInt&gt;();
for( i in 0...1920000 ) {
    o[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , o );</pre>
<p>results [all times in ms]</p>
<pre>96, 66, 80, 71, 75, 65, 74, 72, 76, 68, 77, 73, 72, 77, 68, 70, 72, 72, 81, 68</pre>
<h3>Test Five, getVector -&gt; Vector -&gt; setVector (fixed size vector)</h3>
<p>fixed size vectors take time, in this instance Vector&lt;UInt&gt;(1920000,true) takes circa 5ms</p>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
var o : Vector&lt;UInt&gt; = new Vector&lt;UInt&gt;( 1920000 , true );
for( i in 0...1920000 ) {
    o[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , o );</pre>
<p>results [all times in ms]</p>
<pre>53, 41, 41, 46, 45, 47, 41, 40, 45, 45, 46, 41, 40, 46, 45, 47, 41, 50, 42, 46</pre>
<h3>Test Six, getVector -&gt; setVector</h3>
<p>this time we cut out the second vector, as essentially it's not needed</p>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
for( i in 0...1920000 ) {
    v[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , v );</pre>
<p>results [all times in ms]</p>
<pre>44, 36, 38, 36, 38, 36, 36, 38, 36, 36, 37, 39, 35, 38, 38, 36, 39, 36, 38, 38</pre>
<p>Some fantastic speed gains, and since we're getting to smaller figures we need to start counting the cost of each line:</p>
<pre>bitmapData.getVector( bitmapData.rect );</pre>
<p>10, 8, 9, 9, 8, 10, 8, 9, 8, 11, 8, 10, 8, 10, 8, 9, 8, 9, 8 [ms]</p>
<pre>for( i in 0...1920000 ) {
    v[i] = v[i] + 1;
}</pre>
<p>25, 25, 25, 27, 27, 26, 25, 26, 25, 25, 26, 25, 27, 25, 25, 26, 25, 26, 25, 25</p>
<p>however the loop alone takes 8ms so we're looking at av 19ms for this part</p>
<pre>bitmapData.setVector( bitmapData.rect , v );</pre>
<p>11, 5, 6, 4, 4, 6, 4, 6, 4, 4, 5, 5, 6, 4, 6, 4, 5, 5, 4, 5</p>
<p>and finally to be really count the cost we need to see what the uint+1 would take by itself</p>
<pre>for( i in 0...1920000 ) {
    0xFFFFF9 + 1;
}</pre>
<p>9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8, 9, 9</p>
<p>so the true cost of using this single vector method in test six is 37-9 = 28ms for getVector, update vector, setVector</p>
<p>the true cost of using a vector inside the loop is 26-9 = 17ms (we need this later)</p>
<h2>Adding haXe in to the equation.</h2>
<p>Finally, as you probably know haXe provides access to fast memory; fast memory is essentially a ByteArray set in flash.system.ApplicationDomain.currentDomain.domainMemory and accessed through the new opcodes.</p>
<p>A little trick we can use with haxe is to write info from this fast memory to BitmapData via setPixels() since it is a ByteArray.</p>
<p>First off we need to set up the fast memory storage somewhere in our app:</p>
<pre>var storage : ByteArray = new ByteArray();
storage.endian = Endian.LITTLE_ENDIAN;
storage.length = BYTES;
flash.Memory.select( storage );</pre>
<p>to calulate the value of BYTES we simply do WIDTH*HEIGHT*4 (because a UInt takes up 4 Bytes of memory)</p>
<p>we know from earlier that getVector only takes circa 9ms, whereas getPixels takes a shocking 90ms; so what I'm going to do is:</p>
<h3>Test Seven : getVector -&gt; flash.Memory -&gt; setPixels</h3>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
for( i in 0...1920000 ) {
    flash.Memory.setI32( i*4 , v[i] + 1 );
}
// ensure position of fast memory is 0
flash.system.ApplicationDomain.currentDomain.domainMemory.position = 0;
bitmapData.lock();
bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory );
bitmapData.unlock( bitmapData.rect );</pre>
<p>results [all times in ms]<br />
34, 31, 32, 34, 32, 32, 32, 33, 32, 33, 31, 33, 32, 31, 33, 32, 33, 32, 33, 31,</p>
<p>We've shaved off 5ms! but hold on.. there's more - we need to analyse this</p>
<p>bitmapData.getVector( bitmapData.rect ); as we know from earlier takes 9ms</p>
<p>the loop takes 9ms</p>
<pre>flash.system.ApplicationDomain.currentDomain.domainMemory.position = 0;
bitmapData.lock();
bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory );
bitmapData.unlock( bitmapData.rect );</pre>
<p>together all takes 5ms</p>
<p>which means that the true cost of using flash.Memory is 32-9-9-5 = 9ms</p>
<p>if you remember from test six, the vector cost was 17ms so we've gained some and lost some, but overall came out on top, and certainly a tonne better than flash 9!</p>
<p>now, these tests where all with a big chunk of 1600x1200 pixels, in a real world app we'd have less.. so lets see with 800x450px (360,000 pixels)</p>
<h3>Vector Only:</h3>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
for( i in 0...360000 ) {
    v[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , v );</pre>
<p>results [all times in ms]<br />
13, 6, 6, 6, 6, 7, 6, 7, 6, 6, 7, 6, 6, 7, 7, 7, 7, 7, 7, 7  = 137 / 20 : av 6.85ms = 146fps</p>
<h3>Vector + Memory</h3>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
for( i in 0...360000 ) {
flash.Memory.setI32( i*4 , v[i] + 1 );
}

// ensure position of fast memory is 0
flash.system.ApplicationDomain.currentDomain.domainMemory.position = 0;
bitmapData.lock();
bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory );
bitmapData.unlock( bitmapData.rect );</pre>
<p>results [all times in ms]<br />
7, 6, 6, 6, 5, 6, 6, 5, 5, 5, 6, 5, 5, 5, 6, 5, 5, 5, 5, 5 = 109 / 20 : av 5.45ms = 183fps</p>
<h2>Summary</h2>
<p>so, there's your choices 800x450px swf using methods to update raw bitmap data at 3fps for flash 9 vs <strong>146fps</strong> for pure as3 vs <strong>183 fps</strong> for haXe (and probably alchemy).</p>
<p><strong>imho:</strong> vectors are a fantastic choice, my personal preference goes to using fastmemory, but thats because i can avoid the stack using it and know that whilst its a little more coding, you can't get faster.</p>
<p><strong>overall:</strong> I'm just grateful we've got options to be able to deliver this kind of performance, really opens the doors for some stunning work and libraries not least the upcoming version of papervision and no doublt some libs for haXe.</p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/haxe/bitmapdata-vectors-bytearrays-and-optimization/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

