<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>webr3.org &#187; ByteArray</title>
	<atom:link href="http://webr3.org/blog/tag/bytearray/feed/" rel="self" type="application/rss+xml" />
	<link>http://webr3.org/blog</link>
	<description>brain&#039;s on fire!</description>
	<lastBuildDate>Mon, 30 Aug 2010 00:11:38 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Something about practical usage of flash.Memory in HaXe</title>
		<link>http://webr3.org/blog/haxe/something-about-practical-usage-of-flash-memory-in-haxe/</link>
		<comments>http://webr3.org/blog/haxe/something-about-practical-usage-of-flash-memory-in-haxe/#comments</comments>
		<pubDate>Thu, 09 Jul 2009 17:17:18 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[Flash 10]]></category>
		<category><![CDATA[haXe]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[3D computer graphics]]></category>
		<category><![CDATA[animation]]></category>
		<category><![CDATA[Arrays]]></category>
		<category><![CDATA[ByteArray]]></category>
		<category><![CDATA[Class]]></category>
		<category><![CDATA[Data types]]></category>
		<category><![CDATA[Fast Memory]]></category>
		<category><![CDATA[flash player]]></category>
		<category><![CDATA[Integer]]></category>
		<category><![CDATA[Lookup table]]></category>
		<category><![CDATA[ram]]></category>
		<category><![CDATA[Technology/Internet]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=79</guid>
		<description><![CDATA[It's dawned on me that I've completely negated to mention why the haXe implementation of flash.Memory (flash player 10's new opcodes / fastmemory support) is so good.
Other than the really obvious bit that its direct access to a block of ram/fast memory which works extremely quickly, there is one other small but vital detail that's [...]]]></description>
			<content:encoded><![CDATA[<p>It's dawned on me that I've completely negated to mention why the haXe implementation of flash.Memory (flash player 10's new opcodes / fastmemory support) is so good.</p>
<p>Other than the really obvious bit that its direct access to a block of ram/fast memory which works extremely quickly, there is one other small but vital detail that's haXe specific and makes a vast difference between alchemy and haxe.</p>
<h2>The Big Difference</h2>
<p>You can think of flash.Memory as a static ByteArray, thus it is accessible from <em>both</em> static class and instance methods.</p>
<p>Now, another feature which offers huge speed increases in haXe is inlining.</p>
<p>Spelling it out - You can only inline static private methods, flash.Memory is all static access. So stick all your data (or big chunks of it) in flash.Memory and then inline  most of your heavy number crunching class.</p>
<p><strong>All inlined, all using fast memory = very very quick swfs!</strong></p>
<h2>Ideas and Tricks</h2>
<p>First of all, get the thought of only being able to have one chunk or type of data in fast memory at a time, you can partition it up and load everything in there.</p>
<p>A common setup I use is to have the first 1024 bytes as fast storage for float, int, byte, uint variables, the next xxx-thousand bytes as raw bitmap data, then all the other data I may want - like a block of 500k x,y,z floats for 3d data, followed by a few lookup charts.</p>
<p>the first 1024 bytes as fast storage for float, int, byte, uint variables - <em>because</em> you can then access the most commonly used vars using small int offsets (0-255) - and since its a small int your using it avoids all stacks completely, the bytecode will only contain the opcode to get the value from fast memory followed by the offset actually in the bytecode - for anything 256 and over its going to hit the int stack to get the offset first (which is barely noticable, but you know every fraction counts)</p>
<p>the next xxx-thousand bytes as raw bitmap data - <em>because </em>you can then work on your bitmap data ultra fast and write it back to the BitmapData instance using setPixels; very fast indeed, faster than a fixed sized vector even.</p>
<p>then all the other data - this is partially obvious, what may not be is the use of lookup charts, but I'll cover that in a moment.</p>
<h3>Accessing..</h3>
<p>Remember flash.Memory differs from a byte array in that you have to add in the offset for each value you want to grab</p>
<ul>
<li>Byte = 1 Byte</li>
<li>Int = 4 Bytes</li>
<li>Float = 4 Bytes</li>
<li>Double = 8 Bytes</li>
</ul>
<p>To use flash.Memory with functions which require a ByteArray.. [ like BitmapData.getPixels() -  Sound.extract() ] .. you can access it directly from flash.system.ApplicationDomain.currentDomain.domainMemory for example:</p>
<pre>// set position to start of the data we need
flash.system.ApplicationDomain.currentDomain.domainMemory.position = SOME_OFFSET;
// use the fast memory
bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory );</pre>
<p><strong>note:</strong> don't be fooled though, you can't simply call the functions on the bytearray in currentDomain, although <em>it is the data</em> that is in fast memory, it does not use the opcodes, which are what speed it up!</p>
<p><strong>Lookup Charts</strong></p>
<p>Often you'll find yourself calling the same code over and over and over; perhaps without realising just how much; consider we make an animation which uses perlin noise to generate 2d or 3d data - like this example - the example reads about 40k uint values from perlin noise, then converts each pixel to individual r,g,b values, then turns it in to a float -1 to 1 using a simple calculation. Thing is.. that means the same calculation runs 3x40000 times per frame at 50 frames a second.. 6 million times a second!</p>
<p>In this scenario, there are only 256 possible inputs to the calculation and 256 results - so rather than working it out 6 million times we can load it all in to a lookup chart, stick the lookup chart in fast memory and then lookup values rather than calculating - often you can really speed things up by doing this. <a href="http://webr3.org/experiments/perlin-particles/light-cloud/PerlinParticleEffects.hx">Practical example with comments here</a>.</p>
<p>Hope this is of use to somebody and clears things up a bit; I use the methods often so you'll see them in the source of my experiments.</p>
<p><img class="alignnone size-full wp-image-80" title="nopic" src="http://webr3.org/blog/wp-content/uploads/2009/07/nopic.jpg" alt="nopic" width="600" height="250" /></p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/haxe/something-about-practical-usage-of-flash-memory-in-haxe/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>BitmapData, Vectors, ByteArrays and Optimization</title>
		<link>http://webr3.org/blog/haxe/bitmapdata-vectors-bytearrays-and-optimization/</link>
		<comments>http://webr3.org/blog/haxe/bitmapdata-vectors-bytearrays-and-optimization/#comments</comments>
		<pubDate>Mon, 29 Jun 2009 10:13:31 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[Flash 10]]></category>
		<category><![CDATA[haXe]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[ActionScript]]></category>
		<category><![CDATA[BitmapData]]></category>
		<category><![CDATA[ByteArray]]></category>
		<category><![CDATA[Flash Optimization]]></category>
		<category><![CDATA[Vector]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=53</guid>
		<description><![CDATA[
In this post I'm looking at the layout of BitmapData and the fastest way to access / manipulate the raw data. The syntax used is haXe, however the information is good for as3/flash10 right up until the last section which deals with using fast memory and is haXe/Alchemy specific.
The Simple Stuff
A BitmapData 5px by 5px [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-58" title="opt" src="http://webr3.org/blog/wp-content/uploads/2009/06/opt.jpg" alt="opt" width="600" height="250" /></p>
<p>In this post I'm looking at the layout of BitmapData and the fastest way to access / manipulate the raw data. The syntax used is haXe, however the information is good for as3/flash10 right up until the last section which deals with using fast memory and is haXe/Alchemy specific.</p>
<h3>The Simple Stuff</h3>
<p>A BitmapData 5px by 5px will comprise of 25 pixels; the flash API is a bit misleading as to the point coordinate system used,<br />
as such here's the layout of the data contained:</p>
<pre>       X
   0,-,-,-,-
   -,1,-,-,-
 Y -,-,2,-,-
   -,-,-,3,-
   -,-,-,-,4</pre>
<p>the upper left corner is point(0,0) - the lower right is point(4,4); [not 1,1 to 5,5 as the manual suggests]</p>
<p>to access the data I'm going to be using three specific methods:</p>
<ul>
<li> BitmapData.getPixel( x, y );</li>
<li> BitmapData.getPixels( rect );</li>
<li> BitmapData.getVector( rect );</li>
</ul>
<p>the data returned by getPixels and getVector is in array format (ByteArray and Vector&lt;UInt&gt; respectively), both are 0 indexed with 25 entries.</p>
<p>Pixel values run from left to right, so the 5px example above converts to</p>
<pre>index[0] = point(0,0)
index[1] = -
index[2] = -
index[3] = -
index[4] = -
index[5] = -
index[6] = point(1,1)
index[7] = -
index[8] = -
index[9] = -
index[10] = -
index[11] = -
index[12] = point(2,2)
index[13] = -
index[14] = -
index[15] = -
index[16] = -
index[17] = -
index[18] = point(3,3)
index[19] = -
index[20] = -
index[21] = -
index[22] = -
index[23] = -
index[24] = point(4,4)</pre>
<h3>Converting between X,Y and Offsets</h3>
<p>to work with the data in its raw format we're going to need some helper functions/code to convert between point(x,y) and offset.</p>
<p><strong>X,Y to Offset</strong></p>
<pre>function pointToOffset( x : Int , y : Int , width : Int ) : Int {
    return (y * width) + x;
}</pre>
<p>thus for point(3,3):</p>
<pre>    (3 x 5) + 3 = 18</pre>
<p><strong>Offset to X,Y</strong></p>
<pre>function offsetToPoint( offset : Int , width : Int ) : Point {
    return new Point( offset%width , Std.int(offset/width) );
}
// or
bitmapData.getPixel( offset%width , Std.int(offset/width) );</pre>
<p>thus for offset 18 :</p>
<pre>    x: 18%5 = 3
    y: 18/5 = 3</pre>
<h2>Speed Tests</h2>
<p>For each of the three methods [ getPixel(x,y); getPixels(rect); getVector(rect); ] I've done a lot of testing to see which is fastest. All tests use a non transparent bitmapData of 1,920,000 pixels (1600x1200) the task of the test itself is to get all pixels from the bitmapData, load them in to an array/bytearray/vector, add 1 to the color for each pixel and then write the new data back to the BitmapData.</p>
<h3>Starting with the Flash 9 methods available:</h3>
<p><strong>Test One, getPixel -&gt; array -&gt; setPixel</strong><br />
This method incurs two full loops because there is no setPixels(array) method, the time of a bare loop of 1920000 is 8ms.</p>
<pre>var o : Array&lt;UInt&gt; = new Array&lt;UInt&gt;();
for( i in 0...1920000 ) {
    o[i] = bitmapData.getPixel( i%WIDTH , Std.int(i/WIDTH) ) + 1;
}
for( i in 0...1920000 ) {
    bitmapData.setPixel( i%WIDTH , Std.int(i/WIDTH) , o[i] );
}</pre>
<p>results [all times in ms]</p>
<pre>555, 569, 574, 573, 575, 570, 570, 570, 568, 576, 575, 573, 569, 563, 570</pre>
<h3>Test Two - getPixels -&gt; ByteArray -&gt; setPixels</h3>
<p>This method needs two ByteArrays to avoid using temporary var's and setting positions multiple times, the time of simply calling getPixels() alone is 90ms!</p>
<pre>var b : ByteArray = bitmapData.getPixels( bitmapData.rect );
var n : ByteArray = new ByteArray();
n.endian = b.endian; // getPixels often returns big endian values and byte arrays are little endian by default
b.position = 0; // other wise we get an error
for( i in 0...1920000 ) {
    n.writeUnsignedInt( b.readUnsignedInt() + 1 );
}
n.position = 0; // otherwise we get an error
bitmapData.setPixels( bitmapData.rect , n );</pre>
<p>results [all times in ms]</p>
<pre>597, 590, 591, 598, 591, 591, 595, 593, 595, 591, 591, 594, 591, 594, 588</pre>
<p>Rather disapointing results really, we'd get a lovely 2fps at that rate (all tests ran on a quad core btw).</p>
<h2>On to the methods available in flash 10, which let's us use Vectors</h2>
<h3>Test Three, getPixel -&gt; Vector -&gt; setVector</h3>
<p>This test is included purely to show the speed difference between Vectors and Arrays (test one)</p>
<pre>var o : Vector&lt;UInt&gt; = new Vector&lt;UInt&gt;( 1920000 , true );
for( i in 0...1920000 ) {
    o[i] = bitmapData.getPixel( i%WIDTH , Std.int(i/WIDTH) ) + 1;
}
for( i in 0...1920000 ) {
    bitmapData.setPixel( i%WIDTH , Std.int(i/WIDTH) , o[i] );
}</pre>
<p>results [all times in ms]</p>
<pre>439, 443, 447, 444, 447, 445, 448, 447, 446, 447, 447, 445, 447, 449, 448</pre>
<h3>Test Four, getVector -&gt; Vector -&gt; setVector (unfixed vector)</h3>
<p>unfixed vector initialization is virtually instant, way too quick for me to measure anyways (or to make a difference)</p>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
var o : Vector&lt;UInt&gt; = new Vector&lt;UInt&gt;();
for( i in 0...1920000 ) {
    o[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , o );</pre>
<p>results [all times in ms]</p>
<pre>96, 66, 80, 71, 75, 65, 74, 72, 76, 68, 77, 73, 72, 77, 68, 70, 72, 72, 81, 68</pre>
<h3>Test Five, getVector -&gt; Vector -&gt; setVector (fixed size vector)</h3>
<p>fixed size vectors take time, in this instance Vector&lt;UInt&gt;(1920000,true) takes circa 5ms</p>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
var o : Vector&lt;UInt&gt; = new Vector&lt;UInt&gt;( 1920000 , true );
for( i in 0...1920000 ) {
    o[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , o );</pre>
<p>results [all times in ms]</p>
<pre>53, 41, 41, 46, 45, 47, 41, 40, 45, 45, 46, 41, 40, 46, 45, 47, 41, 50, 42, 46</pre>
<h3>Test Six, getVector -&gt; setVector</h3>
<p>this time we cut out the second vector, as essentially it's not needed</p>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
for( i in 0...1920000 ) {
    v[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , v );</pre>
<p>results [all times in ms]</p>
<pre>44, 36, 38, 36, 38, 36, 36, 38, 36, 36, 37, 39, 35, 38, 38, 36, 39, 36, 38, 38</pre>
<p>Some fantastic speed gains, and since we're getting to smaller figures we need to start counting the cost of each line:</p>
<pre>bitmapData.getVector( bitmapData.rect );</pre>
<p>10, 8, 9, 9, 8, 10, 8, 9, 8, 11, 8, 10, 8, 10, 8, 9, 8, 9, 8 [ms]</p>
<pre>for( i in 0...1920000 ) {
    v[i] = v[i] + 1;
}</pre>
<p>25, 25, 25, 27, 27, 26, 25, 26, 25, 25, 26, 25, 27, 25, 25, 26, 25, 26, 25, 25</p>
<p>however the loop alone takes 8ms so we're looking at av 19ms for this part</p>
<pre>bitmapData.setVector( bitmapData.rect , v );</pre>
<p>11, 5, 6, 4, 4, 6, 4, 6, 4, 4, 5, 5, 6, 4, 6, 4, 5, 5, 4, 5</p>
<p>and finally to be really count the cost we need to see what the uint+1 would take by itself</p>
<pre>for( i in 0...1920000 ) {
    0xFFFFF9 + 1;
}</pre>
<p>9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8, 9, 9</p>
<p>so the true cost of using this single vector method in test six is 37-9 = 28ms for getVector, update vector, setVector</p>
<p>the true cost of using a vector inside the loop is 26-9 = 17ms (we need this later)</p>
<h2>Adding haXe in to the equation.</h2>
<p>Finally, as you probably know haXe provides access to fast memory; fast memory is essentially a ByteArray set in flash.system.ApplicationDomain.currentDomain.domainMemory and accessed through the new opcodes.</p>
<p>A little trick we can use with haxe is to write info from this fast memory to BitmapData via setPixels() since it is a ByteArray.</p>
<p>First off we need to set up the fast memory storage somewhere in our app:</p>
<pre>var storage : ByteArray = new ByteArray();
storage.endian = Endian.LITTLE_ENDIAN;
storage.length = BYTES;
flash.Memory.select( storage );</pre>
<p>to calulate the value of BYTES we simply do WIDTH*HEIGHT*4 (because a UInt takes up 4 Bytes of memory)</p>
<p>we know from earlier that getVector only takes circa 9ms, whereas getPixels takes a shocking 90ms; so what I'm going to do is:</p>
<h3>Test Seven : getVector -&gt; flash.Memory -&gt; setPixels</h3>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
for( i in 0...1920000 ) {
    flash.Memory.setI32( i*4 , v[i] + 1 );
}
// ensure position of fast memory is 0
flash.system.ApplicationDomain.currentDomain.domainMemory.position = 0;
bitmapData.lock();
bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory );
bitmapData.unlock( bitmapData.rect );</pre>
<p>results [all times in ms]<br />
34, 31, 32, 34, 32, 32, 32, 33, 32, 33, 31, 33, 32, 31, 33, 32, 33, 32, 33, 31,</p>
<p>We've shaved off 5ms! but hold on.. there's more - we need to analyse this</p>
<p>bitmapData.getVector( bitmapData.rect ); as we know from earlier takes 9ms</p>
<p>the loop takes 9ms</p>
<pre>flash.system.ApplicationDomain.currentDomain.domainMemory.position = 0;
bitmapData.lock();
bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory );
bitmapData.unlock( bitmapData.rect );</pre>
<p>together all takes 5ms</p>
<p>which means that the true cost of using flash.Memory is 32-9-9-5 = 9ms</p>
<p>if you remember from test six, the vector cost was 17ms so we've gained some and lost some, but overall came out on top, and certainly a tonne better than flash 9!</p>
<p>now, these tests where all with a big chunk of 1600x1200 pixels, in a real world app we'd have less.. so lets see with 800x450px (360,000 pixels)</p>
<h3>Vector Only:</h3>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
for( i in 0...360000 ) {
    v[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , v );</pre>
<p>results [all times in ms]<br />
13, 6, 6, 6, 6, 7, 6, 7, 6, 6, 7, 6, 6, 7, 7, 7, 7, 7, 7, 7  = 137 / 20 : av 6.85ms = 146fps</p>
<h3>Vector + Memory</h3>
<pre>var v : Vector&lt;UInt&gt; = bitmapData.getVector( bitmapData.rect );
for( i in 0...360000 ) {
flash.Memory.setI32( i*4 , v[i] + 1 );
}

// ensure position of fast memory is 0
flash.system.ApplicationDomain.currentDomain.domainMemory.position = 0;
bitmapData.lock();
bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory );
bitmapData.unlock( bitmapData.rect );</pre>
<p>results [all times in ms]<br />
7, 6, 6, 6, 5, 6, 6, 5, 5, 5, 6, 5, 5, 5, 6, 5, 5, 5, 5, 5 = 109 / 20 : av 5.45ms = 183fps</p>
<h2>Summary</h2>
<p>so, there's your choices 800x450px swf using methods to update raw bitmap data at 3fps for flash 9 vs <strong>146fps</strong> for pure as3 vs <strong>183 fps</strong> for haXe (and probably alchemy).</p>
<p><strong>imho:</strong> vectors are a fantastic choice, my personal preference goes to using fastmemory, but thats because i can avoid the stack using it and know that whilst its a little more coding, you can't get faster.</p>
<p><strong>overall:</strong> I'm just grateful we've got options to be able to deliver this kind of performance, really opens the doors for some stunning work and libraries not least the upcoming version of papervision and no doublt some libs for haXe.</p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/haxe/bitmapdata-vectors-bytearrays-and-optimization/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
