BitmapData, Vectors, ByteArrays and Optimization

In this post I’m looking at the layout of BitmapData and the fastest way to access / manipulate the raw data. The syntax used is haXe, however the information is good for as3/flash10 right up until the last section which deals with using fast memory and is haXe/Alchemy specific.
The Simple Stuff
A BitmapData 5px by 5px will comprise of 25 pixels; the flash API is a bit misleading as to the point coordinate system used,
as such here’s the layout of the data contained:
X 0,-,-,-,- -,1,-,-,- Y -,-,2,-,- -,-,-,3,- -,-,-,-,4
the upper left corner is point(0,0) – the lower right is point(4,4); [not 1,1 to 5,5 as the manual suggests]
to access the data I’m going to be using three specific methods:
- BitmapData.getPixel( x, y );
- BitmapData.getPixels( rect );
- BitmapData.getVector( rect );
the data returned by getPixels and getVector is in array format (ByteArray and Vector<UInt> respectively), both are 0 indexed with 25 entries.
Pixel values run from left to right, so the 5px example above converts to
index[0] = point(0,0) index[1] = - index[2] = - index[3] = - index[4] = - index[5] = - index[6] = point(1,1) index[7] = - index[8] = - index[9] = - index[10] = - index[11] = - index[12] = point(2,2) index[13] = - index[14] = - index[15] = - index[16] = - index[17] = - index[18] = point(3,3) index[19] = - index[20] = - index[21] = - index[22] = - index[23] = - index[24] = point(4,4)
Converting between X,Y and Offsets
to work with the data in its raw format we’re going to need some helper functions/code to convert between point(x,y) and offset.
X,Y to Offset
function pointToOffset( x : Int , y : Int , width : Int ) : Int {
return (y * width) + x;
}
thus for point(3,3):
(3 x 5) + 3 = 18
Offset to X,Y
function offsetToPoint( offset : Int , width : Int ) : Point {
return new Point( offset%width , Std.int(offset/width) );
}
// or
bitmapData.getPixel( offset%width , Std.int(offset/width) );
thus for offset 18 :
x: 18%5 = 3
y: 18/5 = 3
Speed Tests
For each of the three methods [ getPixel(x,y); getPixels(rect); getVector(rect); ] I’ve done a lot of testing to see which is fastest. All tests use a non transparent bitmapData of 1,920,000 pixels (1600×1200) the task of the test itself is to get all pixels from the bitmapData, load them in to an array/bytearray/vector, add 1 to the color for each pixel and then write the new data back to the BitmapData.
Starting with the Flash 9 methods available:
Test One, getPixel -> array -> setPixel
This method incurs two full loops because there is no setPixels(array) method, the time of a bare loop of 1920000 is 8ms.
var o : Array<UInt> = new Array<UInt>();
for( i in 0...1920000 ) {
o[i] = bitmapData.getPixel( i%WIDTH , Std.int(i/WIDTH) ) + 1;
}
for( i in 0...1920000 ) {
bitmapData.setPixel( i%WIDTH , Std.int(i/WIDTH) , o[i] );
}
results [all times in ms]
555, 569, 574, 573, 575, 570, 570, 570, 568, 576, 575, 573, 569, 563, 570
Test Two – getPixels -> ByteArray -> setPixels
This method needs two ByteArrays to avoid using temporary var’s and setting positions multiple times, the time of simply calling getPixels() alone is 90ms!
var b : ByteArray = bitmapData.getPixels( bitmapData.rect );
var n : ByteArray = new ByteArray();
n.endian = b.endian; // getPixels often returns big endian values and byte arrays are little endian by default
b.position = 0; // other wise we get an error
for( i in 0...1920000 ) {
n.writeUnsignedInt( b.readUnsignedInt() + 1 );
}
n.position = 0; // otherwise we get an error
bitmapData.setPixels( bitmapData.rect , n );
results [all times in ms]
597, 590, 591, 598, 591, 591, 595, 593, 595, 591, 591, 594, 591, 594, 588
Rather disapointing results really, we’d get a lovely 2fps at that rate (all tests ran on a quad core btw).
On to the methods available in flash 10, which let’s us use Vectors
Test Three, getPixel -> Vector -> setVector
This test is included purely to show the speed difference between Vectors and Arrays (test one)
var o : Vector<UInt> = new Vector<UInt>( 1920000 , true );
for( i in 0...1920000 ) {
o[i] = bitmapData.getPixel( i%WIDTH , Std.int(i/WIDTH) ) + 1;
}
for( i in 0...1920000 ) {
bitmapData.setPixel( i%WIDTH , Std.int(i/WIDTH) , o[i] );
}
results [all times in ms]
439, 443, 447, 444, 447, 445, 448, 447, 446, 447, 447, 445, 447, 449, 448
Test Four, getVector -> Vector -> setVector (unfixed vector)
unfixed vector initialization is virtually instant, way too quick for me to measure anyways (or to make a difference)
var v : Vector<UInt> = bitmapData.getVector( bitmapData.rect );
var o : Vector<UInt> = new Vector<UInt>();
for( i in 0...1920000 ) {
o[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , o );
results [all times in ms]
96, 66, 80, 71, 75, 65, 74, 72, 76, 68, 77, 73, 72, 77, 68, 70, 72, 72, 81, 68
Test Five, getVector -> Vector -> setVector (fixed size vector)
fixed size vectors take time, in this instance Vector<UInt>(1920000,true) takes circa 5ms
var v : Vector<UInt> = bitmapData.getVector( bitmapData.rect );
var o : Vector<UInt> = new Vector<UInt>( 1920000 , true );
for( i in 0...1920000 ) {
o[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , o );
results [all times in ms]
53, 41, 41, 46, 45, 47, 41, 40, 45, 45, 46, 41, 40, 46, 45, 47, 41, 50, 42, 46
Test Six, getVector -> setVector
this time we cut out the second vector, as essentially it’s not needed
var v : Vector<UInt> = bitmapData.getVector( bitmapData.rect );
for( i in 0...1920000 ) {
v[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , v );
results [all times in ms]
44, 36, 38, 36, 38, 36, 36, 38, 36, 36, 37, 39, 35, 38, 38, 36, 39, 36, 38, 38
Some fantastic speed gains, and since we’re getting to smaller figures we need to start counting the cost of each line:
bitmapData.getVector( bitmapData.rect );
10, 8, 9, 9, 8, 10, 8, 9, 8, 11, 8, 10, 8, 10, 8, 9, 8, 9, 8 [ms]
for( i in 0...1920000 ) {
v[i] = v[i] + 1;
}
25, 25, 25, 27, 27, 26, 25, 26, 25, 25, 26, 25, 27, 25, 25, 26, 25, 26, 25, 25
however the loop alone takes 8ms so we’re looking at av 19ms for this part
bitmapData.setVector( bitmapData.rect , v );
11, 5, 6, 4, 4, 6, 4, 6, 4, 4, 5, 5, 6, 4, 6, 4, 5, 5, 4, 5
and finally to be really count the cost we need to see what the uint+1 would take by itself
for( i in 0...1920000 ) {
0xFFFFF9 + 1;
}
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8, 9, 9
so the true cost of using this single vector method in test six is 37-9 = 28ms for getVector, update vector, setVector
the true cost of using a vector inside the loop is 26-9 = 17ms (we need this later)
Adding haXe in to the equation.
Finally, as you probably know haXe provides access to fast memory; fast memory is essentially a ByteArray set in flash.system.ApplicationDomain.currentDomain.domainMemory and accessed through the new opcodes.
A little trick we can use with haxe is to write info from this fast memory to BitmapData via setPixels() since it is a ByteArray.
First off we need to set up the fast memory storage somewhere in our app:
var storage : ByteArray = new ByteArray(); storage.endian = Endian.LITTLE_ENDIAN; storage.length = BYTES; flash.Memory.select( storage );
to calulate the value of BYTES we simply do WIDTH*HEIGHT*4 (because a UInt takes up 4 Bytes of memory)
we know from earlier that getVector only takes circa 9ms, whereas getPixels takes a shocking 90ms; so what I’m going to do is:
Test Seven : getVector -> flash.Memory -> setPixels
var v : Vector<UInt> = bitmapData.getVector( bitmapData.rect );
for( i in 0...1920000 ) {
flash.Memory.setI32( i*4 , v[i] + 1 );
}
// ensure position of fast memory is 0
flash.system.ApplicationDomain.currentDomain.domainMemory.position = 0;
bitmapData.lock();
bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory );
bitmapData.unlock( bitmapData.rect );
results [all times in ms]
34, 31, 32, 34, 32, 32, 32, 33, 32, 33, 31, 33, 32, 31, 33, 32, 33, 32, 33, 31,
We’ve shaved off 5ms! but hold on.. there’s more – we need to analyse this
bitmapData.getVector( bitmapData.rect ); as we know from earlier takes 9ms
the loop takes 9ms
flash.system.ApplicationDomain.currentDomain.domainMemory.position = 0; bitmapData.lock(); bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory ); bitmapData.unlock( bitmapData.rect );
together all takes 5ms
which means that the true cost of using flash.Memory is 32-9-9-5 = 9ms
if you remember from test six, the vector cost was 17ms so we’ve gained some and lost some, but overall came out on top, and certainly a tonne better than flash 9!
now, these tests where all with a big chunk of 1600×1200 pixels, in a real world app we’d have less.. so lets see with 800×450px (360,000 pixels)
Vector Only:
var v : Vector<UInt> = bitmapData.getVector( bitmapData.rect );
for( i in 0...360000 ) {
v[i] = v[i] + 1;
}
bitmapData.setVector( bitmapData.rect , v );
results [all times in ms]
13, 6, 6, 6, 6, 7, 6, 7, 6, 6, 7, 6, 6, 7, 7, 7, 7, 7, 7, 7 = 137 / 20 : av 6.85ms = 146fps
Vector + Memory
var v : Vector<UInt> = bitmapData.getVector( bitmapData.rect );
for( i in 0...360000 ) {
flash.Memory.setI32( i*4 , v[i] + 1 );
}
// ensure position of fast memory is 0
flash.system.ApplicationDomain.currentDomain.domainMemory.position = 0;
bitmapData.lock();
bitmapData.setPixels( bitmapData.rect , flash.system.ApplicationDomain.currentDomain.domainMemory );
bitmapData.unlock( bitmapData.rect );
results [all times in ms]
7, 6, 6, 6, 5, 6, 6, 5, 5, 5, 6, 5, 5, 5, 6, 5, 5, 5, 5, 5 = 109 / 20 : av 5.45ms = 183fps
Summary
so, there’s your choices 800×450px swf using methods to update raw bitmap data at 3fps for flash 9 vs 146fps for pure as3 vs 183 fps for haXe (and probably alchemy).
imho: vectors are a fantastic choice, my personal preference goes to using fastmemory, but thats because i can avoid the stack using it and know that whilst its a little more coding, you can’t get faster.
overall: I’m just grateful we’ve got options to be able to deliver this kind of performance, really opens the doors for some stunning work and libraries not least the upcoming version of papervision and no doublt some libs for haXe.

Huge says:
2009/06/29 at 20:06
Hi – interesting stuff.
To be fair to the bitmap versions, you should eliminate the “%” operation by using two loops, or making the width a power of 2 and use the “&” operator. However, I’m sure that this will not close the gap to the memory version!
Dr. Green says:
2009/07/11 at 07:23
Hi!
This is a fantastic article. I did not know, haxe’s optimization, but if You use:
… flash.Memory.setI32( i*4 , v[i] + 1 ); …
can it be faster with bit-operations? Like i<<2 ? Or haxe optimize this?
Thanks for Your answer.
nathan says:
2009/07/11 at 14:36
completely unsure to be honest, it should be faster marginally, I’ll do some tests and figure out.
interestingly if you use the new AS3V tool produced by Joa this is one of the optimizations it suggests