본문 바로가기

Multimedia

dEngine: A iOS 3D renderer source code

http://fabiensanglard.net/dEngineSourceCodeRelease/

iOS 3D engine source code. - dEngine: A iOS 3D renderer source code

 


 

Fabien Sanglard's non- blog

  




 

 

 

dEngine: A iOS 3D renderer source code

April 28th, 2011


Source Code Released


I've decided to release the source code of the OpenGS ES 1.0/2.0 renderers I wrote in the summer of 2009 nicknamed "dEngine". It was the first renderer to feature Shadow Mapping and Bump Mapping on iPhone at the time. Note that shadow mapping was achieved by packing the depth informations in a color texture but now you have access to GL_OES_depth_texture so you should be able to gain some speed.

The overall code quality is far from exceptionnal (support for two object types WTF ?!?)but I consider it a good tutorial for OpenGL ES 2.0, you can read about bump-mapping and shadow-mapping with a fun example from a Doom 3 level.

The OpenGL ES 2.0 renderer feature uber-shaders: Depending on the material currently renderered a shader is recompiled on the fly in order to avoid branching.

Enjoy:











A few videos



The following videos show characters from Doom3 that I used to test the engine, the HellKnight is 2200 poly, the rest of the room visible is 1000. The materials all feature a diffuse map, a normal map and a specular map ( up to 512x512 ). The shadow is generated via a shadowmap ( because render to depth texture is not supported on iPhone (GL_OES_depth_texture), depth value are packed in a R8G8B8A8 color texture twice the size of the screen).


iPHone 3GS programmable pipeline, running at 27fps.
iPhone 2G/3G fixed pipeline, running at 45fps.
Click to Play

Polymorphism


The rendering path is abstracted via a C struct containing function pointers a la Quake 2.

	typedef struc renderer_t
	{
		uchar type;
		void (*Set3D)(void); 
		void (*StartRendition  )(void); 
		void (*StopRendition  )(void); 
		void (*SetTexture)(unsigned int);
		void (*RenderEntities)(void);
		void (*UpLoadTextureToGpu)(texture_t* texture);
		void (*Set2D)(void);
		//...
	} renderer_t

				
	//	renderer_fixed.h
	void initFixedRenderer(renderer_t* renderer);

	//   renderer_progr.h
	void initProgrRenderer(renderer_t* renderer);

				

The "implementation" of every function is hidden in the .c of each renderer, initFixedRenderer and initProgrRenderer only expose the function address via the pointer.

A few optimizations...


Texture compression is a big win as a 32bits per texel RGBA textures is a pig with no real reason to exist when working with a small display. OpenGS ES 1.1 and 2.0 do not require the GPU to support any texture compression but the good guys at Imagination Technologies provided support for PVRTC which bring down consumption to as low at 2bits per pixel with alpha support !

Vertex metadatas can be slimmed down as well:

A "regular" vertex is:

	Vertex Elementary unit:

	position   	3 floats
	normal		3 floats
	tangent		3 floats
	textureCoo	2 floats
	-------------------
	            44 bytes

By packing the components in "shorts" instead of "floats" via normalization, you end up having:

	Vertex Elementary unit:

	position   	3 floats
	normal		3 shorts
	tangent		3 shorts
	textureCoo	2 shorts
	-------------------
	            28 bytes

It's almost like we "compress" the data on the CPU, send it to the GPU where they are "decompressed". Abusing normalization divide bandwidth consumption by almost 50% and help to slightly improve performances.



Compiler tuning is also important. Xcode is setup by default to generate ARM binaries using the Thumb instruction set, which is 16 bits instead of 32 bits. This reduce the size of the binary and the cost for Apple but it's bad for 3D as Thumb instruction have to be translated to 32bits. Uncheck this option for an instant gain of performances.

Framebuffer refresh can also be improved a lot with 3.1 firmware. This is an issue I mentionned in my article about Wolfenstein for iPhone: NSTimer is an abomination and I was trilled to find we can now use CADisplayLink to perform vsync and get adaptative framerate ( although I'm experimenting some nasty touchesMoved on non 2G v3.X devices, if you have any info about this, email me !).

Reduze Framebuffer colorspace is an other way to improve performances by reducing the amount of written data. Move from 24bits color space to 16 bits provides some good improvements.

    
    CAEAGLLayer *eaglLayer = (CAEAGLLayer *)self.layer;
		
    eaglLayer.opaque = YES;
    eaglLayer.drawableProperties = [NSDictionary dictionaryWithObjectsAndKeys:
        [NSNumber numberWithBool:YES], 
        kEAGLDrawablePropertyRetainedBacking, 
        
        //FTW
        //kEAGLColorFormatRGBA8, 
        kEAGLColorFormatRGB565,


        kEAGLDrawablePropertyColorFormat, nil];
        
	

Stating the obvious here, but reduce texture & blending mode switches are very important ( Forget about good perf if you do more than 60 textures changes). The material approach of the engine can very handy in this regard.

Reduce blending of your polygons is PARAMOUNT: PowerVR performs TBDR (tile-based deferred rendering) which mean that one pixel is rendered only once via hidden surface removal, blending is defeating the purpose. My take is that a blended polygon is rendere regardless of the culling outcome and it destroys perfs.

And last but not least, optimize the vertice indices so GPU fetches will hit the cache as much as possible.

Uber shader


Depending on the materials properties used in a scene , the shader is re-compiled at runtime and then cached. This approach allow to reduce branching operation in the shader. I was very pleased with the result, if I stay below 10/15 shader switches per frame there is no significant performance drop.

    //snipet of the fragment shader

    #ifdef BUMP_MAPPING
        bump		=  texture2D(s_bumpMap, v_texcoord).rgb * 2.0 - 1.0;
        lamberFactor  =  max(0.0,dot(lightVec, bump) );
        specularFactor = max(0.0,pow(dot(halfVec,bump),materialShininess)) ;
    #else
        lamberFactor  =  max(0.0,dot(lightVec, v_normal) );
        specularFactor = max(0.0,pow(dot(halfVec,v_normal),materialShininess)) ;
    #endif
	
    #ifdef SPEC_MAPPING
        vec3 matTextColor = texture2D(s_specularMap, v_texcoord).rgb; 
    #else
        vec3 matTextColor = matColorSpecular;
    #endif

The now obsolete depth packing into a color buffer.


I love shadows effects, I think the realism and ambiance you get totally justify the cycles and bandwidth cost. It doesn't come for free in openGL and it's quite ugly to do with the fixed pipeline but I was trilled to have it working on mobile shaders. Unfortunatly as of today, iPhones don't support GL_OES_depth_texture, which mean you cannot render directly to the a depth texture. The workaround is to pack a 32 floating point value into 4x4 bytes color (RGBA) texture:

	// This is the shadowmap generator shader

	const vec4 packFactors = vec4( 256.0 * 256.0 * 256.0,256.0 * 256.0,256.0,1.0);
	const vec4 bitMask     = vec4(0.0,1.0/256.0,1.0/256.0,1.0/256.0);

	void main(void)
	{
		float normalizedDistance  = position.z / position.w;
		normalizedDistance = (normalizedDistance + 1.0) / 2.0;

		vec4 packedValue = vec4(fract(packFactors*normalizedDistance));
		packedValue -= packedValue.xxyz * bitMask;

		gl_FragColor  = packedValue;
	}

			

This method to pack float in bytes is pretty clever (not mine) because it accounts for the internal accuracy of any GPU ( via the substraction line) and hence can be used on any kind of GPU (PowerVR,ATI,NVidia). Gratz to however came up with this.

 

Comments (5)


#1 - Ellis - 04/29/2011 - 02:36
Thanks for sharing your work,

When you're talking about the Ubershader, it is not a real Ubershader because you don't use "if" instructions. So, you need one glUseProgram per material. We can't batch many drawCalls with your code, but sort by material and do one drawCall per material.

Ellis,
#2 - Fabien Sanglard - 04/29/2011 - 09:19
Uber shader pretty much refers to a single file that can generate shaders for many different effects... whether it uses defines, static branching, dynamic branching or proprietary pre-processing the method remains relatively similar.

Now depending on the drivers you can get some weird behaviour of you use uniform for branching (like shader recompilation). By using #defines and caching I was sure to avoid this.
#3 - Tim Omernick - 04/29/2011 - 09:21
That's pretty cool :) I should say that ngmoco's Eliminate (which I wrote) was I believe the first renderer to do bump mapping on iOS -- I added it to the q3 engine around June(?) that year. Though, it didn't do shadows, so nice job!

-t
#4 - Florian - 04/30/2011 - 05:00
Depth buffers are mostly just 16 bit, so you should be fine packing depth into 2 bytes. That gives you the possibility of packing the normal into the other 2 bytes.

That would be handy, since then you can do fully deferred rendering. For instance deferred lighting which scales better then uniform lights, deferred decals, SSAO, etc.

Otherwise you'd need 3 passes to capture the information, and packing a normaldepth map you just need 2 (albedo and normaldepth).
#5 - Daniele - 04/30/2011 - 16:39
Hi Fabien,

your posts are always great, and this contribution is valuable. Thank you!

Daniele.

 

@2011