iOS 3D engine source code. - dEngine: A iOS 3D renderer source code
Fabien Sanglard's non- blog
dEngine: A iOS 3D renderer source code
April 28th, 2011Source Code Released
I've decided to release the source code of the OpenGS ES 1.0/2.0 renderers I wrote in the summer of 2009 nicknamed "dEngine". It was the first renderer to feature Shadow Mapping and Bump Mapping on iPhone at the time. Note that shadow mapping was achieved by packing the depth informations in a color texture but now you have access to GL_OES_depth_texture
so you should be able to gain some speed.
The overall code quality is far from exceptionnal (support for two object types WTF ?!?)but I consider it a good tutorial for OpenGL ES 2.0, you can read about bump-mapping and shadow-mapping with a fun example from a Doom 3 level.
The OpenGL ES 2.0 renderer feature uber-shaders: Depending on the material currently renderered a shader is recompiled on the fly in order to avoid branching.
Enjoy:
A few videos
The following videos show characters from Doom3 that I used to test the engine, the HellKnight is 2200 poly, the rest of the room visible is 1000. The materials all feature a diffuse map, a normal map and a specular map ( up to 512x512 ). The shadow is generated via a shadowmap ( because render to depth texture is not supported on iPhone (GL_OES_depth_texture
), depth value are packed in a R8G8B8A8 color texture twice the size of the screen).
iPHone 3GS programmable pipeline, running at 27fps. |
iPhone 2G/3G fixed pipeline, running at 45fps. |
Click to Play |
Polymorphism
The rendering path is abstracted via a C struct containing function pointers a la Quake 2.
typedef struc renderer_t { uchar type; void (*Set3D)(void); void (*StartRendition )(void); void (*StopRendition )(void); void (*SetTexture)(unsigned int); void (*RenderEntities)(void); void (*UpLoadTextureToGpu)(texture_t* texture); void (*Set2D)(void); //... } renderer_t // renderer_fixed.h void initFixedRenderer(renderer_t* renderer); // renderer_progr.h void initProgrRenderer(renderer_t* renderer);
The "implementation" of every function is hidden in the .c of each renderer, initFixedRenderer
and initProgrRenderer
only expose the function address via the pointer.
A few optimizations...
Texture compression is a big win as a 32bits per texel RGBA textures is a pig with no real reason to exist when working with a small display. OpenGS ES 1.1 and 2.0 do not require the GPU to support any texture compression but the good guys at Imagination Technologies provided support for PVRTC which bring down consumption to as low at 2bits per pixel with alpha support !
Vertex metadatas can be slimmed down as well:
A "regular" vertex is:
Vertex Elementary unit: position 3 floats normal 3 floats tangent 3 floats textureCoo 2 floats ------------------- 44 bytes
By packing the components in "shorts" instead of "floats" via normalization, you end up having:
Vertex Elementary unit: position 3 floats normal 3 shorts tangent 3 shorts textureCoo 2 shorts ------------------- 28 bytes
It's almost like we "compress" the data on the CPU, send it to the GPU where they are "decompressed". Abusing normalization divide bandwidth consumption by almost 50% and help to slightly improve performances.
Compiler tuning is also important. Xcode is setup by default to generate ARM binaries using the Thumb instruction set, which is 16 bits instead of 32 bits. This reduce the size of the binary and the cost for Apple but it's bad for 3D as Thumb instruction have to be translated to 32bits. Uncheck this option for an instant gain of performances.
Framebuffer refresh can also be improved a lot with 3.1 firmware. This is an issue I mentionned in my article about Wolfenstein for iPhone: NSTimer is an abomination and I was trilled to find we can now use CADisplayLink
to perform vsync and get adaptative framerate ( although I'm experimenting some nasty touchesMoved on non 2G v3.X devices, if you have any info about this, email me !).
Reduze Framebuffer colorspace is an other way to improve performances by reducing the amount of written data. Move from 24bits color space to 16 bits provides some good improvements.
CAEAGLLayer *eaglLayer = (CAEAGLLayer *)self.layer; eaglLayer.opaque = YES; eaglLayer.drawableProperties = [NSDictionary dictionaryWithObjectsAndKeys: [NSNumber numberWithBool:YES], kEAGLDrawablePropertyRetainedBacking, //FTW //kEAGLColorFormatRGBA8, kEAGLColorFormatRGB565, kEAGLDrawablePropertyColorFormat, nil];
Stating the obvious here, but reduce texture & blending mode switches are very important ( Forget about good perf if you do more than 60 textures changes). The material approach of the engine can very handy in this regard.
Reduce blending of your polygons is PARAMOUNT: PowerVR performs TBDR (tile-based deferred rendering) which mean that one pixel is rendered only once via hidden surface removal, blending is defeating the purpose. My take is that a blended polygon is rendere regardless of the culling outcome and it destroys perfs.
And last but not least, optimize the vertice indices so GPU fetches will hit the cache as much as possible.
Uber shader
Depending on the materials properties used in a scene , the shader is re-compiled at runtime and then cached. This approach allow to reduce branching operation in the shader. I was very pleased with the result, if I stay below 10/15 shader switches per frame there is no significant performance drop.
//snipet of the fragment shader #ifdef BUMP_MAPPING bump = texture2D(s_bumpMap, v_texcoord).rgb * 2.0 - 1.0; lamberFactor = max(0.0,dot(lightVec, bump) ); specularFactor = max(0.0,pow(dot(halfVec,bump),materialShininess)) ; #else lamberFactor = max(0.0,dot(lightVec, v_normal) ); specularFactor = max(0.0,pow(dot(halfVec,v_normal),materialShininess)) ; #endif #ifdef SPEC_MAPPING vec3 matTextColor = texture2D(s_specularMap, v_texcoord).rgb; #else vec3 matTextColor = matColorSpecular; #endif
The now obsolete depth packing into a color buffer.
I love shadows effects, I think the realism and ambiance you get totally justify the cycles and bandwidth cost. It doesn't come for free in openGL and it's quite ugly to do with the fixed pipeline but I was trilled to have it working on mobile shaders. Unfortunatly as of today, iPhones don't support GL_OES_depth_texture
, which mean you cannot render directly to the a depth texture. The workaround is to pack a 32 floating point value into 4x4 bytes color (RGBA) texture:
// This is the shadowmap generator shader const vec4 packFactors = vec4( 256.0 * 256.0 * 256.0,256.0 * 256.0,256.0,1.0); const vec4 bitMask = vec4(0.0,1.0/256.0,1.0/256.0,1.0/256.0); void main(void) { float normalizedDistance = position.z / position.w; normalizedDistance = (normalizedDistance + 1.0) / 2.0; vec4 packedValue = vec4(fract(packFactors*normalizedDistance)); packedValue -= packedValue.xxyz * bitMask; gl_FragColor = packedValue; }
This method to pack float in bytes is pretty clever (not mine) because it accounts for the internal accuracy of any GPU ( via the substraction line) and hence can be used on any kind of GPU (PowerVR,ATI,NVidia). Gratz to however came up with this.
Comments (5)
'Multimedia' 카테고리의 다른 글
Grant Schindler (0) | 2012.06.20 |
---|---|
HUD란? (0) | 2012.06.13 |
[3D MAX] - The Many Faces of 3d Max Modelling ( Cad Plan to Max Modelling ) (0) | 2012.06.13 |
[Point Cloud] Photogrammetry - Laser Scanning (0) | 2012.06.07 |
오거 3D 엔진의 구조. - http://www.thisisgamelab.com/784 (0) | 2012.05.29 |
Teaching the OGRE To Be Civil (0) | 2012.05.29 |
Glrobal Earth Terrain Editor Toolbar (0) | 2012.05.08 |
Tutorials from Evermotion - vray, 3dsmax, maya, photoshop, lightwave, modeling, XSI, maya (0) | 2012.05.04 |
InfoEra - Undet point cloud processing software for object analysis, 2D drafting and 3D modeling - Undet for Point cloud (0) | 2012.05.01 |
Update 10.1.0 - April 3rd 2012 [Orbit Knowledge Base] (0) | 2012.05.01 |
When you're talking about the Ubershader, it is not a real Ubershader because you don't use "if" instructions. So, you need one glUseProgram per material. We can't batch many drawCalls with your code, but sort by material and do one drawCall per material.
Ellis,
Now depending on the drivers you can get some weird behaviour of you use uniform for branching (like shader recompilation). By using #defines and caching I was sure to avoid this.
-t
That would be handy, since then you can do fully deferred rendering. For instance deferred lighting which scales better then uniform lights, deferred decals, SSAO, etc.
Otherwise you'd need 3 passes to capture the information, and packing a normaldepth map you just need 2 (albedo and normaldepth).
your posts are always great, and this contribution is valuable. Thank you!
Daniele.