From my point of view, most important aspect and point of interest so far in a classic deferred renderer is the G-Buffer.
As they say, the devil is in the details - while it is a quite simple and elegant idea (storing material properties per-pixel instead of calculating results at first pass), it can get quite involved as soon as you enter the real, performance-optimized, world.
In general, new-generation ultra-fast uber-GPUs with gazillions of stream processors, as far as raw power is concerned, blow our uber-fast cpus to the dust.
So, chances are that your bottleneck will probably be memory access and not GPU calculations (of course, this is by no means an axiom - just a trend).
Unfortunately, deferred rendering is very memory intensive. If someone would do the completely "naive" implementation, of using the ""obvious"" 128-bit wide (i.e. a float4) buffers for his render targets, he is going to be rather unhappy with his performance.
Let us see why:
The "obvious" classic data that would need to be stored for the lighting to work, would be:
1) Position (typically, a float[3])
2) Surface Normal at that point (typically, a ""float[3]"")
3) Diffuse material color (read as a ""float[3]"")
4) Specular material color (read as a ""float[3]"")
5) Specular gloss or shininess factor (either Blinn term or Gaussian term, typically, a ""float"")
With these properties, one can execute a classic Lambert diffuse shading, plus Phong or BlinnPhong or Gaussian specular shading with as much detail as needed - Nump mapping, parallax mapping et.c. will already have executed and been taken into account by the time the lighting pass is run.
The simplest, naive, please-don't-do-that-or-if-you-do-don't-quote-me solution is to simply close your eyes, think that everything a float, which gives you the following buffers to fill:
0: Position: float, float,float, "free"
1: Normal: float, float, float, "free"
2: Diffuse: float, float, float, "free"
3: Specular+Gloss: float, float, float, float
14*32= 512 bits per pixel. Yikes.
Of course, that is the first thing I did in order to make sure I can show something on screen. Think of it as a joke implementation.
Diffuse and Specular, they are normally [0,1] values, and Normals are [-1,1] values. They really have no business being encoded in 32 bits per channel - it's wasteful.
I have read several ways to approach that problem. One of the best reads that I found on the subject is in GPU Gems 3.
The formats that "looked" most natural to me and that I am experimenting with is with storing all diffuse and specular in R8G8B8A8_UNORM, and normals as R8G8B8A8_SNORM. This way, they are "naturally" encoded in the correct range (for example, R8G8B8A8_UNORM is practically the way textures are stored in files - a 0-255 number per channel, that is normally "read" as a float in [0,1]). The precision of normals will probably not be enough, and is a bit wasteful, but we will address that later.
If you use a Gaussian specular model (as I am, currently), you're in luck and the specular exponent is also [0-1]. If not, take care to somehow transform your exponent into something that can be clamped in [0-1]. For Phong exponent, since I don't consider specular exponent precision to be of paramount importance, I just divide it by 256 and clamp it to [0,1]. This gives me a range of 0-255 which is quite enough even for very shiny surfaces.
Which leaves us with:
0: Position: float, float,float, "free"
1: Normal: byte, byte, byte, "free"
2: Diffuse: byte, byte, byte, "free"
3: Specular+Gloss: byte, byte, byte, byte
Now, we are in much, much better shape : 4*32 + 10*8 = 224 bpp. Less than half, and that "should" not hurt precision.
What about position then? It's now more than half our buffer size. We already know the screen position, shouldn't that help us somehow?
Let's pause this question for now, and assume you don't want to bother messing with position at all.
Let's also assume we have brand new shiny GPU's that allow Multiple Render Targets, which we really want to take advantage of to render all that in one pass.
DirectX will tell you (and with good reason - it's a hardware restriction) to use render targets that are the same width. It actually worked with different widths for me, but don't do it anyway.
Which means, either you "Somehow" transform all the rest of the buffers and stuff them into a single, 128 bit render target, or you "need" to help with the position whether you want to optimize it or not.
Now, there are two "classic" ways to go around this.
One, somehow "coerce" the position into a 32 bit format. Typically, this might be some of the "exotic" formats, like R11G11B10, using 11 bpp accuracy for xy and 10 for z, or R10B10G10A2, giving you 2 spare bits for something else.
TBH I have not explored this in much detail, because it looks like a bad compromise - you compromise even more accuracy to store something that you can retrieve with pretty much full accuracy in another way.
Using Depth.
You forget about all other components of position, and just store pixel depth in a nice, full-precision R32-Float buffer all by its own (or 24 bit if you are using depth+stencil buffer render target)
Then, back where you will be retrieving it, use information at hand (specifically, you need to retrieve a ray pointing from the camera to the pixel, using the pixel coords and projection data) to "unproject" the pixel back to a position.
I will not be explaining it here, but here are some resources which helped me decide my solution:
http://mynameismjp.wordpress.com/2009/03/10/reconstructing-position-from-depth/
http://oddeffects.blogspot.gr/2011/01/deferred-rendering-reconstructing.html
Keep in mind, "Depth" here is an umbrella term. You can store z, z/w, z/FarClip or anything else that the technique you use needs to reconstruct Position. That's all there is to it - depth has now officially been delegated to a reconstruction tool.
In any case, our buffers now look like:
0: Depth: float OR 0: Depth_stencil 24 bit float + 8 bit stencil
1: Normal: byte, byte, byte, "free"
2: Diffuse: byte, byte, byte, "free"
3: Specular + Gloss: byte, byte, byte, byte
Which is 32*4 = 128 bit. Which is rather acceptable.
If you want to use a simpler specular model (for example, assuming that specular color is the same as diffuse, or that specular is always white), you can even skip the specular completely, and use diffuse to reconstruct it using something like:
0: Depth: float
1: Normal+specular_power: byte, byte, byte, byte
2: Diffuse+gloss: byte, byte, byte, byte
Which is 32*3 = 96 bit. Nifty, assuming the simplified lighting model is ok.
And it probably will be ok: many (not all) materials will have specular reflections of the same color as their diffuse, or they will have white specular reflections. For example, usually metals reflect with their diffuse color, and usually shiny plastics reflect white.
Basically then, all you need is:
1) An assumption, if you want white or diffuse-colored specilar reflections
2) Specular power(intensity), a factor by which either the diffuse color or (1,1,1) will be multiplied to give you the specular color
3) Gloss (typically called shininess), which you already have.
I will not be using this - I prefer the full-specular model, but there you have it if you need it.
No comments:
Post a Comment