Sunday, April 21, 2013

Deferred shading - some theory


All right, this post will be a little incomplete due to extreme lack of time the past days.

Two words for theory, and a promise of an update to the post with some screenshots in the week.



Deferred rendering is a way to reduce the algorithmic complexity of using multiple lights on a scene.

There, I said it. I bet you haven't heard it put in these words before.

Let me explain.

Very crudely, counting only the number of runs of the pixel shader, forward rendering basically boils down to the following (this can be one pass, multiple passes, but the result is similar)

For each object
   For each light
       For each rasterized pixel of the object
           Render the pixel color resulting from the object's attributes, accumulating the results of each light into the render target.

Now, as most people familiar with this kind of thing will note, this simplified example implies an algorithmic complexity of - at least - O(m*n*o), where m the number of objects, n the number of lights, and o the screen area covered by the object.

Of course, this is a very simplified picture that doesn't take half of the variables into account, that could destroy the example - real time rendering is not always about algorithmic complexity, as you have very specific constraints on the time spent on each frame (to attain 60fps, 16.6ms to be exact, and that does not take into account cpu/gpu stalls, logic, pretty much nothing of importance). This means that achieving a constant complexity that costs 1000 ms, may be scientifically interesting, but is not useful  as an actual implementation for real time rendering because it would give you 1fps.

In any case, this complexity described is very very real and very much in need of optimization, and deferred rendering tries to remedy that using a very smart trick:

Instead of iterating objects and lights and accumulating directly the color in the render buffer, it renders whatever attributes of the object are needed, per-pixel, into textures, and then does another per-pixel pass, of the following general structure:

Pass 1:
Set a custom set of render targets (the G-Buffer)
For each object
   For each rasterized pixel of the object
      Render the object's needed attributes into the custom render targets that will not be rendered

Pass 2: (Wait, this is not final)
Set the G-Buffer targets as textures the pixel shader can read from
For each light
   For each rasterized pixel (for this example, using a full-screen quad to iterate the entire screen)
        Read the required attributes for that specific pixel from the G-Buffer, use them to render the pixel color according to the light, accumulate the results into the render target

Now, a trained eye will already notice that this has become a complexity in the order of  O(m*o+n*p), where m is the number of objects, n is the number of lights, o is the screen space of the object, and p is the entire screen space. I imagine that it might be faster than forward rendering in specific situations, but since it does incur some expensive "constants" (for example, a lot of texture fetches in the pixel shader) this is not our target - we are looking for a clear winner here. The above algorithm would be suited for a lot of directional lights (that affect the entire screen), but that is a rare case as far as I know, and not the one real time apps and games would optimize for.

In fact, what we can do is consider each light as being bounded to an area of effect (for example, a cone for a spotlight, or a sphere for a point light), size the area in such a way as to actually be the area of effect of the light (using an algorithm that takes into account attenuation and intensity), and we can avoid all these full screen renders.
This way, you only pay for what you use: small light, small cost. Big light, big cost.

So, pass 2 becomes:

For each light
   For each rasterized pixel of the light's bounding volume
        Read the required attributes for that specific pixel from the G-Buffer, use them to render the pixel color according to the light, accumulate the results into the render target

So this becomes O(m*o + n*q), where q is the rasterized size of the light.

And now, we can imagine that if we are careful, and are conservative with our G-Buffer, and we don't use a lot of very big lights that increase n*q, and we need to use lots of small lights, we will be faster than the forward renderer.

By far.

I will try to put an example here later.

No comments:

Post a Comment