Optimizating Rendering for a Big Procedural City

Summary

This is a long post where I describe the procedural generation algorithm we used for generating our city, then diagnose its performance problems and explain my shader-based rendering optimization.

Introduction

I have been working with a friend on a Unity game that takes place in a big procedurally generated city. This is what our city currently looks like:

Jumping around the buildings in our city.

In this post, I want to document some of the optimization work I've done on this project. Our city started off very small, but we ran into performance issues as we started to scale it up. I spent about 2 weeks working out the optimizations could do. It took a lot of fiddling to get there. In the end, the biggest changes I made involved adjusting the way we rendered buildings so that all of the distant buildings can be rendered mostly on the GPU. I would like to elaborate a bit more on this, but first, I need to explain how our city was generated.

How We Are Procedurally Generating a City

The road structure of our city is generated with a recursive division algorithm, inspired by maze generation algorithms of the same name. We start with a grid of road and non-road tiles, with solid lines of road running horizontally and vertically across a grid:

Top down view of a grid of road tiles. — A grid of road and non-road tiles. The red lines drawn in the top left marks the size and position of the grid cells.

This big grid then divided into smaller regions recursively, while blocking off roads at each division to create organic looking road structures. In more technical words, at each recursive step:

a random line of road tiles across the grid is chosen as the dividing line.
then, a random road tile on the dividing line is converted into a non-road tile
then, recursively apply these steps to the two regions of grid on the two sides of the dividing line (unless they are too small to be divisible)

The result of the recursive division results in a network of roads that has a mix of long, straight paths and short, twisty paths:

Top down view of a grid of tiles wuth 3 recursive layers of division lines drawn on top. Each division line blocks off a road tile at a random point along the line. — Example resulting road structure from recurisve division. The red/green/blue lines drawn indicate the division lines that were chosen during the 1st/2nd/3rd recursive steps.

From here, buildings are placed on non-road tiles: going along the grid horizontally and vertically, one building is placed on every other tile. Then, these buildings are stretched/scaled where possible to fill up any extra non-road tiles that were created during the recursive division process.

Top down view of a grid of tiles where buildings placed on non-road tiles expand to fill up nearby space. — Example of a road network and how the buildings would fit onto it. The red areas indicate the space the buildings take up when initially placed, and the green areas indicate the extra space they take up after being stretched to fill up the extra room.

Building Models and Variations

Each building uses a random model, selected from one of the three building models we've created. Each building also has a random orientation, color, and height. To reach the desired height, multiple instances of the base model will be stacked on top of each other. Those instances will also be stretched vertically as needed. The resulting buildings look like this:

Screenshot of the game with a high-angle view of the many variations of the buildings. — A 3D view of the buildings with varying base model, color, orientation, and height.

To add more visually interesting landmarks, we added groups of taller buildings. We also mega-sized certain buildings. Megabuildings take up a 5x5 set of tiles.

Screenshot of the game from a high altitude. Areas of taller and shorter buildings can be seen. There is a noticeable void behind the buildings. — A broader 3D view of the buildings. This screenshot contains several areas of taller buildings and a few megabuildings.

We still had other features left to add, like big billboards, highways, and other cyberpunk elements. But first, we needed to make sure our city can scale up to tremendous sizes while maintaining good performance. We wanted a BIG city, after all!

Performance Problems Our Procedural City

All of the screenshots above were taken with a render distance of 80 tiles. That translates to about 1200 buildings within render distance, which is far too small for our fantasy of a BIG city. Here is what the view looks like with a render distance of 360 tiles (~25k buildings within render distance):

Screenshot of the game from a very high altitude. The void from before is filled with smaller buidlings. — A view of the city from one of the highest points reachable in the game (render distance = 360 tiles).

The view looks much more "rich". There is now a sea of smaller, distant buildings filling in the empty space in the background. No more void; there is only buildings until you stare far into the distance. Ideally, we want to turn up the render distance even farther, but we ran into clear performance problems at this point.

Screenshot of Unity Profiler. Graphs shows 44ms taken for rendering and 60k batches, with the most time spent in RenderLoop.DrawSRPBatcher and CullScriptable functions. — Performance graph of the game in Unity's Profiler. Rendering takes over 40ms. It only drops below 16ms when the player is looking directly downwards (middle part of the graph). (Render distance = 360 tiles)

Using the profiler, I noticed that rendering performance seemed to scale directly with the # of batches/triangles/vertices, and that the most time-consuming functions reported by the profiler was related to mesh batching and culling. From this, I concluded that the insane number of meshes in the scene was the problem.

A few things to note:

I am running on a Ryzen 7 3700X and Radeon RX 6600
we are using the Unity 2022.3 LTS
we are using Unity's Universal Render Pipeline with the SRP Batcher enabled
each building mesh is controlled by an LOD Group component, which switches between the detailed building model and a simplified building model based on distance/visibility

Screenshot of 3 different buildings and their respective low detailed models rendered in game. — Comparision of the building detailed models (left) and smplified models (right). The detailed models have many bumps and crevices along the surfaces of the sides of the buildings, which are all flattened in the simplified versions.

In order to confirm the problem, I tried adjusting the code for our building generation to reduce the number of meshes in the scene. One thing that was contributing to the high mesh count was having thousands of buildings in our scene, where each one of them could be comprised of up to 32 instances of the base building mesh, stacked on top of each other to reach the desired height. To reduce the mesh count per building, I created meshes that are comprised of multiple stacks of the base building models, so that each building will be comprised of only a single mesh instead of up to 32. With this optimization, our performance improved drastically:

Screenshot of Unity Profiler. Graphs shows 16ms taken for rendering and 28k batches, with the most time still spent in RenderLoop.DrawSRPBatcher and CullScriptable functions. — With the optimization, the rendering time was reduced by 27ms (62%) and the batch count by 32k (53%). (Render distance = 360 tiles)

The number vertex and triangle count became much higher. This was because I didn't adjust the LOD thresholds, so a lot of the buildings were being shown in much higher detail than what was needed. This would get fixed later, but seeing it not cause any problems on my computer confirmed that the number of meshes in the scene was the real issue.

Optimizing Our Procedural City

The plan for optimizing the number of meshes is as follows:

Our buildings are very rectangular and box-like, so when they are far, they can be replaced with imposters. In our case, each building can be replaced with a simple box with each side textured to look like the actual building.
- The imposter would just be a textured box, which has less vertices and triangles than both the fully detailed and less detailed LOD versions of our building models.
- More importantly, the imposter version of each building can be rendered from the exact same box mesh. It's just the texturing on each side of the box that would be different.

Given all that, what I thought to do was:

Screenshot of a 10x10 grid of tall, textureless boxes and a 10x10 grid of buildings for comparision. — The plan is to replace 10x10 chunks of distant buildings with a single imposter mesh containing 100 tall boxes (left) and write a shader to render it like real buildings (right).

If it works, then we can easily reduce the number of batches contibuted by faraway buildings by a factor of 100 (or however many buildings the single mesh will replace).

In order for the shader to render the boxes like the buildings they are replacing, it needs to know each building's base model, color, size, orientation, etc. We don't have this information in the shader. Most of this data is being generated on the CPU at startup, deterministically from the building's position. It would technically be possible to reproduce this data inside the shader, or get the CPU to pass it all onto the GPU at runtime, but doing so came with performance complications.

Instead, I figured we can calculate all of that data beforehand and encode them into textures that the shader can read from. Each pixel in the texture would store the data required for one tile in-game. In order to store all the required data to reproduce a 2048x2048-tile-sized city, we needed five 2048x2048 RGBA textures. The resulting textures looked like this:

Collage of 4 textures, each one having seemingly randomly colored pixels spread in a grid pattern. — These are 4 of the textures that contains data needed to recreate the buildings. Each pixel holds data for 1 tile. These are each 125x125 segments of the full textures, enlarged 4x to see the pixels better.

The textures above contain data relating to:

top left: structure type (red), building orientation (green), building horizontal scale (blue), building position offset (alpha).
top right: building base model ID (red), building # of stacks (green), building height (blue), other building rotation data (alpha)
bottom left: building window color (RGB, alpha = intensity).
bottom right: building accent color (RGB, alpha = intensity).

The main downside of this approach is that we would no longer be able to generate an infinite-sized city, our city could only be as big as the size of the textures. However, a 2048 by 2048 tiles feels big enough. If we wanted to restore infinite procedural generation in the future, we can probably work around this limitation (maybe by randomly mixing together random segments from these textures instead of reading it as is).

The Shader

With that said, we can utilize the mesh of boxes and the data textures with a shader like this:

A large screenshot of the building imposter shader in shader graph, with multiple numbered sections drawn. — The building imposter shader in Unity's Shader Graph.

The shader is divided into these sections:

For each vertex, find the building it belongs to and the position of that building.
Converting the building's position into the corresponding UV coordinates of its pixel on the data textures.
Reading the data textures for the pixel containing the building's data. There are 5 textures being read here (4 of which were shown above).

Sections 4-8 are for decoding the information that were encoded in the data textures, like the building's type, orientation, base model ID, number of stacks, and colors.

For megabuildings, moving the vertices to enlarge the box to the size of a megabuilding.
For normal buidlings, moving the vertices to account for any horizontal stretching to fill empty non-road space.
Moving the vertices vertically according to the desired height of the building.
Passing many of the decoded values to the fragment shader as interpolators.
Calculating the UV cordinates of the fragment, accounting for the building's orientation and stack count.
Sampling the imposter building textures (shown below) and mixing in the building's colors.
This just adds some fancy textures for the windows.

Imposter Building Textures:

Two textures, each split into a 2x2 grid of building textures. Looks like a mask for the accents and window areas. — Window mask textures (left) and accent/emission textures (right) for all the imposter buildings. Both are 2 by 2 texture arrays, indexable with the building ID.

Here is a comparison of the fully detailed models, their less detailed LOD versions, and the imposter versions rendered with the shader above:

Screenshot of a 10x10 grid of buildings rendered with high detailed models, low detailed models, and the imposter method. For visual comparison. — Comparison of the same chunk of buildings rendered with full detailed models (left), lower detailed models (center), and the imposter shader (right). The lower detailed models and the imposters look practically the same from this distance.

With this, I was able to greatly optimize the mesh count of our city by replacing distance chunks of buildings with imposter versions. Here is the performance after making this optimization:

Screenshot of Unity Profiler. Graphs shows 4ms taken for rendering and 2k batches. — With the imposters, the rendering time was reduced to less than 5ms (a 11ms or 69% improvement) and the batch count to under 3000 (a 25k or 89% improvement). (Render distance = 360 tiles)

Here is the new look of the city with a render distance now turned up to 810 tiles (still with good performance!), and a screenshot of what is was like before, for comparison:

Screenshot of the city from a high altitude. Buildings can be seen far into the distance. — Screenshots of the city from a high altitude. Top: render distance = 810 tiles. Bottom: render distance = 80 tiles. After optimizations, there is only a minor performance difference between the two.

Screenshot of the city from a high altitude. Only foreground buildings can be seen. — Screenshots of the city from a high altitude. Top: render distance = 810 tiles. Bottom: render distance = 80 tiles. After optimizations, there is only a minor performance difference between the two.

The batch count being in the thousands still felt uncomfortable, but it was no longer a bottleneck and so was left as a problem for future me.

Other Optimizations

There were a few other optimizations I did that improved the performance of our game (though not as much as the building imposters did). These were:

flattening the hierarchy of the scene, which reduced the cost of propagating transform changes when I moved the chunks and buildings around (for object pooling/recycling).
makings similar imposters for the roads. Each road tile was also an individual mesh, which was optimized by replacing many of them with a single plane that is textured to replicate the same look. This would lose detail on the road tiles when viewed up close, so I only did it for the distant roads. The really distant roads were not rendered at all since they were obstructed by all the buildings.

There were also a few other optimizations that I considered or tried, but did not end up using. These were:

using fog to avoid rendering distant buildings — while we still have fog to fade out distant buildings, we wanted players to feel like they can see clear and far. Adding heavy fog to avoid rendering distant buildings was not the artistic direction we wanted to go in.
using the skybox or large background billboards to fake distant buildings — while having a skybox that mimics distant lights can help with our aesthetic, I found that players like to use distant buildings as landmarks to orient themselves as they ran around the city. Thus, we needed those distant buildings to be real or players would get confused.
occlusion culling: initially, occlusion culling seemed like the ideal solution; many building were being hidden behind other buidlings and could theoretically be occlusion culled. However, the worst performance happens when the player gets a high-angle view of the city, where a tremendous amount of buildings are visible and occlusion culling is the least effective. This makes occlusion culling seem not worth its cost, although I will probably consider it again when more optimization is needed.

Future Work

There is plenty of room for improvement. In particular, I am interested in trying out True Imposters, which might allow for buildings with more complex shapes than the mostly rectangular ones we have now. Another thing that we would want to add later on is more building decorations and accessories, like signs, pipes, wires, antennas, etc. I suspect most of these details will be too small to be worth rendering on distant buildings, but in the case that they aren't, we'll have to figure out something for them.

That is all for now. We still have a game to make inside of our city.

Jumping around some more in our city (render distance = 810 tiles).