Since late October, I have been reworking the internals of my Lump editor, which is a 3D tile modelling software I will use to construct geometry for my games with additional options. The way I initially constructed it, was just through a collection of game object in Unity with some simple heuristics and a post processing simplification routine. But, it has serious scaling issues because it was built the way it was. Too many draw calls and too many limitations.
So, in essence, it needed to handle large geometric items and be able to split them up so that I would only work with a strict subset of those geometric and data pieces. At the top is an octree that manages the 3D tiles and each octree node has a Mesh manager that splits up the mesh into mesh items that follow the vertex limitations of Unity. It is very cheap to add geometry to the node, but when removing an item, the inner caching and data structures need more processing, which is fine for now.
The basic building items needed to be more memory efficient and these are not just cubes, but shapes made from a fixed set of vertices. There are a lot of things one can pre-compute when working within fixed restrictions. Every 3D tile occupies a space of 1 m^3 and is only able to rotate with 90 degree increments and inverse the scale on each axis. A 3D tile can also be made from sub-items, and have a resolution of 2^3, 4^3, 8^3 or 16^3. Which makes for a bit of a challenge to keep efficient.
To start of with, every vertex and normal reside in a shared list, like most modern geometry is set up with very few bytes for indices. Every block has a set of these polygons and every block can be be placed as sub blocks in another shape and rotated in 48 different ways(6 sides with 4 increments of 90 degrees and all their inverted counter parts). All these 48 variations are precomputed and stored in a table for lookup. And since all the matrices these blocks will ever deal with is one of those 48 variations, we can pre-compute all the matrix multiplications that are possible from the hierarchy chain. Another table of 48×48 that will return a value from 0-47. All matrix multiplication for these blocks have been cut down to a single lookup into a 2D array.
Each block only takes up about 7 bytes of memory. The local matrix is a byte, the resulting cached world matrix is a byte, its internal position is two bytes along with a couple of more indices. These 7 bytes could be compacted even further, but is not a priority right now.
Each sub-block does not contain a local position like we are used to seeing. Since these sub blocks follow a strict rule, we can also insure a certain order within a N^3 block so that we only need an index into a fixed array. For example, a block made from 16^3 sub blocks will give you 4096 sub blocks at most and two bytes will surely cover that range. So of course from the internal rules of how these blocks are placed, these indices can be converted to local positions and are all pre-computed as well. Not only that, all the matrix multiplications these positions might go through we pre-compute as well. So we don’t really have any matrix multiplications at all, just a form of lookup tables that take an index to a matrix and an index into the array and give you a new index within the array. The world matrix index also governs what pre-computed shape you are supposed to use.
The polygons on these shapes that might lie on the border and be covered and hidden by a neighboring block are also placed into a lookup array where the index of the normal is used to map to the internal order of that particular shape to find a polygon with that normal or its inverse. And since its all based on indices and look ups, all the final floating point calculations are based on fixed numbers which should eliminate any floating point accumulation errors we might commonly see. You only need those numbers to generate the points that go into the final geometry.
These collections of blocks are placed within a shape_description which other instanced shapes will point to so that you are not duplicating a lot of data.
All these crazy maps make it so that when we are actually dealing with large arrays of blocks and its neighbors, that we don’t need to do any searches of any kind, we just need its keys and indices to get all the data we need. What feature calls for such a pre-computation set up is the fact that we might be placing a lot of these high resolution 3D tiles into the scene and placing them next to 0 – 18 neighboring blocks. That is the number of neighbor that might affect the polygons that need to be hidden and the polygons that might be merged with its neighbors in a simplification step.
When you place a shape down in the scene, it collects its neighbors, copies the blocks with their indices into a linear array covering only the border blocks using the blocks position index as the index into that array. All the border blocks are at the front and have an index below a certain number, the blocks array itself within the shape is just 1 dimensional and sorted by the index it would have within our special 3D array lookup. These arrays along with the index of our border blocks will hand us a list of ids for all the possible neighbors we might have in those neighbors. No searching of neighbor blocks, just a lookup into a table that hands you indices into your neighbors. This also supports varying resolutions for the culling step when hiding faces. A face of a 2^3 shape might cover 64 blocks of a 16^3 shape. The merging step is restricted to neighbors of the same resolution.
We find our polygons from a lookup table and make simple checks to see if we should exclude them from the final geometry. Does the polygon point to the same material and have a smaller or an equal area to the one it is facing.
At the end of this, there is a simplification pass with some simple traversal of neighbors and rubber band algorithms to create convex polygons of neighboring shapes. The mesh manager within the octree eats up the resulting geometry and updates the rendered geometry. The simplification step is reserved for the smallest resolution in real time, but covers the full range on export.
Is this in any way confusing?
I need to finish the last step of this implementation and make sure a number of sanity tests pass before I wrap it up.