I spend a little bit of my vacation time to wrap up some personal project tasks that have been simmering for the past 11 months for all the pie in the sky ideas I could come up with to make the library run faster and with less memory.
At the end of it, I have an allotment library that contains type-less memory pools, for each pooled thread and custom vectors, hashmaps and buffer containers to make all the memory management a lot more efficient and easy to use. There is a lot of dynamic stack allocations for iterative functions, and nothing recursive to be found anywhere in the library.
I have replaced almost all 3rd party libraries I was using with more efficient and better suited solutions to this particular library.
I added custom splitting containers for building the trees to be able to easily construct rule-sets for building the trees. Which contains some random, simple and naive form of splitters, down to my own custom method that is using Naylors Good Tree Heuristics paper as a premise, that is very efficient and fast to build and produces the best resulting trees of all the differing splitters I have tried. So, rebuilding the trees is quite efficient, and the resulting trees are very good. So I can get away with quite a few operations between rebuilds to cleanup memory and improve the tree.
And for the splitting operations when building, all the geometry is plucked into a BVH which can easily be partitioned as the BSP tree when splitting to make the building even more efficient.
Added all the tree reduction heuristics to remove dead branches after merging, altered the re-constructions to use the actual tree as a adjacency graph via some minor additions on the sub hyperplanes and leafs of the trees. So, ensuring that the input produces a good search structure became a lot more elegant and efficient.
Did all the dirty memory magic to pool and reduce allocations. Using mimalloc as a baseline for allocations efficiency and constructing the pools.
My small set of brutal tests, are running around 1000x faster than when I first got all this running. The one particular Dodecahedron test that sculpts the Stanford bunny out of a single cube with 10365 subtractions, is running around ~12 secs on my 8 year old PC. So, around 860hz on average. On more recent machines it is running under 10 secs with over 1000hz on average.
That particular tests without my memory pools, consumes about 200 megs for its tree and support structures, but will grow by at least 2x when the pools are being used.
This crazy performance run, started by a scribble on my chalkboard wall with a couple of dozen dreamy tasks about 20 months ago, and have now all been crossed out. Yay.