phototiler logophototiler

A photorealistic map renderer and 3D map model exporter.

Improving performance in phototiler v2.2.0

Phototiler v2.2.0 comes with terrain support and performance improvement to task scheduling. With the introduction of terrain, a new concept emerged: dependencies between tasks in the tile loader. What it means is that some tasks may become dependent on the data from other data sources before their geometry can start to be built.

In phototiler, terrain support is implemented using layer geometry warping: adapting the layer geometry to morph and fit against the underlaying terrain geometry. It is a different approach to the typical terrain draping used in mapping software, but can give overall better results as draping can lead to artifacts due to the finite resolution of the texture that the layers are rendered into.

To be able to fit the geometry, the tile layer geometry is subdivided into a grid that matches the underlaying terrain geometry, the polygon of each feature is then clipped against each of these terrain patches and height queries are then performed for each point of the tesselated and subdivided polygon.

For the reasons just mentioned, the vector data geometry processing becomes dependent on the DEM data (heightmaps) as we now need to be able to perform height queries at random locations over a tile. If we categorize the different set of tasks phototiler has to dispatch when a user requests a new scene to be built, we get the following:

And the implicit dependencies that emerge from these tasks:

Previously, phototiler had a 1:1 matching between vector data and tile mesh. Now, multiple data sources can contribute to the generation of a single tile mesh, with eventual dependencies between data sources themselves.

To profile c++ code in phototiler, I use a minimal profiler called minitrace. It is a simple library that exports a json trace file that can then be loaded in a flame graph visualization tool like Perfetto. This helps find the biggest offenders and is a very lightweight way to profile and visualize the execution timeline of the render loop and all threads. The only requirement is to manually annotate the parts of the code you want to profile with macros.

In tile loading, the biggest offenders happened to be two things, data fetch latency (using curl on Linux and macOS and winhttp on Windows) and BVH construction. There is little I can do regarding the data download latency, as the server endpoints are from third-parties. Though, something I noticed was that the order of tasks wasn't optimal as some tasks could be triggered much earlier, and that can be solved with better task ordering and a job system.

A simple approach to task scheduling is to declare the dependencies of each task and let the job system find the most optimal order of execution. To do so, the job system maintains a graph that describes the dependencies, and which tasks have already been executed. At each frame, the tasks that are done are consumed away from the graph and the graph node that have no other nodes pointing to them satisfy the requirements to be scheduled next for execution.

c++ task job system dependency flow
c++ task job system dependency data flow

On a simple scene, this approach reduced the total time spent processing the scene generation task of downloading, parsing and building the geometry a 3x improvement.

Data download latency is still the slowest part, but at least now these tasks are all dispatched in an optimal order and don't put on hold other processing that could have happened earlier in the timeline.

Of course, I could have looked at improving the mesh generation algorithms, but this would not have improved the perceived speed of scene creation for the user, at least not at this stage of optimizing scene processing. Now that all tasks are executed as early they can, the next biggest offenders can be looked at.

One potential idea is to improve the speed at which the next task gets picked. Currently, the job system gets updated every frame (at 16 ms intervals), which means that if a task is faster than the next frame tick and threads aren't busy, they may be waiting another frame until the job system queues a task for them, this isn't great.

Improving that can be another great step towards packing the scene generation timeline a bit more before looking at optimizing the mesh generation and potentially parallel BVH generation!

– halfmaps