Sunday 21 August 2011

The importance of data and visuals separation

At the start of the project, some four months ago, the idea was only for procedural terrain, without editing. At that point, what we now call 'visual blocks' was the only data structure around.

Visual blocks are simply a collection of cubes with density values on each corner. These cubes naturally share corners, at most 8 cubes per corner. Visual blocks were generated on the fly,
extracting a surface from the density function using an implementation of the Marching Cubes Algorithm (now patent-free). This happened without storing any data to disk, as the terrain was not editable, thus the density function described it perfectly at all times.

As the other guy on the project joined, we decided to go for editable terrain. This meant that we had to store terrain data on the disk, since when edited the density function would no longer be an accurate representation. In the previous post, I mentioned how to store these discretely.

Now, the other problem when storing data is somewhat tricky. Neighboring visual blocks share
densities along the border. These densities can be considered as part of both visual blocks. So the question was, do we store this data as part of both visual blocks, or just one?

Storing it in just one visual block would require loading the whole next one to obtain this data, but even worse, would lead to recursive calls! The newly loaded visual block would require the next one, and so on and so on. There are ways of avoiding this, but not entirely too pretty.


Storing the densities as part of both visual blocks seems wasteful. If we consider that a visual block is in fact cubical, and all its sides border another, this would mean large amounts of densities to be duplicated.

And perhaps, most importantly, storing the data on per-visual block basis, ties the underlying data structure to the basic unit of display. If we wanted larger visual blocks (say for LoD), we couldn't do that.

The solution was to completely detach the underlying data from the visuals. In something called the Terrain Data Manager (which to some extent is described in the project wiki here), we manage both writing / reading form disk and the data currently in memory. The TDM stores datablocks - which are completely unrelated and independent from visual blocks. Datablocks are non-overlapping but adjacent, and are the basic atomic unit of data that can get loaded or generated.
The picture on the right illustrates how visual blocks can overlap several parts of datablocks.
Since the datablock was the smallest unit of data we would load, it made sense for it's size to be small, but due to costly time of reading/writing to disk, this size could not be too small. The current implementation holds datablock size at 20 voxels along each axis (or in other words, 20 m^3, since voxels are stored 1/m), while the visual blocks size is still being experimented upon. The separation of these two ideas gives large amounts of freedom

A visual block can be of any size and range, while the underlying data management remains untouched. The visual blocks don't even have to be of size that's multiple - or anywhere close to that of the datablock. This allows tweaking of both visual blocks size and datablock size independently, to optimize the time required for building and generating. It also eliminates the problem of storing any duplicates or causing recursive calls.

Further, the datablocks are the only data that actually gets generated - and stored. Visual blocks no longer concern themselves with where data comes from - the disk or is generated - they only request data from the TDM. The TDM hides all internals, and only returns certain things, like density at a specific point, or voxel-type (as in dirt, sand, etc) at a point.

The introduction of the TDM was one of the major steps in making the editable terrain behave correctly, and it allowed us a lot of freedoms in experimenting with visuals.

No comments:

Post a Comment