2012. szeptember 25., kedd

GameCracker - 8. Category Heads

When we consider the execution time of the program, the step that can be expected to take the most time is checking to see if a state is already in the graph. That's why I introduced the concept of categories. As a reminder, a category is a hash that limits which states have a possibility to be equal to the new one. For each category, I have a linked list of the nodes which contain states in this category; these category chains are the nodes which I need to check during expansion. I also described that each node contains a link to the next node in the same category chain, so these links are neatly contained in the main data structure. However, I need to know which node is the first in a particular chain. So for the category chains to work, I need an additional data structure that tells me the ID of the first node in a category chain (the category head) given the category. Since both the category itself and the node ID are long values, this data structure is essentially a long-long map.

If the category does a nice job limiting the number of nodes that need inspecting, that means that there are a lot of categories. The time required to look up a category head can become an issue as the number of category chains grows.

With that in mind, I chose a file-based hash table to store the category heads. If I remember correctly, I took a hash map implementation (not the chained java.util.HashMap, but one using open addressing) and adapted it a bit. Here's how it works.

Open addressed hash maps store their elements in arrays; by a hashing function, each key is assigned a position in the array. If the position is already occupied, a free position is found via some probing method. As new elements are added to the map, the array fills up, and collisions become more common; there are more and more elements that require more and more steps to find. This ruins the performance of the data structure. To remedy this, the entire hash map is recreated using a larger array.

This recreation is something that I couldn't fit in my file-based transactional data storage system. Even if I could somehow manage to make it work, it would probably take a lot of time to do it in the file system. So I decided for a compromise. Instead of reallocating the entire array, I append a new array at the end of the file. So my category head store comprises of a sequence of these arrays, which I call blocks. The last block is the one in which new elements are added, until it fills up and a new block is appended. The price of this strategy is that when searching for a key, every single block must be examined to see if it contains the key. So there's the additional price, an overhead besides the probing (caused by collisions).

This is the data structure that I use now. I'd really love to find a better one though, one with better performance and more pleasant growing abilities. Recently I've been wondering how databases would do this. If I created a table with two 'long' columns, the category (which would be the primary key) and the head ID, how would it perform compared to my hash map? Does it have some tricks that I could learn? One of these days I'll take a closer look.

Nincsenek megjegyzések:

Megjegyzés küldése