January 8, 2013 - Justin's Linklog

HAT-trie: A Cache-conscious Trie-based Data Structure for Strings [PDF]

‘Tries are the fastest tree-based data structures for managing strings in-memory, but are space-intensive. The burst-trie is almost as fast but reduces space by collapsing trie-chains into buckets. This is not however, a cache-conscious approach and can lead to poor performance on current processors. In this paper, we introduce the HAT-trie, a cache-conscious trie-based data structure that is formed by carefully combining existing components. We evaluate performance using several real-world datasets and against other highperformance data structures. We show strong improvements in both time and space; in most cases approaching that of the cache-conscious hash table. Our HAT-trie is shown to be the most e?cient trie-based data structure for managing variable-length strings in-memory while maintaining sort order.’ (via Tony Finch)

(tags: via:fanf data-structures tries cache-aware trees)
The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases [PDF]

‘Main memory capacities have grown up to a point where most databases ?t into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data structures like balanced binary search trees are not ef?cient on modern hardware, because they do not optimally utilize on-CPU caches. Hash tables, also often used for main-memory indexes, are fast but only support point queries. To overcome these shortcomings, we present ART, an adaptive radix tree (trie) for ef?cient indexing in main memory. Its lookup performance surpasses highly tuned, read-only search trees, while supporting very ef?cient insertions and deletions as well. At the same time, ART is very space ef?cient and solves the problem of excessive worst-case space consumption, which plagues most radix trees, by adaptively choosing compact and ef?cient data structures for internal nodes. Even though ART’s performance is comparable to hash tables, it maintains the data in sorted order, which enables additional operations like range scan and pre?x lookup.’ (via Tony Finch)

(tags: via:fanf data-structures trees indexing cache-aware tries)
Ef?cient In-Memory Indexing with Generalized Pre?x Trees [PDF]

‘Ef?cient data structures for in-memory indexing gain in importance due to (1) the exponentially increasing amount of data, (2) the growing main-memory capacity, and (3) the gap between main-memory and CPU speed. In consequence, there are high performance demands for in-memory data structures. Such index structures are used—with minor changes—as primary or secondary indices in almost every DBMS. Typically, tree-based or hash-based structures are used, while structures based on prefix-trees (tries) are neglected in this context. For tree-based and hash-based structures, the major disadvantages are inherently caused by the need for reorganization and key comparisons. In contrast, the major disadvantage of trie-based structures in terms of high memory consumption (created and accessed nodes) could be improved. In this paper, we argue for reconsidering pre?x trees as in-memory index structures and we present the generalized trie, which is a pre?x tree with variable prefix length for indexing arbitrary data types of fixed or variable length. The variable prefix length enables the adjustment of the trie height and its memory consumption. Further, we introduce concepts for reducing the number of created and accessed trie levels. This trie is order-preserving and has deterministic trie paths for keys, and hence, it does not require any dynamic reorganization or key comparisons. Finally, the generalized trie yields improvements compared to existing in-memory index structures, especially for skewed data. In conclusion, the generalized trie is applicable as general-purpose in-memory index structure in many different OLTP or hybrid (OLTP and OLAP) data management systems that require balanced read/write performance.’ (via Tony Finch)

(tags: via:fanf prefix-trees tries data-structures)
A Non-Blocking HashTable by Dr. Cliff Click : programming

Proggit discovers the NonBlockingHashMap. This comment from Boundary’s cscotta is particularly interesting: “The code is intricate and curiously-formatted, but NBHM is quite excellent. The majority of our analytics platform is backed by NBHMs updated rapidly in parallel. Cliff’s a great, friendly, approachable guy; if you have any specific questions about the approaches or implementation, he may be happy to answer.”

(tags: data-structures algorithms non-blocking concurrency threading multicore cliff-click azul maps java boundary)

Archives

Links for 2013-01-08