This is the long version of XPCOM hashtable guide. The information you're looking for is probably there.
What Is a Hashtable?
A hashtable is a data construct that stores a set of items. Each item has a key that identifies the item. Items are found, added, and removed from the hashtable by using the key. Hashtables may seem like arrays, but there are important differences:
Array | Hashtable | |
---|---|---|
Keys: | integer: arrays are always keyed on integers, and must be contiguous. | any type: almost any datatype can be used as key, including strings, integers, XPCOM interface pointers, IIDs, and almost anything else. Keys can be disjunct (i.e. you can store entries with keys 1, 5, and 3000). |
Lookup Time: | O(1): lookup time is a simple constant | O(1): lookup time is mostly-constant, but the constant time can be larger than an array lookup |
Sorting: | sorted: stored sorted; iterated over in a sorted fashion. | unsorted: stored unsorted; cannot be iterated over in a sorted manner. |
Inserting/Removing: | O(n): adding and removing items from a large array can be time-consuming | O(1): adding and removing items from hashtables is a quick operation |
Wasted space: | none: Arrays are packed structures, so there is no wasted space. | some: hashtables are not packed structures; depending on the implementation, there may be significant wasted memory. |
In their implementation, hashtables take the key and apply a mathematical hash function to randomize the key and then use the hash to find the location in the hashtable. Good hashtable implementations will automatically resize the hashtable in memory if extra space is needed, or if too much space has been allocated.
When Should I Use a Hashtable?
Hashtables are useful for
- sets of data that need swift random access;
- with non-integral keys or non-contiguous integral keys;
- or where items will be frequently added or removed.
Hashtables should not be used for
- Sets that need to be sorted;
- Very small datasets (less than 12-16 items);
- Data that does not need random access.
In these situations, an array, a linked-list, or various tree data structures are more efficient.
Mozilla's Hashtable Implementations
Mozilla has several hashtable implementations, which have been tested and, tuned, and hide the inner complexities of hashtable implementations:
PLDHash
- low-level C API; stores keys and data in one large memory structure; uses the heap efficiently; client must declare an "entry class" and may not hold onto entry pointers.PLHashTable
- low-level C API; entry class pointers are constant; more efficient for large entry structures; often wastes memory making many small heap allocations.nsTHashtable
- low-level C++ wrapper aroundPLDHash
; generates callback functions and handles most casting automagically. Client writes their own entry class which can include complex key and data types.nsDataHashtable/nsInterfaceHashtable/nsClassHashtable
- high-level C++ wrappers aroundPLDHash
; simplifies the common usage pattern mapping a simple keytype to a simple datatype; client does not need to declare or manage an entry class;nsDataHashtable
datatype is a scalar such asPRUint32
;nsInterfaceHashtable
datatype is an interface;nsClassHashtable
datatype is a class pointer owned by the hashtable.
Which Hashtable Should I Use?
Key Type: | ||||||
---|---|---|---|---|---|---|
integer | String/CString | nsID | nsISupports* | Complex | ||
Data Type: | None (Hash Set) | nsInt32HashSet |
ns(C)StringHashSet |
nsTHashtable<...> |
||
Simple (PRUint32) | nsDataHashtable |
nsTHashtable<...> |
||||
<nsUint32HashKey, |
<ns(C)StringHashKey, |
<nsIDHashKey, |
<nsISupportsHashKey, |
|||
Interface (nsISupports) | nsInterfaceHashtable |
|||||
<nsUint32HashKey, |
<ns(C)StringHashKey, |
<nsIDHashKey, |
<nsISupportsHashKey, |
|||
Class (nsString*) | nsClassHashtable |
|||||
<nsUint32HashKey, |
<ns(C)StringHashKey, |
<nsIDHashKey, |
<nsISupportsHashKey, |
|||
Complex (structures, etc.) |
nsTHashtable<...> |
PLDHash
The PLDHash
implementation is a fairly low-level implementation, written in C. It is extremely flexible, but requires some time to understand and use. A basic guide is included here, but you should read most of xpcom/glue/pldhash.h
if you intend to use PLDHash
. The C++ wrappers for PLDHash
(see below) are often much easier and safer to use in C++ code, as many potential casting errors are easily avoided.
You must declare an entry struct type, deriving from PLDHashEntryHdr. This entry struct should contain whatever data you wish to store in the hashtable (any pointer or fixed-length data type). Note: because of the double-hashing implementation, entries may move in memory when the hashtable is altered. If you need entry pointers to remain constant, you may want to consider using PLHashTable
instead.
You must also initialize a PLDHashTableOps
structure. This serves similarly to a vtable in C++, with pointers to appropriate user-defined functions that initialize, compare, and match entries. Because PLDHash
does not know what datatype your key is, all functions that work with keys are declared using const void*
, and your client code must cast these pointers to the appropriate type.
PLDHashTables can be allocated on the stack or the heap:
- When allocated on the stack, or as a C++ class member, the table must be initialized using
PL_DHashTableInit
, and finalized usingPL_DHashTableFinish
; - When allocated on the heap, use
PL_NewDHashTable
andPL_DHashTableDestroy
to allocate and delete the table.
PLHashTable
PLHashTable
is a part of NSPR. The header file can be found at
. In general, nsprpub/lib/ds/plhash.h
PLDHash
is a better solution than PLHashTable
, because PLHashTable
makes many heap allocations.
There are two situations where PLHashTable
may be preferable to PLDHash
:
- You need entry-pointers to remain constant.
- The entries stored in the table are very large (larger than 12 words).
PLDHash
does not handle large entry structures efficiently.
nsTHashtable
nsTHashtable
is a C++ template that wraps PLDHash
. It hides many of the complexities of PLDHash
(callback functions, the ops structure, etc). You should read xpcom/glue/nsTHashtable.h
.
To use nsTHashtable
, you must declare an entry-class in a pre-defined format. This entry class contains the key and the data that you are hashing (just like PLDHash
, above). It also declares functions that manipulate the key. In most cases, the functions of this entry class can be entirely inline. For examples of entry classes, see the declarations at xpcom/glue/nsHashKeys.h
.
The template parameter is the entry class. You must use the Init()
function to initalize the table properly. At this point, use the functions PutEntry/GetEntry/RemoveEntry
to alter the hashtable. The Iterator
class will do iteration, but beware that the iteration will occur in a seemingly-random order (no sorting).
nsTHashtable
s can be allocated on the stack, as class members, or on the heap.- Entry pointers can and do change when items are added to or removed from the hashtable. Do not keep long-lasting pointers to entries.
- because of this,
nsTHashtable
is not inherently thread-safe. If you use a hashtable in a multi-thread environment, you must provide locking as appropriate.
Before using nsTHashtable
, see if nsBaseHashtable
and relatives will work for you. They are much easier to use, because you do not have to declare an entry class. If you are hashing a simple key type to a simple data type, they are generally a better choice.
nsBaseHashtable and friends: nsDataHashtable, nsInterfaceHashtable, and nsClassHashtable
These C++ templates provide a high-level interface for using hashtables that hides most of the complexities of PLDHash
. They provide the following features:
- hashtable operations can be completed without using an entry class, making code easier to read;
- optional thread-safety: the hashtable can manage a read-write lock around the table;
- predefined key classes provide automatic cleanup of strings/interfaces
nsInterfaceHashtable
andnsClassHashtable
automatically release/delete objects to avoid leaks.
nsBaseHashtable
is not used directly; choose one of the three derivative classes based on the data type you want to store. The KeyClass
is taken from nsHashKeys.h
and is the same for all three classes:
nsDataHashtable<KeyClass, DataType>
-DataType
is a simple type such asPRUint32
orPRBool
.nsInterfaceHashtable<KeyClass, Interface>
-Interface
is an XPCOM interface such asnsISupports
ornsIDOMNode
nsClassHashtable<KeyClass, T>
-T
is any C++ class. The hashtable stores a pointer to the object, and deletes that object when the entry is removed.
The important files to read are xpcom/glue/nsBaseHashtable.h
and xpcom/glue/nsHashKeys.h
. These classes can be used on the stack, as a class member, or on the heap. Initialize using the Init()
function; you can specify whether you need thread-safety at this time. Use the Put()
, Get()
, and Remove()
methods to alter the table.
Using nsTHashtable as a hash-set
A hash set only tracks the existence of keys: it does not associate data with the keys. This can be done using nsTHashtable<nsSomeHashKey>
. The appropriate entries are GetEntry and PutEntry.
Future Plans
nsISimpleEnumerator support
The (obsolete) nsHashtable
has a wrapper that exposes an nsISimpleEnumerator
on its items. I will add this support to the various nsBaseHashtable
classes as well, as needed.
Hash Functions
All of the above hashtables need a Hash Function. This function converts the key into a semi-unique integer. The mozilla codebase already contains hash functions for most key types, including narrow and wide strings, pointers, and most binary data:
void* |
cast using NS_PTR_TO_INT32 |
char* string |
nsCRT::HashCode() |
PRUnichar* string |
|
nsAString |
HashString() |
nsACString |
|
nsID& |
nsIDHashKey::HashKey() |
Writing a good hash function is well beyond the scope of this document, and has been discussed extensively in computer-science circles for many years. There are many different types of hash functions. Mozilla has tuned a good general-purpose hash algorithm for strings and nsID
.
Original Document Information
- Author(s): Benjamin Smedberg <[email protected]>