Data Structures

glib implements many common data structures, so you don't have to reinvent the wheel every time you want a linked list. This section covers glib's implementation of linked lists, sorted binary trees, N-ary trees, and hash tables.

Lists

glib provides generic single and doubly linked lists, GSList and GList, respectively. These are implemented as lists of gpointer; you can use them to hold integers with the GINT_TO_POINTER and GPOINTER_TO_INT macros. GSList and GList have identical API's, except that there is a g_list_previous() function and no g_slist_previous(). This section will discuss GSList but everything also applies to the doubly linked list.

In the glib implementation, the empty list is simply a NULL pointer. It's always safe to pass NULL to list functions since it's a valid list of length 0. Code to create a list and add one element might look like this:


GSList* list = NULL;
gchar* element = g_strdup("a string");
list = g_slist_append(list, element);

glib lists have a noticeable Lisp influence; the empty list is a special "nil" value for that reason. g_slist_prepend() works much like cons---it's a constant-time operation that adds a new cell to the front of the list.

Notice that you must replace the list passed to list-modifying functions with their return value, in case the head of the list changes. glib will handle memory issues, deallocating and allocating list links as needed.

For example, the following code would remove the above-added element and empty the list:


list = g_slist_remove(list, element);

list is now NULL. You still have to free element yourself, of course. To clear an entire list, use g_slist_free(), which removes all the links in one fell swoop. g_slist_free() has no return value because it would always be NULL, and you can simply assign that value to your list if you like. Obviously, g_slist_free() frees only the list cells; it has no way of knowing what to do with the list contents.

To access a list element, you refer to the GSList struct directly:


gchar* my_data = list->data;

To iterate over the list, you might write code like this:


GSList* tmp = list;
while (tmp != NULL)
  {
    printf("List data: %p\n", tmp->data);
    tmp = g_slist_next(tmp);
  }

Figure 13 shows the basic functions for changing GSList contents. For all of these, you must assign the return value to your list pointer in case the head of the list changes. Note that glib does not store a pointer to the tail of the list, so prepending is a constant-time operation, while append, insert, and remove are proportional to the list's size.

In particular, this means that constructing a list using g_slist_append() is a terrible idea; use g_slist_prepend() and then call g_slist_reverse() if you need items in a particular order. If you anticipate frequently appending to a list, you can also keep a pointer to the last element. The following code can be used to perform efficient appends:


void
efficient_append(GSList** list, GSList** list_end, gpointer data)
{
  g_return_if_fail(list != NULL);
  g_return_if_fail(list_end != NULL);

  if (*list == NULL)
    {
      g_assert(*list_end == NULL);
      
      *list = g_slist_append(*list, data);
      *list_end = *list;     
    }
  else 
    {
      *list_end = g_slist_append(*list_end, data)->next;
    }
} 

To use this function, you would store the list and its end somewhere, and pass their address to efficient_append():


  GSList* list = NULL;
  GSList* list_end = NULL;

  efficient_append(&list, &list_end, g_strdup("Foo"));
  efficient_append(&list, &list_end, g_strdup("Bar"));
  efficient_append(&list, &list_end, g_strdup("Baz"));

Of course you have to be careful not to use any list functions that might change the end of the list without updating list_end.

#include <glib.h>

GSList* g_slist_append(GSList* list, gpointer data);

GSList* g_slist_prepend(GSList* list, gpointer data);

GSList* g_slist_insert(GSList* list, gpointer data, gint position);

GSList* g_slist_remove(GSList* list, gpointer data);

Figure 13. Changing linked list contents

For accessing list elements, the functions in Figure 14 are provided. None of these change the list's structure. g_slist_foreach() applies a GFunc to each element of the list. A GFunc is defined as follows:


typedef void (*GFunc)(gpointer data, gpointer user_data);

Used in g_slist_foreach(), your GFunc will be called on each list->data in list, passing the user_data you provided to g_slist_foreach(). g_slist_foreach() is comparable to Scheme's "map" function.

For example, you might have a list of strings, and you might want to be able to create a parallel list with some transformation applied to the strings. Here is some code, using the efficient_append() function from an earlier example:


typedef struct _AppendContext AppendContext;
struct _AppendContext {
  GSList* list;
  GSList* list_end;
  const gchar* append;
};

static void 
append_foreach(gpointer data, gpointer user_data)
{
  AppendContext* ac = (AppendContext*) user_data;
  gchar* oldstring = (gchar*) data;

  efficient_append(&ac->list, &ac->list_end, 
                   g_strconcat(oldstring, ac->append, NULL));
}

GSList*
copy_with_append(GSList* list_of_strings, const gchar* append)
{
  AppendContext ac;

  ac.list = NULL;
  ac.list_end = NULL;
  ac.append = append;

  g_slist_foreach(list_of_strings, append_foreach, &ac);

  return ac.list;
}

glib and GTK+ use the "function pointer and user data" idiom heavily. If you have functional programming experience, this is much like using lambda expressions to create a closure. (A closure combines a function with an environment---a set of name-value bindings. In this case the "environment" is the user data you pass to append_foreach(), and the "closure" is the combination of the function pointer and the user data.)

#include <glib.h>

GSList* g_slist_find(GSList* list, gpointer data);

GSList* g_slist_nth(GSList* list, guint n);

gpointer g_slist_nth_data(GSList* list, guint n);

GSList* g_slist_last(GSList* list);

gint g_slist_index(GSList* list, gpointer data);

void g_slist_foreach(GSList* list, GFunc func, gpointer user_data);

Figure 14. Accessing data in a linked list

There are some handy list-manipulation routines, listed in Figure 15. With the exception of g_slist_copy(), all of these affect the lists in-place. Which means you must assign the return value and forget about the passed-in pointer, just as you do when adding or removing list elements. g_slist_copy() returns a newly-allocated list, so you can continue to use both lists and must free both lists eventually.

#include <glib.h>

guint g_slist_length(GSList* list);

GSList* g_slist_concat(GSList* list1, GSList* list2);

GSList* g_slist_reverse(GSList* list);

GSList* g_slist_copy(GSList* list);

Figure 15. Manipulating a linked list

Finally, there are some provisions for sorted lists, shown in Figure 16. To use these, you must write a GCompareFunc, which is just like the comparison function in the standard C qsort(). Using glib types, this becomes:


typedef gint (*GCompareFunc) (gconstpointer a, gconstpointer b);

If a < b, the function should return a negative value; if a > b a positive value; if a == b it should return 0.

Once you have a comparison function, you can insert an element into an already-sorted list, or sort an entire list. Lists are sorted in ascending order. You can even recycle your GCompareFunc to find list elements, using g_slist_find_custom(). (A word of caution: GCompareFunc is used inconsistently in glib; sometimes it glib expects an equality predicate instead of a qsort()-style function. However, the usage is consistent within the list API.)

Be careful with sorted lists; misusing them can rapidly become very inefficient. For example, g_slist_insert_sorted() is an O(n) operation, but if you use it in a loop to insert multiple elements the loop runs in exponential time. It's better to simply prepend all your elements, then call g_slist_sort().

#include <glib.h>

GSList* g_slist_insert_sorted(GSList* list, gpointer data, GCompareFunc func);

GSList* g_slist_sort(GSList* list, GCompareFunc func);

GSList* g_slist_find_custom(GSList* list, gpointer data, GCompareFunc func);

Figure 16. Sorted lists

Trees

There are two different kinds of tree in glib; GTree is your basic balanced binary tree, useful to store key-value pairs sorted by key; GNode stores arbitrary tree-structured data, such as a parse tree or taxonomy.

GTree

To create and destroy a GTree, use the constructor-destructor pair displayed in Figure 17. GCompareFunc is the same qsort()-style comparison function described for GSList; in this case it's used to compare keys in the tree.

#include <glib.h>

GTree* g_tree_new(GCompareFunc key_compare_func);

void g_tree_destroy(GTree* tree);

Figure 17. Creating and destroying balanced binary trees

Functions for manipulating the contents of the tree are shown in Figure 18. All very straightforward; g_tree_insert() overwrites any existing value, so be careful if the existing value is your only pointer to a chunk of allocated memory. If g_tree_lookup() fails to find the key, it returns NULL, otherwise it returns the associated value. Both keys and values have type gpointer, but the GPOINTER_TO_INT() and GPOINTER_TO_UINT() macros allow you to use integers instead.

#include <glib.h>

void g_tree_insert(GTree* tree, gpointer key, gpointer value);

void g_tree_remove(GTree* tree, gpointer key);

gpointer g_tree_lookup(GTree* tree, gpointer key);

Figure 18. Manipulating GTree contents

There are two functions which give you an idea how large the tree is, shown in Figure 19.

#include <glib.h>

gint g_tree_nnodes(GTree* tree);

gint g_tree_height(GTree* tree);

Figure 19. Determining the size of a GTree

Using g_tree_traverse() (Figure 20) you can walk the entire tree. To use it, you provide a GTraverseFunc, which is passed each key-value pair and a data argument you give to g_tree_traverse(). Traversal continues as long as the GTraverseFunc returns FALSE; if it ever returns TRUE then traversal stops. You can use this to search the tree by value. Here is the definition of GTraverseFunc:


typedef gint (*GTraverseFunc)(gpointer key, gpointer value, gpointer data);

GTraverseType is an enumeration; there are four possible values. Here are their meanings with respect to GTree.

  • G_IN_ORDER first recurses the left child of the node (the "lower" key according to your GCompareFunc), then calls the traversal function on the key-value pair of the current node, then recurses the right child. This traversal is in order from lowest to highest, according to your GCompareFunc.

  • G_PRE_ORDER calls the traversal function on the key-value pair of the current node, then recurses the left child, then recurses the right child.

  • G_POST_ORDER recurses the left child, then recurses the right child, and finally calls the traversal function on the current node's key-value pair.

  • G_LEVEL_ORDER is only meaningful for GNode, it is not allowed with GTree.

#include <glib.h>

void g_tree_traverse(GTree* tree, GTraverseFunc traverse_func, GTraverseType traverse_type, gpointer data);

Figure 20. Traversing GTree

GNode

A GNode is an N-way tree, implemented as a doubly linked list with parent and child lists. Thus, most list operations have analogues in the GNode API. You can also walk the tree in various ways. Here's the declaration for a node:


typedef struct _GNode GNode;

struct _GNode
{
  gpointer data;
  GNode   *next;
  GNode   *prev;
  GNode   *parent;
  GNode   *children;
};

There are macros to access GNode members, shown in Figure 21. As with GList, the data member is intended to be used directly. These macros return the next, prev, and children members respectively; they also check whether their argument is NULL before dereferencing it, and return NULL if it is.

#include <glib.h>

g_node_prev_sibling(node);

g_node_next_sibling(node);

g_node_first_child(node);

Figure 21. Accessing GNode members

To create a node, the usual _new() function is provided (Figure 22). g_node_new() creates a childless and parentless node containing data. Typically g_node_new() is used only to create the root node; convenience macros are provided which automatically create new nodes as needed.

#include <glib.h>

GNode* g_node_new(gpointer data);

Figure 22. Creating a GNode

To build a tree the fundamental operations shown in Figure 23 are used. Each operation returns the just-added node, for convenience when writing loops or recursing the tree. Unlike GList, it is safe to ignore the return value.

#include <glib.h>

GNode* g_node_insert(GNode* parent, gint position, GNode* node);

GNode* g_node_insert_before(GNode* parent, GNode* sibling, GNode* node);

GNode* g_node_prepend(GNode* parent, GNode* node);

Figure 23. Building a GNode tree

The convenience macros shown in Figure 24 are implemented in terms of the fundamental operations. g_node_append() is analagous to g_node_prepend(); the rest take a data argument, automatically allocate a node for it, and call the corresponding basic operation.

#include <glib.h>

g_node_append(parent, node);

g_node_insert_data(parent, position, data);

g_node_insert_data_before(parent, sibling, data);

g_node_prepend_data(parent, data);

g_node_append_data(parent, data);

Figure 24. Building a GNode

To remove a node from the tree, there are two functions shown in Figure 25. g_node_destroy() removes the node from a tree, destroying it and all its children. g_node_unlink() removes a node and makes it into a root node; i.e., it converts a subtree into an independent tree.

#include <glib.h>

void g_node_destroy(GNode* root);

void g_node_unlink(GNode* node);

Figure 25. Destroying a GNode

There are two macros for detecting the top and bottom of a GNode tree, shown in Figure 26. A root node is defined as a node with no parent or siblings. A leaf node has no children.

#include <glib.h>

G_NODE_IS_ROOT(node);

G_NODE_IS_LEAF(node);

Figure 26. Predicates for GNode

You can ask glib to report useful information about a GNode, including the number of nodes it contains, its root node, its depth, and the node containing a particular data pointer. These functions are shown in Figure 27.

GTraverseType was introduced earlier, with respect to GTree; here are the possible values for GNode:

  • G_IN_ORDER first recurses the leftmost child of the node, then visits the node itself, then recurses the rest of the node's children. This isn't very useful; mostly it is intended for use with GTree.

  • G_PRE_ORDER visits the current node, then recurses each child in turn.

  • G_POST_ORDER recurses each child in order, then visits the current node.

  • G_LEVEL_ORDER first visits the node itself; then each of the node's children; then the children of the children; then the children of the children of the children; and so on. That is, it visits each node of depth 0, then each node of depth 1, then each node of depth 2, etc.

GNode's tree-traversal functions have a GTraverseFlags argument. This is a bitfield used to change the nature of the traversal. Currently there are only three flags---you can visit only leaf nodes, only non-leaf nodes, or all nodes:

  • G_TRAVERSE_LEAFS means to traverse only leaf nodes.

  • G_TRAVERSE_NON_LEAFS means to traverse only non-leaf nodes.

  • G_TRAVERSE_ALL is simply a shortcut for (G_TRAVERSE_LEAFS | G_TRAVERSE_NON_LEAFS).

#include <glib.h>

guint g_node_n_nodes(GNode* root, GTraverseFlags flags);

GNode* g_node_get_root(GNode* node);

gboolean g_node_is_ancestor(GNode* node, GNode* descendant);

guint g_node_depth(GNode* node);

GNode* g_node_find(GNode* root, GTraverseType order, GTraverseFlags flags, gpointer data);

Figure 27. GNode Properties

The remaining GNode functions are straightforward; most of them are simply operations on the node's list of children. Figure 28 lists them. There are two function typedefs unique to GNode:


typedef gboolean (*GNodeTraverseFunc) (GNode* node, gpointer data);
typedef void (*GNodeForeachFunc) (GNode* node, gpointer data);

These are called with a pointer to the node being visited, and the user data you provide. A GNodeTraverseFunc can return TRUE to stop whatever traversal is in progress; thus you can use GNodeTraverseFunc in combination with g_node_traverse() to search the tree by value.

#include <glib.h>

void g_node_traverse(GNode* root, GTraverseType order, GTraverseFlags flags, gint max_depth, GNodeTraverseFunc func, gpointer data);

guint g_node_max_height(GNode* root);

void g_node_children_foreach(GNode* node, GTraverseFlags flags, GNodeForeachFunc func, gpointer data);

void g_node_reverse_children(GNode* node);

guint g_node_n_children(GNode* node);

GNode* g_node_nth_child(GNode* node, guint n);

GNode* g_node_last_child(GNode* node);

GNode* g_node_find_child(GNode* node, GTraverseFlags flags, gpointer data);

gint g_node_child_position(GNode* node, GNode* child);

gint g_node_child_index(GNode* node, gpointer data);

GNode* g_node_first_sibling(GNode* node);

GNode* g_node_last_sibling(GNode* node);

Figure 28. Accessing a GNode

Hash Tables

GHashTable is a simple hash table implementation, providing an associative array with constant-time lookups. To use the hash table, you must provide a GHashFunc, which should return a positive integer when passed a hash key:


typedef guint (*GHashFunc) (gconstpointer key);

Each returned guint (modulus the size of the table) corresponds to a "slot" or "bucket" in the hash; GHashTable handles collisions by storing a linked list of key-value pairs in each slot. Thus, the guint values returned by your GHashFunc must be fairly evenly distributed over the set of possible guint values, or the hash table will degenerate into a linked list. Your GHashFunc must also be fast, since it is used for every lookup.

In addition to GHashFunc, a GCompareFunc is required to test keys for equality. Somewhat unpleasantly, GHashTable does not use GCompareFunc in the same way GSList and GTree do, although the function signature is the same. Here GCompareFunc is expected to be an equality operator, returning TRUE if its arguments are equal. It should not be a qsort()-style comparison function. The key comparison function is used to find the correct key-value pair when hash collisions result in more than one pair in the same hash slot.

To create and destroy a GHashTable, use the constructor and destructor listed in Figure 29. Remember that glib has no way of knowing how to destroy the data contained in your hash table; it only destroys the table itself.

#include <glib.h>

GHashTable* g_hash_table_new(GHashFunc hash_func, GCompareFunc key_compare_func);

void g_hash_table_destroy(GHashTable* hash_table);

Figure 29. GHashTable

Ready-to-use hash and comparison functions are provided for the most common keys: integers, pointers, and strings. These are listed in Figure 30. The functions for integers accept a pointer to a gint, rather than the gint itself. If you pass NULL as the hash function argument to g_hash_table_new(), g_direct_hash() is used by default. If you pass NULL as the key equality function, then simple pointer comparison is used (equivalent to g_direct_equal(), but without a function call).

#include <glib.h>

guint g_int_hash(gconstpointer v);

gint g_int_equal(gconstpointer v1, gconstpointer v2);

guint g_direct_hash(gconstpointer v);

gint g_direct_equal(gconstpointer v1, gconstpointer v2);

guint g_str_hash(gconstpointer v);

gint g_str_equal(gconstpointer v1, gconstpointer v2);

Figure 30. Pre-written hashes/comparisons

Manipulating the hash is simple. The routines are summarized in Figure 31. Insertions do not copy the key or value; these are entered into the table exactly as you provide them, overwriting any pre-existing key-value pair with the same key ("same" is defined by your hash and equality functions, remember). If this is a problem, you must do a lookup or remove before you insert. Be especially careful if you dynamically allocate keys or values.

The simple g_hash_table_lookup() returns the value it finds associated with key, or NULL if there is no value. Sometimes this won't do. For example, NULL may be a valid value in itself. If you're using strings as keys, especially dynamically allocated strings, knowing that a key is in the table might not be enough; you might want to retrieve the exact gchar* the hash table is using to represent key "foo". A second lookup function is provided for cases like these. g_hash_table_lookup_extended() returns TRUE if the lookup succeeded; if it returns TRUE, it places the key and value it found in the locations it's given.

#include <glib.h>

void g_hash_table_insert(GHashTable* hash_table, gpointer key, gpointer value);

void g_hash_table_remove(GHashTable * hash_table, gconstpointer key);

gpointer g_hash_table_lookup(GHashTable * hash_table, gconstpointer key);

gboolean g_hash_table_lookup_extended(GHashTable* hash_table, gconstpointer lookup_key, gpointer* orig_key, gpointer* value);

Figure 31. Manipulating GHashTable

GHashTable keeps an internal array whose size is a prime number. It also keeps a count of the number of key-value pairs stored in the table. If the average number of pairs per available slot drops below 0.3 (or so), the array is made smaller; if it goes above 3, the array is made larger to reduce collisions. Resizing happens automatically whenever you insert or remove pairs from the table. This ensures the hash table's memory use is optimal. Unfortunately, it is inefficient to rebuild the hash table over and over if you're doing a large number of insertions or removals. To solve the problem, the hash table can be frozen, meaning that resizing is temporarily suppressed. When you're done adding and removing items, you simply thaw the table, resulting in a single optimal-size calculation. (Be careful though; a frozen table can end up with many hash collisions if you add large quantities of data. This should be fine as long as you thaw before you do any lookups.) The functions are in Figure 32.

#include <glib.h>

void g_hash_table_freeze(GHashTable* hash_table);

void g_hash_table_thaw(GHashTable* hash_table);

Figure 32. Freezing and thawing GHashTable