3

I’m trying to design a generic linked list in C where each node can store arbitrary data. Right now I have the following definitions:

typedef struct {
    int clientID;
    char name[256];
} Client;

typedef struct Node_ {
    Client client;
    struct Node_ *next;
} Node;

However, data inside the node (Client, in this case) is going to change, while the structure of the linked list and basic operations (insertion, deletion, etc...) are not. Right now, I am rewriting the basic API for each new type but that is proving to be really inefficient and time consuming.

Question:

What is the best practice or alternatives for generalizing a data container in linked lists in C, if that is really feasible ?

5
  • You can look at "chained linked list", ie every node is a linked list of its own. Or just a pointer to an array allocated elsewhere. There kind of data types with individual allocation are however horribly inefficient on a modern computer (year 1990 or later) so you probably shouldn't be using them. Commented Sep 22 at 14:52
  • 1
    Do you need to have different types of data on the same list, or are they on separate lists? If different types of data are on different lists, and you are just trying to avoid duplicating list manipulation functions for each type of data, there are various ways to avoid that duplication. Commented Sep 22 at 15:17
  • 1
    Put the next node pointer at the start of each node data structure so that it’s always in the same location. Your list manipulation functions can then be generic / independent of data type. Commented Sep 22 at 16:14
  • Thank you @IanAbbott. Exactly, list manipulation functions behave in the same way, the only difference is the type of each node in the linked list. Commented Sep 22 at 16:29
  • Thank you @tinman. How would you go about that ? Do you have a small piece I can experiment with ? Commented Sep 22 at 16:29

2 Answers 2

4

First, you'll need an enum to tag what kind of data you have, for example:

enum datatype { CLIENT, SERVER };

Then your list nodes would contain that and either a void * for the actual data:

typedef struct Node_ {
    enum datatype type;
    void *data;
    struct Node_ *next;
} Node;

Or you would create a union of all of the types and use that in the list nodes:

union data {
    Client client;
    Server server;
};

typedef struct Node_ {
    enum datatype type;
    union data data;
    struct Node_ *next;
} Node;
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much +1. That is really helpful. Perhaps, I missed to clarify, but basic linked list manipulation is going to be used across other problems, perhaps involving an Employee structure, House structure, or even a single int. The ideal would be reusing that same API for each different type. The only thing that changes is the data contained in the node. Do you think that is feasible? I have a bit of functional programming background, and I am used to polymorphic functions.
The above does that. You would either add each of those types to the common union type, or you would point the void * to the data in question.
It would be better to have next as the first struct member which would allow for some common list functions via casting.
1

Here's a proof of concept I've been working on that leans heavily on void * and callbacks. It's not perfeshunal kwality by any stretch, but it should be illustrative.

First, the node type itself:

enum linkType { LINK_PREV, 
                LINK_NEXT, 
                LINK_LEFT, 
                LINK_RIGHT, 
                LINK_PARENT, 
                NUM_LINKS };
struct node {
  void *key;
  void *data;
  struct node *link[NUM_LINKS];
};

The intent is that this node type can be used for any type of container - list, tree, queue, stack, etc., so instead of distinct members for next, previous, left, right, etc., I just create a bunch of enumeration constants to identify the link type and array of links indexed by those enumerations. If I need to add more link types, I just update the enumeration and rebuild. Even through "prev/next" and "left/right" are semantically similar, I leave them distinct. As I play with this I may change that.

This node type is not directly exposed to any client code; I keep the implementation hidden behind the following interface:

#ifndef NODE_H
#define NODE_H

#include <stdio.h>

typedef struct node Node;

Node *nodeCreate( const void *key, 
                  const void *data, 
                  void *(*kcopy)(const void *), 
                  void *(*dcopy)(const void *) );

void nodeDestroy( Node *n, 
                  void (*kfree)(void *), 
                  void (*dfree)(void *) );

/**
 * Instead of exposing the node definition directly, I only expose functions
 * to operate on nodes.
 *
 * As I play with more container types I will likely add more traversal functions,
 * but this is as far as I've gotten.
 */
Node *nodeNext( const Node *n );
Node *nodePrev( const Node *n );
Node *nodeLeft( const Node *n );
Node *nodeRight( const Node *n );
Node *nodeParent( const Node *n );
Node *nodeInsertBefore( Node *toInsert, Node *current );
Node *nodeInsertAfter( Node *toInsert, Node *current );
Node *nodeInsertLeft( Node *toInsert, Node *parent );
Node *nodeInsertRight( Node *toInsert, Node *parent );

Node *nodeRemoveBefore( Node *current );
Node *nodeRemoveAfter( Node *current );
Node *nodeRemoveLeft( Node *current );
Node *nodeRemoveRight( Node *current );

const void *nodeKey( const Node *n );
void *nodeData( const Node *n );

void nodeDump( const Node *n, FILE *stream );

#endif

This code relies on you supplying the following callbacks:

  • kcopy - allocates memory for and assigns contents of your key

  • dcopy - allocates memory for and assigns contents of your data

  • kfree - deallocates memory for the key

  • dfree - deallocates memory for your data.

The node creation function is pretty simple:

Node *nodeCreate( const void *key, const void *data, void *(*kcopy)(const void *), void *(*dcopy)(const void *) )
{
  assert( key && "nodeCreate: key parameter cannot be NULL" );
  assert( kcopy && "nodeCreate: kcopy parameter cannot be NULL" );

  Node *n = malloc( sizeof *n );
  if ( n )
  {
    n->key = kcopy( key );
    if ( data && dcopy )
      n->data = dcopy( data );

    memset( n->link, 0, sizeof *n->link * NUM_LINKS );
  }
  return n;
}

The resources allocated for the node, node key, and node data are owned by the container.

I've similarly built a list type; in addition to tracking the list head and tail nodes, it stores all the callbacks necessary to create, free, and compare nodes:

struct list {
  Node *head, *tail;
  size_t count;
  void *(*kcpy)(const void *);
  void *(*dcpy)(const void *);
  void (*kfree)(void *);
  void (*dfree)(void *);
  int  (*kcmp)(const void *, const void *);
};

In addition to the key and data allocation and deallocation functions, we need a callback to compare key values; by convention, it will return an integer value < 0 if the first argument is "less than" (ordered before) the second argument, > 0 if the first argument is "greater than" (ordered after) the second argument, or = 0 if the arguments are "equal" (have the same ordering).

Again, like the node type, I do not directly expose this list type, I hide it behind the following interface:

#ifndef LIST_H
#define LIST_H

#include <stdbool.h>
#include <stdio.h>
#include "node.h"

typedef struct list List;

List *listCreate( void *(*kcopy)(const void *), 
                  void *(*dcopy)(const void *), 
                  void (*kfree)(void *), 
                  void (*dfree)(void *), 
                  int (*kcmp)(const void *, const void *) );

void listClear( List *l );
void listDestroy( List *l );

size_t listSize( List *l );

Node *listInsert( List *l, const void *key, const void *data );
Node *listInsertAt( List *l, const void *key, const void *data, size_t pos );
Node *listPush( List *l, const void *key, const void *data );
Node *listAppend( List *l, const void *key, const void *data );
bool listFindPos( List *l, const void *key, size_t *pos );
bool listContains( List *l, const void *key );
Node *listFind( List *l, const void *key );
Node *listFindAt( List *l, size_t pos );
Node *listRemove( List *l, const void *key );
Node *listRemoveAt( List *l, size_t pos );
Node *listPop( List *l );
Node *listPopTail( List *l );
void listDump( List *l, FILE *stream );

#endif

The listCreate function allocates memory for the list object and stores all the associated callbacks:

List *listCreate( void *(*kcpy)(const void *), 
                  void *(*dcpy)(const void *), 
                  void (*kfree)(void *), 
                  void (*dfree)(void *), 
                  int (*kcmp)(const void *, const void * ) )
{
  assert( kcpy && "listCreate: kcpy cannot be NULL" );
  assert( kfree && "listCreate: kfree cannot be NULL" );
  assert( kcmp && "listCreate: kcmp cannot be NULL" );

  List *l = malloc( sizeof *l );
  if ( l )
  {
    l->count = 0;
    l->kcpy = kcpy;
    l->dcpy = dcpy;
    l->kfree = kfree;
    l->dfree = dfree;
    l->kcmp = kcmp;

    int i = 0;

    /**
     * head and tail are dummy nodes that don't actually store
     * any data, they're just there to mark the beginning
     * and ending of the list.  They store an int key by default,
     * regardless of what actual key type is going to be.
     */
    l->head = nodeCreate( &i, NULL, createInt, NULL );
    l->tail = nodeCreate( &i, NULL, createInt, NULL );
    nodeInsertBefore( l->head, l->tail );
  }
  return l;
}

The various insertion, push, and append routines create the node using the specified key and data and the stored callbacks:

/**
 * Inserts new item in key order
 */
Node *listInsert( List *l, const void *key, const void *data )
{
  assert( l && "listInsert: l cannot be NULL" );
  assert( key && "listInsert: key cannot be NULL" );
  Node *n = nodeCreate( key, data, l->kcpy, l->dcpy );
  if ( n )
  {
    Node *cur;
    for ( cur = nodeNext( l->head ); cur != l->tail && l->kcmp( nodeKey( cur ), key ) < 0; cur = nodeNext( cur ) )
      ;
    if ( !nodeInsertBefore( n, cur ) )
    {
      nodeDestroy( n, l->kfree, l->dfree );
      n = NULL;
    }
    else
      l->count++;
  } 
  return n;
}

/**
 * Inserts new item at specified location; fails if location is greater
 * than list size.
 */
Node *listInsertAt( List *l, const void *key, const void *data, size_t pos )
{
  assert( l && "listInsertAt: l cannot be NULL" );
  assert( key && "listInsertAt: key cannot be NULL" );

  if ( pos > l->count )
    return NULL;

  /**
   * Uses the callbacks stored with the list object to create copies of the key
   * and data.
   */
  Node *n = nodeCreate( key, data, l->kcpy, l->dcpy );
  if ( n )
  {
    Node *cur = l->head;
    for ( size_t i = 0; i < pos; i++ )
      cur = nodeNext( cur );

    if ( !nodeInsertBefore( n, cur ) )
    {
      nodeDestroy( n, l->kfree, l->dfree );
      n = NULL;
    }
    else
      l->count++;
  }
  return n;
}

Small proof of concept - I create several lists:

  • one that only stores integers; the key is the data, so the data callbacks are left NULL;

  • one that stores strings keyed by integers;

  • one that stores integers keyed by strings;

So I need to create the callbacks:

void *copyInt( const void *data )
{
  const int *ldata = data;
  int *p = malloc( sizeof *p );
  if ( p )
    *p = *ldata;
  return p;
}

int cmpInt( const void *l, const void *r )
{
  const int *ll = l;
  const int *lr = r;

  if ( *ll < *lr )
    return -1;
  else if ( *ll > *lr )
    return 1;
  return 0;
}

void *copyStr( const void *s )
{
  char *ls = malloc( strlen( (const char *) s ) + 1 );
  if ( ls )
    strcpy( ls, (const char *) s );
  return ls;
}

int cmpStr( const void *l, const void *r )
{
  return strcmp( (const char *) l, (const char *) r );
}

Then create my various lists:

  List *l = listCreate( copyInt, NULL, free, NULL, cmpInt );
  List *li = listCreate( copyInt, copyStr, free, free, cmpInt );
  List *ls = listCreate( copyStr, copyInt, free, free, cmpStr );

I generate some random values and strings, load them into the lists in key order:

    for ( size_t i = 0; i < count; i++ )
    {
      int val = rand() % 1000;
      char buf[16];

      
      Node *n = listInsert( l, &val, NULL ); 
      Node *ni = listInsert( li, &val, randomStr( buf, sizeof buf ) );
      Node *ns = listInsert( ls, buf, &val );

      if ( !n || !ni || !ns )
      {
        fputs( "    Error inserting value\n", stderr );
        break;
      }
    }

Then print the elements out, popping them off the lists as I go:

    Node *n = NULL;
    
    puts( "    l list contents: " );
    while ( (n = listPop(l)) )
    {
      printf( "\t    value: %5d\n", *(int *) nodeKey( n ) );
      nodeDestroy( n, free, NULL );
    }
    puts( "    li list contents: " );
    while( (n = listPop(li)) )
    {
      printf( "\t    key: %5d; value \"%s\"\n", *(int *) nodeKey( n ), (char *) nodeData( n ) );
      nodeDestroy( n, free, free );
    };

    puts( "    ls list contents: " );
    while ( (n = listPop(ls)) )
    {
      printf( "\t    key: \"%s\"; value: %d\n", (char *) nodeKey( n ), *(int *) nodeData( n ) );
      nodeDestroy( n, free, free );
    }

Output:

$ ./listtest
     l list size: 10
    li list size: 10
    ls list size: 10
    l list contents: 
            value:    13
            value:    54
            value:    60
            value:   211
            value:   245
            value:   303
            value:   383
            value:   478
            value:   607
            value:   731
    li list contents: 
            key:    13; value "g agadabedfeee "
            key:    54; value "gfe bgb c e efe"
            key:    60; value "bbdcddffbbfecf "
            key:   211; value "daddgbfgc ecgab"
            key:   245; value "bafbaddebac ffe"
            key:   303; value "eag  b cbec cag"
            key:   383; value "gcbagffbgabf gd"
            key:   478; value "baaff d bceeafd"
            key:   607; value "gdcgfafbace d e"
            key:   731; value " ea   facfefeec"
    ls list contents: 
            key: " ea   facfefeec"; value: 731
            key: "baaff d bceeafd"; value: 478
            key: "bafbaddebac ffe"; value: 245
            key: "bbdcddffbbfecf "; value: 60
            key: "daddgbfgc ecgab"; value: 211
            key: "eag  b cbec cag"; value: 303
            key: "g agadabedfeee "; value: 13
            key: "gcbagffbgabf gd"; value: 383
            key: "gdcgfafbace d e"; value: 607
            key: "gfe bgb c e efe"; value: 54

So, in theory, this code could handle any arbitrary key and data types; you just need to supply the appropriate callbacks.

One drawback (of many) with this approach is you can't pass literal values to the various insertion or search functions; you must always create some kind of object and pass its address. You'd probably want to create a type-aware front end for your specific use case and have it call this code:

Node *myListInsert( List *l, int key, char *data )
{
  return listInsert( l, &key, data );
}
...
n = myListInsert( mylist, 42, "spam spam spam spam spam" );

With all the callbacks and the unused link types in every node this isn't the most efficient approach, but if you wanted efficient you wouldn't be creating a generic container type in the first place.

1 Comment

John, I appreciate your extremely though answer a lot. I'll take a look when I get to it. Thank you so much

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.