1

This question is about the best practices to handle this pointer problem I've dug myself into.

I have an array of structures that is dynamically generated in a function that reads a csv.

int init_from_csv(instance **instances,char *path) {
    ... open file, get line count
   *instances = (instance*) malloc( (size_t) sizeof(instance) * line_count );
    ... parse and set values of all instances
    return count_of_valid_instances_read;
}
// in main()
instance *instances;
int ins_len = init_from_csv(&instances, "some/path/file.csv");

Now, I have to perform functions on this raw data, split it, and perform the same functions again on the splits. This data set can be fairly large so I do not want to duplicate the instances, I just want an array of pointers to structs that are in the split.

instance **split = (instance**) malloc (sizeof(instance*) * split_len_max);
int split_function(instance *instances, ins_len, instances **split){
  int i, c;
  c = 0;
  for (i = 0; i < ins_len; i++) {
    if (some_criteria_is_true) {
      split[c++] = &instances[i];
  }
  return c;
}

Now my question what would be the best practice or most readable way to perform a function on both the array of structs and the array of pointers? For a simple example count_data().

int count_data (intances **ins, ins_len, float crit) {
  int i,c;
  c = 0;
  for (i = 0; i < ins_len; i++) {
    if ins[i]->data > crit) {
      ++c;
    }
  }
  return c;
}

// code smell-o-vision going off by now
int c1 = count_data (split, ins_len, 0.05);  // works
int c2 = count_data (&instances, ins_len, 0.05); // obviously seg faults

I could make my init_from_csv malloc an array of pointers to instances, and then malloc my array of instances. I want to learn how a seasoned c programmer would handle this sort of thing though before I start changing a bunch of code.

1
  • You can't do that; you're passing an array of pointers in one instance, and an array of actual structs in the other, which means the indexing will be messed up in one of the cases. It's probably better to have 3 functions: one which performs the operation on an individual entry, and 2 helper functions to deal with each type of array. Commented Aug 2, 2012 at 4:00

1 Answer 1

2

This might seem a bit grungey, but if you really want to pass that instances** pointer around and want it to work for both the main data set and the splits, you really need to make an array of pointers for the main data set too. Here's one way you could do it...

size_t i, mem_reqd;
instance **list_seg, *data_seg;

/* Allocate list and data segments in one large block */
mem_reqd = (sizeof(instance*) + sizeof(instance)) * line_count;
list_seg = (instance**) malloc( mem_reqd );
data_seg = (instance*) &list_seg[line_count];

/* Index into the data segment */
for( i = 0; i < line_count; i++ ) {
    list_seg[i] = &data_seg[i];
}

*instances = list_seg;

Now you can always operate on an array of instance* pointers, whether it's your main list or a split. I know you didn't want to use extra memory, but if your instance struct is not trivially small, then allocating an extra pointer for each instance to prevent confusing code duplication is a good idea.

When you're done with your main instance list, you can do this:

void free_instances( instance** instances )
{
    free( instances );
}

I would be tempted to implement this as a struct:

struct instance_list {
    instance ** data;
    size_t length;
    int owner;
};

That way, you can return this from your functions in a nicer way:

instance_list* alloc_list( size_t length, int owner )
{
    size_t i, mem_reqd;
    instance_list *list;
    instance *data_seg;

    /* Allocate list and data segments in one large block */
    mem_reqd = sizeof(instance_list) + sizeof(instance*) * length;
    if( owner ) mem_reqd += sizeof(instance) * length;
    list = (instance_list*) malloc( mem_reqd );
    list->data = (instance**) &list[1];
    list->length = length;
    list->owner = owner;

    /* Index the list */
    if( owner ) {
        data_seg = (instance*) &list->data[line_count];
        for( i = 0; i < line_count; i++ ) {
            list->data[i] = &data_seg[i];
        }
    }

    return list;
}

void free_list( instance_list * list )
{
    free(list);
}

void erase_list( instance_list * list )
{
    if( list->owner ) return;
    memset((void*)list->data, 0, sizeof(instance*) * list->length);
}

Now, your function that loads from CSV doesn't have to focus on the details of creating this monster, so it can simply do the task it's supposed to do. You can now return lists from other functions, whether they contain the data or simply point into other lists.

instance_list* load_from_csv( char *path )
{
    /* get line count... */
    instance_list *list = alloc_list( line_count, 1 );
    /* parse csv ... */
    return list;
}

etc... Well, you get the idea. No guarantees this code will compile or work, but it should be close. I think it's important, whenever you're doing something with arrays that's even slightly more complicated than just a simple array, it's useful to make that tiny extra effort to encapsulate it. This is the major data structure you'll be working with for your analysis or whatever, so it makes sense to give it a little bit of stature in that it has its own data type.

I dunno, was that overkill? =)

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your very thorough answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.