0

I'm trying to write a simple split function in c, where you supply a string and a char to split on, and it returns a list of split-strings:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char ** split(char * tosplit, char delim){
        int amount = 0;
        for (int i=0; i<strlen(tosplit); i++) {
                if (tosplit[i] == delim) {
                        amount++;
                }
        }
        char ** split_ar = malloc(0);
        int counter = 0;
        char *token = strtok(tosplit, &delim);
        while (token){
                split_ar[counter] = malloc(0);
                split_ar[counter] = token;
                token = strtok(NULL, &delim);
                counter++;
        }
        split_ar[counter] = 0;
        return split_ar;
}


int main(int argc, char *argv[]){
  if (argc == 2){
    char *tosplit = argv[1];
                char delim = *argv[2];
                char ** split_ar = split(tosplit, delim);
                while (*split_ar){
            printf("%s\n", *split_ar);
                        split_ar++;
                }
  } else {
    puts("Please enter words and a delimiter.");
  }
}

I use malloc twice: once to allocate space for the pointers to strings, and once allocate space for the actual strings themselves. The strange thing is: during testing I found that the code still worked when I malloc'ed no space at all.

When I removed the malloc-lines I got Segfaults or Malloc-assertion errors, so the lines do seem to be necessary, even though they don't seem to do anything. Can someone please explain me why?

I expect it has something to with strtok; the string being tokenized is initialized outside the function scope, and strtok returns pointers to the original string, so maybe malloc isn't even necessary. I have looked at many old SO threads but could find nothing similar enough to answer my question.

12
  • It's implementation-defined whether calling malloc(0) returns a null pointer, or a valid pointer to 0 bytes of memory. So you need to either (a) take care not to try to allocate 0 bytes of memory, or (b) if you do, don't print an error message if malloc(0) returns NULL. Commented Mar 10, 2023 at 17:40
  • 1
    Maybe because malloc((0) returns a pointer to a memory zone of length 0, and when you dereferencing this pointer you get undefined behaviour which appears to work. Commented Mar 10, 2023 at 17:42
  • In answer to your question: uninitialized pointers are different than null pointers, and are different from properly-allocated, valid pointers. See also this answer. Commented Mar 10, 2023 at 17:42
  • You can't call char ** split_ar = malloc(0);, and then start filling in split_ar[counter]. For simplicity, try calling split_ar = malloc(50 * sizeof(char *)), where 50 is a guess of how many strings you might need. (That's not a good long-term solution, but it's a start.) Commented Mar 10, 2023 at 17:44
  • If you say split_ar[counter] = malloc(…);, immediately followed by split_ar[counter] = token;, you're throwing away (failing to use) the memory you just allocated, and instead filling in split_ar with a pointer value — of dubious longevity — from token. Commented Mar 10, 2023 at 17:47

1 Answer 1

0

Why does malloc(0) in C not produce an error ... ?

why the code works despite the malloc(0).

Calling malloc(0) is OK. Using that pointer later as in split_ar[counter] = malloc(0); is undefined behavior (UB) as even split_ar[0] attempts to access outside the allocated memory.

When code incurs undefined behavior, there is no should produce an error. It is undefined behavior. There is no defined behavior in undefined behavior. It might "work", it might not. It is UB.

C does not certainly add safeguards to weak programming.

If you need a language to add extra checks for such mistakes, C is not the best answer.


Instead, allocate the correct amount. In OP's case I think it is, at most, amount + 2. (Consider the case when tosplit does not contain any delimiters.)

char **split_ar = malloc(sizeof split_ar[0] * (amount + 2));
if (split_ar == NULL) {
  Handle_OutOfMemory();
}

Further

Code is only attempting to copy the pointer and not the string.

// Worthless code
//split_ar[counter] = malloc(0);
//split_ar[counter] = token;

Instead, allocate for the string and copy the string. Research strdup().

// Sample code using the very common strdup().
split_ar[counter] = strdup(token);

Advanced

  1. Use strspn() and strcspn() to walk down an sing and parse it. This has the nice benefit of operating on a const string and readily knowing the size of the token - useful in allocating.

  2. Use the same technique twice to pre-calculate token count as well as parsing. This avoids differences that exist in OP's 2 methods.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.