Reading a file of text into array with malloc in C

Question

I could use a set of eyes (or more) on this code. I'm trying to read in a set amount of bytes from a filestream (f1) to an array/buffer (file is a text file, array is of char type). If I read in size "buffer - 1" I want to "realloc" the array and the continue to read, starting at where I left off. Basically I'm trying to dynamically expand the buffer for the file of unknown size. What I'm wondering:

Am I implementing this wrong?
How would I check failure conditions on something like "realloc" with the code the way it is?
I'm getting a lot of warnings when I compile about "implicit declaration of built-in function realloc..." (I'm seeing that warning for my use of read, malloc, strlen, etc. as well.
When "read()" get's called a second time (and third, fourth, etc.) does it read from the beginning of the stream each time? That could be my issue is I only seem to return the first "buff_size" char's.

Here's the snippet:

//read_buffer is of size buff_size
n_read = read(f1, read_buffer, buff_size - 1);
read_count = n_read;
int new_size = buff_size;
while (read_count == (buff_size - 1))
{

        new_size *= 2;
        read_buffer = realloc(read_buffer, new_size);
        n_read = read(f1, read_buffer[read_count], buff_size - 1);
        read_count += n_read;
}

As I am learning how to do this type of dynamic read, I'm wondering if someone could state a few brief facts about best practices with this sort of thing. I'm assuming this comes up a TON in the professional world (reading files of unknown size)? Thanks for your time. ALSO: As you guys find good ways of doing things (ie a technique for this type of problem), do you find yourselves memorizing how you did it, or maybe saving it to reference in the future (ie is a solution fairly static)?

You are calling read() twice. Don't repeat yourself.

wildplasser
– wildplasser

2012-05-02 19:00:35 +00:00
Commented May 2, 2012 at 19:00 — wildplasser
– wildplasser, Commented May 2, 2012 at 19:00
You are calling read() twice. Don't repeat yourself

Robert Martin
– Robert Martin

2012-05-03 22:51:45 +00:00
Commented May 3, 2012 at 22:51 — Robert Martin
– Robert Martin, Commented May 3, 2012 at 22:51

Jerry Coffin · Accepted Answer · 2012-05-02 19:00:48Z

6

If you're going to expand the buffer for the entire file anyway, it's probably easiest to seek to the end, get the current offset, then seek back to the beginning and read in swoop:

size = lseek(f1, 0, SEEK_END); // get offset at end of file

lseek(f1, 0, SEEK_SET); // seek back to beginning

buffer = malloc(size+1); // allocate enough memory.

read(f1, buffer, size);  // read in the file

Alternatively, on any reasonably modern POSIX-like system, consider using mmap.

answered May 2, 2012 at 19:00

Jerry Coffin

494k83 gold badges656 silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Robert Martin · Accepted Answer · 2012-05-02 19:00:18Z

Here's a cool trick: use mmap instead (man mmap).

In a nutshell, say you have your file descriptor f1, on a file of nb bytes. You simply call

char *map = mmap(NULL, nb, PROT_READ, MAP_PRIVATE, f1, 0);
if (map == MAP_FAILED) {
    return -1; // handle failure
}

Done.

You can read from the file as if it was already in memory, and the OS will read pages into memory as necessary. When you're done, you can simply call

munmap(map, nb);

and the mapping goes away.

edit: I just re-read your post and saw you don't know the file size. Why?

You can use lseek to seek to the end of the file and learn its current length.

If instead it's because someone else is writing to the file while you're reading, you can read from your current mapping until it runs out, then call lseek again to get the new length, and use mremap to increase the size. Or, you could simply munmap what you have, and mmap with a new "offset" (the number I set to 0, which is how many bytes from the file to skip).

wildplasser · Accepted Answer · 2012-05-02 20:37:36Z

#include <stdlib.h> /* for realloc() */
#include <string.h> /* for memcpy() */
#include <unistd.h> /* for read() */

char buff[512] ; /* anything goes */
size_t done, size;
char *result = NULL;
int fd;

done = size = 0;
while (1) {
        int n_read;
        n_read = read(fd, buff, sizeof buff);
        if (n_read <=0) {
            ... for network connections, (n_read == -1 && errno == EAGAIN)
            ... should be handled special (by a continue) here.
            break;
            }
        if (done+n_read > size) {
            result = realloc(result, size ? 2*size : n_read );
            ... maybe handle NULL return from realloc here ...
            size = size ? 2*size : n_read;
            }
        memcpy(result+done, buff, n_read);
        done += n_read;
        }
 ... and maybe shave down result a bit here ...

Note: this is more or less the vanilla way. Another way would be to malloc a real big array first, and realloc to the right size later. That will reduce the number of reallocs, and it might be more gentle for the malloc arena, wrt fragmentation. YMMV.

Collectives™ on Stack Overflow

Reading a file of text into array with malloc in C

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related