2

I could use a set of eyes (or more) on this code. I'm trying to read in a set amount of bytes from a filestream (f1) to an array/buffer (file is a text file, array is of char type). If I read in size "buffer - 1" I want to "realloc" the array and the continue to read, starting at where I left off. Basically I'm trying to dynamically expand the buffer for the file of unknown size. What I'm wondering:

  1. Am I implementing this wrong?
  2. How would I check failure conditions on something like "realloc" with the code the way it is?
  3. I'm getting a lot of warnings when I compile about "implicit declaration of built-in function realloc..." (I'm seeing that warning for my use of read, malloc, strlen, etc. as well.
  4. When "read()" get's called a second time (and third, fourth, etc.) does it read from the beginning of the stream each time? That could be my issue is I only seem to return the first "buff_size" char's.

Here's the snippet:

//read_buffer is of size buff_size
n_read = read(f1, read_buffer, buff_size - 1);
read_count = n_read;
int new_size = buff_size;
while (read_count == (buff_size - 1))
{

        new_size *= 2;
        read_buffer = realloc(read_buffer, new_size);
        n_read = read(f1, read_buffer[read_count], buff_size - 1);
        read_count += n_read;
}

As I am learning how to do this type of dynamic read, I'm wondering if someone could state a few brief facts about best practices with this sort of thing. I'm assuming this comes up a TON in the professional world (reading files of unknown size)? Thanks for your time. ALSO: As you guys find good ways of doing things (ie a technique for this type of problem), do you find yourselves memorizing how you did it, or maybe saving it to reference in the future (ie is a solution fairly static)?

2
  • You are calling read() twice. Don't repeat yourself. Commented May 2, 2012 at 19:00
  • 4
    You are calling read() twice. Don't repeat yourself Commented May 3, 2012 at 22:51

3 Answers 3

6

If you're going to expand the buffer for the entire file anyway, it's probably easiest to seek to the end, get the current offset, then seek back to the beginning and read in swoop:

size = lseek(f1, 0, SEEK_END); // get offset at end of file

lseek(f1, 0, SEEK_SET); // seek back to beginning

buffer = malloc(size+1); // allocate enough memory.

read(f1, buffer, size);  // read in the file

Alternatively, on any reasonably modern POSIX-like system, consider using mmap.

Sign up to request clarification or add additional context in comments.

Comments

2

Here's a cool trick: use mmap instead (man mmap).

In a nutshell, say you have your file descriptor f1, on a file of nb bytes. You simply call

char *map = mmap(NULL, nb, PROT_READ, MAP_PRIVATE, f1, 0);
if (map == MAP_FAILED) {
    return -1; // handle failure
}

Done.

You can read from the file as if it was already in memory, and the OS will read pages into memory as necessary. When you're done, you can simply call

munmap(map, nb);

and the mapping goes away.

edit: I just re-read your post and saw you don't know the file size. Why?

You can use lseek to seek to the end of the file and learn its current length.

If instead it's because someone else is writing to the file while you're reading, you can read from your current mapping until it runs out, then call lseek again to get the new length, and use mremap to increase the size. Or, you could simply munmap what you have, and mmap with a new "offset" (the number I set to 0, which is how many bytes from the file to skip).

Comments

2
#include <stdlib.h> /* for realloc() */
#include <string.h> /* for memcpy() */
#include <unistd.h> /* for read() */

char buff[512] ; /* anything goes */
size_t done, size;
char *result = NULL;
int fd;

done = size = 0;
while (1) {
        int n_read;
        n_read = read(fd, buff, sizeof buff);
        if (n_read <=0) {
            ... for network connections, (n_read == -1 && errno == EAGAIN)
            ... should be handled special (by a continue) here.
            break;
            }
        if (done+n_read > size) {
            result = realloc(result, size ? 2*size : n_read );
            ... maybe handle NULL return from realloc here ...
            size = size ? 2*size : n_read;
            }
        memcpy(result+done, buff, n_read);
        done += n_read;
        }
 ... and maybe shave down result a bit here ...

Note: this is more or less the vanilla way. Another way would be to malloc a real big array first, and realloc to the right size later. That will reduce the number of reallocs, and it might be more gentle for the malloc arena, wrt fragmentation. YMMV.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.