C - Dynamic Array

Question

I'm trying to feed an array with fscanf() while looping through a file containing a list of integers, n integers long. It seems that I need to use malloc and/or potentially realloc. I've heard that the malloc command takes a noticeable amount of execution time and that it's best to over-allocate. Would someone mind helping me out with understanding the building blocks of accomplishing this end?

Disclaimer: I'm new to C.

I've heard. I think you need to take your time and read a little about dynamic allocation. There's a bunch of tutorials out there. Don't be afraid. — karlphillip
– karlphillip, Commented Jul 19, 2011 at 17:09
Claimer: You will not notice the execution time of malloc(). — Kerrek SB
– Kerrek SB, Commented Jul 19, 2011 at 17:09

cnicutar · Accepted Answer · 2011-07-19 17:14:00Z

7

No, what you've heard is misleading (at least to me). malloc is a just a function, and usually a fast one.

Most of the time it does all of its job in user-space. It "overallocates" so you don't have to
The bookkeeping (the linked list with free blocks etc.) is highly optimized since virtually everyone uses malloc

It's unrealistic to think you can easily beat malloc at this game. I am sorry if this doesn't answer your question (which was pretty general) but you have to realize there is no (~~spoon~~) optimization you can easily implement.

answered Jul 19, 2011 at 17:14

cnicutar

184k26 gold badges378 silver badges398 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Alok Save Over a year ago

malloc is a just a function, Well it is one of the most important functions so i would not say it is just a function. :)

cnicutar Over a year ago

@Als I hear you :-) I meant it as in "not a system call".

Michael Dorgan Over a year ago

Disagree. Malloc, depending on how it's implemented is not usually a fast function. Quite often, it is a 10 top performance hog, depending on how many blocks you are tracking. It all depends on how the list of blocks internally is handled, though usually it is a O(n) or worse exectuon time to add or remove items. If OP is doing a malloc within a while loop, there is a strong chance that he will feel the overhead, though not when coupled with an IO function such as fscanf.

cnicutar Over a year ago

@Michael Dorgan You may be right, but the cost of the work done to find out the needed size might be prohibitive. Now, if he said he wants to realloc as he goes, well that's another question.

Michael Dorgan Over a year ago

realloc - gah! I had to implement that the other day on an embedded device due to a very poor lib we inherited. That function is an ugly way to manage memory. Fragmentation - check. Mem moving constantly - check. Error prone - check. A trifecta of badness.

|

CPD · Accepted Answer · 2011-07-19 17:16:09Z

6

Reading the file will be much slower than allocating the memory!

You may want to read the whole file and find out how many entires you want and then malloc() all in one go.

malloc(sizeof(int)*n)

answered Jul 19, 2011 at 17:16

CPD

4272 silver badges11 bronze badges

3 Comments

Michael Dorgan Over a year ago

This is a much better way to handle your memory allocation. Even better if you can get a filesize() before the read to pre-allocate the whole buffer once.

nick_name Over a year ago

This is an awesome piece of advice. Thank you.

Gravity Over a year ago

So...the suggestion is that the file be read twice? Once to find out how long it is and again to store it? That makes sense if performance is not a primary concern.

n. m. could be an AI · Accepted Answer · 2011-07-19 17:32:26Z

4

Premature optimization is the root of all evil (google it).

That said, allocate whatever amount you guess is reasonable/typical for the task at hand, and double it whenever you have to realloc. This strategy is rather hard to beat.

answered Jul 19, 2011 at 17:32

n. m. could be an AI

122k14 gold badges141 silver badges267 bronze badges

Comments

Michael Dorgan · Accepted Answer · 2011-07-19 17:22:56Z

0

For your specific case, malloc isn't going to cause you issues. The run time of fscanf will be many, many times slower than the overhead of malloc and free. But, it can add up in high performance areas of an app. In these areas, there are other ways such as mem pools and fixed size allocators than can combat malloc()'s overhead. But, you are no where near needing to worry about performance overhead when you are just starting out.

answered Jul 19, 2011 at 17:22

Michael Dorgan

12.5k3 gold badges36 silver badges63 bronze badges

Comments

ninjalj · Accepted Answer · 2011-07-19 17:33:12Z

0

Note that malloc() adds some overhead to each allocation to maintain its internal data structures (at least 4 bytes in common implementations), so if you integers are 4 bytes long, doing a malloc() for each integer would have >= 50% overhead (probably 75%). This would be the equivalent of using an array of Integer's in Java, instead of an array of int's.

As @Charles Dowd said, it's much better to allocate all the memory in one go, to avoid overhead.

answered Jul 19, 2011 at 17:33

ninjalj

44.1k11 gold badges112 silver badges151 bronze badges

1 Comment

Gravity Over a year ago

Not to mention that doing a function call for every integer in what could be a very large file is likely to be a bit slow.

Gravity · Accepted Answer · 2011-07-19 17:35:14Z

0

You don't want to call malloc or realloc with every integer read, that's for sure. Can you estimate how much space you will need? Do you control the file format? If so, you could have the first line of the file be a single integer that denotes how many integers are to be read from the file. Then you could allocate all the space you need in one go. If you don't control the format and can't do this, follow the other suggestion mentioned in this thread: allocate a reasonably-sized buffer, and double it every time you run out of space.

answered Jul 19, 2011 at 17:35

Gravity

2,7541 gold badge21 silver badges29 bronze badges

Comments

mja · Accepted Answer · 2011-07-20 04:30:14Z

It's a text file (not binary) and not in a fixed format, right? Otherwise it would be easy to calculate the size of the array from the file size ( buffer_size = file_size / record_size , buffersize is in words (the size of an int), the other sizes is in bytes).

This is what I would do (but I'm a bit of a nutter when it comes to applied statistics).

1) What is the maximum number of characters (a.k.a. bytes) a number (a.k.a. record) will occupy in the file, don't forget to include the end-of-line characters (CR, NF) and other blank glyphs (spaces, tabs et.c.)? If you already can estimate what the average size of a record would be, then it is even better, you use that instead of the maximum size.

initial_buffer_size = file_size / max_record_size + 1    (/ is integer division)

2) Allocate that buffer, read your integers into that buffer until it is full. If the whole file is read then you are finished, otherwise resize or reallocate buffer to meet your new estimated needs.

resize_size = 
   prev_buffer_size
   + bytes_not_read / ( bytes_already_read / number_of_records_already_read ) 
   + 1

3) Read into that buffer (from where the previous reading ended) until it is full, or all the of file has been read.

4) If not finished, repeat from step 2) with the new prev_buffer_size.

This will work best if the numbers (records) are totally randomly distributed from a byte-size point of view. If not, and if you have a clue what kind of distribution they have, you can adjust the algorithm according to that.

Collectives™ on Stack Overflow

C - Dynamic Array

7 Answers 7

6 Comments

3 Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

6 Comments

3 Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related