5

I'm running into a strange situation with passing a pointer to a structure with a very large array defined in the struct{} definition, a float array around 34MB in size. In a nutshell, the psuedo-code looks like this:

typedef config_t{
  ...
  float values[64000][64];
} CONFIG;


int32_t Create_Structures(CONFIG **the_config)
{
  CONFIG  *local_config;
  int32_t number_nodes;

  number_nodes = Find_Nodes();

  local_config = (CONFIG *)calloc(number_nodes,sizeof(CONFIG));
  *the_config = local_config;
  return(number_nodes);
}


int32_t Read_Config_File(CONFIG *the_config)
{
    /* do init work here */
    return(SUCCESS);
}


main()
{
    CONFIG *the_config;
    int32_t number_nodes,rc;

    number_nodes = Create_Structures(&the_config);

    rc = Read_Config_File(the_config);
    ...
    exit(0);
}

The code compiles fine, but when I try to run it, I'll get a SIGSEGV at the { beneath Read_Config_File().

(gdb) run
...
Program received signal SIGSEGV, Segmentation fault.
0x0000000000407d0a in Read_Config_File (the_config=Cannot access memory at address 0x7ffffdf45428
) at ../src/config_parsing.c:763
763 {
(gdb) bt
#0  0x0000000000407d0a in Read_Config_File (the_config=Cannot access memory at address 0x7ffffdf45428
) at ../src/config_parsing.c:763
#1  0x00000000004068d2 in main (argc=1, argv=0x7fffffffe448) at ../src/main.c:148

I've done this sort of thing all the time, with smaller arrays. And strangely, 0x7fffffffe448 - 0x7ffffdf45428 = 0x20B8EF8, or about the 34MB of my float array.

Valgrind will give me similar output:

==10894== Warning: client switching stacks?  SP change: 0x7ff000290 --> 0x7fcf47398
==10894==          to suppress, use: --max-stackframe=34311928 or greater
==10894== Invalid write of size 8
==10894==    at 0x407D0A: Read_Config_File (config_parsing.c:763)
==10894==    by 0x4068D1: main (main.c:148)
==10894==  Address 0x7fcf47398 is on thread 1's stack

The error messages all point to me clobbering the stack pointer, but a) I've never run across one that crashes on entry of the function and b) I'm passing pointers around, not the actual array.

Can someone help me out with this? I'm on a 64-bit CentOS box running kernel 2.6.18 and gcc 4.1.2

Thanks!

Matt

5
  • 10
    Posting pseudo-code will only get you pseudo-answers. The devil is in the details, and they'll probably all matter. Commented Jan 19, 2012 at 23:45
  • 2
    Could we possibly see the source of Read_Config_File? That's where the problem seems to be, in the block you elided. Commented Jan 19, 2012 at 23:46
  • You are not testing the return value of calloc(), which might fail. Commented Jan 19, 2012 at 23:47
  • @tranzmatt could you paste the source of Read_Config_File as @Borealid said and also the code between Create_Structures and Read_Config_File calls if any? Commented Jan 20, 2012 at 0:03
  • Read_Config_File could be writing off the end of the array. Commented Jan 20, 2012 at 2:05

2 Answers 2

1

You've blown up the stack by allocating one of these huge config_t structs onto it. The two stack pointers on evidence in the gdb output, 0x7fffffffe448 and 0x7ffffdf45428, are very suggestive of this.

$ gdb
GNU gdb 6.3.50-20050815 ...blahblahblah...
(gdb) p 0x7fffffffe448 - 0x7ffffdf45428  
$1 = 34312224

There's your ~34MB constant that matches the size of the config_t struct. Systems don't give you that much stack space by default, so either move the object off the stack or increase your stack space.

Sign up to request clarification or add additional context in comments.

2 Comments

Is there a "lint"-like tool that can tell me how much stack a function is going to gobble up? In hindsight, it's now pretty obvious what the malfunction was, but I'd love to know up front if such issues exist.
gcc has -fstack-limit-* for run-time checks, but I know of nothing that warns about excessively large stack allocations at compile-time.
1

The short answer is that there must be a config_t declared as a local variable somewhere, which would put it on the stack. Probably a typo: missing * after a CONFIG declaration somewhere.

1 Comment

That was the problem. I'd forgotten I'd kept a temporary copy of the config_t structure in the function before jamming the array inside. I moved the array elsewhere and now it doesn't seg fault. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.