1

I'm learning C.

I'm building a P2P content distribution system and as part of that system there is a section of code which reads the network. To receive the data I set a buffer:

    char raw_buf[4096];

I found today that despite launching several identical VMs on a cloud platform some of the instances crashed (segfaulted) when trying to run that line of code. The issue was isolated to those specific instances and the line of code where the array size was too big, so dropping the size was a work around.

I know I could malloc the buffer memory but I'm trying to avoid using the heap as I want the data transfer to be as fast as possible. This might be a misconception.

There are two modes of operation with the networking stack, the first handles very small command messages and responses and the second involves handling bulk data transfer. I'm having the problem with the smaller command messages. Each message might be in the order of a few kilobytes - hence allocating it on the stack. The messages are very quickly discarded after they have been processed.

Is there a kernel setting that would enable me to assign more data into the stack and is this a bad idea? Trying to find the optimal size is going to be more the result of profiling the code. I'm currently limited to 16kb on a few of the VMs and less than that on the others. I didn't spend any time trying to find the limitation on the failing VMs.

9
  • 3
    A 4 KiB buffer is itself not big for a normal Linux system, which has a default stack-size of 8 MiB. Unless you fill up your stack some other way, like for example a very deep recursion, this array by itself will not be the problem. That it crashes on that line is an indication of a problem somewhere else. Commented May 26 at 13:49
  • 5
    If the thread stack is only 4 KiB, then a buffer of the same size placed on the stack will definitely overflow. That it only seems to crash on some systems is because you have undefined behavior, which means your whole program can be considered flawed. Commented May 26 at 14:32
  • 1
    " Is there a kernel setting that would enable me to assign more data into the stack and is this a bad idea?" I can't follow, you've said that the thread is being crashed, that has been got only 4 KiB stack, how does kernel help you, if it's up to you to decide the size of the stack? Commented May 26 at 14:36
  • 2
    4 KiB is extremely small for a thread stack. The way to allow for more data is to assign a larger stack. No kernel parameter need. Indeed, it's unclear to me why you are choosing your own stack size at all. That's hamstringing you, and probably not achieving what you think it is. I'd recommend taking the default stack size until and unless you discover an actual problem that is solved by choosing a smaller one. See also unix.stackexchange.com/a/280865/289373. Commented May 26 at 15:01
  • 2
    UB doesn't have to manifest itself consistently or in easily explicable ways. Appearing to work is an option too. Only allocating 4 KiB for the stack is incredibly stingy. I'd probably start at 64 KiB and go up. You could arrange to check and record your stack usage, and refine your stack allocation from that data. If you must use such a tiny stack, you must use heap allocation. Prove that it is too slow (measurements again) before panicking about it. Commented May 26 at 15:03

1 Answer 1

0

I know I could malloc the bufer memory but Im trying to avoid using the heap as I want the data transfer to be as fast as possible. This might be a misconception.

We can use various "pooling" techniques to get the benefits of heap allocation with the speed of a stack allocation:

  1. https://en.wikipedia.org/wiki/Slab_allocation
  2. https://en.wikipedia.org/wiki/Memory_pool
  3. https://en.wikipedia.org/wiki/Object_pool_pattern

I've used this technique in another answer: cs50 pset5 Speller optimisation

With some tweaking, after an initial allocation, virtually all buffer requests can be handled from the buffer pool (and, rarely, if ever go back to malloc for a fresh allocation).

Below is a slight generalization that should be very close to what you need. It compiles but has not been tested.


FILE: bufnew.h

// bufnew.h -- bufnew control

#ifndef _bufnew_h_
#define _bufnew_h_

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <stdatomic.h>

// cached buffer
struct buffer {
    unsigned int buf_opt;
    struct buffer *buf_next;
    char buf_data[4096];
};

// equates to buf_opt
#define OPT_ALLOC       (1u << 0)       // 1=buffer allocated from heap
#define OPT_LOCK        (1u << 1)       // 1=allocater lock

// cached buffer allocation control
struct bufpool {
    unsigned int pool_opt;
    struct buffer *pool_head;
    pthread_mutex_t pool_lock;
};


    void
    pool_init(struct bufpool *pool,unsigned int opt);

    struct buffer *
    pool_get(struct buffer **hptr);

    void
    pool_put(struct bufpool *pool,struct buffer *buf);

    struct buffer *
    poolq_get(struct bufpool *pool);

    void
    poolq_final(struct bufpool *pool);

    struct buffer *
    poolx_get(struct bufpool *pool);

    void
    poolx_final(struct bufpool *pool);

#endif

FILE: pool.c

// pool.c -- common functions for poolq/poolx

#include <bufnew.h>

// pool_init -- initialize a pool
void
pool_init(struct bufpool *pool,unsigned int opt)
{

    pool->pool_head = NULL;
    pool->pool_opt = opt;
    pthread_mutex_init(&pool->pool_lock,NULL);
}

// pool_get -- get the first available cached buffer
struct buffer *
pool_get(struct buffer **hptr)
{
    struct buffer *head;

    while (1) {
        head = atomic_load(hptr);

        // no more cached buffers
        if (head == NULL)
            break;

        if (atomic_compare_exchange_strong(hptr,&head,head->buf_next))
            break;
    }

    return head;
}

// pool_put -- put back buffer into free pool
void
pool_put(struct bufpool *pool,struct buffer *buf)
{
    struct buffer **hptr = &pool->pool_head;
    struct buffer *head = atomic_load(hptr);

    while (1) {
        buf->buf_next = head;
        if (atomic_compare_exchange_strong(hptr,&head,buf))
            break;
    }
}

FILE: poolq.c

// poolq.c -- quick/simple pool allocater

#include <bufnew.h>

// poolq_get -- get a buffer (add more buffers if necessary)
struct buffer *
poolq_get(struct bufpool *pool)
{
    struct buffer **hptr = &pool->pool_head;
    struct buffer *newbuf;

    do {
        // try for uncontended grab
        newbuf = pool_get(hptr);
        if (newbuf != NULL)
            break;
        
        // get new buffer
        newbuf = malloc(sizeof(*newbuf));
        if (newbuf == NULL) {
            perror("poolq_get/malloc");
            exit(1);
        }

        newbuf->buf_next = NULL;
        newbuf->buf_opt = OPT_ALLOC;
    } while (0);

    return newbuf;
}

// poolq_final -- release all buffers to heap
void
poolq_final(struct bufpool *pool)
{

    // NOTE: this assumes that all buffers in this pool have been put back into
    // the list (via pool_put)

#if 0
    pthread_mutex_lock(&pool->pool_lock);
#endif

    struct buffer *cur = pool->pool_head;
    struct buffer *next;

    // free up all chunks to heap
    for (;  cur != NULL;  cur = next) {
        next = cur->buf_next;
        free(cur);
    }

    pool->pool_head = NULL;

#if 0
    pthread_mutex_unlock(&pool->pool_lock);
#endif
}

FILE: poolx.c

// poolx.c -- full/smart pool allocater

#include <bufnew.h>

#define CHUNK       100

// poolx_get -- get a buffer (add more buffers if necessary)
struct buffer *
poolx_get(struct bufpool *pool)
{
    struct buffer **hptr = &pool->pool_head;
    struct buffer *newbuf;

    while (1) {
        // try for uncontended grab
        newbuf = pool_get(hptr);
        if (newbuf != NULL)
            break;
        
        // we're out of free buffers -- prepare for new heap allocation
        if (pool->pool_opt & OPT_LOCK)
            pthread_mutex_lock(&pool->pool_lock);

        // attempt to use the heap
        do {
            struct buffer *head = atomic_load(hptr);

            // somebody put back a buffer between our pool_get call and our
            // lock call:
            // (1) either a simple pool_put intervened
            // (2) another thread got the lock first
            // NOTE: this is an "optimization" and not strictly necessary but
            // it prevents [harmless] "excessive" allocation by racing threads
            if (pool->pool_opt & OPT_LOCK) {
                if (head != NULL)
                    break;
            }

            // to prevent always going to the heap, allocate several contiguous
            // buffers at once
            newbuf = malloc(sizeof(*newbuf) * CHUNK);
            if (newbuf == NULL) {
                perror("malloc");
                exit(1);
            }

            // the last buffer
            struct buffer *tail = &newbuf[CHUNK - 1];

            unsigned int opt = OPT_ALLOC;
            for (struct buffer *cur = newbuf;  cur <= tail;  ++cur) {
                cur->buf_opt = opt;
                opt &= ~OPT_ALLOC;
                cur->buf_next = cur + 1;
            }

            // insert our chain of buffers at the front
            while (1) {
                // FIXME/CAE -- does this need atomic_store so it's visible to
                // other threads before the exchange?
#if 1
                tail->buf_next = head;
#else
                atomic_store(&tail->buf_next,head);
#endif

                if (atomic_compare_exchange_strong(hptr,&head,newbuf))
                    break;
            }
        } while (0);

        if (pool->pool_opt & OPT_LOCK)
            pthread_mutex_unlock(&pool->pool_lock);
    }

    return newbuf;
}

// poolx_final -- release all buffers to heap
void
poolx_final(struct bufpool *pool)
{

    // NOTE: this assumes that all buffers in this pool have been put back into
    // the list (via pool_put)

#if 0
    pthread_mutex_lock(&pool->pool_lock);
#endif

    struct buffer *cur = pool->pool_head;
    struct buffer *next;
    struct buffer *prev = NULL;
    struct buffer *head = NULL;

    // remove all buffers from list that are _not_ a heap allocation head
    for (;  cur != NULL;  cur = next) {
        next = cur->buf_next;

        // is this buffer the first one in a chunk received from the heap?
        if (cur->buf_opt & OPT_ALLOC) {
            if (head == NULL)
                head = cur;
            prev = cur;
            continue;
        }

        if (prev != NULL)
            prev->buf_next = next;
    }

    // free up all chunks to heap
    for (cur = head;  cur != NULL;  cur = next) {
        next = cur->buf_next;
        free(cur);
    }

    pool->pool_head = NULL;

#if 0
    pthread_mutex_unlock(&pool->pool_lock);
#endif
}

In the code above, I've used cpp conditionals to denote old vs. new code:

#if 0
// old code
#else
// new code
#endif

#if 1
// new code
#endif

Note: this can be cleaned up by running the file through unifdef -k

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.