10

I've been reading into the new concurrent collections and esspecially the ConcurrentBag got my attention. Since the ConcurrentBag internally holds a local set on each individual thread using it to keep track of the items, this means that when the thread itself gets out of scope, it will still be referenced in memory by the ConcurrentBag. This in turn means both memory claimed by the thread, as well as native resources? (excuse me for not knowing the exact inner workings of the .NET thread object)

I can assume a usecase where you have 1 global ConcurrentBack for a multithreaded webservice where you have alot of clients adding tasks. These tasks are added by threads on the threadpool. Now the threadpool is a very efficient way to manage threads but it does remove and create Threads based on the amount of work. Therefore, such a webservice can at times find itself in trouble since the underlying bag is still referencing to many should be-destroyed threads.

I created a quick app to test this behavior:

    static ConcurrentBag<int> bag = new ConcurrentBag<int>();
    static void FillBag() { for (int i = 0; i < 100; i++) { bag.Add(i); } }
    static void PrintState() { Console.WriteLine("Bag size is: {0}", bag.Count); }
    static void Main(string[] args)
    {
        var remote = new Thread(x =>
        {
            FillBag();
            PrintState();
        });
        // empty bag
        PrintState();
        // first 100 items are added on main thread
        FillBag();
        PrintState();
        // second 100 items are added on remote thread
        remote.Start();
        remote.Join();
        // since the remote thread is gone out of scope, what happened to its local storage which is part of the bag?
        PrintState();
        // now force a cleanup
        WeakReference weakRemoteReference = new WeakReference(remote); 
        remote = null;
        GC.Collect();
        GC.WaitForPendingFinalizers();
        // Now check if the thread still exists
        if (weakRemoteReference.IsAlive)
            Console.WriteLine("Remote thread still exists");
        PrintState();
        Console.ReadLine();

And the output confirms my story:

Bag size is: 0
Bag size is: 100
Bag size is: 200
Bag size is: 200
Remote thread still exists
Bag size is: 200

Is this behavior to be expected, did i make a mistake in my test or can this be considered a design flaw?

1
  • Note that i name the remote thread is gone out of scope, ofcourse its still in scope and should say: "Since the remote thread is finished" Commented Mar 18, 2011 at 14:11

2 Answers 2

9

The ConcurrentBag does indeed keep things in thread local storage, and if you abandon threads it can cause a memory leak. However, the implementation is able to "steal" items from one thread's list to give to another thread. You can see this in action if you write the following:

ConcurrentBag<int> MyBag = new ConcurrentBag<int>();

void DoIt()
{
    for (int i = 0; i < 10; ++i)
    {
        MyBag.Add(i);
    }

    ThreadPool.QueueUserWorkItem(EmptyBag);

    Console.Write("Press Enter:");
    Console.ReadLine();

    Console.WriteLine("{0} items in bag", MyBag.Count);
}

void EmptyBag(object state)
{
    int take;
    while (MyBag.TryTake(out take))
    {
        Console.WriteLine(take);
    }
    Console.WriteLine("Bag is empty");
}

If you run that program and wait until the "Bag is empty" message before you hit Enter, you'll see that the bag is indeed emptied.

So, as long as there's one thread reading from the bag, it will be emptied eventually. Even if all the items were added by other threads.

So, yes, there's a possible memory leak. In practice, though, if multiple threads are accessing the bag, it's likely not a concern.

Sign up to request clarification or add additional context in comments.

2 Comments

I'm guessing two threads are here, and the QueueUserWorkItem spawns a new thread. Which thread is the one that is leaking (source, or the stolen copy)? How can I interrupt this process and shutdown cleanly? I'm using a cancelation token in a task...
@makerofthings7: In this example code, the main thread owns the bag, and the thread spawned by QueueUserWorkItem empties the bag. Nothing's leaking in this example. In general, there will be a leak if you add things to the bag and don't take them out. You might consider posting a question if you want more information.
0

It'll "leak" a limited memory overhead per live thread which puts items into it, in addition to the memory needed for stored items, of course. But that memory overhead can be reclaimed by the GC after the thread terminates. So as long as the current number of live threads is limited, and you are not producing items faster than you can consume them, the bag's memory use will be limited.

Here's a test to demonstrate it:

using System.Collections.Concurrent;

var bag = new ConcurrentBag<int>();
var threadCount = 0;

for (var i = 0; i < Environment.ProcessorCount; ++i)
    UseBagAndStartNewThread();

void UseBagAndStartNewThread()
{
    var incremented = Interlocked.Increment(ref threadCount);
    bag.Add(incremented);
    _ = bag.TryTake(out _);

    var nextThread = new Thread(UseBagAndStartNewThread)
    {
        IsBackground = false,
        Name = incremented.ToString()
    };
    nextThread.Start();
}

The test starts as many threads as you have processors. Each thread will put an item into the bag and then take an item out. It will then start a new thread before finishing execution. So you'll always have about one thread per processor, but a constant stream of new, short-lived threads. It keeps running until you kill the process. If the bag leaked, memory use would keep increasing.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.