1

Why Deadlock if put sending Terminate message and thread.join() in one loop?

In book of The Rust Programming Language 20.3 Graceful Shutdown and Cleanup https://doc.rust-lang.org/book/ch20-03-graceful-shutdown-and-cleanup.html

Here is the output for below code that could cause deadlock.

Running target\debug\main.exe

Worker 0 got a job; running.

Shutting down. Sending terminate message to all workers. Shutting down all workers.

Shutting down worker 0

Worker 1 got a job; running.

Worker 1 was told to terminate.

error: process didn't exit successfully: target\debug\main.exe (exit code: 0xc000013a, STATUS_CONTROL_C_EXIT) ^C

        for worker in &mut self.workers {

            self.sender.send(Message::Terminate).unwrap();

            println!("Shutting down worker {}", worker.id);
            if let Some(thread) = worker.thread.take() {
                thread.join().unwrap();
            }
        }

Could you help me understand why this coding logic could cause a Deadlock?

""" To better understand why we need two separate loops, imagine a scenario with two workers. If we used a single loop to iterate through each worker, on the first iteration a terminate message would be sent down the channel and join called on the first worker’s thread. If that first worker was busy processing a request at that moment, the second worker would pick up the terminate message from the channel and shut down. We would be left waiting on the first worker to shut down, but it never would because the second thread picked up the terminate message. Deadlock! """

I was thinking, after first worker finish task, the first worker will get next terminate message, and then break the loop.

Will thread.join() prevent first worker from accepting new message from channel? It seems not.

Here is my understanding of the logic and steps:

on the first iteration a terminate message would be sent down the channel,

first worker was busy processing a request at that moment. 2nd worker get a terminate message, and exit the loop. first worker thread.join() to main. - first worker thread is moved out of ThreadPool, by worker.thread.take(), leave the worker.thread as Option::None;

Now there is no worker.thread in ThreadPool::drop.

on the second iteration, fn ThreadPool::drop() send another terminate message down the channel,

there is no worker.thread to process the message, 2nd worker already exited the loop. then, maybe 2nd worker thread.join() to main().

At the end, the moved first worker thread is in infinite loop. main() is waiting for the thread to end, waiting for ever.

But. there is another thought, even the moved first worker thread is not in ThreadPool, the thread still has the receiver, to receive the terminate message, and then break the loop.

I'm still confusing. ^_^

1 Answer 1

2

The problem is that the second terminate message might never get sent. When we call thread.join().unwrap();, we wait until that thread finishes before continuing. So if the first thread never terminates (because the second worker got the termination message), then we'll never progress past thread.join().unwrap(); in the first iteration of the loop.

Think about this possible sequence of events.

  1. (thread 1) Worker 1 starts a job.
  2. (thread 2) Worker 2 checks for a message (nothing).
  3. (main thread) Termination message is sent.
  4. (thread 2) Worker 2 checks for a message (termination message).
  5. (thread 2) Thread 2 ends.
  6. (thread 1) Worker 1's job ends.
  7. (thread 1) Worker 1 checks for a message (nothing).
  8. (main thread) Thread 1 is joined (main thread is now just waiting).
  9. (thread 1) Worker 1 checks for a message (nothing).
  10. (thread 1) Worker 1 checks for a message (nothing).
  11. ... (deadlock)

Worker 1 will never get a message because the only messages being sent are those in the main thread. But the main thread is waiting for thread 1 to finish. That's the definition of deadlock. Thread 1 won't finish until the main thread sends a termination message, and the main thread won't send a termination message until thread 1 finishes.

This doesn't have anything to do with whether the thread is in the threadpool or not. Yes, thread 1 is no longer in the threadpool after worker.thread.take(), but the thread still exists and the main thread is still waiting for it.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the quick and details answer.
Now I understood. The one possible case is: After thread 1 thread.join(), main thread is blocked and wait thread 1 to finish, meanwhile thread 1 is in infinite loop, and won't finish, Deadlock. And then main thread keeps waiting, the 2nd terminate message won't be sent by main thread.
Then my guess id kind of correct. 'Will thread.join() prevent first worker from accepting new message from channel? ' - thread 1 thread.join() indirect blocked main thread to wait forever, since thread 1 never end. ^_^

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.