Skip to content

Conversation

@VSadov
Copy link
Member

@VSadov VSadov commented Nov 21, 2025

When inserting workitems into threadpool queues, we must always guarantee that for every workitem there will be some worker at some point in the future that will certainly notice the presence of the workitem and executes it.

There was an attempt to relax the requirements in #100506.
Sadly, it leads to occasional deadlocks when items are present in the work queues and no workers are coming to pick them up.

The same change was made in all 3 threadpools - IO completion, Sockets and the general purpose ThreadPool. The fix is applied to all three threadpools.

We have seen reports about deadlocks when running on net9 or later releases:

The fix will need to be ported to net10 and net9. Thus this PR tries to restore just the part which changed the enqueuers/workers handshake algorithm.
More stuff was piled up into threadpool since the change, so doing the minimal fix without disturbing the rest is somewhat tricky.

Fixes: #121608 (definitely, I have tried with the repro)
Fixes: #119043 (likely, I do not have a repro to try, but symptoms seem like from the same issue)

@VSadov
Copy link
Member Author

VSadov commented Nov 21, 2025

I think this is ready for review.

@VSadov VSadov marked this pull request as ready for review November 22, 2025 02:30
Copilot AI review requested due to automatic review settings November 22, 2025 02:30
@VSadov VSadov requested a review from stephentoub November 22, 2025 02:33
Copilot finished reviewing on behalf of VSadov November 22, 2025 02:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses critical reliability issues in three thread pool implementations by reverting problematic changes from #100506 that led to deadlocks in .NET 9. The fix restores a simpler and safer enqueuer/worker handshake protocol that guarantees work items will always have a worker thread available to process them.

Key Changes:

  • Simplified the QueueProcessingStage enum by removing the Determining state, leaving only NotScheduled and Scheduled
  • Changed the worker thread protocol to reset the processing stage to NotScheduled before checking for work items (preventing a race condition window)
  • Removed complex retry/dequeue logic and the _nextWorkItemToProcess optimization in favor of always requesting an additional worker when processing an item

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs Simplified general-purpose ThreadPool enqueuer/worker handshake by removing Determining state, _nextWorkItemToProcess field, and complex retry logic; streamlined Dispatch() to always request a worker after dequeuing an item;
src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs Applied same handshake simplification to Unix socket async engine; removed UpdateEventQueueProcessingStage() method; simplified Execute() to use consistent pattern of resetting state before checking queue

VSadov and others added 2 commits November 21, 2025 18:48
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disk IO completions getting lost Dotnet process running, but no reaction is shown

1 participant