1. Home
2. Questions
3. Unanswered
4. AI Assist Labs
5. Tags
7. Chat
8. Users
10. Companies
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Stack Internal
Bring the best of human thought and AI automation together at your work. Learn more

Return to Answer

added 244 characters in body

Source Link

edited Mar 10, 2020 at 13:54

Engineer

30.4k
4
76
124

Non-optimality in terms of architecture. Wrong tool for the job.

GPU-optimal tasks are highly parallel: vertex processing, texture processing, computing boid motion, with all kernel-threads running in parallel from start to finish.

Culling, OTOH, is all about questions / conditionals e.g. "is this object in this view at this time given these conditions?". There are severe performance impacts when one GPU thread fails while all the others pass, because GPUs expect all pipelines to execute a kernel in parallel from start to finish.

So culling has tended to be better suited to calculation on the CPU, which is built for branching without stalling the entire pipeline (c.f. branch prediction, which GPUs lack) on a failed branch.

Think of it like this: GPU's processing profile is "many small things at once without branches"; CPU's domain is "give me any problem large or small, conditional or not, and I will crack it quickly."

And every time the GPU pipeline stalls because you were executing non-optimal tasks on it, all those threads are sitting twiddling their thumbs, which is like losing man-years of work you could have been applying to a better-suited problem, with no stalls at all.

Non-optimality in terms of architecture. Wrong tool for the job.

GPU-optimal tasks are highly parallel: vertex processing, texture processing, computing boid motion, with all kernel-threads running in parallel from start to finish.

Culling, OTOH, is all about questions / conditionals e.g. "is this object in this view at this time given these conditions?". There are severe performance impacts when one GPU thread fails while all the others pass, because GPUs expect all pipelines to execute a kernel in parallel from start to finish.

So culling has tended to be better suited to calculation on the CPU, which is built for branching without stalling the entire pipeline (c.f. branch prediction, which GPUs lack) on a failed branch.

Think of it like this: GPU's processing profile is "many small things at once without branches"; CPU's domain is "give me any problem large or small, conditional or not, and I will crack it quickly."

Non-optimality in terms of architecture. Wrong tool for the job.

GPU-optimal tasks are highly parallel: vertex processing, texture processing, computing boid motion, with all kernel-threads running in parallel from start to finish.

Culling, OTOH, is all about questions / conditionals e.g. "is this object in this view at this time given these conditions?". There are severe performance impacts when one GPU thread fails while all the others pass, because GPUs expect all pipelines to execute a kernel in parallel from start to finish.

So culling has tended to be better suited to calculation on the CPU, which is built for branching without stalling the entire pipeline (c.f. branch prediction, which GPUs lack) on a failed branch.

Think of it like this: GPU's processing profile is "many small things at once without branches"; CPU's domain is "give me any problem large or small, conditional or not, and I will crack it quickly."

And every time the GPU pipeline stalls because you were executing non-optimal tasks on it, all those threads are sitting twiddling their thumbs, which is like losing man-years of work you could have been applying to a better-suited problem, with no stalls at all.

Source Link

answered Mar 10, 2020 at 13:49

Engineer

30.4k
4
76
124

Non-optimality in terms of architecture. Wrong tool for the job.

GPU-optimal tasks are highly parallel: vertex processing, texture processing, computing boid motion, with all kernel-threads running in parallel from start to finish.

Culling, OTOH, is all about questions / conditionals e.g. "is this object in this view at this time given these conditions?". There are severe performance impacts when one GPU thread fails while all the others pass, because GPUs expect all pipelines to execute a kernel in parallel from start to finish.

So culling has tended to be better suited to calculation on the CPU, which is built for branching without stalling the entire pipeline (c.f. branch prediction, which GPUs lack) on a failed branch.

Think of it like this: GPU's processing profile is "many small things at once without branches"; CPU's domain is "give me any problem large or small, conditional or not, and I will crack it quickly."