Skip to main content
added 244 characters in body
Source Link
Engineer
  • 30.4k
  • 4
  • 76
  • 124

Non-optimality in terms of architecture. Wrong tool for the job.

GPU-optimal tasks are highly parallel: vertex processing, texture processing, computing boid motion, with all kernel-threads running in parallel from start to finish.

Culling, OTOH, is all about questions / conditionals e.g. "is this object in this view at this time given these conditions?". There are severe performance impacts when one GPU thread fails while all the others pass, because GPUs expect all pipelines to execute a kernel in parallel from start to finish.

So culling has tended to be better suited to calculation on the CPU, which is built for branching without stalling the entire pipeline (c.f. branch prediction, which GPUs lack) on a failed branch.

Think of it like this: GPU's processing profile is "many small things at once without branches"; CPU's domain is "give me any problem large or small, conditional or not, and I will crack it quickly."

And every time the GPU pipeline stalls because you were executing non-optimal tasks on it, all those threads are sitting twiddling their thumbs, which is like losing man-years of work you could have been applying to a better-suited problem, with no stalls at all.

Non-optimality in terms of architecture. Wrong tool for the job.

GPU-optimal tasks are highly parallel: vertex processing, texture processing, computing boid motion, with all kernel-threads running in parallel from start to finish.

Culling, OTOH, is all about questions / conditionals e.g. "is this object in this view at this time given these conditions?". There are severe performance impacts when one GPU thread fails while all the others pass, because GPUs expect all pipelines to execute a kernel in parallel from start to finish.

So culling has tended to be better suited to calculation on the CPU, which is built for branching without stalling the entire pipeline (c.f. branch prediction, which GPUs lack) on a failed branch.

Think of it like this: GPU's processing profile is "many small things at once without branches"; CPU's domain is "give me any problem large or small, conditional or not, and I will crack it quickly."

Non-optimality in terms of architecture. Wrong tool for the job.

GPU-optimal tasks are highly parallel: vertex processing, texture processing, computing boid motion, with all kernel-threads running in parallel from start to finish.

Culling, OTOH, is all about questions / conditionals e.g. "is this object in this view at this time given these conditions?". There are severe performance impacts when one GPU thread fails while all the others pass, because GPUs expect all pipelines to execute a kernel in parallel from start to finish.

So culling has tended to be better suited to calculation on the CPU, which is built for branching without stalling the entire pipeline (c.f. branch prediction, which GPUs lack) on a failed branch.

Think of it like this: GPU's processing profile is "many small things at once without branches"; CPU's domain is "give me any problem large or small, conditional or not, and I will crack it quickly."

And every time the GPU pipeline stalls because you were executing non-optimal tasks on it, all those threads are sitting twiddling their thumbs, which is like losing man-years of work you could have been applying to a better-suited problem, with no stalls at all.

Source Link
Engineer
  • 30.4k
  • 4
  • 76
  • 124

Non-optimality in terms of architecture. Wrong tool for the job.

GPU-optimal tasks are highly parallel: vertex processing, texture processing, computing boid motion, with all kernel-threads running in parallel from start to finish.

Culling, OTOH, is all about questions / conditionals e.g. "is this object in this view at this time given these conditions?". There are severe performance impacts when one GPU thread fails while all the others pass, because GPUs expect all pipelines to execute a kernel in parallel from start to finish.

So culling has tended to be better suited to calculation on the CPU, which is built for branching without stalling the entire pipeline (c.f. branch prediction, which GPUs lack) on a failed branch.

Think of it like this: GPU's processing profile is "many small things at once without branches"; CPU's domain is "give me any problem large or small, conditional or not, and I will crack it quickly."