Revisions to DirectX shader - how to spread raytracer computation over multiple frames?

removed extraneous word.

Source Link

edited Dec 15, 2016 at 18:58

3Dave

3.1k
1
23
37

clip() and discard() won't get you any performance increase, nor is it intended to. To understand why, consider how the GPU does things.

This must mean that the gpu thread groups are by default allocated in a tiled fashion on the screen, ...

Correct. Not only is this the "default," it is mandated by the underlying hardware in the fixed-function rasterizer stage.

...and that all threads of a group must be done computing before the (other) resources of the thread group can be used again.

Shaders are executed in waves of 64 (AMD) or 32 (NVIDIA) threads. These threads execute in SIMD fashion, meaning that they execute in lock-step.

The wave will execute at the rate of the slowest thread. If some threads clip() or return say, halfway through the shader, they will still stall until all threads in the wave have completed. They aren't like CPU threads. (I consider the term "thread" to be something of a misnomer in this case but, hey, it's the world we live in.)

Now, with shader model 6 on the horizon, or with Vulkan right now, you can use wave ballot intrinsics to early-out and get some of that execution time back. However, even with those tools, the wave will still take as long as the the slowest thread. So, turning off every other pixel won't get you anything (as adjacent pixels are typically assigned to the same wave).

Does anyone know of any magic that can force a fragment shader to only allocate resources for say half the pixels of the screen? (but keep the render target size)

You may be able to use a stencil to mask off every other pixel, but this wouldn't give you back 100% of the time you're looking to save as there's overhead for the stencil comparison. Also, that just seems like a kludge.

The best and easiest way to only draw half of the pixels screen is to render to a 1/4 resolution render target (1/2 on each side), then draw it scaled to match your screen-aligned quad. You can play with your sampler configuration when drawing to your screen-aligned quad to get the "grainy" pixelated image, if that's what you want.

You COULD do this in compute, but you'd be handling rasterization yourself. Then again, that may be a more appropriate place to implement raytracing. I've written raycasting code in both CUDA and HLSL compute, and it works pretty well. If you don't need the rasterizer, compute is a perfectly reasonable way to go.

clip() and discard() won't get you any performance increase, nor is it intended to. To understand why, consider how the GPU does things.

This must mean that the gpu thread groups are by default allocated in a tiled fashion on the screen, ...

Correct. Not only is this the "default," it is mandated by the underlying hardware in the fixed-function rasterizer stage.

...and that all threads of a group must be done computing before the (other) resources of the thread group can be used again.

Shaders are executed in waves of 64 (AMD) or 32 (NVIDIA) threads. These threads execute in SIMD fashion, meaning that they execute in lock-step.

The wave will execute at the rate of the slowest thread. If some threads clip() or return say, halfway through the shader, they will still stall until all threads in the wave have completed. They aren't like CPU threads. (I consider the term "thread" to be something of a misnomer in this case but, hey, it's the world we live in.)

Now, with shader model 6 on the horizon, or with Vulkan right now, you can use wave ballot intrinsics to early-out and get some of that execution time back. However, even with those tools, the wave will still take as long as the the slowest thread. So, turning off every other pixel won't get you anything (as adjacent pixels are typically assigned to the same wave).

Does anyone know of any magic that can force a fragment shader to only allocate resources for say half the pixels of the screen? (but keep the render target size)

You may be able to use a stencil to mask off every other pixel, but this wouldn't give you back 100% of the time you're looking to save as there's overhead for the stencil comparison. Also, that just seems like a kludge.

The best and easiest way to only draw half of the pixels screen is to render to a 1/4 resolution render target (1/2 on each side), then draw it scaled to match your screen-aligned quad. You can play with your sampler configuration when drawing to your screen-aligned quad to get the "grainy" pixelated image, if that's what you want.

You COULD do this in compute, but you'd be handling rasterization yourself. Then again, that may be a more appropriate place to implement raytracing. I've written raycasting code in both CUDA and HLSL compute, and it works pretty well. If you don't need the rasterizer, compute is a perfectly reasonable way to go.

clip() and discard() won't get you any performance increase, nor is it intended to. To understand why, consider how the GPU does things.

This must mean that the gpu thread groups are by default allocated in a tiled fashion on the screen, ...

Correct. Not only is this the "default," it is mandated by the underlying hardware in the fixed-function rasterizer stage.

...and that all threads of a group must be done computing before the (other) resources of the thread group can be used again.

Shaders are executed in waves of 64 (AMD) or 32 (NVIDIA) threads. These threads execute in SIMD fashion, meaning that they execute in lock-step.

The wave will execute at the rate of the slowest thread. If some threads clip() or return say, halfway through the shader, they will still stall until all threads in the wave have completed. They aren't like CPU threads. (I consider the term "thread" to be something of a misnomer in this case but, hey, it's the world we live in.)

Now, with shader model 6 on the horizon, or with Vulkan right now, you can use wave ballot intrinsics to early-out and get some of that execution time back. However, even with those tools, the wave will still take as long as the the slowest thread. So, turning off every other pixel won't get you anything (as adjacent pixels are typically assigned to the same wave).

Does anyone know of any magic that can force a fragment shader to only allocate resources for say half the pixels of the screen? (but keep the render target size)

You may be able to use a stencil to mask off every other pixel, but this wouldn't give you back 100% of the time you're looking to save as there's overhead for the stencil comparison. Also, that just seems like a kludge.

The best and easiest way to only draw half of the pixels is to render to a 1/4 resolution render target (1/2 on each side), then draw it scaled to match your screen-aligned quad. You can play with your sampler configuration when drawing to your screen-aligned quad to get the "grainy" pixelated image, if that's what you want.

You COULD do this in compute, but you'd be handling rasterization yourself. Then again, that may be a more appropriate place to implement raytracing. I've written raycasting code in both CUDA and HLSL compute, and it works pretty well. If you don't need the rasterizer, compute is a perfectly reasonable way to go.

text correction

Source Link

edited Dec 15, 2016 at 18:42

3Dave

3.1k
1
23
37

clip() and discard() won't get you any performance increase, nor is it intended to. To understand why, consider how the GPU does things.

This must mean that the gpu thread groups are by default allocated in a tiled fashion on the screen, ...

Correct. Not only is this the "default," it is mandated by the underlying hardware in the fixed-function rasterizer stage.

...and that all threads of a group must be done computing before the (other) resources of the thread group can be used again.

Shaders are executed in waves of 64 (AMD) or 32 (NVIDIA) threads. These threads execute in SIMD fashion, meaning that they execute in lock-step.

The wave will execute at the rate of the slowest thread. If some threads clip() or return say, halfway through the shader, they will still stall until all threads in the wave have completed. They aren't like CPU threads. (I consider the term "thread" to be something of a misnomer in this case but, hey, it's the world we live in.)

Now, with shader model 6 on the horizon, or with Vulkan right now, you can use wave ballot intrinsics to early-out and get some of that execution time back. However, even with those tools, the wave will still take as long as the the slowest thread. So, turning off every other pixel won't get you anything (as adjacent pixels are typically assigned to the same wave).

Does anyone know of any magic that can force a fragment shader to only allocate resources for say half the pixels of the screen? (but keep the render target size)

You may be able to use a stencil buffer to mask off every other pixel, but this wouldn't give you back 100% of the time you're looking to save as there's overhead for the stencil comparison. Also, that just seems like a kludge.

The best and easiest way to only draw half of the pixels screen is to render to a 1/4 resolution render target (1/2 on each side), then draw it scaled to match your screen-aligned quad. You can play with your sampler configuration when drawing to your screen-aligned quad to get the "grainy" pixelated image, if that's what you want.

You COULD do this in compute, but you'd be handling rasterization yourself. Then again, that may be a more appropriate place to implement raytracing. I've written raycasting code in both CUDA and HLSL compute, and it works pretty well. If you don't need the rasterizer, compute is a perfectly reasonable way to go.

clip() and discard() won't get you any performance increase, nor is it intended to. To understand why, consider how the GPU does things.

This must mean that the gpu thread groups are by default allocated in a tiled fashion on the screen, ...

Correct. Not only is this the "default," it is mandated by the underlying hardware in the fixed-function rasterizer stage.

...and that all threads of a group must be done computing before the (other) resources of the thread group can be used again.

Shaders are executed in waves of 64 (AMD) or 32 (NVIDIA) threads. These threads execute in SIMD fashion, meaning that they execute in lock-step.

The wave will execute at the rate of the slowest thread. If some threads clip() or return say, halfway through the shader, they will still stall until all threads in the wave have completed. They aren't like CPU threads. (I consider the term "thread" to be something of a misnomer in this case but, hey, it's the world we live in.)

Now, with shader model 6 on the horizon, or with Vulkan right now, you can use wave ballot intrinsics to early-out and get some of that execution time back. However, even with those tools, the wave will still take as long as the the slowest thread. So, turning off every other pixel won't get you anything (as adjacent pixels are typically assigned to the same wave).

Does anyone know of any magic that can force a fragment shader to only allocate resources for say half the pixels of the screen? (but keep the render target size)

You may be able to use a stencil buffer to mask off every other pixel, but this wouldn't give you back 100% of the time you're looking to save as there's overhead for the stencil comparison. Also, that just seems like a kludge.

The best and easiest way to only draw half of the pixels screen is to render to a 1/4 resolution render target (1/2 on each side), then draw it scaled to match your screen-aligned quad. You can play with your sampler configuration when drawing to your screen-aligned quad to get the "grainy" pixelated image, if that's what you want.

You COULD do this in compute, but you'd be handling rasterization yourself. Then again, that may be a more appropriate place to implement raytracing. I've written raycasting code in both CUDA and HLSL compute, and it works pretty well. If you don't need the rasterizer, compute is a perfectly reasonable way to go.

clip() and discard() won't get you any performance increase, nor is it intended to. To understand why, consider how the GPU does things.

This must mean that the gpu thread groups are by default allocated in a tiled fashion on the screen, ...

Correct. Not only is this the "default," it is mandated by the underlying hardware in the fixed-function rasterizer stage.

...and that all threads of a group must be done computing before the (other) resources of the thread group can be used again.

Shaders are executed in waves of 64 (AMD) or 32 (NVIDIA) threads. These threads execute in SIMD fashion, meaning that they execute in lock-step.

The wave will execute at the rate of the slowest thread. If some threads clip() or return say, halfway through the shader, they will still stall until all threads in the wave have completed. They aren't like CPU threads. (I consider the term "thread" to be something of a misnomer in this case but, hey, it's the world we live in.)

Now, with shader model 6 on the horizon, or with Vulkan right now, you can use wave ballot intrinsics to early-out and get some of that execution time back. However, even with those tools, the wave will still take as long as the the slowest thread. So, turning off every other pixel won't get you anything (as adjacent pixels are typically assigned to the same wave).

Does anyone know of any magic that can force a fragment shader to only allocate resources for say half the pixels of the screen? (but keep the render target size)

You may be able to use a stencil to mask off every other pixel, but this wouldn't give you back 100% of the time you're looking to save as there's overhead for the stencil comparison. Also, that just seems like a kludge.

The best and easiest way to only draw half of the pixels screen is to render to a 1/4 resolution render target (1/2 on each side), then draw it scaled to match your screen-aligned quad. You can play with your sampler configuration when drawing to your screen-aligned quad to get the "grainy" pixelated image, if that's what you want.

You COULD do this in compute, but you'd be handling rasterization yourself. Then again, that may be a more appropriate place to implement raytracing. I've written raycasting code in both CUDA and HLSL compute, and it works pretty well. If you don't need the rasterizer, compute is a perfectly reasonable way to go.

added 588 characters in body

Source Link

edited Dec 15, 2016 at 18:36

3Dave

3.1k
1
23
37

clip() and discard() won't get you any performance increase, nor is it intended to. To understand why, consider how the GPU does things.

This must mean that the gpu thread groups are by default allocated in a tiled fashion on the screen, and...

Correct. Not only is this the "default," it is mandated by the underlying hardware in the fixed-function rasterizer stage.

...and that all threads of a group must be done computing before the (other) resources of the thread group can be used again.

Correct. Shaders are executed in waves of 64 (AMD) or 32 (NVIDIA) threads. These threads execute in SIMD fashion, meaning that they execute in lock-step.

The wave will execute at the rate of the slowest thread. If some threads clip() or return say, halfway through the shader, they will still stall until all threads in the wave have completed. They aren't like CPU threads. (I consider the term "thread" to be something of a misnomer in this case but, hey, it's the world we live in.)

Now, with shader model 6 on the horizon, or with Vulkan right now, you can use wave ballot intrinsics to early-out and get some of that execution time back. However, even with those tools, the wave will still take as long as the the slowest thread. So, turning off every other pixel won't get you anything (as adjacent pixels are typically assigned to the same wave).

Does anyone know of any magic that can force a fragment shader to only allocate resources for say half the pixels of the screen? (but keep the render target size)

You may be able to use a stencil buffer to mask off every other pixel, but this wouldn't give you back 100% of the time you're looking to save as there's overhead for the stencil comparison. Also, that just seems like a kludge.

The best and easiest way to only draw half of the pixels screen is to render to a lower1/4 resolution render target (1/2 on each side), then draw it scaled to match your screen-aligned quad. You can play with your sampler configuration when drawing to your screen-aligned quad to get the "grainy" pixelated image, if that's what you want.

You COULD do this in compute, but you'd be handling rasterization yourself. Then again, that may be a more appropriate place to implement raytracing. I've written raycasting code in both CUDA and HLSL compute, and it works pretty well. If you don't need the rasterizer, compute is a perfectly reasonable way to go.

clip() won't get you any performance increase, nor is it intended to. To understand why, consider how the GPU does things.

This must mean that the gpu thread groups are by default allocated in a tiled fashion on the screen, and that all threads of a group must be done computing before the (other) resources of the thread group can be used again.

Correct. Shaders are executed in waves of 64 (AMD) or 32 (NVIDIA) threads. These threads execute in SIMD fashion, meaning that they execute in lock-step.

The wave will execute at the rate of the slowest thread. If some threads clip() or return say, halfway through the shader, they will still stall until all threads in the wave have completed. They aren't like CPU threads. (I consider the term "thread" to be something of a misnomer in this case but, hey, it's the world we live in.)

Now, with shader model 6 on the horizon, or with Vulkan right now, you can use wave ballot intrinsics to early-out and get some of that execution time back. However, even with those tools, the wave will still take as long as the the slowest thread. So, turning off every other pixel won't get you anything (as adjacent pixels are typically assigned to the same wave).

Does anyone know of any magic that can force a fragment shader to only allocate resources for say half the pixels of the screen? (but keep the render target size)

The easiest way to only draw half of the screen is to render to a lower resolution render target, then draw it scaled to match your screen-aligned quad. You can play with your sampler configuration when drawing to your screen-aligned quad to get the "grainy" pixelated image, if that's what you want.

clip() and discard() won't get you any performance increase, nor is it intended to. To understand why, consider how the GPU does things.

This must mean that the gpu thread groups are by default allocated in a tiled fashion on the screen, ...

Correct. Not only is this the "default," it is mandated by the underlying hardware in the fixed-function rasterizer stage.

...and that all threads of a group must be done computing before the (other) resources of the thread group can be used again.

Shaders are executed in waves of 64 (AMD) or 32 (NVIDIA) threads. These threads execute in SIMD fashion, meaning that they execute in lock-step.

The wave will execute at the rate of the slowest thread. If some threads clip() or return say, halfway through the shader, they will still stall until all threads in the wave have completed. They aren't like CPU threads. (I consider the term "thread" to be something of a misnomer in this case but, hey, it's the world we live in.)

Now, with shader model 6 on the horizon, or with Vulkan right now, you can use wave ballot intrinsics to early-out and get some of that execution time back. However, even with those tools, the wave will still take as long as the the slowest thread. So, turning off every other pixel won't get you anything (as adjacent pixels are typically assigned to the same wave).

Does anyone know of any magic that can force a fragment shader to only allocate resources for say half the pixels of the screen? (but keep the render target size)

You may be able to use a stencil buffer to mask off every other pixel, but this wouldn't give you back 100% of the time you're looking to save as there's overhead for the stencil comparison. Also, that just seems like a kludge.

The best and easiest way to only draw half of the pixels screen is to render to a 1/4 resolution render target (1/2 on each side), then draw it scaled to match your screen-aligned quad. You can play with your sampler configuration when drawing to your screen-aligned quad to get the "grainy" pixelated image, if that's what you want.

You COULD do this in compute, but you'd be handling rasterization yourself. Then again, that may be a more appropriate place to implement raytracing. I've written raycasting code in both CUDA and HLSL compute, and it works pretty well. If you don't need the rasterizer, compute is a perfectly reasonable way to go.

Source Link

answered Dec 15, 2016 at 18:29

3Dave

3.1k
1
23
37

Loading

Stack Exchange Network

Return to Answer