Does it make sense to use a compute shader with Dispatch(1,1,1) and Numthreads[1,1,1] to draw a cone?

Question

I was working on this idea of drawing a cone rotating on the y axis with a parametric equation, using a compute shader with a function like:

CS(SV_GroupID, uint3 nDTid : SV_DispatchThreadID, SV_GroupThreadID).

To avoid multiple rendering, I was using a test on nDTid to draw only in a given nDTid.xy interval.

I found this tricky and finally I decided to use Dispatch(1,1,1) and numthreads[1,1,1], which remove the requirement for the nDTid interval test and the cone is drawn only once.

What I have noticed is an erratic framerate that I guess depends on the number of pixel drawn.

This becomes more an exercise than something applicable to my regular project, but is it of interest to limit Dispatch and numthread as I did with compute shaders? How can I reduce this erratic framerate?

Below is a code example and the rendered result.

Preparing constant shader

void UpdateShaderConstantCone()
{
    FLOAT R = 10.0f;//radius at cone base
    FLOAT H = 30.0f;//height of Cone
    XMVECTOR U = XMVectorSet(R, 0, 0, 1);//vector for plane of base cone
    XMVECTOR DirC = XMVectorSet(0, 0, H, 360);//initial direction of cone along Z axis
    XMMATRIX M = XMMatrixRotationY(RotCone);
    M.r[3] = XMVectorSet(20, 50, 0, 1);//position of the cone top = purple ball center
    gCBCone.mWorld = XMMatrixTranspose(M*gVP);//set World.View.Perspective mat
    XMStoreFloat4(&gCBCone.DirC, DirC);
    XMStoreFloat4(&gCBCone.U, U);
    FLOAT C = 0.5f * 255.0f;
    gCBCone.Color = XMFLOAT4(0 * C, C * 255 , C * 255 * 255, 1.0f);
    gpDC11->UpdateSubresource(gpCBBufferCone, 0, NULL, &gCBCone, 0, 0);
    gpDC11->CSSetConstantBuffers(dwSlotCone, 1, &gpCBBufferCone);
}

Setting the compute shader. Instead of progressing along the cone axis and drawing a circle at each step, which required one transform / lighting calculation per circle pixel drawn, I loop first for each pixel of the circle at the base and draw a line from top cone to this pixel. This requires only one transform / lighting calculation per line drawn. It is not really efficient may I say, as top pixels can be drawn several times. This shader does not draw the base of the cone and did not z-test. For some reasons, the lighting makes the cone reddish instead of going grey, which I have not solved yet.

#define SM_SCREENX 1920.0
#define SM_SCREENY 1080.0
RWTexture2D<uint> UAVDiffuse0  : register( u0 );

cbuffer cbCone: register(b6)
{
    matrix WVP;
    float4 DirC;//direction of cone, w=Height
    float4 U;//vector plane of circle w=radius / Height
    //Note the plane requires two vectors UP(R,0,0) and VP(0,R,0).
    //so U is used as U.xyy is UP and U.yxy is VP
    float4 C;//color precalculated half
};

#define CPInc 2*3.14159265359f
static const float3 f3_u3 = float3(255,255*255,255*255*255);
static const float3 LDir = normalize(float3(-1,-1, 1));//Light dir
static const float3 LCol = float3(0.5f,0.5f,0.5f)*f3_u3;//ambient color 0.5f,0.5f,0.5f
static const float2 Sd2a = float2(SM_SCREENX, SM_SCREENY)*0.5f;
static const float2 Sd2m = float2(SM_SCREENX, -SM_SCREENY)*0.5f;
[numthreads(1,1,1)]
void CS_PostDeferred( uint3 nGid : SV_GroupID, uint3 nDTid : SV_DispatchThreadID, uint3 nGTid : SV_GroupThreadID )
{
    float4 P = mul(float4(0,0,0,1),WVP);
    P/=P.w;
    P.xy=P.xy*Sd2m+Sd2a;
    float4 E = float4(0,0,0,1);
    float k=CPInc/DirC.w;//angle increment to draw the circle
    float Angle = 0;
    float2 sd2aP = Sd2a-P.xy;//precalculate delta for line tracing
    for ( int a = 0; a < DirC.w; a++ )//make a circle at height h
    {
        float3 D = U.xyy*cos(Angle)+U.yxy*sin(Angle);//calculate circle pixel pos
        E.xyz = DirC.xyz+D;//add base position from top eg 0,0,Height
        Angle+=k;
        E = mul(E,WVP);//convert to screen space
        E/=E.w;
        E.xy=mad(E.xy,Sd2m,sd2aP);//convert pixel position E to distance in pixel from top cone
        float2 dXY = abs(E.xy);//calculate longest screen pixel path
        int L = (dXY.x>dXY.y)?dXY.x:dXY.y;//determine number of step
        float3 CL = C*dot(normalize(D),LDir)+LCol;//calculate ambient lighting
        //we assume the normal D along the line is constant
        E.xy/=L;//convert E to pixel path step
        float2 XY = P.xy;
        for (int l=0;l<L;l++)
        {
            UAVDiffuse0[int2(XY)]= (uint(CL.r)&0xFF) | (uint(CL.g)&0xFF00) | (uint(CL.b)&0xFF0000);
            XY+=E.xy;
        }
    }
}

Screen shot

An alternative solution is to use a geometry shader to output a triangle from the top to two successive pixel circles.

LudoProf · Accepted Answer · 2025-11-21 01:57:22Z

There may at times be reasons to use a compute shader with Dispatch(1, 1,1) and Numthreads[1,1,1], but drawing a cone is not the use case for that.

By limiting your compute shader to a single thread, you're giving up the parallel computation benefits you could be getting from the GPU. We could draw potentially every pixel of the cone in parallel, rather than looping over the cone one step at a time serially. This parallelism minimizes the total latency from when we start drawing the cone to when we're done, and scales much better to use the hardware's many cores to handle different numbers of pixels that need to be drawn.

Myself, I'd skip using a compute shader entirely. You can achieve this effect with a plain vanilla vertex-fragment shader and no loops at all (and the cone is fully solid, without the moiré stripes of skipped pixels seen in the question):

We'll start by building a mesh that represents a bounding volume for your cone - something that will cover every fragment of the screen you want your cone to occupy, though it's OK if it covers a bit more. We'll draw this bounding mesh with a shader that draws the cone contained inside, and clips-out any excess space outside.

For my example, I'm using a standard 1-unit-wide cube mesh, with vertices at (±0.5, ±0.5, ±0.5). To be more efficient, your bounds could be a low-poly cone, so you don't need to clip as much empty space.

We'll render this mesh flipped inside-out, so we're drawing its far faces instead of the nearer ones. This means even if the camera moves up to/inside the bounding volume, it'll still see the far face, so our shader will still run there (rather than seeing a hole in our cone where the bounding volume clips the near plane).

In the vertex shader, we'll save the position of "this" vertex and the eye vector (pointing from the vertex to the camera), both in object-local space. Those will be interpolated and passed to the fragment shader, to become the initial position and direction of a ray we'll "trace" through the bounding volume, to find where it intersects the cone.

In my case, I'll shift the z coordinate in the shader so it runs 0...1, putting the point of the cone (z=0) at one face of the bounding cube.

struct v2f {
    float4 vertex : SV_POSITION;
    float3 pos : TEXCOORD0;
    float depth: TEXCOORD1;
    float3 dir : TEXCOORD2;
};

v2f vert (appdata v) {
    v2f o;

    // Project vertex to clip space, as normal.
    o.vertex = mul(MVP, v.vertex);

    // Save depth separately to use in fragment shader.
    o.depth - o.vertex.w;
    
    // Save local position of vertex to interpolate.
    o.pos = v.vertex.xyz;
    o.pos.z += 0.5f; // HACK: remapping cube to 0...1 on the Z
    // (Because I was too lazy to make a custom model)

    // Get vector from vertex to camera, also in local space.
    o.dir = mul(WorldToObject, float4(CameraPos, 1)).xyz - v.vertex.xyz;

    return o;
}

Then in the fragment shader, we can analytically compute where this ray intersects the cone (without raymarching one step at a time), and return the "outside" face that should be seen by the camera.

struct fragout {
    fixed4 colour : SV_Target;
    float depth : SV_Depth;
};

float eye_to_nonlinear_depth(float depth) {
    // Using Unity convention:
    // _ZBufferParams.z = (1 - far)/(near * far) 
    //                    (or -1 times that if using reverse Z buffer)
    // _ZBufferParams.w = 1 / near 
    //                    (or 1/far if using reverse Z buffer)

    return (1.0 - (depth * _ZBufferParams.w)) 
           / (depth * _ZBufferParams.z);
}

fragout frag (v2f i) {
    fragout o;

    // Coefficients for quadratic formula:
    float a = dot(i.dir.xy, i.dir.xy) - 0.25f * i.dir.z * i.dir.z;
    float b = 2 * dot(i.pos.xy, i.dir.xy) - 0.5f * i.pos.z * i.dir.z;
    float c = dot(i.pos.xy, i.pos.xy) - 0.25f * i.pos.z * i.pos.z;

    // Discriminant - if less than zero, we missed the cone entirely.
    float disc = b*b - 4 * a * c;
    clip(disc);
    float step = sqrt(disc);

    // Parameter value t where we cross the cone.
    float t = (-b + step)/(2*a);

    // Flip which side we're looking at if outside the bounds.
    if (t < 0 || i.pos.z + i.dir.z * t > 1) {
        t -= step/a;
    }
    
    // Reconstruct local position on cone surface.
    float3 intersect = i.pos + t * i.dir;

    // If outside of bounds, clip it out.
    clip(intersect.z * (1-intersect.z));

    // For debug purposes, display local intersection point as a colour.
    o.colour = fixed4(intersect, 1.0f);
    o.colour.rg *= 2.0f;

    // t = 0 means we're at the original depth interpolated from the vertices.
    // t = 1 means we've travelled the full eye vector and hit the camera.
    // So we can just scale the original depth by 1-t and remap it:
    o.depth = eye_to_nonlinear_depth(i.depth * (1.0f - t));

    return o;
}

Because we're using the object's (inverse) model matrix, we can rotate and scale the cone just by orienting and scaling the model:

And because we calculate the eye depth of the intersection, it Z-sorts correctly with other geometry:

This version draws the cone as though it's hollow, but if you want to cap it off, or draw it translucent so you see both faces, or even vary its opacity by how much of the cone's solid volume the view ray cuts through, any of those are doable too with just some tweaks to the math.

Once you know what part of the cone your view ray is hitting, you can shade it any way you like - you're not limited to the debug colours I've used here.

Thanks Ludoprof for this reply. I have to adapt now. Can you comment on when it is interesting to use dispatch(1,1,1) — philB
– philB, Commented yesterday
I've never needed that case myself, but I'd speculate you might do that when you need to do say a coordination / filtering / digesting step in part of a bigger process. Like an earlier dispatch created a bunch of intermediate results across many threads, and you need to do just one extra invocation to review and summarize those results down to the narrower final output you need - like identifying the highest-priority tiles to update in a virtual texture this frame, out of many candidates that were previously evaluated in parallel. — LudoProf
– LudoProf, Commented yesterday

Stack Exchange Network

Does it make sense to use a compute shader with Dispatch(1,1,1) and Numthreads[1,1,1] to draw a cone?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Does it make sense to use a compute shader with Dispatch(1,1,1) and Numthreads[1,1,1] to draw a cone?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions