Try to use h264_nvenc, nvidia's hardware H.264 encoder.
Install required drivers & codecs, if not already done.
You may need to enable non-free repos in /etc/apt/sources.list
The NVENC Video Encoding library provides an interface to video encoder hardware on supported NVIDIA GPUs.
This package contains the nvidia-encode runtime library
Modify and test your ffmpeg command to.
ffmpeg -y -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -c:v h264_nvenc -preset p6 -tune hq -rc vbr -cq 24 -b:v 0 output.mp4
If the command doesn’t work due to an error with a mismatch in hardware-accelerated decoding and scaling, check this post.
There are solutions for "Using FFmpeg with full hardware-accelerated decoding via NVDEC" and "Using software-based decode fallback", along with other relevant links in this Post.
For FFmpeg, there are two variables you can modify to speed up initialization time for CUDA HWContext creation, useful in low-latency workloads, selecting GPU 0 for execution:
export CUDA_VISIBLE_DEVICES=0
export CUDA_DEVICE_MAX_CONNECTIONS=2
On a mixed-GPU setup where device listing order(s) from nvidia-smi may be inconsistent, set the environment variable CUDA_DEVICE_ORDER to CUDA_DEVICE_ORDER=PCI_BUS_ID to enforce a consistent schema for GPU listing and provisioning by PCI IDs.
Modify as necessary to suit your environment.
1. Using FFmpeg with Full hardware-accelerated decoding via nvdec:...
(a). Scaling being done with the scale_npp filer, available when FFmpeg is built with the proprietary CUDA SDK (when the flags --enable-nonfree --enable-cuda-nvcc --nvccflags="-gencode arch=compute_52,code=sm_52 -O2" are passed to ./configure on build time)
2. Using software-based decode fallback:
Note that GPU allocation is done via the filter hwupload_cuda=0 which initializes a CUDA HWContext bound to GPU 0 for all scaling operations, and for the h264_nvenc encoder wrapper, the private option -gpu:v 0 follows.
(a). Using the scale_npp filter:
Other relevant links from Convert ffmpeg encoding from libx264 to h264_nvenc:...
More sources:
I would guess that libx264 delivers better quality than h264_nvenc for the same bitrate.
h264_nvenc is probably faster and uses less power. h264_nvenc is only available on NVIDIA hardware.
nvidia-smi (NVIDIA System Management Interface)
To check the GPU Usage and other informations run nvidia-smi in another terminal to confirm GPU utilization.
No one-size-fits-all solution
This issue often comes down to how your software, hardware, drivers, and system settings interact.
The mismatch between hardware acceleration and scaling depends on your specific setup, like your GPU, installed drivers, FFmpeg configuration, and even environment variables.
There’s no one-size-fits-all solution, you’ll need to experiment with different settings (e.g. switching between hardware and software decoding) to see what works for your system and your setup.