Tensorflow speed of tf.nn.conv2D used instead of opencv GaussianBlur

Ask Question

Asked 5 months ago

Modified 5 months ago

Viewed 75 times

I'm trying to move some computer vision tasks to tensorflow. The most intensive ops are convolutions, like GaussianBlur. The timings I get using timeit suggest that the GPU equivalent is >10 x slower.

The stdout reports "WARNING:tensorflow:AutoGraph could not transform ... and will run it as-is..." for both functions. This means that no graph is created, and so performance is less good?
Can I use timeit to check performance of a tf script?
I assumed that by creating the tf variables outside the function, that the function starts from variables already on the GPU, is this true?
Is it possible to do the 2D convolution layer by layer without splitting the image in layers like I do in tf_gauss_blur_stacked()?

The code below is my current version of Gaussian blur in opencv and tensorflow.

import tensorflow as tf # v2.2.0, requires tf 2 for example
import numpy as np # v1.21.6
import cv2  # v4.6.0
from timeit import timeit

def opencv_gauss_blur(im):
    return cv2.GaussianBlur(im.copy(), (5, 5), 1.1)

@tf.function
def tf_gauss_blur_stacked(im, im_out, kernel2D):
    for ii in range(im.shape[-1]):
        im_slice = im[0, :, :, ii][tf.newaxis, :, :, tf.newaxis]
        im_out[0, :, :, ii].assign(tf.nn.conv2d(im_slice, kernel2D, strides=[1, 1, 1, 1], padding="SAME")[0,:,:,0])
    return im_out

@tf.function
def tf_gauss_blur(im, kernel3D):
    return tf.nn.conv2d(im, kernel3D, strides=[1, 1, 1, 1], padding="SAME")

# input image A with dimensions (x, y, channel)
A = np.random.randint(0, 4095, (40, 50, 3)).astype(dtype=np.float32)
B = opencv_gauss_blur(A)

blur_kernel = cv2.getGaussianKernel(5, 1.1) * cv2.getGaussianKernel(5, 1.1).T
kernel2D = tf.constant(blur_kernel, dtype=tf.float32)[:, :, tf.newaxis, tf.newaxis] # shape dims:  X, Y, num_input_channels, num_output_channels

kernel3D = np.zeros(shape=kernel2D.shape[:2] + (A.shape[-1], A.shape[-1]), dtype=np.float32)
for ii in range(A.shape[-1]):
    kernel3D[:, :, ii, ii] = kernel2D[:, :, 0, 0]
kernel3D = tf.constant(kernel3D, dtype=tf.float32)
    
tfA = tf.constant(A, dtype=tf.float32)[tf.newaxis, ] # shape dims batch, X, Y, cha
im_out = tf.Variable(tfA)
B_tf_stack = tf_gauss_blur_stacked(tfA, im_out, kernel2D)

B_tf = tf_gauss_blur(tfA, kernel3D)

print(np.abs((B_tf[0, 2:-2, 2:-2, ] - B[2:-2, 2:-2,])).max())
print(np.abs((B_tf_stack[0, 2:-2, 2:-2, ] - B[2:-2, 2:-2,])).max())

The max difference between openCV and tensorflow is < 0.001 (on scale of 0-4095), which is sufficient agreement.

Timeit from the console:

%timeit B = opencv_gauss_blur(A)
%timeit B_tf = tf_gauss_blur_stacked(tfA, im_out, kernel2D)
%timeit B_tf = tf_gauss_blur(tfA, kernel3D)

Gives 11, 386 and 257 us per loop (mean ± std. dev. of 7 runs, 1000 loops each). OpenCV convolutes a 1D gaussian in X and then a 1D gaussian in Y direction, which should yield the same as the 2D/3D tensorflow functions, but has 2.5x/7.5x fewer computations.

edited Jun 19 at 13:20

Christoph Rackwitz

16.4k5 gold badges42 silver badges56 bronze badges

asked Jun 19 at 13:04

Frank_Coumans

3145 silver badges14 bronze badges

1

a gaussian blur is a specific instance of a convolution. for one, it's separable. for another, it's symmetric.

Christoph Rackwitz
– Christoph Rackwitz

2025-06-19 13:21:19 +00:00
Commented Jun 19 at 13:21
2

Did you try TensorFlow's own Gaussian filter method? tensorflow.org/api_docs/python/tfm/vision/augment/…

paisanco
– paisanco

2025-06-19 14:15:08 +00:00
Commented Jun 19 at 14:15
@paisanco Thank you for the tip. I cannot install this package on my current system, but will test it on another machine.

Frank_Coumans
– Frank_Coumans

2025-06-19 20:02:29 +00:00
Commented Jun 19 at 20:02
@ChristophRackwitz Yes, this means it can be sped up by doing two orthogonal 1D convolutions (which is the openCV approach) instead of one 2D convolution. But it doesn't help if you need to pad the convolution kernel with 0s to make it run in a single pass. Or is there another consequence?

Frank_Coumans
– Frank_Coumans

2025-06-19 20:09:03 +00:00
Commented Jun 19 at 20:09
I see no requirement for any padding of the kernel.

Christoph Rackwitz
– Christoph Rackwitz

2025-06-20 08:04:06 +00:00
Commented Jun 20 at 8:04

| Show 2 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Tensorflow speed of tf.nn.conv2D used instead of opencv GaussianBlur

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest