0

I'm trying to move some computer vision tasks to tensorflow. The most intensive ops are convolutions, like GaussianBlur. The timings I get using timeit suggest that the GPU equivalent is >10 x slower.

  • The stdout reports "WARNING:tensorflow:AutoGraph could not transform ... and will run it as-is..." for both functions. This means that no graph is created, and so performance is less good?
  • Can I use timeit to check performance of a tf script?
  • I assumed that by creating the tf variables outside the function, that the function starts from variables already on the GPU, is this true?
  • Is it possible to do the 2D convolution layer by layer without splitting the image in layers like I do in tf_gauss_blur_stacked()?

The code below is my current version of Gaussian blur in opencv and tensorflow.

import tensorflow as tf # v2.2.0, requires tf 2 for example
import numpy as np # v1.21.6
import cv2  # v4.6.0
from timeit import timeit

def opencv_gauss_blur(im):
    return cv2.GaussianBlur(im.copy(), (5, 5), 1.1)

@tf.function
def tf_gauss_blur_stacked(im, im_out, kernel2D):
    for ii in range(im.shape[-1]):
        im_slice = im[0, :, :, ii][tf.newaxis, :, :, tf.newaxis]
        im_out[0, :, :, ii].assign(tf.nn.conv2d(im_slice, kernel2D, strides=[1, 1, 1, 1], padding="SAME")[0,:,:,0])
    return im_out

@tf.function
def tf_gauss_blur(im, kernel3D):
    return tf.nn.conv2d(im, kernel3D, strides=[1, 1, 1, 1], padding="SAME")

# input image A with dimensions (x, y, channel)
A = np.random.randint(0, 4095, (40, 50, 3)).astype(dtype=np.float32)
B = opencv_gauss_blur(A)

blur_kernel = cv2.getGaussianKernel(5, 1.1) * cv2.getGaussianKernel(5, 1.1).T
kernel2D = tf.constant(blur_kernel, dtype=tf.float32)[:, :, tf.newaxis, tf.newaxis] # shape dims:  X, Y, num_input_channels, num_output_channels

kernel3D = np.zeros(shape=kernel2D.shape[:2] + (A.shape[-1], A.shape[-1]), dtype=np.float32)
for ii in range(A.shape[-1]):
    kernel3D[:, :, ii, ii] = kernel2D[:, :, 0, 0]
kernel3D = tf.constant(kernel3D, dtype=tf.float32)
    
tfA = tf.constant(A, dtype=tf.float32)[tf.newaxis, ] # shape dims batch, X, Y, cha
im_out = tf.Variable(tfA)
B_tf_stack = tf_gauss_blur_stacked(tfA, im_out, kernel2D)

B_tf = tf_gauss_blur(tfA, kernel3D)

print(np.abs((B_tf[0, 2:-2, 2:-2, ] - B[2:-2, 2:-2,])).max())
print(np.abs((B_tf_stack[0, 2:-2, 2:-2, ] - B[2:-2, 2:-2,])).max())

The max difference between openCV and tensorflow is < 0.001 (on scale of 0-4095), which is sufficient agreement.

Timeit from the console:

%timeit B = opencv_gauss_blur(A)
%timeit B_tf = tf_gauss_blur_stacked(tfA, im_out, kernel2D)
%timeit B_tf = tf_gauss_blur(tfA, kernel3D)

Gives 11, 386 and 257 us per loop (mean ± std. dev. of 7 runs, 1000 loops each). OpenCV convolutes a 1D gaussian in X and then a 1D gaussian in Y direction, which should yield the same as the 2D/3D tensorflow functions, but has 2.5x/7.5x fewer computations.

7
  • 1
    a gaussian blur is a specific instance of a convolution. for one, it's separable. for another, it's symmetric. Commented Jun 19 at 13:21
  • 2
    Did you try TensorFlow's own Gaussian filter method? tensorflow.org/api_docs/python/tfm/vision/augment/… Commented Jun 19 at 14:15
  • @paisanco Thank you for the tip. I cannot install this package on my current system, but will test it on another machine. Commented Jun 19 at 20:02
  • @ChristophRackwitz Yes, this means it can be sped up by doing two orthogonal 1D convolutions (which is the openCV approach) instead of one 2D convolution. But it doesn't help if you need to pad the convolution kernel with 0s to make it run in a single pass. Or is there another consequence? Commented Jun 19 at 20:09
  • I see no requirement for any padding of the kernel. Commented Jun 20 at 8:04

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.