1

I am trying to implement a convolutional layer in Python using Numpy. The input is a 4-dimensional array of shape [N, H, W, C], where:

  • N: Batch size
  • H: Height of image
  • W: Width of image
  • C: Number of channels

The convolutional filter is also a 4-dimensional array of shape [F, F, Cin, Cout], where

  • F: Height and width of a square filter
  • Cin: Number of input channels (Cin = C)
  • Cout: Number of output channels

Assuming a stride of one along all axes, and no padding, the output should be a 4-dimensional array of shape [N, H - F + 1, W - F + 1, Cout].

My code is as follows:

import numpy as np

def conv2d(image, filter):
  # Height and width of output image
  Hout = image.shape[1] - filter.shape[0] + 1
  Wout = image.shape[2] - filter.shape[1] + 1

  output = np.zeros([image.shape[0], Hout, Wout, filter.shape[3]])

  for n in range(output.shape[0]):
    for i in range(output.shape[1]):
      for j in range(output.shape[2]):
        for cout in range(output.shape[3]):
          output[n,i,j,cout] = np.multiply(image[n, i:i+filter.shape[0], j:j+filter.shape[1], :], filter[:,:,:,cout]).sum()

  return output

This works perfectly, but uses four for loops and is extremely slow. Is there a better way of implementing a convolutional layer that takes 4-dimensional input and filter, and returns a 4-dimensional output, using Numpy?

3
  • I'm having dificulties tryind to reproduce this. Can you give a sample filter? Judging by this ,filter.shape[3], is it 4dimensional? Commented May 11, 2019 at 15:44
  • The filter is 4 dimensional. A sample filter could be filter = np.random.randint(0, 2, [5, 5, 3, 16]). This would be a 5 X 5 filter that operates on a three channel input image and generates an output 'image' with 16 channels. Commented May 11, 2019 at 16:12
  • Okay, will give it a look when I have some time Commented May 11, 2019 at 17:07

1 Answer 1

6

This a straightforward implementation of this kind of keras-like (?) convolution. It might be hard to understand for beginners because it uses a lot of broadcasting and stride tricks.

from numpy.lib.stride_tricks import as_strided
def conv2d(a, b):
    a = as_strided(a,(len(a),a.shape[1]-len(b)+1,a.shape[2]-b.shape[1]+1,len(b),b.shape[1],a.shape[3]),a.strides[:3]+a.strides[1:])
    return np.einsum('abcijk,ijkd', a, b[::-1,::-1])

BTW: if you are doing convolution with very-big kernel, use Fourier-based algorithm instead.

EDIT: The [::-1,::-1] should be removed in the case that convolution does not involve flipping the kernel first (like what's in tensorflow).

EDIT: np.tensordot(a, b, axes=3) performs much better than np.einsum("abcijk,ijkd", a, b), and is highly recommended. So, the function becomes:

from numpy.lib.stride_tricks import as_strided

def conv2d(a, b):
  Hout = a.shape[1] - b.shape[0] + 1
  Wout = a.shape[2] - b.shape[1] + 1

  a = as_strided(a, (a.shape[0], Hout, Wout, b.shape[0], b.shape[1], a.shape[3]), a.strides[:3] + a.strides[1:])

  return np.tensordot(a, b, axes=3)
Sign up to request clarification or add additional context in comments.

7 Comments

Thank you for the answer. However, this does not seem to produce the correct result. For example, I compared it with tensorflow.conv2d(a, b, strides=[1,1,1,1], padding="VALID"), and the results are different. The output tensors have the same shape, but the values are different. Any idea why that could be the case?
I found the problem. It seems to be related to the definition of convolve. For [1,2,3] convolving with [1,2,3], in my understanding, it should be 1*3+2*2+3*1 (otherwise it's called correlate). But tensorflow seems to have defined convolution as 1*1+2*2+3*3. Anyway, to get the result of tensorflow, simply remove the [::-1,::-1] at last line.
The method seemed interesting, so evaluated the performance of all 3 variants. Here is the result. TF using CPU only. gist.github.com/prabindh/beea050b379e6e9a52699b66d7ce227f
@Prabindh Yeah, generally numpy should not be slower than tensorflow on same cpu under similar algorithm implementation, and this implementation is basically equivalent to the psudocode in tf.nn.conv2d documentation: generate a virtual matrix in no time (except here is a 6-tensor), and do right multiplication (in einsum). You can fix the result different problem by removing the [::-1,::-1]
Thank you for your contributions @ZisIsNotZis and @Prabindh. Would suggest using np.tensordot(a, b, axes=3) instead of np.einsum("abcijk,ijkd", a, b). I noticed the former works much faster than the latter, especially on increasingly larger datasets.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.