Convolutional layer in Python using Numpy

Question

I am trying to implement a convolutional layer in Python using Numpy. The input is a 4-dimensional array of shape [N, H, W, C], where:

N: Batch size
H: Height of image
W: Width of image
C: Number of channels

The convolutional filter is also a 4-dimensional array of shape [F, F, Cin, Cout], where

F: Height and width of a square filter
Cin: Number of input channels (Cin = C)
Cout: Number of output channels

Assuming a stride of one along all axes, and no padding, the output should be a 4-dimensional array of shape [N, H - F + 1, W - F + 1, Cout].

My code is as follows:

import numpy as np

def conv2d(image, filter):
  # Height and width of output image
  Hout = image.shape[1] - filter.shape[0] + 1
  Wout = image.shape[2] - filter.shape[1] + 1

  output = np.zeros([image.shape[0], Hout, Wout, filter.shape[3]])

  for n in range(output.shape[0]):
    for i in range(output.shape[1]):
      for j in range(output.shape[2]):
        for cout in range(output.shape[3]):
          output[n,i,j,cout] = np.multiply(image[n, i:i+filter.shape[0], j:j+filter.shape[1], :], filter[:,:,:,cout]).sum()

  return output

This works perfectly, but uses four for loops and is extremely slow. Is there a better way of implementing a convolutional layer that takes 4-dimensional input and filter, and returns a 4-dimensional output, using Numpy?

I'm having dificulties tryind to reproduce this. Can you give a sample filter? Judging by this ,filter.shape[3], is it 4dimensional? — yatu
– yatu, Commented May 11, 2019 at 15:44
The filter is 4 dimensional. A sample filter could be filter = np.random.randint(0, 2, [5, 5, 3, 16]). This would be a 5 X 5 filter that operates on a three channel input image and generates an output 'image' with 16 channels. — Dr. Prasanna Date
– Dr. Prasanna Date, Commented May 11, 2019 at 16:12

Dr. Prasanna Date · Accepted Answer · 2019-05-15 02:38:16Z

6

This a straightforward implementation of this kind of keras-like (?) convolution. It might be hard to understand for beginners because it uses a lot of broadcasting and stride tricks.

from numpy.lib.stride_tricks import as_strided
def conv2d(a, b):
    a = as_strided(a,(len(a),a.shape[1]-len(b)+1,a.shape[2]-b.shape[1]+1,len(b),b.shape[1],a.shape[3]),a.strides[:3]+a.strides[1:])
    return np.einsum('abcijk,ijkd', a, b[::-1,::-1])

BTW: if you are doing convolution with very-big kernel, use Fourier-based algorithm instead.

EDIT: The [::-1,::-1] should be removed in the case that convolution does not involve flipping the kernel first (like what's in tensorflow).

EDIT: np.tensordot(a, b, axes=3) performs much better than np.einsum("abcijk,ijkd", a, b), and is highly recommended. So, the function becomes:

from numpy.lib.stride_tricks import as_strided

def conv2d(a, b):
  Hout = a.shape[1] - b.shape[0] + 1
  Wout = a.shape[2] - b.shape[1] + 1

  a = as_strided(a, (a.shape[0], Hout, Wout, b.shape[0], b.shape[1], a.shape[3]), a.strides[:3] + a.strides[1:])

  return np.tensordot(a, b, axes=3)

edited May 15, 2019 at 2:38

Dr. Prasanna Date

7751 gold badge9 silver badges29 bronze badges

answered May 13, 2019 at 6:52

ZisIsNotZis

1,7601 gold badge16 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Dr. Prasanna Date Over a year ago

Thank you for the answer. However, this does not seem to produce the correct result. For example, I compared it with tensorflow.conv2d(a, b, strides=[1,1,1,1], padding="VALID"), and the results are different. The output tensors have the same shape, but the values are different. Any idea why that could be the case?

ZisIsNotZis Over a year ago

I found the problem. It seems to be related to the definition of convolve. For [1,2,3] convolving with [1,2,3], in my understanding, it should be 1*3+2*2+3*1 (otherwise it's called correlate). But tensorflow seems to have defined convolution as 1*1+2*2+3*3. Anyway, to get the result of tensorflow, simply remove the [::-1,::-1] at last line.

Prabindh Over a year ago

The method seemed interesting, so evaluated the performance of all 3 variants. Here is the result. TF using CPU only. gist.github.com/prabindh/beea050b379e6e9a52699b66d7ce227f

ZisIsNotZis Over a year ago

@Prabindh Yeah, generally numpy should not be slower than tensorflow on same cpu under similar algorithm implementation, and this implementation is basically equivalent to the psudocode in tf.nn.conv2d documentation: generate a virtual matrix in no time (except here is a 6-tensor), and do right multiplication (in einsum). You can fix the result different problem by removing the [::-1,::-1]

Dr. Prasanna Date Over a year ago

Thank you for your contributions @ZisIsNotZis and @Prabindh. Would suggest using np.tensordot(a, b, axes=3) instead of np.einsum("abcijk,ijkd", a, b). I noticed the former works much faster than the latter, especially on increasingly larger datasets.

|

Collectives™ on Stack Overflow

Convolutional layer in Python using Numpy

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related