Create Numpy array of images

Question

I have some (950) 150x150x3 .jpg image files that I want to read into an Numpy array.

Following is my code:

X_data = []
files = glob.glob ("*.jpg")
for myFile in files:
    image = cv2.imread (myFile)
    X_data.append (image)

print('X_data shape:', np.array(X_data).shape)

The output is (950, 150). Please let me know why the list is not getting converted to np.array correctly and whether there is a better way to create the array of images.

Of what I have read, appending to numpy arrays is easier done through python lists and then converting them to arrays.

EDIT: Some more information (if it helps), image.shape returns (150,150,3) correctly.

what's your goal? a 4D 950x150x150x3 array? or a list of "correct" arrays of 150x150x3 or something else? — DomTomCat
– DomTomCat, Commented Jun 10, 2016 at 11:28
@GughanRavikumar It does not help because cv2.imread already returns a numpy array. — Abhishek Bansal
– Abhishek Bansal, Commented Jun 10, 2016 at 11:41
@AbhishekBansal Then try np.vstack(X_data) instead of np.array(X_data) — SvbZ3r0
– SvbZ3r0, Commented Jun 10, 2016 at 11:43

DomTomCat · Accepted Answer · 2019-08-30 09:04:29Z

25

I tested your code. It works fine for me with output

('X_data shape:', (4, 617, 1021, 3))

however, all images were exactly the same dimension.

When I add another image with different extents I have this output:

('X_data shape:', (5,))

So I'd recommend checking the sizes and the same number of channels (as in are really all images coloured images)? Also you should check if either all images (or none) have alpha channels (see @Gughan Ravikumar's comment)

If only the number of channels vary (i.e. some images are grey), then force loading all into the color format with:

image = cv2.imread (myFile, cv2.IMREAD_COLOR)

EDIT: I used the very code from the question, only replaced with a directory of mine (and "*.PNG"):

import cv2
import glob
import numpy as np

X_data = []
files = glob.glob ("C:/Users/xxx/Desktop/asdf/*.PNG")
for myFile in files:
    print(myFile)
    image = cv2.imread (myFile)
    X_data.append (image)

print('X_data shape:', np.array(X_data).shape)

edited Aug 30, 2019 at 9:04

answered Jun 10, 2016 at 11:50

DomTomCat

8,6291 gold badge54 silver badges66 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Abhishek Bansal Over a year ago

All the images are 3 channel, having dimensions 150x150x3. Can there be any other error?

DomTomCat Over a year ago

You could test with enforcing the same data type: image = cv2.imread (myFile, 1).astype(np.uint8), however I don't quite believe in it

Håken Lid Over a year ago

You can add an assert statement in the loop that will raise AssertionError if any of the images have a different shape: assert image.shape == (150,150,3), "img %s has shape %r" % (myFile, image.shape)

Abhishek Bansal Over a year ago

Thank you one of my images was of size (150,149,3), apparently wasn't getting noticed. Sorry and Thanks again.

Mridul Pandey · Accepted Answer · 2020-01-23 02:58:55Z

6

Appending images in a list and then converting it into a numpy array, is not working for me. I have a large dataset and RAM gets crashed every time I attempt it. Rather I append the numpy array, but this has its own cons. Appending into list and then converting into np array is space complex, but appending a numpy array is time complex. If you are patient enough, this will take care of RAM crasing problems.

def imagetensor(imagedir):
  for i, im in tqdm(enumerate(os.listdir(imagedir))):
    image= Image.open(im)
    image= image.convert('HSV')
    if i == 0:
      images= np.expand_dims(np.array(image, dtype= float)/255, axis= 0)
    else:
      image= np.expand_dims(np.array(image, dtype= float)/255, axis= 0)
      images= np.append(images, image, axis= 0)
  return images

I am looking for better implementations that can take care of both space and time. Please comment if someone has a better idea.

edited Jan 23, 2020 at 2:58

answered Jan 23, 2020 at 2:44

Mridul Pandey

4021 gold badge5 silver badges8 bronze badges

1 Comment

DevLoverUmar Over a year ago

Thanks! Was having same issues and in this case I can compromise on time but at least it should work!

mic · Accepted Answer · 2020-07-15 04:54:27Z

1

Here is a solution for images that have certain special Unicode characters, or if we are working with PNGs with a transparency layer, which are two cases that I had to handle with my dataset. In addition, if there are any images that aren't of the desired resolution, they will not be added to the Numpy array. This uses the Pillow package instead of cv2.

resolution = 150

import glob
import numpy as np
from PIL import Image

X_data = []
files = glob.glob(r"D:\Pictures\*.png")
for my_file in files:
    print(my_file)
    
    image = Image.open(my_file).convert('RGB')
    image = np.array(image)
    if image is None or image.shape != (resolution, resolution, 3):
        print(f'This image is bad: {myFile} {image.shape if image is not None else "None"}')
    else:
        X_data.append(image)

print('X_data shape:', np.array(X_data).shape)
# If you have 950 150x150 images, this would print 'X_data shape: (950, 150, 150, 3)'

If you aren't using Python 3.6+, you can replace the r-string with a regular string (except with \\ instead of \, if you're using Windows), and the f-string with regular string interpolation.

answered Jul 15, 2020 at 4:54

mic

1,2741 gold badge18 silver badges31 bronze badges

2 Comments

Sten Techy Over a year ago

Why do I get this as my output: X_data shape: (5,) and not X_data shape: (950, 150, 150, 3)

Sten Techy Over a year ago

The cause of the error was the image format I used, I used ".jpg" instead of ".png", the line "files = glob.glob(r"D:\Pictures*.png")" seems to only read png files. I feel I should not delete the comment, just in case another person runs into the same error.

Bob · Accepted Answer · 2017-05-23 16:39:05Z

0

Your definition for the .JPG frame that will be put into a matrix of the same size should should be x, y, R, G, B, A. "A" is not used, but it does take up 8 bits at the end of each pixel.

answered May 23, 2017 at 16:39

Bob

11 bronze badge

Collectives™ on Stack Overflow

Create Numpy array of images

4 Answers 4

4 Comments

1 Comment

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related