0

I'm trying to do PCA on a sparse matrix, but I am encountering an error:

TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

Here is my code:

import sys
import csv
from sklearn.decomposition import PCA

data_sentiment = []
y = []
data2 = []
csv.field_size_limit(sys.maxint)
with open('/Users/jasondou/Google Drive/data/competition_1/speech_vectors.csv') as infile:
    reader = csv.reader(infile, delimiter=',', quotechar='|')
    n = 0
    for row in reader:
        # sample = row.split(',')
        n += 1
        if n%1000 == 0:
            print n
        data_sentiment.append(row[:25000])

pca = PCA(n_components=3)
pca.fit(data_sentiment)
PCA(copy=True, n_components=3, whiten=False)
print(pca.explained_variance_ratio_) 
y = pca.transform(data_sentiment)

The input data is speech_vector.csv, which a 2740 * 50000 matrix found available here

Here is the full error traceback:

Traceback (most recent call last):
  File "test.py", line 45, in <module>
    y = pca.transform(data_sentiment)
  File "/Users/jasondou/anaconda/lib/python2.7/site-packages/sklearn/decomposition/pca.py", line 397, in transform
    X = X - self.mean_
TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

I do not quite understand what self.mean_ refers to here.

5
  • 3
    It would be useful to know which line the error occurs, also your code is in it's current form just nonsense as you're passing an empty list to pca.fit Commented Apr 15, 2015 at 20:34
  • 1
    I'm thinking this happens elsewhere (e.g. in pca.fit() or pca.transform()); I don't see any subtraction operations that might have raised this error directly in this top-level code. Commented Apr 15, 2015 at 20:36
  • 1
    I don't know what you're referring to when you say "did not quite understand what self.mean_ here" Commented Apr 15, 2015 at 20:47
  • 2
    Please update the question to include a minimal, complete example that demonstrates the problem (stackoverflow.com/help/mcve). You haven't shown in the code or stated in the question how PCA is imported. Commented Apr 15, 2015 at 20:51
  • This is still not a complete example - we don't have access to your CSV file, and we therefore can't know what data_sentiment looks like. Could you please add a few rows from data_sentiment to your question. Also, please edit your question to contain the full traceback for the error message you are seeing. Commented Apr 15, 2015 at 22:39

1 Answer 1

1

You are not parsing the CSV file correctly. Each row that your reader returns will be a list of strings, like this:

row = ['0.0', '1.0', '2.0', '3.0', '4.0']

Your data_sentiment will therefore be a list-of-lists-of-strings, for example:

data_sentiment = [row, row, row]

When you pass this directly to pca.fit(), it is internally converted to a numpy array, also containing strings:

X = np.array(data_sentiment)
print(repr(X))
# array([['0.0', '1.0', '2.0', '3.0', '4.0'],
#        ['0.0', '1.0', '2.0', '3.0', '4.0'],
#        ['0.0', '1.0', '2.0', '3.0', '4.0']], 
#       dtype='|S3')

numpy has no rule for subtracting an array of strings from another array of strings:

X - X
# TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

This mistake would have been very easy to spot if you had bothered to show us some of the contents of data_sentiment in your question, as I asked you to.


What you need to do is convert your strings to floats, for example:

data_sentiment.append([float(s) for s in row[:25000]])

A much easier way would be to use np.loadtxt to parse the CSV file:

data_sentiment = np.loadtxt('/path/to/file.csv', delimiter=',')

If you have pandas installed, then pandas.read_csv will probably be faster than np.loadtxt for a large array such as this one.

Sign up to request clarification or add additional context in comments.

2 Comments

If my answer solves your problem then you should accept it (click the tick next to my answer)
No problem, and welcome to StackOverflow! As a new user of the site, learning how to ask good questions is the most important skill for you to pick up. Please remember to include as much relevant information as you can in your question. If other users have to ask for important details in the comments then they are likely to get impatient with you, and may downvote or close your question instead of trying to answer it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.