0

In my code I am currently iterating and creating three lists:

data, row, col

to construct a sparse matrix (it represents a rating matrix with user u having rated item i with a rating from 1 to 5), I got weird errors in the reported ratings when checking my sparse matrix afterwards: some values are greater than 5 which is not possible (I checked the file and there is no rating greater than 5, I also checked the values in the data list and there is no value greater than 5, so the error is probably when building the matrix using sparse.coo_matrix(),

See my code below:

from scipy import sparse
import numpy as np

row = []
column = []
data= []

with open(filename, 'r') as f:
    for line in f:
        if not line[0].isdigit():
            continue
        line = line.strip()
        elem = line.split(',')

        userid = int(elem[0].strip())
        businessid = int(elem[1].strip())
        rating = float(elem[2].strip())

        row.append(userid)
        column.append(businessid)
        data.append(rating)

#data = np.array(data)

"""checking if any rating in the file is greater than 5,
and there is not"""
for rating in data:
    if rating > 5:
        print rating

total = sparse.coo_matrix((data, (row, column)),dtype=float).tocsr()

""" Here I'm checking to see if 
there is any rating over than 5 in the sparse matrix
and there is!"""
row = total.nonzero()[0]
column = total.nonzero()[1]

for u in range(len(row)):
    indr = row[u]
    indc = column[u]
    if total[indr, indc] > 5:
        print '---'
        print total[indr, indc]
        print indr
        print indc

And here is the beginning of my file:

user,item,rating
480,0,5
16890,0,2
5768,0,4
319,1,1
4470,1,4
7555,1,5
8768,1,5

Do you have any idea of why I'm getting this error when building the matrix ?

Thanks a lot!

1 Answer 1

1

From the docs for to_csr:

Duplicate entries will be summed together

(I have no idea why it does this.)

Sign up to request clarification or add additional context in comments.

1 Comment

Sparse matrices were developed for linear algebra problems. This summation feature is very convenient when constructing finite element models. MATLAB sparse matrix does the same thing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.