Efficient way to fill 2d array in Python

Question

I have 3 arrays: array "words" of pairs ["id": "word"] by the length 5000000, array "ids" of unique ids by the length 13000 and array "dict" of unique words (dictionary) by the length 500000. This is my code:

matrix = sparse.lil_matrix((len(ids), len(dict)))
for i in words:
    matrix[id.index(i['id']), dict.index(i['word'])] += 1.0

But it works too slow (I haven't got a matrix after 15 hours of work). Are there any ideas to optimize my code?

Ashwini Chaudhary · Accepted Answer · 2015-05-08 09:51:39Z

2

First of all don't name your array dict, it is confusing as well as hides the built-in type dict.

The problem here is that you're doing everything in quadratic time, so convert your arrays dict and id to a dictionary first where each word or id point to its index.

matrix = sparse.lil_matrix((len(ids), len(dict)))
dict_from_dict = {word: ind for ind, word in enumerate(dict)}
dict_from_id = {id: ind for ind, id in enumerate(id)}
for i in words:
    matrix[dict_from_id[i['id']], dict_from_dict[i['word']] += 1.0

answered May 8, 2015 at 9:51

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Efficient way to fill 2d array in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related