0

I have this table in mysql, where I have the occurrences (CNT column) of each ITEM for each distinct ID:

ID     ITEM      CNT
---------------------
01     093        4
01     129F       2
01     AB56       0
01     BB44       0
01     XH7        0
01     TYE2       1
02     093        0
02     129F       3
02     AB56       1
02     BB44       0
02     XH7        2
02     TYE2       2
03     093        9
03     129F       2
03     AB56       0
03     BB44       1
03     XH7        4
03     TYE2       0
......

I would like to find an efficient way of importing this data from MySQL to Python so I can use them as item count vectors for a clustering procedure, in the form of a list of lists:

[[4,2,0,0,0,1],[0,3,1,0,2,2],[9,2,0,1,4,0]]

where each list represents an ID... I'm dealing with a lot of data (millions of rows) so performance is an issue.. Any help will be appreciated

1 Answer 1

1

Use itertools.groupby:

...
cursor.execute('SELECT ID, CNT FROM table_name ORDER BY ID')
item_count_vector = [
    [cnt for id_, cnt in grp]
    for key, grp in itertools.groupby(cursor.fetchall(), key=lambda row: row[0])
]

OR (in case you use DictCursor-like cursor)

item_count_vector = [
    [d['CNT'] for d in grp]
    for key, grp in itertools.groupby(cursor.fetchall(), key=lambda row: row['ID'])
]

...

>>> import itertools
>>> # Assume following rows are retrieved from DB using cursor.fetchall()
>>> rows = (
...     ('01',4),
...     ('01',2),
...     ('01',0),
...     ('01',0),
...     ('01',0),
...     ('01',1),
...     ('02',0),
...     ('02',3),
...     ('02',1),
...     ('02',0),
...     ('02',2),
...     ('02',2),
...     ('03',9),
...     ('03',2),
...     ('03',0),
...     ('03',1),
...     ('03',4),
...     ('03',0),
... )
>>> [[cnt for id_, cnt in grp] for key, grp in itertools.groupby(rows, key=lambda row: row[0])]
[[4, 2, 0, 0, 0, 1], [0, 3, 1, 0, 2, 2], [9, 2, 0, 1, 4, 0]]
Sign up to request clarification or add additional context in comments.

1 Comment

i got a keyerror:0...is it because fetch all returns it in the form of a dictionary? {'CNT': 4L, 'ID': 01L}, {'CNT': 2L, 'ID': 01L}...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.