im not trying to do this smart or fast, just trying to do it at all.
i have a file looks like this :
$ cat all_user_token_counts.csv
@5raphaels,in,15
@5raphaels,for,15
@5raphaels,unless,11
@5raphaels,you,11
i know its uncode utf-8 encoded because i created it, like this
debug('opening ' + ALL_USER_TOKEN_COUNTS_FILE)
file = codecs.open(ALL_USER_TOKEN_COUNTS_FILE, encoding="utf-8",mode= "w")
for (user, token) in tokenizer.get_tokens_from_all_files():
#... count tokens ..
file.write(unicode(username +","+ token +","+ str(count) +"\r\n"))
i want to read it in to a numpy array so it looks like this, or something..
array([[u'@5raphaels', u'in', 15],
[u'@5raphaels', u'for', 11],
[u'@5raphaels', u'unless', 11]],
dtype=('<U10', '<U10', int))
As i experiment in process of writing this question it comes to me that it may not even be possible? If so, I'd love to know!
Thanks in advance!