I am trying to convert 'feature1' array from the following data structure into a numpy array so I can input it to sklearn. However, I am running in circles as it always tells me that dtype=object is unsuitable, and I am not able to convert it to the desired float64 format.
I want to extract all the 'feature1' as a list of numpy arrays of dtype=float64, instead of dtype=object from the following structure.
vec is an object returned from an earlier computation.
>>>vec
[{'is_Primary': 1, 'feature1': [2, 2, 2, 0, 0.03333333333333333, 0], 'object_id': ObjectId('557beda51d41c8e4d1aeac25'), 'vectorized': 1},
{'is_Primary': 0, 'feature1': [2, 2, 1, 0, 0.5, 0], 'object_id': ObjectId('557beda51d41c8e4d1aeac25'), 'vectorized': 1}]
I tried the following:
>>> t = np.array(list(vec))
>>> t
>>>>array([ {'is_Primary': 0, 'feature1': [], 'object_id': ObjectId('557bcd881d41c8d9c5f5822f'), 'vectorized': 1},
{'is_Primary': 0, 'feature1': [], 'object_id': ObjectId('557bcd881d41c8d9c5f58233'), 'vectorized': 1},
{'is_Primary': 0, 'feature1': [], 'object_id': ObjectId('557bcd881d41c8d9c5f58237'), 'vectorized': 1},
...,
{'is_Primary': 0, 'feature1': [], 'object_id': ObjectId('557beda61d41c8e4d1aead1f'), 'vectorized': 1},
{'is_Primary': 1, 'feature1': [2, 2, 0, 0], 'object_id': ObjectId('557beda61d41c8e4d1aead1d'), 'vectorized': 1},
{'is_Primary': 1, 'feature1': [], 'object_id': ObjectId('557beda61d41c8e4d1aead27'), 'vectorized': 1}], dtype=object)
Also,
>>> array = np.array([x['feature1'] for x in vec])
as suggested by another user, gives a similar output:
>>> array
>>> array([[], [], [], ..., [], [2, 2, 0, 0], []], dtype=object)
I know I can access the contents of 'feature1' using array[i], but what I want is to convert the dtype=object to dtype=float64, and made into a list/dict in which each row will have the 'feature1'of the corresponding entry from vec.
I also tried using a pandas dataframe, but to no avail.
>>>>pandaseries = pd.Series(df['feature1']).convert_objects(convert_numeric=True)
>>>>pandaseries
0 []
1 []
2 []
3 []
4 []
5 []
6 []
7 []
8 []
9 []
10 []
11 []
12 []
13 []
14 []
...
7021 []
7022 [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 12, 2, 24...
7023 []
7024 []
7025 []
7026 []
7027 []
7028 [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 12, 2, 24...
7029 []
7030 [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 12, 2, 24...
7031 []
7032 [2, 2, 0.1, 0]
7033 []
7034 [2, 2, 0, 0]
7035 []
Name: feature1, Length: 7036, dtype: object
>>>
Again, dtype: object is returned. My guess would be to loop over each row and print a list out. But I am unable to do that. Maybe it is a newbie question. What am I doing wrong?
Thanks.
veccontains two dictionaries, each has a 'feature1' item. Which one do you want?