converting a string into a numpy array

Question

I have the string :

'0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,73-100,100-51,51,51,51-100,100-52,52,52,52,52,52,52,52,52-100,100-71,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'

I basically want to feed a dataframe with a columns of strings like above to a 1D CNN for binary classification so I need to convert them to numpy arrays before training a model.

how can I convert these strings to a numpy array and save its features considering the character "-" between some numbers?

How should the - be interpreted? For instance, what does 100-71 mean? — Balaji Ambresh
– Balaji Ambresh, Commented Jun 15, 2020 at 17:23
@Balaji Ambresh each comma seperated number is a 15 min interval in a person's day starting from 00:00 to 23:59, the first number shows the time interval(00:00 to 00:15) and so on. '0' means we don't know what this person is doing in that time interval. but if we have a number ,for example, 71 this means the person is somewhere coded as a number(71 is church). if we have two numbers like 100-71, this means the person is at two different places in the given time period. — ali bakhtiari
– ali bakhtiari, Commented Jun 15, 2020 at 17:30
@Balaji Ambresh there is no such input. all inputs have 96 comma seperated values. the output of my model will be gender classification 0 or 1. 0 shows female 1 shows male. I intend to train a model based on these strings that each have a gender. — ali bakhtiari
– ali bakhtiari, Commented Jun 15, 2020 at 17:44

qedk · Accepted Answer · 2020-06-15 17:29:14Z

1

import numpy as np

inp = "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,73-100,100-51,51,51,51-100,100-52,52,52,52,52,52,52,52,52-100,100-71,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"
arr = np.array(inp.split(","))

If you want them to be as numbers, then use dtype=np.uint8 but you have to pre-process the numbers with - the way you want to (using replace(), et al.)

answered Jun 15, 2020 at 17:29

qedk

5286 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Balaji Ambresh · Accepted Answer · 2020-06-15 18:01:13Z

Is this acceptable?

I'm using negative codes to ensure that they don't collide with any of your location codes. You get the idea:

locations = '0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,73-100,100-51,51,51,51-100,100-52,52,52,52,52,52,52,52,52-100,100-71,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'

import numpy as np
code = 0
mappings = {}
mapped_locations = []
for location in locations.split(','):
    if '-' in location:
        parts = [int(part) for part in location.split('-')]
        small, large = min(parts), max(parts)
        key = f'{small}-{large}'
        if key not in mappings:
            code -= 1
            mappings[key] = code
        mapped_locations.append(mappings[key])
    else:
        mapped_locations.append(int(location))
print(np.array(mapped_locations))
print()
print(mappings)

Output:

[ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0 -1 -2 51 51 -2 -3 52 52 52 52 52 52 52 -3 -4  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]

{'73-100': -1, '51-100': -2, '52-100': -3, '71-100': -4}

Collectives™ on Stack Overflow

converting a string into a numpy array

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related