0

I have the string :

'0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,73-100,100-51,51,51,51-100,100-52,52,52,52,52,52,52,52,52-100,100-71,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'

I basically want to feed a dataframe with a columns of strings like above to a 1D CNN for binary classification so I need to convert them to numpy arrays before training a model.

how can I convert these strings to a numpy array and save its features considering the character "-" between some numbers?

9
  • 4
    How should the - be interpreted? For instance, what does 100-71 mean? Commented Jun 15, 2020 at 17:23
  • 2
    @Balaji Ambresh each comma seperated number is a 15 min interval in a person's day starting from 00:00 to 23:59, the first number shows the time interval(00:00 to 00:15) and so on. '0' means we don't know what this person is doing in that time interval. but if we have a number ,for example, 71 this means the person is somewhere coded as a number(71 is church). if we have two numbers like 100-71, this means the person is at two different places in the given time period. Commented Jun 15, 2020 at 17:30
  • 2
    @skrrrt unfortunately no. Commented Jun 15, 2020 at 17:32
  • 2
    @Balaji Ambresh there is no such input. all inputs have 96 comma seperated values. the output of my model will be gender classification 0 or 1. 0 shows female 1 shows male. I intend to train a model based on these strings that each have a gender. Commented Jun 15, 2020 at 17:44
  • 2
    @Balaji Ambresh I suppose a unique code. Commented Jun 15, 2020 at 17:46

2 Answers 2

1
import numpy as np

inp = "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,73-100,100-51,51,51,51-100,100-52,52,52,52,52,52,52,52,52-100,100-71,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"
arr = np.array(inp.split(","))

If you want them to be as numbers, then use dtype=np.uint8 but you have to pre-process the numbers with - the way you want to (using replace(), et al.)

Sign up to request clarification or add additional context in comments.

Comments

1

Is this acceptable?

I'm using negative codes to ensure that they don't collide with any of your location codes. You get the idea:

locations = '0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,73-100,100-51,51,51,51-100,100-52,52,52,52,52,52,52,52,52-100,100-71,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'

import numpy as np
code = 0
mappings = {}
mapped_locations = []
for location in locations.split(','):
    if '-' in location:
        parts = [int(part) for part in location.split('-')]
        small, large = min(parts), max(parts)
        key = f'{small}-{large}'
        if key not in mappings:
            code -= 1
            mappings[key] = code
        mapped_locations.append(mappings[key])
    else:
        mapped_locations.append(int(location))
print(np.array(mapped_locations))
print()
print(mappings)

Output:

[ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0 -1 -2 51 51 -2 -3 52 52 52 52 52 52 52 -3 -4  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]

{'73-100': -1, '51-100': -2, '52-100': -3, '71-100': -4}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.