1

I'm a noob in python!

  1. I'd like to get sequences and anomaly together like this: sequence and anomaly

  2. and sort only normal sequence.(if a value of anomaly column is 0, it's a normal sequence)

  3. turn normal sequences to numpy array (without anomaly column)

each row(Sequence) is one session. so in this case their are 6 independent sequences. each element represent some specific activity.

'''

sequence = np.array([[5, 1, 1, 0, 0, 0],
       [5, 1, 1, 0, 0, 0],
       [5, 1, 1, 0, 0, 0],
       [5, 1, 1, 0, 0, 0],
       [5, 1, 1, 0, 0, 0],
       [5, 1, 1, 300, 200, 100]])

anomaly = np.array((0,0,0,0,0,1))

''' i got these two variables and have to sort only normal sequences.

Here is the code i tried: '''

# sequence to dataframe
empty_df = pd.DataFrame(columns = ['Sequence'])
empty_df.reset_index()

for i in range(sequence.shape[0]):
  empty_df = empty_df.append({"Sequence":sequence[i]},ignore_index = True) #

#concat anomaly

anomaly_df = pd.DataFrame(anomaly)
df = pd.concat([empty_df,anomaly_df],axis = 1)
df.columns = ['Sequence','anomaly']
df

'''

I didn't want to use pd.DataFrame because it gives me this:

pd.DataFrame(sequence)

enter image description here

anyways, after making df, I tried to sort normal sequences

#sorting normal seq

normal = df[df['anomaly'] == 0]['Sequence'] 
# back to numpy. only sequence column.
normal = normal.to_numpy()
normal.shape

''' and this numpy gives me different shape1 from the variable sequence. sequence.shape: (6,6) normal.shape =(5,)

I want to have (5,6). Tried reshape but didn't work.. Can someone help me with this? If there are any unspecific explanation from my question, plz leave a comment. I appreciate it.

1
  • what do you mean by sorting? It seems it is sorted from lowest to highest Commented Nov 20, 2020 at 8:54

2 Answers 2

2

I am not quite sure of what you need but here you could do:

import pandas as pd
df = pd.DataFrame({'sequence':sequence.tolist(), 'anomaly':anomaly})
df

                  sequence  anomaly
0        [5, 1, 1, 0, 0, 0]        0
1        [5, 1, 1, 0, 0, 0]        0
2        [5, 1, 1, 0, 0, 0]        0
3        [5, 1, 1, 0, 0, 0]        0
4        [5, 1, 1, 0, 0, 0]        0
5  [5, 1, 1, 300, 200, 100]        1
Sign up to request clarification or add additional context in comments.

Comments

1

Convert it into list then create an array. Try:

normal = df.loc[df['anomaly'].eq(0), 'Sequence']
normal = np.array(normal.tolist())
print(normal.shape)

# (5,6)

4 Comments

That worked perfectly! Thank you for your succinct and correct answer! But would you be able to explain why the way you tried work? Like.. how come I got (5,) and you got (5,6) by turning dataframe to list and then to array
or just simply you can tell me the steps you've taken..!
type(np.array(normal.to_numpy())[0]) --> list and type(np.array(normal.tolist())[0]) --> numpy.ndarray considering only the first row. One give me a list object and another one give a numpy array.
(5,6) you will get when you have a 2D array. not a 1D array having values as list which will give you (5,). I passed list of list to numpy array which gave me an 2D array.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.