1

I have data like below:

idx A B C D
0 0.0 0.0 0.0 apple
1 0.5 0.5 0.6 car
2 0.7 0.7 0.2 vegetables
3 0.8 0.9 0.4 fruits
4 0.9 1.0 0.8 metal
idx E 
0 0.000006
idx A B C D
0 1.0 1.1 0.1 computer
1 0.8 1.6 1.0 books
2 0.9 1.9 1.1 textile
idx E
0 1.000009
idx A B C D
0 0.7 2.5 2 mouse
1 0.6 2.9 3 animals
2 0.5 3.0 2 birds
3 0.9 3.3 4 flower
4 1.0 3.4 5 garden
5 1.0 3.8 1 desk
6 0.85 3.9 8 tea
7 0.2 4.2 9 bread
8 0.1 4.9 3 paper
9 0.7 7.6 6 butter
idx E
0 0.9

I want to change where there is idx E remove the repeated header, repeat the last row above and make a dot instead of value of column D, and displace the E to column with its value (repeated to the whole corresponding). I want to change it conditionally as below with python like below :

idx A B C D E
0 0.0 0.0 0.0 apple 0.000006
1 0.5 0.5 0.6 car 0.000006
2 0.7 0.7 0.2 vegetables 0.000006
3 0.8 0.9 0.4 fruits 0.000006
4 0.9 1.0 0.8 metal 0.000006
5 0.9 1.0 0.0 . 0.000006
6 1.0 1.1 0.1 computer 1.000009
7 0.8 1.6 1.0 books 1.000009
8 0.9 1.9 1.1 textile 1.000009
9 0.9 1.9 . 1.000009
10 0.7 2.5 2 mouse 0.9
11 0.6 2.9 3 animals 0.9
12 0.5 3.0 2 birds 0.9
13 0.9 3.3 4 flower 0.9
14 1.0 3.4 5 garden 0.9
15 1.0 3.8 1 desk 0.9
16 0.85 3.9 8 tea 0.9
17 0.2 4.2 9 bread 0.9
18 0.1 4.9 3 paper 0.9
19 0.7 7.6 6 butter 0.9
20 0.7 7.6 0.0 . 0.9

Is there any possibility to make a conditional looping? with such dataframe?

2
  • Is Pandas required here or do you just want to produce a clean CSV file? Commented Jan 26, 2021 at 14:25
  • clean would also be best to export as json in the end. Commented Jan 26, 2021 at 14:26

2 Answers 2

1

First remove rows with A and E in column A by Series.isin in inverted mask by ~ in boolean indexing, create default index:

df = df[~df['A'].isin(['A','E'])].reset_index(drop=True)

Then set columns by mask for test Nr - set NaNs to D by Series.where and back filling misisng values, then set missing values by DataFrame.mask in A, B and forward filling misisng values and last set . in C column:

m = df['A'].shift().eq('E')
m1 = df['A'].eq('E')

df['E'] = df['A'].where(m).bfill()

df[['A','B', 'C']] = df[['A','B', 'C']].mask(m | m1).ffill()
df.loc[m, 'D'] = '.'
df.loc[m, 'C'] = 0

df = df[~m1].reset_index(drop=True)
print (df)
       A    B    C           D         E
0    0.0  0.0  0.0       apple  0.000006
1    0.5  0.5  0.6         car  0.000006
2    0.7  0.7  0.2  vegetables  0.000006
3    0.8  0.9  0.4      fruits  0.000006
4    0.9  1.0  0.8       metal  0.000006
5    0.9  1.0    0           .  0.000006
6    1.0  1.1  0.1    computer  1.000009
7    0.8  1.6  1.0       books  1.000009
8    0.9  1.9  1.1     textile  1.000009
9    0.9  1.9    0           .  1.000009
10   0.7  2.5    2       mouse       0.9
11   0.6  2.9    3     animals       0.9
12   0.5  3.0    2       birds       0.9
13   0.9  3.3    4      flower       0.9
14   1.0  3.4    5      garden       0.9
15   1.0  3.8    1        desk       0.9
16  0.85  3.9    8         tea       0.9
17   0.2  4.2    9       bread       0.9
18   0.1  4.9    3       paper       0.9
19   0.7  7.6    6      butter       0.9
20   0.7  7.6    0           .       0.9
Sign up to request clarification or add additional context in comments.

8 Comments

Sorry, there was missing column C data as where the dot needs to be put, before that the cell has 0 value as conditional. If you can change bit as helpful. I have placed the data in column C and its change in resulting as well. Please if you can help with this condition
Little false, dot should come at the column D and instead of dot where it comes now should come 0
In the input data, the value of Nr is not adjacent and in the below cell. I have not copy pasted and typed here so mistaken. Please if you can readjust the loop. A request.
@ML85 - for 0.9 not exist index value idx ?
There were two undesired rows with nan cells generated. but with dropna, it worked fairly well. Thank you so much. Great help.
|
0

I would not use pandas here, but I would revert to the good old csv module which IMHO is more versatile to process a file which is not in a true CSV format:

delimiter=' '         # put here the actual delimiter
with open(input_csv) as infile, open(output_csv, newline='') as outfile
    rd = csv.reader(infile, delimiter=delimiter)
    wr = csv.writer(outfile, delimiter=delimiter)
    wr.writerow(['idx', 'A', 'B', 'C', 'D', 'E'])    # write a header
    nb = 0                                           # record number
    
    for row in rd:
        if flag:
            e = row[-1]           # use last value
            for r in pool:        # copy it for the whole block
                r.append(e)
            wr.writerows(pool)    # write the block
            flag = False
        elif row[0] == 'idx':
            if row[1] == 'E':
                pool.append(pool[-1][:])
                pool[-1][-1] = '.'
                pool[-1][0] = nb
                nb += 1
                flag = True
            else:
                pool = []
        else:
            row[0] = nb
            pool.append(row)
            nb += 1

With your input, it gives:

idx A B C D E
0 0.0 0.0 0.0 apple 0.000006
1 0.5 0.5 0.6 car 0.000006
2 0.7 0.7 0.2 vegetables 0.000006
3 0.8 0.9 0.4 fruits 0.000006
4 0.9 1.0 0.8 metal 0.000006
5 0.9 1.0 0.8 . 0.000006
6 1.0 1.1 0.1 computer 1.000009
7 0.8 1.6 1.0 books 1.000009
8 0.9 1.9 1.1 textile 1.000009
9 0.9 1.9 1.1 . 1.000009
10 0.7 2.5 2 mouse 0.9
11 0.6 2.9 3 animals 0.9
12 0.5 3.0 2 birds 0.9
13 0.9 3.3 4 flower 0.9
14 1.0 3.4 5 garden 0.9
15 1.0 3.8 1 desk 0.9
16 0.85 3.9 8 tea 0.9
17 0.2 4.2 9 bread 0.9
18 0.1 4.9 3 paper 0.9
19 0.7 7.6 6 butter 0.9
20 0.7 7.6 6 . 0.9

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.