CSV conditional changes in rows and column

Question

I have data like below:

idx A B C D
0 0.0 0.0 0.0 apple
1 0.5 0.5 0.6 car
2 0.7 0.7 0.2 vegetables
3 0.8 0.9 0.4 fruits
4 0.9 1.0 0.8 metal
idx E 
0 0.000006
idx A B C D
0 1.0 1.1 0.1 computer
1 0.8 1.6 1.0 books
2 0.9 1.9 1.1 textile
idx E
0 1.000009
idx A B C D
0 0.7 2.5 2 mouse
1 0.6 2.9 3 animals
2 0.5 3.0 2 birds
3 0.9 3.3 4 flower
4 1.0 3.4 5 garden
5 1.0 3.8 1 desk
6 0.85 3.9 8 tea
7 0.2 4.2 9 bread
8 0.1 4.9 3 paper
9 0.7 7.6 6 butter
idx E
0 0.9

I want to change where there is idx E remove the repeated header, repeat the last row above and make a dot instead of value of column D, and displace the E to column with its value (repeated to the whole corresponding). I want to change it conditionally as below with python like below :

idx A B C D E
0 0.0 0.0 0.0 apple 0.000006
1 0.5 0.5 0.6 car 0.000006
2 0.7 0.7 0.2 vegetables 0.000006
3 0.8 0.9 0.4 fruits 0.000006
4 0.9 1.0 0.8 metal 0.000006
5 0.9 1.0 0.0 . 0.000006
6 1.0 1.1 0.1 computer 1.000009
7 0.8 1.6 1.0 books 1.000009
8 0.9 1.9 1.1 textile 1.000009
9 0.9 1.9 . 1.000009
10 0.7 2.5 2 mouse 0.9
11 0.6 2.9 3 animals 0.9
12 0.5 3.0 2 birds 0.9
13 0.9 3.3 4 flower 0.9
14 1.0 3.4 5 garden 0.9
15 1.0 3.8 1 desk 0.9
16 0.85 3.9 8 tea 0.9
17 0.2 4.2 9 bread 0.9
18 0.1 4.9 3 paper 0.9
19 0.7 7.6 6 butter 0.9
20 0.7 7.6 0.0 . 0.9

Is there any possibility to make a conditional looping? with such dataframe?

Is Pandas required here or do you just want to produce a clean CSV file? — Serge Ballesta
– Serge Ballesta, Commented Jan 26, 2021 at 14:25

jezrael · Accepted Answer · 2021-01-26 15:04:31Z

1

First remove rows with A and E in column A by Series.isin in inverted mask by ~ in boolean indexing, create default index:

df = df[~df['A'].isin(['A','E'])].reset_index(drop=True)

Then set columns by mask for test Nr - set NaNs to D by Series.where and back filling misisng values, then set missing values by DataFrame.mask in A, B and forward filling misisng values and last set . in C column:

m = df['A'].shift().eq('E')
m1 = df['A'].eq('E')

df['E'] = df['A'].where(m).bfill()

df[['A','B', 'C']] = df[['A','B', 'C']].mask(m | m1).ffill()
df.loc[m, 'D'] = '.'
df.loc[m, 'C'] = 0

df = df[~m1].reset_index(drop=True)
print (df)
       A    B    C           D         E
0    0.0  0.0  0.0       apple  0.000006
1    0.5  0.5  0.6         car  0.000006
2    0.7  0.7  0.2  vegetables  0.000006
3    0.8  0.9  0.4      fruits  0.000006
4    0.9  1.0  0.8       metal  0.000006
5    0.9  1.0    0           .  0.000006
6    1.0  1.1  0.1    computer  1.000009
7    0.8  1.6  1.0       books  1.000009
8    0.9  1.9  1.1     textile  1.000009
9    0.9  1.9    0           .  1.000009
10   0.7  2.5    2       mouse       0.9
11   0.6  2.9    3     animals       0.9
12   0.5  3.0    2       birds       0.9
13   0.9  3.3    4      flower       0.9
14   1.0  3.4    5      garden       0.9
15   1.0  3.8    1        desk       0.9
16  0.85  3.9    8         tea       0.9
17   0.2  4.2    9       bread       0.9
18   0.1  4.9    3       paper       0.9
19   0.7  7.6    6      butter       0.9
20   0.7  7.6    0           .       0.9

edited Jan 26, 2021 at 15:04

answered Jan 26, 2021 at 13:38

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

ML85 Over a year ago

Sorry, there was missing column C data as where the dot needs to be put, before that the cell has 0 value as conditional. If you can change bit as helpful. I have placed the data in column C and its change in resulting as well. Please if you can help with this condition

ML85 Over a year ago

Little false, dot should come at the column D and instead of dot where it comes now should come 0

ML85 Over a year ago

In the input data, the value of Nr is not adjacent and in the below cell. I have not copy pasted and typed here so mistaken. Please if you can readjust the loop. A request.

jezrael Over a year ago

@ML85 - for 0.9 not exist index value idx ?

ML85 Over a year ago

There were two undesired rows with nan cells generated. but with dropna, it worked fairly well. Thank you so much. Great help.

|

Serge Ballesta · Accepted Answer · 2021-01-26 15:16:06Z

I would not use pandas here, but I would revert to the good old csv module which IMHO is more versatile to process a file which is not in a true CSV format:

delimiter=' '         # put here the actual delimiter
with open(input_csv) as infile, open(output_csv, newline='') as outfile
    rd = csv.reader(infile, delimiter=delimiter)
    wr = csv.writer(outfile, delimiter=delimiter)
    wr.writerow(['idx', 'A', 'B', 'C', 'D', 'E'])    # write a header
    nb = 0                                           # record number
    
    for row in rd:
        if flag:
            e = row[-1]           # use last value
            for r in pool:        # copy it for the whole block
                r.append(e)
            wr.writerows(pool)    # write the block
            flag = False
        elif row[0] == 'idx':
            if row[1] == 'E':
                pool.append(pool[-1][:])
                pool[-1][-1] = '.'
                pool[-1][0] = nb
                nb += 1
                flag = True
            else:
                pool = []
        else:
            row[0] = nb
            pool.append(row)
            nb += 1

With your input, it gives:

idx A B C D E
0 0.0 0.0 0.0 apple 0.000006
1 0.5 0.5 0.6 car 0.000006
2 0.7 0.7 0.2 vegetables 0.000006
3 0.8 0.9 0.4 fruits 0.000006
4 0.9 1.0 0.8 metal 0.000006
5 0.9 1.0 0.8 . 0.000006
6 1.0 1.1 0.1 computer 1.000009
7 0.8 1.6 1.0 books 1.000009
8 0.9 1.9 1.1 textile 1.000009
9 0.9 1.9 1.1 . 1.000009
10 0.7 2.5 2 mouse 0.9
11 0.6 2.9 3 animals 0.9
12 0.5 3.0 2 birds 0.9
13 0.9 3.3 4 flower 0.9
14 1.0 3.4 5 garden 0.9
15 1.0 3.8 1 desk 0.9
16 0.85 3.9 8 tea 0.9
17 0.2 4.2 9 bread 0.9
18 0.1 4.9 3 paper 0.9
19 0.7 7.6 6 butter 0.9
20 0.7 7.6 6 . 0.9

Collectives™ on Stack Overflow

CSV conditional changes in rows and column

2 Answers 2

8 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related