Python: write CSV column with loop

Question

Using Python, I'd like to create a loop to write text in a CSV file when a row contains text.

The original CSV format is:

user_id,    text
0,  
1,  
2,  
3,  sample text
4,  sample text

I'm seeking to add another column "text_number" that will insert the string "text_x", with x representing the number of texts in the column. I'd like to iterate this and increase the string's value by +1 for each new text. The final product would look like:

user_id,    Text,   text_number
0,      
1,      
2,      
3,  sample text,    text_0
4,  sample text,    text_1

With my working code I can insert the header "text_number", but I'm having difficulty in putting together the loop for text_x.

import csv

output = list()
with open("test.csv") as file:
    csv_reader = csv.reader(file)
    for i, row in enumerate(csv_reader):
        if i == 0:
            output = [row+["text_number"]]
            continue
        # here's where I'm stuck
            
with open("output2.csv", "w", newline="") as file:
    csv_writer = csv.writer(file, delimiter=",")
    for row in output:
        csv_writer.writerow(row)

Any thoughts?

"text_x", with x representing the number of texts in the column what do you mean by number of texts? — Epsi95
– Epsi95, Commented Jul 23, 2021 at 17:23
Sorry for the lack of clarity. Basically, I want the first text for user id 3 to have a value of text_0, user 4 as text_1, and onwards for other texts in the file. — Daniel Hutchinson
– Daniel Hutchinson, Commented Jul 23, 2021 at 17:27

Epsi95 · Accepted Answer · 2021-07-23 17:53:58Z

1

find description in comments

# asuming the file
# user_id,text
# 0,  
# 1,  
# 2,  
# 3,sample text
# 4,sample text
# 5, 
# 6,sample text

# import the library
import pandas as pd
df = pd.read_csv('test.csv').fillna('')

# creating column text_number initializing with ''
df['text_number'] = ''

# getting the index where text is valid
index = df.loc[df['text'].str.strip().astype(bool)].index

# finally creating the column text_number with increment as 0, 1, 2 ...
df.loc[index, 'text_number'] = [f'text_{i}' for i in range(len(index))]

print(df)

# save it to disk
df.to_csv('output2.csv')


#    user_id         text text_number
# 0        0                         
# 1        1                         
# 2        2                         
# 3        3  sample text      text_0
# 4        4  sample text      text_1
# 5        5                         
# 6        6  sample text      text_2

edited Jul 23, 2021 at 17:53

answered Jul 23, 2021 at 17:43

Epsi95

9,1071 gold badge19 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Daniel Hutchinson Over a year ago

Greatly appreciate the assistance here. This approach worked! One wrinkle, however. Apparently the blank spaces are read as NaN, and text numbers are created for those cells as well. Any ideas on how to ignore those?

Epsi95 Over a year ago

so you have blank space also? Now try , see the change df = pd.read_csv('test.csv').fillna('') in above code

Daniel Hutchinson Over a year ago

Yes, it appears so. No values shows up in Excel, but in running the script it reads as NaN.

Daniel Hutchinson Over a year ago

That solved it! Many thanks for the assistance, big help!

Timus · Accepted Answer · 2021-07-23 17:41:21Z

1

You could try the following modification of your first part:

output = list()
with open("test.csv") as file:
    csv_reader = csv.reader(file)
    output.append(next(csv_reader) + ['text_number'])
    text_no = 0
    for row in csv_reader:
        if row[1].strip():
            row.append(f'text_{text_no}')
            text_no += 1
        output.append(row)

edited Jul 23, 2021 at 17:41

answered Jul 23, 2021 at 17:31

Timus

11.4k5 gold badges20 silver badges33 bronze badges

Comments

imxitiz · Accepted Answer · 2021-07-23 18:15:08Z

1

You can try this:

import csv

output = list()
x=0
with open("test.csv") as file:
    csv_reader = csv.reader(file)
    for i, row in enumerate(csv_reader):
        row[1]=row[1].strip()
        if i == 0:
            row.append("text_number")
        else:
            if row[1]=="":
                row.append(" ")
            else:
                row.append(f"text_{x}")
                x+=1
        output.append(row)            
with open("output2.csv", "w", newline="") as file:
    csv_writer = csv.writer(file, delimiter=",")
    for row in output:
        csv_writer.writerow(row)

I haven't changed anything in your code, which should be changed. I am just adding new element in row in every iteration. And append that every row in output, for making new list of row.

If you are comfortable with pandas then you can try this too:


import pandas as pd

df=pd.read_csv("test.csv")

r=[]
x=0
for i in range(df.shape[0]):
    if df["    text"][i].strip()=="":
        r.append(f" ")
    else:
        r.append(f"text_{x}")
        x+=1

df["text_number"]=r

print(df)
"""
   user_id           text   text_number
0        0                     
1        1                     
2        2                     
3        3    sample text      text_0
4        4    sample text      text_1
"""
pd.to_csv("output2.csv")

Here we are making list for text_number column.

edited Jul 23, 2021 at 18:15

answered Jul 23, 2021 at 17:28

imxitiz

4,0253 gold badges13 silver badges36 bronze badges

2 Comments

Daniel Hutchinson Over a year ago

Greatly appreciate the assistance here. Both approaches worked! One wrinkle, however. Apparently the blank spaces are read as NaN, and text numbers are created for those cells as well. Any ideas on how to ignore those?

imxitiz Over a year ago

I have edited the answer. I forget to do that my bad

Collectives™ on Stack Overflow

Python: write CSV column with loop

3 Answers 3

4 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related