0

Using Python, I'd like to create a loop to write text in a CSV file when a row contains text.

The original CSV format is:

user_id,    text
0,  
1,  
2,  
3,  sample text
4,  sample text

I'm seeking to add another column "text_number" that will insert the string "text_x", with x representing the number of texts in the column. I'd like to iterate this and increase the string's value by +1 for each new text. The final product would look like:

user_id,    Text,   text_number
0,      
1,      
2,      
3,  sample text,    text_0
4,  sample text,    text_1

With my working code I can insert the header "text_number", but I'm having difficulty in putting together the loop for text_x.

import csv

output = list()
with open("test.csv") as file:
    csv_reader = csv.reader(file)
    for i, row in enumerate(csv_reader):
        if i == 0:
            output = [row+["text_number"]]
            continue
        # here's where I'm stuck
            
with open("output2.csv", "w", newline="") as file:
    csv_writer = csv.writer(file, delimiter=",")
    for row in output:
        csv_writer.writerow(row)

Any thoughts?

5
  • 1
    can you use pandas? Commented Jul 23, 2021 at 17:15
  • Yep, comfortable with pandas. Commented Jul 23, 2021 at 17:22
  • 1
    "text_x", with x representing the number of texts in the column what do you mean by number of texts? Commented Jul 23, 2021 at 17:23
  • Sorry for the lack of clarity. Basically, I want the first text for user id 3 to have a value of text_0, user 4 as text_1, and onwards for other texts in the file. Commented Jul 23, 2021 at 17:27
  • 1
    @DanielHutchinson I have added the answer, check that! Commented Jul 23, 2021 at 17:34

3 Answers 3

1

find description in comments

# asuming the file
# user_id,text
# 0,  
# 1,  
# 2,  
# 3,sample text
# 4,sample text
# 5, 
# 6,sample text

# import the library
import pandas as pd
df = pd.read_csv('test.csv').fillna('')

# creating column text_number initializing with ''
df['text_number'] = ''

# getting the index where text is valid
index = df.loc[df['text'].str.strip().astype(bool)].index

# finally creating the column text_number with increment as 0, 1, 2 ...
df.loc[index, 'text_number'] = [f'text_{i}' for i in range(len(index))]

print(df)

# save it to disk
df.to_csv('output2.csv')


#    user_id         text text_number
# 0        0                         
# 1        1                         
# 2        2                         
# 3        3  sample text      text_0
# 4        4  sample text      text_1
# 5        5                         
# 6        6  sample text      text_2
Sign up to request clarification or add additional context in comments.

4 Comments

Greatly appreciate the assistance here. This approach worked! One wrinkle, however. Apparently the blank spaces are read as NaN, and text numbers are created for those cells as well. Any ideas on how to ignore those?
so you have blank space also? Now try , see the change df = pd.read_csv('test.csv').fillna('') in above code
Yes, it appears so. No values shows up in Excel, but in running the script it reads as NaN.
That solved it! Many thanks for the assistance, big help!
1

You could try the following modification of your first part:

output = list()
with open("test.csv") as file:
    csv_reader = csv.reader(file)
    output.append(next(csv_reader) + ['text_number'])
    text_no = 0
    for row in csv_reader:
        if row[1].strip():
            row.append(f'text_{text_no}')
            text_no += 1
        output.append(row)

Comments

1

You can try this:

import csv

output = list()
x=0
with open("test.csv") as file:
    csv_reader = csv.reader(file)
    for i, row in enumerate(csv_reader):
        row[1]=row[1].strip()
        if i == 0:
            row.append("text_number")
        else:
            if row[1]=="":
                row.append(" ")
            else:
                row.append(f"text_{x}")
                x+=1
        output.append(row)            
with open("output2.csv", "w", newline="") as file:
    csv_writer = csv.writer(file, delimiter=",")
    for row in output:
        csv_writer.writerow(row)

I haven't changed anything in your code, which should be changed. I am just adding new element in row in every iteration. And append that every row in output, for making new list of row.

If you are comfortable with pandas then you can try this too:


import pandas as pd

df=pd.read_csv("test.csv")

r=[]
x=0
for i in range(df.shape[0]):
    if df["    text"][i].strip()=="":
        r.append(f" ")
    else:
        r.append(f"text_{x}")
        x+=1

df["text_number"]=r

print(df)
"""
   user_id           text   text_number
0        0                     
1        1                     
2        2                     
3        3    sample text      text_0
4        4    sample text      text_1
"""
pd.to_csv("output2.csv")

Here we are making list for text_number column.

2 Comments

Greatly appreciate the assistance here. Both approaches worked! One wrinkle, however. Apparently the blank spaces are read as NaN, and text numbers are created for those cells as well. Any ideas on how to ignore those?
I have edited the answer. I forget to do that my bad

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.