Using Python to replace triple double quotes with single double quote in CSV

Question

I used the pandas library to manipulate the original data down to a simple 2 column csv file and rename it to a text file. The file has triple double quotes that I need replaced with single double quotes. Every line of the file is formatted as:

"""QA12345""","""Some Other Text"""

What I need is:

"QA12345","Some Other Text"

This python snippet wipes out the file after it finishes.

with fileinput.FileInput(input_file, inplace=True) as f:
    next(f)
    for line in f:
        line.replace("""""", '"')

It doesn't work with

line.replace('"""', '"')

either.

I've also tried adjusting the input values to be '"Some Other Text"' and variations (""" and '\"') but nothing seems to work. I believe the triple quote is causing the issue, but I don't know what I have to do to.

you aren't doing anything with line.replace(...)... if this is all your code, then I don't know why you expect it to do anything. Why use fileinput anyway? — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Feb 6 at 22:19
If you are using it just for inplace then I suggest not, note, it assumes you write to sys.stdout (e.g. with print or even directly...) which is why your file is blank no matter what you do! I suggest just using open and manually creating the backup file. Frankly, the fileinput approach is very weird. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Feb 6 at 22:24
BUT if you want to keep it, you need something like for line in f: print(line.replace('"""', '"'), end='') and don't forget to print(next(f), end='') at the top.... keeping in mind, again, that printing to standard output has been implicitly replaced with your file, but this is a very weird way to go about this, IMO — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Feb 6 at 22:30
Why aren't you using pd.read_csv()? I think it will do what you want. — Barmar
– Barmar, Commented Feb 6 at 23:34
FYI: """""" is a triple double-quoted empty string. '"""' is a string with 3 double quotes. — Mark Tolonen
– Mark Tolonen, Commented Feb 6 at 23:47

Schwern · Accepted Answer · 2025-02-07 00:37:24Z

That file looks to be in RFC 4180 format. "" is how you write a single " inside a double-quoted field.

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

"aaa","b""bb","ccc"

"""QA12345""","""Some Other Text""" is the two values "QA12345" and "Some Other Text". If you manually change the """ to " you're changing the data in the file.

Read it with the csv library and it will handle the escaping for you. Then, if you actually want to strip the surrounding quotes and change "QA12345" to QA12345, do so on the parsed data and write it back out.

import csv
import re
import sys

writer = csv.writer(sys.stdout)
for row in csv.reader(['"""QA12345""","""Some Other Text"""']):
    # ['"QA12345"', '"Some Other Text"']
    print(row)

    # Strip " off the front and back of each item in the row.
    row[:] = [ re.sub(r'^"(.*)"$', r'\1', item) for item in row]

    # ['QA12345', 'Some Other Text']
    print(row)

    # QA12345,Some Other Text
    writer.writerow(row)

It doesn't work with line.replace('"""', '"')

replace does not replace in-place (say that three times fast). It returns a new string. So it would be line = line.replace('"""', '"')

Hai Vu · Accepted Answer · 2025-02-07 08:18:47Z

1

Judging by the OP's code, I guess the aim are

Edit file in-place
keep the first line unmodified
For the rest of the lines, replace 3 double quotes with a single one

My solution is almost the same, but with print():

with fileinput.input(input_file, inplace=True) as stream:
    print(next(stream), end="")
    for line in stream:
        print(line.replace('"""', '"'), end="")

That should give the desired result.

answered Feb 7 at 8:18

Hai Vu

41.4k16 gold badges75 silver badges106 bronze badges

1 Comment

ResourceReaper Feb 7 at 12:35

Mr. Vu, yes, the file needs to be edited in place as this was the last piece to clean up the internal formatting. The next(f) in the original code skipped the top line from being saved essentially carving that line out and getting rid of it. This snippet works perfectly sir. I'll modify to have the header row removed. Thank you kindly

MT0 · Accepted Answer · 2025-02-07 00:05:39Z

0

You can use:

import pandas as pd
import numpy as np

data = pd.read_csv("input_file.csv", header=None)
np.savetxt("output_file.txt", data, delimiter=",", fmt="%s")

Which, if the input file contains:

"""QA12345""","""Some Other Text"""

Then the output file will contain:

"QA12345","Some Other Text"

fiddle

answered Feb 7 at 0:05

MT0

173k12 gold badges70 silver badges136 bronze badges

Comments

jackal · Accepted Answer · 2025-02-07 07:42:07Z

0

The built-in csv module can handle this and is (probably) more lightweight than pandas.

Which means that you could just do this:

import csv

IN = "foo_in.csv"
OUT = "foo_out.csv"

with open(IN, mode="r", newline="") as in_data, open(OUT, mode="w", newline="") as out_data:
    writer = csv.writer(out_data, quotechar=None)
    for row in csv.reader(in_data):
        writer.writerow(row)

answered Feb 7 at 7:42

jackal

29.1k3 gold badges9 silver badges27 bronze badges

Collectives™ on Stack Overflow

Using Python to replace triple double quotes with single double quote in CSV

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related