1

I used the pandas library to manipulate the original data down to a simple 2 column csv file and rename it to a text file. The file has triple double quotes that I need replaced with single double quotes. Every line of the file is formatted as:

"""QA12345""","""Some Other Text"""

What I need is:

"QA12345","Some Other Text"

This python snippet wipes out the file after it finishes.

with fileinput.FileInput(input_file, inplace=True) as f:
    next(f)
    for line in f:
        line.replace("""""", '"')

It doesn't work with

line.replace('"""', '"') 

either.

I've also tried adjusting the input values to be '"Some Other Text"' and variations (""" and '\"') but nothing seems to work. I believe the triple quote is causing the issue, but I don't know what I have to do to.

8
  • 1
    you aren't doing anything with line.replace(...)... if this is all your code, then I don't know why you expect it to do anything. Why use fileinput anyway? Commented Feb 6 at 22:19
  • If you are using it just for inplace then I suggest not, note, it assumes you write to sys.stdout (e.g. with print or even directly...) which is why your file is blank no matter what you do! I suggest just using open and manually creating the backup file. Frankly, the fileinput approach is very weird. Commented Feb 6 at 22:24
  • BUT if you want to keep it, you need something like for line in f: print(line.replace('"""', '"'), end='') and don't forget to print(next(f), end='') at the top.... keeping in mind, again, that printing to standard output has been implicitly replaced with your file, but this is a very weird way to go about this, IMO Commented Feb 6 at 22:30
  • 1
    Why aren't you using pd.read_csv()? I think it will do what you want. Commented Feb 6 at 23:34
  • 2
    FYI: """""" is a triple double-quoted empty string. '"""' is a string with 3 double quotes. Commented Feb 6 at 23:47

4 Answers 4

3

That file looks to be in RFC 4180 format. "" is how you write a single " inside a double-quoted field.

  1. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

    "aaa","b""bb","ccc"

"""QA12345""","""Some Other Text""" is the two values "QA12345" and "Some Other Text". If you manually change the """ to " you're changing the data in the file.

Read it with the csv library and it will handle the escaping for you. Then, if you actually want to strip the surrounding quotes and change "QA12345" to QA12345, do so on the parsed data and write it back out.

import csv
import re
import sys

writer = csv.writer(sys.stdout)
for row in csv.reader(['"""QA12345""","""Some Other Text"""']):
    # ['"QA12345"', '"Some Other Text"']
    print(row)

    # Strip " off the front and back of each item in the row.
    row[:] = [ re.sub(r'^"(.*)"$', r'\1', item) for item in row]

    # ['QA12345', 'Some Other Text']
    print(row)

    # QA12345,Some Other Text
    writer.writerow(row)

It doesn't work with line.replace('"""', '"')

replace does not replace in-place (say that three times fast). It returns a new string. So it would be line = line.replace('"""', '"')

Sign up to request clarification or add additional context in comments.

Comments

1

Judging by the OP's code, I guess the aim are

  1. Edit file in-place
  2. keep the first line unmodified
  3. For the rest of the lines, replace 3 double quotes with a single one

My solution is almost the same, but with print():

with fileinput.input(input_file, inplace=True) as stream:
    print(next(stream), end="")
    for line in stream:
        print(line.replace('"""', '"'), end="")

That should give the desired result.

1 Comment

Mr. Vu, yes, the file needs to be edited in place as this was the last piece to clean up the internal formatting. The next(f) in the original code skipped the top line from being saved essentially carving that line out and getting rid of it. This snippet works perfectly sir. I'll modify to have the header row removed. Thank you kindly
0

You can use:

import pandas as pd
import numpy as np

data = pd.read_csv("input_file.csv", header=None)
np.savetxt("output_file.txt", data, delimiter=",", fmt="%s")

Which, if the input file contains:

"""QA12345""","""Some Other Text"""

Then the output file will contain:

"QA12345","Some Other Text"

fiddle

Comments

0

The built-in csv module can handle this and is (probably) more lightweight than pandas.

Which means that you could just do this:

import csv

IN = "foo_in.csv"
OUT = "foo_out.csv"

with open(IN, mode="r", newline="") as in_data, open(OUT, mode="w", newline="") as out_data:
    writer = csv.writer(out_data, quotechar=None)
    for row in csv.reader(in_data):
        writer.writerow(row)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.