0

I am trying to find a way to programmatically search for values in a specific column within a csv file and replace the values if the conditions are met.

Essentially, I will be dealing with a lot of large files with inconsistent data for the State value (Some use NY, others use New York). I need to try to replace most, if not all, with the ISO standard (e.g. NY) for all states.

How would I go about changing this:

data1,data2,New York,data4
data1,data2,NY,data4
data1,data2,Ohio,data4

To this:

data1,data2,NY,data4
data1,data2,NY,data4
data1,data2,OH,data4

All without creating a new file.

1
  • 1. Create a mapping (a dictionary) between the verbose state name and the abbreviated one. 2. Instead of worrying about not creating a new file just create a new one and delete the old one using os.remove Commented Feb 2, 2016 at 17:24

1 Answer 1

1

You could do something like the following to convert your column 3 entries:

import csv

short = {'New York':'NY', 'NY':'NY', 'Ohio':'OH'}
entries = []

with open('data.txt', 'rb') as f_input:
    for cols in csv.reader(f_input):
        cols[2] = short[cols[2]]
        entries.append(cols)

with open('data.txt', 'wb') as f_output:
    csv.writer(f_output).writerows(entries)

This would result in data.txt looking like:

data1,data2,NY,data4
data1,data2,NY,data4
data1,data2,OH,data4

This assumes that your file is small enough to fit into memory.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.