2

I'm trying to read one csv file and write specific rows of that file into another file.

The code runs fine, but the output is not formatted properly:

import pandas as pd
import sys

f = open("output.csv", 'w')
sys.stdout = f

df = pd.read_csv('original_file.csv', low_memory=False)

print df[(df.name == 'fullName')]
print df[(df.name == 'LastName')]

f.close()

In the original file there are multiple columns, all filled with strings. I want to print every row where the name column equals fullName and LastName. However output.csv has all of the data crammed into a single column.

I'm doing all of this on Ubuntu using Vim. I don't know if that would make a difference.

How do I get the output data to write to its corresponding column in output.csv?

3
  • 3
    Any reason not to use to_csv method ? pandas.pydata.org/pandas-docs/stable/generated/… Commented Jul 19, 2017 at 19:16
  • print df[(df.name == 'fullName')|(df.name == 'LastName')] Commented Jul 19, 2017 at 19:16
  • @AdrienMatissart I had tried using that before, but I was not able to search for the values within the cells e.g. fullName and such. I'm sure there is a way, but I'm not familiar enough with pandas to find it. Commented Jul 19, 2017 at 19:21

2 Answers 2

2

This should work:

df = pd.read_csv('original_file.csv', low_memory=False) # read dataframe
new_df = df.loc[(df.name == 'fullName')|(df.name == 'LastName')] # select rows with name == fullname or lastname
new_df.to_csv("output.csv", index=False) # write to csv
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. This is a perfect solution to my question. I've been struggling with this for almost a week now.
You're welcome. Don't wait for an entire week next time to seek help :)
Haha. I've been asking all week. You're the first person that seemed to understand the question. My questions would get downvoted and marked as duplicates.
0

The last line of my solution is wrong. Because of operator precedence rules, the boolean array is being compared to a dataframe column, which is not what one might be looking for.

What you are doing essentially is you write two columns sequentially. Try the following:

import pandas as pd

# read file
df = pd.read_csv('original_file.csv', low_memory=False)

# write select columns of the dataframe to output.csv
df[df['name'] == 'fullName' | df['name'] == 'LastName' ].to_csv('output.csv')

6 Comments

df[df['name'] == 'fullName' | df['name'] == 'LastName' ] will not work as expected - you need to add parentheses. PS it wasn't me who has downvoted your answer...
@MaxU Thank you for your comment!
You may want to check this answer, which explains why it will not work as expected...
@MaxU I tested it myself and included a comment in my answer. Thank you so much for your input!
@MaxU If I amend it, it will be a duplicate of an existing correct answer. Leaving it as is with the comment included would make it be of greater educational value to anyone who sees it. :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.