1

I'm pretty new to Python but I was having some difficulty on getting started on this. I am using Python 3.

I've googled and found quite a few python modules that help with this but was hoping for a more defined answer here. So basically, I need to read from a csv file certain columns i.e G, H, I, K, and M. The ones I need aren't consecutive.

I need to read those columns from the csv file and transfer them to empty columns in an existing xls with data already in it.

I looked in to openpyxl but it doesn't seem to work with csv/xls files, only xlsx. Can I use xlwt module to do this?

Any guidance on which module may work best for my usecase would be greatly appreciated. Meanwhile, i'm going to tinker around with xlwt/xlrd.

2
  • xlwt/xlrd I used. They seem to work fine for me. Python has a csvreader Commented Jul 26, 2016 at 19:51
  • I forgot to mention. Each of these columns in the csv file has about 9k entries... Commented Jul 26, 2016 at 19:58

2 Answers 2

2

I recommend using pandas. It has convenient functions to read and write csv and xls files.

import pandas as pd
from openpyxl import load_workbook

#read the csv file
df_1 = pd.read_csv('c:/test/test.csv')

#lets say df_1 has columns colA and colB
print(df_1)

#read the xls(x) file
df_2=pd.read_excel('c:/test/test.xlsx')
#lets say df_2 has columns aa and bb

#now add a column from df_1 to df_2
df_2['colA']=df_1['colA']

#save the combined output
writer = pd.ExcelWriter('c:/test/combined.xlsx')
df_2.to_excel(writer)
writer.save()

#alternatively, if you want to add just one column to an existing xlsx file:

#i.e. get colA from df_1 into a new dataframe
df_3=pd.DataFrame(df_1['colA'])


#create writer using openpyxl engine
writer = pd.ExcelWriter('c:/test/combined.xlsx', engine='openpyxl') 

#need this workaround to provide a list of work sheets in the file
book = load_workbook('c:/test/combined.xlsx')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

column_to_write=16 #this would go to column Q (zero based index)
writeRowIndex=0 #don't plot row index
sheetName='Sheet1' #which sheet to write on

#now write the single column df_3 to the file
df_3.to_excel(writer, sheet_name=sheetName, columns =['colA'],startcol=column_to_write,index=writeRowIndex)

writer.save()
Sign up to request clarification or add additional context in comments.

15 Comments

Wow! I'll definitely give this a try! So there are ways in the panda module to pick specific columns in the csv file and store in specific columns in the xls file? Edit: also these files have a large amount of data in them, like 10k entries for each column.
I would recommend reading both the csv and the xls files, combine the columns that you want and save everything to a new file. Is that a possibility?
Yeah I that would work as well. Will have to look in to how to combine the columns. Will the amount of entries cause an issue?
I understand everything but df_2.to_excel(writer, 'Scores 1'). What is this meant to do?
that line saves the content of df_2 (which is now extended with the new column) to the writer object that links to your output xlsx file. I deleted 'Scores 1', that was a leftover from another example. that string specifies which sheet to save in. pls accept answer if it was helpful
|
1

You could try XlsxWriter , which is fully featured python module for writing Excel 2007+ XLSX file format. https://pypi.python.org/pypi/XlsxWriter

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.