Creating Multiple Excel Files With Multiple Sheets

Question

I have a master csv file in the form

col1, col2, col3, col4...
a,    x,    y,    z
a,    x,    y,    z
b,    x,    y,    z
b,    x,    y,    z
..    ..    ..    ..

and I want to read this file in. Create a new Excel file with all values where col1==a and another file with all values where col1==b. So OutputFilea will look like:

col1, col2, col3, col4...
a,    x,    y,    z
a,    x,    y,    z

and OutputFileb will look like

col1, col2, col3, col4...
b,    x,    y,    z
b,    x,    y,    z

My question is, should I use csv.reader() line by line and use conditionals to determine which file should be appended or should I append a string with the rows and then write each file at the end. Or is there a module which optimizes a process like this?

What are your criteria for which approach is best? It sounds like all of them are reasonable approaches, making this a matter purely of opinion. — Mad Physicist
– Mad Physicist, Commented Jul 11, 2017 at 18:30
That, and the fact that you haven't actually attempted to implement any of the approaches enough to run into any concrete problems... — Mad Physicist
– Mad Physicist, Commented Jul 11, 2017 at 18:31
@MadPhysicist I will be implementing this on a large data set and do not know if these methods will be too slow or memory inefficient when that time comes. — alexjones
– alexjones, Commented Jul 11, 2017 at 18:33
The implementations are nearly trivial. You can try them all out before the time comes with very little effort. If you have enormous data sets, it should be apparent that holding everything in memory and writing out at the end is not a good option. — Mad Physicist
– Mad Physicist, Commented Jul 11, 2017 at 18:34

Mad Physicist · Accepted Answer · 2017-07-11 18:56:14Z

Since you are going to be working with large data sets, it is probably best not to hold too much in memory at the same time. You can maintain a dictionary of open files keyed by the line prefix, and make sure that the files are closed properly using an contextlib.ExitStack. Doing this will allow you to open new files lazily as you process the input spreadsheet:

from contextlib import ExitStack

output_files = {}
with open('master.csv', 'r') as master, ExitStack() as output_stack:
    for line in master:
        prefix = line.split(',', 1)[0]
        if prefix not in output_files:
            output_name = 'output' + prefix + '.csv'
            output = output_stack.enter_context(open(output_name, 'w'))
            output_files[prefix] = output
        else:
            output = output_files[prefix]
        print(line, file=output)

Given that you want to copy the lines as-is into the output files, I have chosen not to use the csv module. If you want to apply more complex processing, you should probably consider adding it in of course.

Fabio Lamanna · Accepted Answer · 2017-07-13 08:02:43Z

I would suggest to try pandas for this kind of stuff. There is a special function to write to excel. In this case imagine I read your .csv file into a pandas dataframe df:

In [4]: df = pd.read_csv('yourfile.csv')

In [5]: df
Out[5]: 
  col1   col2   col3   col4
0    a      x      y      z
1    a      x      y      z
2    b      x      y      z
3    b      x      y      z

Then I can select only the values I want to filter and save to excel:

In [6]: dfa = df[df['col1']=='a']

In [7]: dfa
Out[7]: 
  col1   col2   col3   col4
0    a      x      y      z
1    a      x      y      z

In [8]: dfa.to_excel('OutputFilea.xls')

The same happens with the second filter:

In [9]: dfb = df[df['col1']=='b']

In [10]: dfb.to_excel('OutputFileb.xls')

Hope that helps.

Collectives™ on Stack Overflow

Creating Multiple Excel Files With Multiple Sheets

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related