python script for manipulating excel sheet

Question

I am trying to write a python script to manipulate the excel spreadsheet.

Suppose if,Ihave the sample data:

Gene        chrom    strand  TSS        TES         Name

NM_145215   chr5     +       135485168  135488045   Abhd11

NM_1190437  chr5     +       135485021  135488045   Abhd11

NM_1205181  chr14    +       54873803   54888844    Abhd4

NM_134076   chr14    +       54878906   54888844    Abhd4

NM_9594     chr2     +       31615464   31659747    Abl1

NM_1112703  chr2     +       31544075   31659747    Abl1

NM_207624   chr11    +       105829258  105851278   Abl1

NM_9598     chr11    +       105836521  105851278   Ace2

NM_1130513  chrX     +       160577273  160626350   Ace2

NM_27286    chrX     +       160578411  160626350   Ace2

For those similar names(column 6), I want to retrieve the whole row with least TSS. Example, for first 2 rows-Abhd11 name, I want to save the 2nd row in my result since the TSS 135485021 < 135485168. So on for all the sets with same NAMES.

Any ideas and comments are appreciated.

Is this really an excel file or a delineated text file? If excel is it XLS or XLSX? — Mark
– Mark, Commented Aug 3, 2012 at 17:47
Have you tried anything so far? Regardless of the input format, I would read the file, create a dictionary with the Name as the key, and just keep the row with the minimum values in it. You also have not specified what you want the output to look like . . . — ernie
– ernie, Commented Aug 3, 2012 at 17:52
There are three ways to do this: 1. Require the Excel file to be saved as .csv instead of .xls(x), then you can use the built-in csv module in Python. 2. Require Excel to be present on the machine, then use PyCOM (Windows)/appscript (Mac) to make it do the work for you. 3. Require nothing, and write your own Python code to parse .xls(x) files (possibly not parsing the entire format, or using and wrapping code from LibreOffice or other projects), which is going to be a ton of work. So, will either 1 or 2 work for you? — abarnert
– abarnert, Commented Aug 3, 2012 at 18:07

jmetz · Accepted Answer · 2012-08-03 18:15:07Z

4

Input

If possible I would save the excel file as a csv file and then load into python using the csv module.

Alternatively you could use the xlrd module for reading excel files - though I haven't used this and don't know much about it.

openpyxl is an additional option for parsing the excel file (cheers Just another dunce).

Manipulation

ernie's idea seems workable, and I would implement it as follows. Assuming that linesreadfromfile is a list of lists as read using csv.reader i.e. each list element is a list of values that corresponds to the delimited entries of that row in the file,

finaldict = {}
for row in linesreadfromfile:
    if finaldict.has_key(row[5]):
        if finaldict[row[5]][3] > row[3]:
            finaldict[row[5]] = row
    else:
        finaldict[row[5]] = row

edited Aug 3, 2012 at 18:15

answered Aug 3, 2012 at 18:07

jmetz

12.9k3 gold badges32 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

abarnert Over a year ago

This is almost certainly the best answer, unless for some reason the user can't require the files to be in .csv format (or needs information that can't be preserved in that format).

Valdogg21 Over a year ago

+1 for mentioning the xlrd module. I've used it to read through Excel files and it's pretty easy to work with. Highly recommended.

Valdogg21 · Accepted Answer · 2012-08-03 18:56:36Z

2

I agree with mutzmatron and would recommend the xlrd module. Here's a simple example:

import xlrd

# Create your file handle
file_handle = xlrd.open_workbook(file_name)

# Use the first page in the spreadsheet (0-based indexes)
sheet = file_handle.sheet_by_index(0)

# Create dictionary for storing values
abc = {}

# Loop through every row
for i in range(sheet.nrows):
  line = sheet.row_values(i)

  # Get your 'Name' and 'TSS' columns
  name = line[5]
  tss = line[3]

  # Add this 'Name' to your dictionary if it's new, or keep the max value
  if name not in abc.keys():
    abc[name] = tss
  else:
    abc[name] = max(abc[name],tss)

Obviously changing what you'd need to save (full row, certain values, etc.) based on your spec.

--- EDIT ---

  # If this 'Name' is new, save this line
  if name not in abc.keys():
    abc[name] = {'tss': tss, 'line': line}

  # Else, if this 'Name' is not new and the TSS is less, keep this new line
  elif tss < abc[name]['tss']:
    abc[name]['line'] = line

edited Aug 3, 2012 at 18:56

answered Aug 3, 2012 at 18:18

Valdogg21

1,2014 gold badges16 silver badges25 bronze badges

2 Comments

jmetz Over a year ago

I was going to go with something like this too (i.e. avoid the if abc[name] < ... that I use, but as the OP wants to keep the whole line I thought it better to not just use the min function (also it's min, not max that the OP asked for).

Valdogg21 Over a year ago

Edited to reflect correct specs from OP (sorry, was in a rush before)

thyme · Accepted Answer · 2012-08-03 18:11:39Z

0

You can use IronSpread which gives you a python console and a way to script actions like this in python. It also supports UDFs that you can use as normal excel functions, which is nice.

answered Aug 3, 2012 at 18:11

thyme

4807 silver badges17 bronze badges

1 Comment

jmetz Over a year ago

Assuming the OP is on windows and has excel... or IronSpread is supported by wine (and the OP has excel) - they could have been given excel files to work with without having excel.

joXn · Accepted Answer · 2012-08-03 18:17:18Z

0

You can use Pyvot, available from the Python Tools for Visual Studio team. It provides a comprehensive API for working with Excel spreadsheets from CPython.

You can get the code from PyPi: http://pypi.python.org/pypi/Pyvot And you can get documentation from the Pytools site: http://pytools.codeplex.com/wikipage?title=Pyvot

answered Aug 3, 2012 at 18:17

joXn

1464 bronze badges

Collectives™ on Stack Overflow

python script for manipulating excel sheet

4 Answers 4

2 Comments

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related