How can I write output from a for loop in python into a csv-formatted file?

Question

The following below is python script that identifies whether certain words are found or not found in a list of different files.

experiment=open('potentiation.txt')
lines=experiment.read().splitlines()
receptors=['crystal_1.txt', 'modeller_1.txt', 'moe_1.txt',
           'nci5_modeller0000_1.txt', 'nci5_modeller0001_1.txt',
           'nci5_modeller0002_1.txt', 'nci5_modeller0003_1.txt',
           'nci5_modeller0004_1.txt', 'nci5_modeller0005_1.txt',
           'nci5_modeller0006_1.txt', 'nci5_modeller0007_1.txt',
           'nci5_modeller0008_1.txt', 'nci5_modeller0009_1.txt',
           'nci5_modeller0010_1.txt', 'nci5_modeller0011_1.txt',
           'nci5_moe0000_1.txt', 'nci5_moe0001_1.txt', 'nci5_moe0002_1.txt',
           'nci5_moe0003_1.txt', 'nci5_moe0004_1.txt', 'nci5_moe0005_1.txt',
           'nci5_moe0006_1.txt', 'nci5_moe0007_1.txt', 'nci5_moe0008_1.txt',
           'nci5_moe0009_1.txt', 'nci5_moe0010_1.txt', 'nci5_moe0011_1.txt',
           'nci5_moe0012_1.txt', 'nci5_moe0013_1.txt', 'nci5_moe0014_1.txt']

for ligand in lines:
    for protein in receptors:
        file1=open(protein,"r")
        read1=file1.read()
        find_hit=read1.find(ligand)
        if find_hit == -1:
            print ligand,protein,"Not Found"
        else:
            print ligand,protein, "Found"

An example of the output of this code is below:

345647 nci5_moe0012_1.txt Not Found
345647 nci5_moe0013_1.txt Not Found
345647 nci5_moe0014_1.txt Found

My question is how can I take the output and format it into a csv file that looks like the example below?

Ligand  nci5_moe0012_1. nci5_moe_0013_1   nci5_moe_0014
345647  Not Found        Not Found        Found

martineau · Accepted Answer · 2015-07-27 00:47:53Z

3

I think something like this would do it (assuming your output file is tab-delimited):

import csv
import os

receptors = ['crystal_1', 'modeller_1', 'moe_1',
             'nci5_modeller0000_1', 'nci5_modeller0001_1',
             'nci5_modeller0002_1', 'nci5_modeller0003_1',
             'nci5_modeller0004_1', 'nci5_modeller0005_1',
             'nci5_modeller0006_1', 'nci5_modeller0007_1',
             'nci5_modeller0008_1', 'nci5_modeller0009_1',
             'nci5_modeller0010_1', 'nci5_modeller0011_1',
             'nci5_moe0000_1', 'nci5_moe0001_1', 'nci5_moe0002_1',
             'nci5_moe0003_1', 'nci5_moe0004_1', 'nci5_moe0005_1',
             'nci5_moe0006_1', 'nci5_moe0007_1', 'nci5_moe0008_1',
             'nci5_moe0009_1', 'nci5_moe0010_1', 'nci5_moe0011_1',
             'nci5_moe0012_1', 'nci5_moe0013_1', 'nci5_moe0014_1']

with open('potentiation.txt', 'rt') as experiment, \
     open('output.csv', 'wb') as outfile:
    csv_writer = csv.writer(outfile, delimiter='\t')
    csv_writer.writerow(['Ligand'] + receptors)  # header row
    for ligand in (line.rstrip() for line in experiment):
        row = [ligand]
        for protein in receptors:
            with open(protein+'.txt', "rt") as file1:
                found = ['Found', 'Not Found'][file1.read().find(ligand) == -1]
                row.append(found)
        csv_writer.writerow(row)

print('output.csv file written')

Update

As I said in a comment this could be done a lot faster by only reading the protein files once. In order to be able to do that and format the output the way you want, the results of checking for each ligand in each file need to stored in a data-structure built-up incrementally as each file is read and then checked multiple times, only to be written out, all-at-once, after all have been done. A simple list-of-lists is adequate for this purpose and has been used in implementation below.

The trade-off is using more memory vs reading and rereading the protein files over-and-over. Since disk IO is often one of the slowest things on a computer, the potentially large performance gain for only a slight increase in code-complexity is probably worthwhile.

Here's the code showing this alternative version:

import csv
import os

receptors = ['crystal_1', 'modeller_1', 'moe_1',
             'nci5_modeller0000_1', 'nci5_modeller0001_1',
             'nci5_modeller0002_1', 'nci5_modeller0003_1',
             'nci5_modeller0004_1', 'nci5_modeller0005_1',
             'nci5_modeller0006_1', 'nci5_modeller0007_1',
             'nci5_modeller0008_1', 'nci5_modeller0009_1',
             'nci5_modeller0010_1', 'nci5_modeller0011_1',
             'nci5_moe0000_1', 'nci5_moe0001_1', 'nci5_moe0002_1',
             'nci5_moe0003_1', 'nci5_moe0004_1', 'nci5_moe0005_1',
             'nci5_moe0006_1', 'nci5_moe0007_1', 'nci5_moe0008_1',
             'nci5_moe0009_1', 'nci5_moe0010_1', 'nci5_moe0011_1',
             'nci5_moe0012_1', 'nci5_moe0013_1', 'nci5_moe0014_1']

# initialize list of lists holding each ligand and its presence in each receptor
with open('potentiation.txt') as experiment:
    ligands = [[ligand] for ligand in (line.rstrip() for line in experiment)]

for protein in receptors:
    with open(protein + '.txt') as protein_file:
        protein_file_data = protein_file.read()
        for row in ligands:
            # determine if this ligand (row[0]) appears in protein data
            row.append('Found' if row[0] in protein_file_data else 'Not Found')

with open('output.csv', 'wb') as outfile:
    csv_writer = csv.writer(outfile, delimiter='\t')
    csv_writer.writerow(['Ligand'] + receptors)  # header row
    csv_writer.writerows(ligands)

print('output.csv file written')

edited Jul 27, 2015 at 0:47

answered Jul 25, 2015 at 20:45

martineau

124k29 gold badges181 silver badges319 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Adam Green Over a year ago

Thanks! When I use this code, I get the following error message: csv_writer([ligand, protein, "Found" if found else "Not Found"]) TypeError: '_csv.writer' object is not callable. Any suggestions?

Adam Green Over a year ago

Thanks this works! One more question. What does ^M mean? It appears in the output csv after each protein_file? Is there a way to get rid of it?

martineau Over a year ago

That's a carriage return character. My last update may get rid of it. If it doesn't, it may be because you're using Python 3 but didn't specify that in your question (and should let me know).

martineau Over a year ago

Adam: After rereading your question I realized my answer only converted the loop output into csv format, but not arranged the way you wanted. My latest update should correct that.

Adam Green Over a year ago

Thanks for catching that. There is actually one more problem with the script. The script is used to find whether a certain ligand is found or not found within various protein files. However, the output of the script is currently showing "Not Found" for all ligands for each protein file. This is not correct as there should be some that were "Found" and some "Not Found". I think a simple conditional expression should work. How can it best be introduced into the script?

|

Clodion · Accepted Answer · 2015-07-25 19:25:57Z

0

You can save your result in lists (one list for ligand, one for proteins), after you add the "Protein" and the value of "Ligand" to appropriate list (in 0 index). After it's easy to save it text file.
For saving you open a file for writing and transform list in string:

my_string = " ".join(map(str, lst))

and then save my_string (And do it for each list)

edited Jul 25, 2015 at 19:25

answered Jul 25, 2015 at 19:02

Clodion

1,0276 silver badges12 bronze badges

5 Comments

Clodion Over a year ago

Or you can use dictionary (keys are ligands and values are tuple (file, Found/Not Found).

Adam Green Over a year ago

Thanks for the response. I am pretty new to python. Could you explain more how I can how I can write two different lists to a single text file and include the output data(Found or Not Found)?

Clodion Over a year ago

Is-it more comprehensible? And you can use a "," in the join method (to be more in csv).

Adam Green Over a year ago

Okay, so so one more question, how can I save both lists as one text file?

Clodion Over a year ago

Here, this is not lists but strings!!

Collectives™ on Stack Overflow

How can I write output from a for loop in python into a csv-formatted file?

2 Answers 2

9 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related