4
\$\begingroup\$

I have written a function that gets a function as input (acquired from numpy.poly1d). The function has access to the GUI variables self.batch and self.inputFile, these variables specify if the function is called in batch mode or not and what the current inputFile (for transformation is).

My main concerns are:

  • Would you understand the function based on my documentation?
  • Is vectorizing only the values to be transformed correct?
  • Is the function in general correct (meaning pythonic?)

The function:

def transformFile(self, f):
    """ This function gets a single function as input, reads the
    raw data file and transforms the read m/z values in the raw data
    file with the given function. The transformed m/z values are 
    stored in a calibrated_<inputfile>.xy file (if program is running
    in batchProcess mode) or asks the user for a filepath to save
    the calibrated file (if program is running in single spectrum mode).

    INPUT: a numpy polynomial (poly1d) function
    OUTPUT: a calibrated data file
    """
    # Prepare variables required for transformation
    outputBatch = []
    mzList = []
    intList = []
    with open (self.inputFile,'r') as fr:
        if self.batch == False:
            output = tkFileDialog.asksaveasfilename()
            if output:
                pass
            else:
                tkMessageBox.showinfo("File Error","No output file selected")
                return
        else: 
            parts = os.path.split(str(self.inputFile))
            output = "calibrated_"+str(parts[-1])
        if self.log == True:
            with open('MassyTools.log','a') as fw:
                fw.write(str(datetime.now())+"\tWriting output file: "+output+"\n")
        for line in fr:
            line = line.rstrip('\n')
            values = line.split()
            mzList.append(float(values[0]))
            intList.append(int(values[1]))
        # Transform python list into numpy array
        mzArray = numpy.array(mzList)
        newArray = f(mzArray)
        # Prepare the output as a list
        for index, i in enumerate(newArray):
            outputBatch.append(str(i)+" "+str(intList[index])+"\n")
        # Join the output list elements
        joinedOutput = ''.join(outputBatch)
        with open(output,'w') as fw:
            fw.write(joinedOutput)
        if self.log == True:
            with open('MassyTools.log','a') as fw:
                fw.write(str(datetime.now())+"\tFinished writing output file: "+output+"\n")    
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

My attempt to save this from obscurity involves the following assumptions:

  • Convert from Python 2 to Python 3
  • Extract class member dependencies, show them as function parameters, and delete self
  • Preserve this as one function, even though it probably shouldn't be one

Individually,

Type-hint f as a poly1d - for now. In another iteration it should be replaced because it's deprecated.

Move variable creation (outputBatch, etc.) closer to where it's actually used.

Don't keep opening and closing MassyTools.log; instead use the built-in logger.

Use pathlib.Path for inputFile.

Replace the use of lowerCamelCase with snake_case.

Replace == False and == True with direct boolean expressions (not self.batch, self.log).

Instead of writing if output: pass, just write if not output.

tkMessageBox and tkFileDialog no longer exist and have been renamed. You can import the new names and use them, or alias them to the old names.

If writing two cases for the value of a boolean predicate like self.batch, and if there are no other overriding factors, you should write the true case first and the false case second, and delete the not.

There is almost certainly a Numpy built-in that can replace your parse-and-append() loop - such as loadtxt - but without seeing the file content it's impossible to say for sure. The same applies to the output loop.

You need more internal whitespace separating "paragraphs" of code. But more importantly, this one function mixes up several different responsibilities that should not exist together, including GUI interaction, data processing and I/O. It should be split to multiple functions.

Covering some of the above,

import logging
from pathlib import Path

import numpy as np
import tkinter.filedialog as tkFileDialog
import tkinter.messagebox as tkMessageBox


logging.basicConfig(filename='MassyTools.log', datefmt='%c')
logger = logging.getLogger('transform')


def transform_file(
    f: np.poly1d,
    input_file: Path,
    batch: bool,
) -> None:
    """
    This function gets a single function as input, reads the
    raw data file and transforms the read m/z values in the raw data
    file with the given function. The transformed m/z values are
    stored in a calibrated_<inputfile>.xy file (if program is running
    in batchProcess mode) or asks the user for a filepath to save
    the calibrated file (if program is running in single spectrum mode).

    INPUT: a numpy polynomial (poly1d) function
    OUTPUT: a calibrated data file
    """

    # Load input
    mz_list = []
    int_list = []
    with input_file.open() as fr:
        for line in fr:
            mz_value, int_value = line.rstrip('\n').split()
            mz_list.append(float(mz_value))
            int_list.append(int(int_value))

    mz_array = np.array(mz_list)
    new_array = f(mz_array)

    # Choose output filename
    if batch:
        output = "calibrated_" + input_file.name
    else:
        output = tkFileDialog.asksaveasfilename()
        if not output:
            tkMessageBox.showinfo(title='File Error', message='No output file selected')
            return
    logger.info(f'Writing output file: %s', output)

    # Write output
    with open(output, 'w') as fw:
        for i_array, i_int in zip(new_array, int_list):
            fw.write(f'{i_array} {i_int}\n')

    logger.info('Finished writing output file: %s', output)
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.