0

I found a Python script called transpose_file.py which can transpose space-delimited files. It looks like so:

 import fileinput

 m = []
 for line in fileinput.input():
     m.append(line.strip().split(" "))
 for row in zip(*m):
     print " ".join(row)

I want to make sure I understand what each line does, as I am very new to Python.

1) First, we import a module called fileinput which allows you to read files and parse through them? Not sure why using a simple with open(sys.argv[1],'r') as f etc would not work

2) Make an empty list called m

3) For each line in your input file, strip any space, tab or newline at the end of the line, and make space the delimiter (i.e. your input file is delimited)

4) For each row ... not sure what the rest means. What does zip(*m) mean? Once this is done, we print a space and we join the row? I just don't see how this results in a transposition.

Any explanation would be deeply appreciated.

2 Answers 2

1
  1. fileinput supports other methods of file input as well. It can effectively do open(sys.argv[1],'r'), but also supports other possibilities - see the Python documentation for this.

  2. Your understanding of 2 and 3 is broadly correct

  3. For each line, the line is stripped of whitespace and then split by spaces. This results in a grid representing each space-delimited part of the file.

  4. zip(*) is effectively Python's transposition operator. For example:

    In [1]: data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    
    In [2]: data
    Out[2]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    
    In [3]: transp = list(zip(*data))
    
    In [4]: transp
    Out[4]: [(1, 4, 7), (2, 5, 8), (3, 6, 9)]
    

    You have to coerce it to a list as zip returns an iterable. zip is perhaps more commonly used to "zip" together two lists, so you can iterate over them together:

    In [1]: list(zip(["one", "three", "five"], ["two", "four", "six"]))
    Out[1]: [('one', 'two'), ('three', 'four'), ('five', 'six')]
    

    This is also well documented.

    The * operator separates each sublist of the grid into a separate argument to zip.

    " ".join joins together each string in an iterable with a space - eg

    In [1]: " ".join(["foo", "bar", "baz"])
    Out[1]: 'foo bar baz'
    

    This just puts the space delimiters back into your newly transposed series of strings. It is again, documented.

Sign up to request clarification or add additional context in comments.

Comments

1

Your analysis is basically correct.

Note that

line.strip().split(" ")

is a little fragile. It strips all leading & trailing whitespace from the line, and then splits the line into a list of strings, using a single space as the delimiter. This may not do what you want if the line contains runs of more than one space, or if it contains tabs.


The zip function iterates over its arguments in parallel, building tuples from the corresponding items in each arg. So first it generates a tuple of all the first items, then all the second items, etc.

Eg:

for t in zip([1, 2, 3], [4, 5, 6], [7, 8, 9]):
    print(t)
print()

output

(1, 4, 7)
(2, 5, 8)
(3, 6, 9)

As you can see, this results in a transposition.

We can use the * "splat" operator to pass a list of sequences to zip, the "splat" operator unpacks the list so that zip sees each of those sequences as a separate arg.

lst = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]

for t in zip(*lst):
    print(t)

This gives the same output as before.

The "splat" operator isn't just a special feature of zip: you can use it on any function that takes multiple arguments. There's also the "double-splat" operator **, which unpacks dictionaries into keyword=value pairs.

If the sequences differ in length then zip stops when there are no more items left in the shortest sequence. However, there's a related function in the standard itertools module: itertools.zip_longest, which takes an optional fillvalue. It keeps going until the longest sequence is exhausted, using fillvalue to fill the gaps. The default fillvalue is None.


In regard to fileinput, some people just find it convenient, I prefer with open(...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.