13

A Python program I'm writing is to read a set number of lines from the top of a file, and the program needs to preserve this header for future use. Currently, I'm doing something similar to the following:

header = ''
header_len = 4
for i in range(1, header_len):
    header += file_handle.readline()

Pylint complains that I'm not using the variable i. What would be a more pythonic way to do this?

Edit: The purpose of the program is to intelligently split the original file into smaller files, each of which contains the original header and a subset of the data. So, I need to read and preserve just the header before reading the rest of the file.

0

9 Answers 9

13
f = open('fname')
header = [next(f) for _ in range(header_len)]

Since you're going to write header back to the new files, you don't need to do anything with it. To write it back to the new file:

open('new', 'w').writelines(header + list_of_lines)

if you know the number of lines in the old file, list_of_lines would become:

list_of_lines = [next(f) for _ in range(chunk_len)]
Sign up to request clarification or add additional context in comments.

2 Comments

Straight forward, easily understandable, and eliminates the pylint complaint. Thus it's the best answer, IMO.
Don't you want to close the new file? ;)
12

I'm not sure what the Pylint rules are, but you could use the '_' throwaway variable name.

header = ''
header_len = 4
for _ in range(1, header_len):
    header += file_handle.readline()

4 Comments

You don't need to use the for loop. I recommend a list comprehension (see my post below). Good call on the throwaway variable, though.
@Roger Pate: can you explain?
@unknown, there's nothing wrong with using for loops. for loops are integral part of Python and are basic concepts of programming. If somebody says otherwise not to use it, tell them to take a hike
You learn something new everyday - I didn't know about the _ variable. Thanks! +1
10
import itertools

header_lines = list(itertools.islice(file_handle, header_len))
# or
header = "".join(itertools.islice(file_handle, header_len))

Note that with the first, the newline chars will still be present, to strip them:

header_lines = list(n.rstrip("\n")
                    for n in itertools.islice(file_handle, header_len))

5 Comments

If you strip the lines it will be difficult to recall the structure of the original header. I recommend you keep them.
No, it won't. In that example they are stored in a list rather than one long string. Which he should use depends on what he's doing with the data later.
The OP writes in his script 'header += ...' so I think he meant a single string, but you are right: it depends.
Arrieta: that's why I used separate header and header_lines variables.
Anurag: your own answer doesn't even use "for line in f", nor do any of the answers I currently see iterate the file directly---if anything, itertools is the only solution here that uses the file as an iterator and is thus the closest answer to "for line in f".
4

My best answer is as follows:

file test.dat:

This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9

Python script:

f = open('test.dat')
nlines = 4
header = "".join(f.readline() for _ in range(nlines))

Output:

>>> header
'This is line 1\nThis is line 2\nThis is line 3\nThis is line 4\n'

Notice that you don't need to call any modules; also that you could use any dummy variable in place of _ (it works with i, or j, or ni, or whatever) but I recomend you don't (to avoid confusion). You could strip the newline characters (though I don't recommend you do - this way you can distinguish among lines) or do anything that you can do with strings in Python.

Notice that I did not provide a mode for opening the file, so it defaults to "read only" - this is not Pythonic; in Python "explicit is better than implicit". Finally, nice people close their files; in this case it is automatic (because the script ends) but it is best practice to close them using f.close().

Happy Pythoning.

Edit: As pointed out by Roger Pate the square brackets are unnecessary in the list comprehension, thereby reducing the line by two characters. The original script has been edited to reflect this.

4 Comments

When you don't actually need a list and any iterable will work, such as the parameter to "".join here, then a generator expression is better, easier (by two keystrokes ;), and more clear than a list comprehension: "".join(..) instead of "".join([..]). They are related, and a LC is actually a special case of a genexp (in my view at least), where [..] is just convenience for list(..). python.org/dev/peps/pep-0289
yes i did read. I still want you to close it for the benefit of others who only want to see code and doesn't want to read.
@Arrieta: Did NASA approve your use of their logo? ;-p
Actually in join you have to use a list comprehension and not an iterator for performance ;)
1

May be this:

header_len = 4
header = open("file.txt").readlines()[:header_len]

But, it will be troublesome for long files.

5 Comments

.readlines() reads the entire file, though.. if you have a large file and don't want to read the whole thing into memory, this could be a bad idea
yeah, I have added that while you were writing this, ;)
@david : guido please make it lazy lazy very lazy...stackoverflow.com/questions/519633/…
There's no need, now that we have itertools.islice.
+1 for simplicity and OP can use the rest of the list items easily to split into smaller files. readlines() does read the entire file, but I am not going to -1 you for that, since we don't know if OP's files are that big in the GB range, so it might still be ok for OP to use this method.
1

I do not see any thing wrong with your solution, may be just replace i with _, I also do not like invoking itertools everywhere where simpler solution will work, it is like people using jQuery for trivial javascript tasks. anyway just to have itertools revenge here is my solution

as you want to read whole file anyway line by line, why not just first read header and after that do whatever you want to do

header = ''
header_len = 4

for i, line in enumerate(file_handle):
    if i < header_len:
        header += line
    else:
        # output chunks to separate files
        pass

print header

Comments

0

What about:

header = []
for i,l in enumerate(file_handle):
    if i <= 3: 
         header += l
         continue
    #proc rest of file here

Comments

0

One problem with using _ as a dummy variable is that it only solves the problem on one level, consider something like the following.

def f(n, m):
"""A function to run g() n times and run h() m times per g."""
    for _ in range(n):
        g()
        for _ in range(m):
            h()
    return 0

This function works fine but the _ iterator over m runs is problematic as it may conflict with the upper _. In any case PyCharm is complaining about this kind of syntax.

So I would argue that _ is not as "throwaway" as was suggested before.

Perhaps you might like to just create a function to do it!

def run(f, n, *args):
    """Runs f with the arguments from the args tuple n times."""
    for _ in range(n):
        f(*args)

e.g. you could use it like this:

>>> def ft(x, L):
...     L.append(x)

>>> a = 7
>>> nums = [4, 1]
>>> run(ft, 10, a, nums)
>>> nums
[4, 1, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7]

Comments

-1
s=""
f=open("file")
for n,line in enumerate(f):
  if n<=3 : s=s+line
  else:
      # do something here to process the rest of the lines          
print s
f.close()

7 Comments

He seems to want the result in a single string (notice he writes header += ...)
I think this implementation is overly complicated for such a simple task; it reads like C on Python - take advantage of the "Batteries Included" philosophy and use the existing methods on the objects.
overly complicated?? what criteria do you use to judge?? number of characters of code? number of lines of code?? Batteries included?? What kind of batteries are you talking about that i am not using? you can test my code versus your code with millions of lines, and they both perform on par. So what's the deal?
The "Batteries Included" is a motto of the Python Language (cf. website) "Fans of Python use the phrase "batteries included" to describe the standard library". What I mean is that your style is not taking advantage of the Standard Library and, by doing so, you are reinventing the wheel. This is not in line with Python's philosophy. By reinventing the wheel you condemn others to understand your logic (which could be difficult in some cases): by using the Standard Library you can express your ideas at a higher level of abstraction and don't distract your code logic with wheel reinventions.
No need in going around downvoting - this is a place to learn and you cannot get offended by people commenting on your code. If you cannot stand the heat, keep out of the kitchen.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.