How to implement `tail -1` in Python

Question

I have some very big files (more than 100 millions lines).
And I need to read their last line.
As I 'm a Linux user, in a shell/script I use 'tail' for that.

Is there a way to rapidly read the last line of a file in python ?
Perhaps, using 'seek', but I 'm not aware with that.

The best I obtain is this :

from subprocess import run as srun

file = "/my_file"
proc = srun(['/usr/bin/tail', '-1', file], capture_output=True)
last_line = proc.stdout

All other pythonic code I tried are slower than calling external /usr/bin/tail

I also read these threads that not satisfy my demand :
How to implement a pythonic equivalent of tail -F?
Head and tail in one line
Because I want some speed of execution and avoid memory overload.

Edit: I try what I understand on comments and …

I get a strange comportment :

>>> with open("./Python/nombres_premiers", "r") as f:
...     a = f.seek(0,2)
...     l = ""
...     for i in range(a-2,0,-1):
...        f.seek(i)
...        l = f.readline() + l
...        if l[0]=="\n":
...           break
... 
1023648626
1023648625
1023648624
1023648623
1023648622
1023648621
1023648620
1023648619
1023648618
1023648617
1023648616
>>> l
'\n2001098251\n001098251\n01098251\n1098251\n098251\n98251\n8251\n251\n51\n1\n'
>>> with open("./Python/nombres_premiers", "r") as f:
...     a = f.seek(0,2)
...     l = ""
...     for i in range(a-2,0,-1):
...        f.seek(i)
...        l = f.readline()
...        if l[0]=="\n":
...           break
... 
1023648626
1023648625
1023648624
1023648623
1023648622
1023648621
1023648620
1023648619
1023648618
1023648617
1023648616
>>> l
'\n'

How to get l = 2001098251 ?

os.seek is your friend -- that's the same facility that tail itself uses. — Charles Duffy
– Charles Duffy, Commented Dec 6, 2024 at 23:35
When you say "How to implement a pythonic equivalent of tail -F" doesn't solve your problem you're wrong -- some of the answers there do use seek() with the correct arguments to skip directly to the end and so are just as efficient as tail itself. Just ignore any answer that doesn't refer to os.seek and os.SEEK_END. — Charles Duffy
– Charles Duffy, Commented Dec 6, 2024 at 23:36
Well, I can reach the end of file with f.seek(0,2) which return an integer (address to the extremely end of file). How to get the last line ? I don't know its length. — Tawal
– Tawal, Commented Dec 7, 2024 at 0:02
@Tawal You should seek(-2, os.SEEK_END) and then something like while f.read(1) != b'\n': f.seek(-2, os.SEEK_CUR) to get to the beginning of the last line. — Yuri Ginsburg
– Yuri Ginsburg, Commented Dec 7, 2024 at 0:18
The way tail does it is to rewind a bit from the end (1-4kb typically) and read a line at a time from there. If you want to get fancy you can rewind more until you find at least one newline between your location and the end of the file. — Charles Duffy
– Charles Duffy, Commented Dec 7, 2024 at 0:20

Charles Duffy · Accepted Answer · 2024-12-07 01:40:45Z

2

tail doesn't support arbitrarily long lines -- it takes the last chunk of the file and iterates from there. Doing the same thing yourself could look like:

def last_line(f, bufsize=4096):
    end_off = f.seek(0, 2)
    f.seek(max(end_off - bufsize, 0), 0)
    lastline = None
    while (line := f.readline()):
        if line[-1] == '\n':
            lastline = line
        else:
            break # last line is not yet completely written; ignore it
    return lastline[:-1] if lastline is not None else None

import sys
print(last_line(open(sys.argv[1], 'r')))

Note that if you want to continue to read new content as the file is edited over time, you should use inotify to watch for changes. https://stackoverflow.com/a/78969468/14122 demonstrates this.

edited Dec 7, 2024 at 1:40

answered Dec 7, 2024 at 1:14

Charles Duffy

299k43 gold badges441 silver badges496 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user2357112 Dec 7, 2024 at 1:18

Seeking to arbitrary offsets is undefined behavior in text mode, though. (There's no way to implement sensible, efficient arbitrary-offset seek for arbitrary character encodings.)

Charles Duffy Dec 7, 2024 at 1:20

Truth, that. Should probably be ignoring encoding failures, at least on the very first read.

Tawal Dec 7, 2024 at 1:36

Get this error :

Traceback (most recent call last):   File "<stdin>", line 2, in <module>   File "<stdin>", line 6, in last_line TypeError: 'NoneType' object is not subscriptable

Charles Duffy Dec 7, 2024 at 1:41

@Tawal, I just added a guard so that if we find no valid lines inside the last 4kb we return None instead of failing with that error. If you choose, you can of course return the incomplete content instead, or you could turn the buffer size up and allow the last line to be more than 4kb. However, this certainly shouldn't be something that can happen with the file you showed where the lines are all quite short. I'd need a reproducer (inclusive of the data file or code that creates a data file with which the problem takes place) to speak further.

Charles Duffy Dec 7, 2024 at 1:44

(If your real file weren't line-oriented at all but instead were NUL-delimited, of course, that's an easy way to get into this state -- you'd need to replace readline() appropriately; it also could probably happen with the prior code revision and a completely empty file -- but neither of those corner cases fit the scenario in the question).

Tawal · Accepted Answer · 2024-12-08 20:07:27Z

0

Using seek(), read() and readline(),
I can rapidly retrieve the last line of a text file :

with open("My_File", "r") as f:
     n = f.seek(0,2)
     for i in range(n-2, 0, -1):
             f.seek(i)
             if f.read(1)=="\n":
                     s = f.readline().replace("\n", "")
                     break

Edit: changed range(n-2, 1, -1) by range(n-2, 0, -1) in case the file has only 1 line.
Edit2: replaced s = f.readline()[:-1] by s = f.readline().replace("\n", "") in case there isn't Line Feed character.

edited Dec 8, 2024 at 20:07

answered Dec 8, 2024 at 19:42

Tawal

1119 bronze badges

Collectives™ on Stack Overflow

How to implement `tail -1` in Python

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related