Get last n lines of a file, similar to tail
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Getting the last n lines of a file is the programmatic version of the Unix tail command. The best implementation depends on file size. For small and medium files, reading line by line into a bounded buffer is simple and reliable. For very large files, reading backward from the end can be more efficient because it avoids scanning the whole file.
Simple and Practical: Use a Bounded Buffer
In Python, the easiest good solution is collections.deque with maxlen.
Why this works well:
- memory stays bounded at
nlines - the code is short and readable
- it handles ordinary text files cleanly
For many real applications, this is already good enough.
Why Not Read the Entire File
A naive solution like:
works, but it loads the whole file into memory. That is fine for small files and a poor choice for large logs.
The deque solution still reads the whole file sequentially, but it does not retain all lines in memory.
Backward Reading for Huge Files
If the file is very large and you want something closer to how tail behaves internally, read from the end in chunks.
This avoids scanning from the start when only the tail is needed, though it is more complex and needs care with encodings and incomplete chunk boundaries.
Use the Real tail Command When Appropriate
If you are writing a script for a Unix-like environment and portability is not a concern, calling the real tail command may be simpler than reimplementing it.
This is often perfectly reasonable in operations tooling, but it is less portable than a pure language-level implementation.
Follow Mode Is a Different Problem
Many people associate tail with tail -f, which keeps watching the file for new lines. That is not the same as "get the last n lines once."
For one-time tail behavior:
- read the last
nlines and stop
For follow mode:
- keep the file open
- seek to the end
- poll or watch for new content
Those are related tasks, but they need different control flow.
Common Pitfalls
The most common mistake is using readlines() on very large files and unexpectedly consuming a lot of memory.
Another issue is forgetting about encodings when reading backward in binary mode. Chunk-based reverse reads are efficient, but text decoding becomes more delicate.
Some developers also assume all files end with a trailing newline. That is common, but not guaranteed, so your implementation should tolerate the final line being unterminated.
Finally, if you are on Windows or in a restricted environment, relying on the external tail command may not be portable.
Summary
- For many use cases,
deque(f, maxlen=n)is the simplest good tail implementation. - Avoid loading the entire file into memory unless the file is known to be small.
- For huge files, reading backward from the end can be more efficient.
- Calling the real
tailcommand is fine when platform dependence is acceptable. - '
tail -fstyle follow behavior is a different feature from just reading the lastnlines once.'

