Asked  6 Months ago    Answers:  5   Viewed   54 times

I'm writing a log file viewer for a web application and for that I want to paginate through the lines of the log file. The items in the file are line based with the newest item at the bottom.

So I need a tail() method that can read n lines from the bottom and support an offset. This is hat I came up with:

def tail(f, n, offset=0):
    """Reads a n lines from f with an offset of offset lines."""
    avg_line_length = 74
    to_read = n + offset
    while 1:
        try:
            f.seek(-(avg_line_length * to_read), 2)
        except IOError:
            # woops.  apparently file is smaller than what we want
            # to step back, go to the beginning instead
            f.seek(0)
        pos = f.tell()
        lines = f.read().splitlines()
        if len(lines) >= to_read or pos == 0:
            return lines[-to_read:offset and -offset or None]
        avg_line_length *= 1.3

Is this a reasonable approach? What is the recommended way to tail log files with offsets?

 Answers

29

The code I ended up using. I think this is the best so far:

def tail(f, n, offset=None):
    """Reads a n lines from f with an offset of offset lines.  The return
    value is a tuple in the form ``(lines, has_more)`` where `has_more` is
    an indicator that is `True` if there are more lines in the file.
    """
    avg_line_length = 74
    to_read = n + (offset or 0)

    while 1:
        try:
            f.seek(-(avg_line_length * to_read), 2)
        except IOError:
            # woops.  apparently file is smaller than what we want
            # to step back, go to the beginning instead
            f.seek(0)
        pos = f.tell()
        lines = f.read().splitlines()
        if len(lines) >= to_read or pos == 0:
            return lines[-to_read:offset and -offset or None], 
                   len(lines) > to_read or pos > 0
        avg_line_length *= 1.3
Tuesday, June 1, 2021
 
Bono
answered 6 Months ago
44
  • Make sure the file exists: use os.listdir() to see the list of files in the current working directory
  • Make sure you're in the directory you think you're in with os.getcwd() (if you launch your code from an IDE, you may well be in a different directory)
  • You can then either:
    • Call os.chdir(dir), dir being the folder where the file is located, then open the file with just its name like you were doing.
    • Specify an absolute path to the file in your open call.
  • Remember to use a raw string if your path uses backslashes, like so: dir = r'C:Python32'
    • If you don't use raw-string, you have to escape every backslash: 'C:\User\Bob\...'
    • Forward-slashes also work on Windows 'C:/Python32' and do not need to be escaped.

Let me clarify how Python finds files:

  • An absolute path is a path that starts with your computer's root directory, for example 'C:Pythonscripts..' if you're on Windows.
  • A relative path is a path that does not start with your computer's root directory, and is instead relative to something called the working directory. You can view Python's current working directory by calling os.getcwd().

If you try to do open('sortedLists.yaml'), Python will see that you are passing it a relative path, so it will search for the file inside the current working directory. Calling os.chdir will change the current working directory.

Example: Let's say file.txt is found in C:Folder.

To open it, you can do:

os.chdir(r'C:Folder')
open('file.txt') #relative path, looks inside the current working directory

or

open(r'C:Folderfile.txt') #full path
Tuesday, June 1, 2021
 
Guesser
answered 6 Months ago
54

How about this (reads last 8 bytes for demo):

$fpath = "C:10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-8, 'End') | Out-Null
for ($i = 0; $i -lt 8; $i++)
{
    $fs.ReadByte()
}

UPDATE. To interpret bytes as string (but be sure to select correct encoding - here UTF8 is used):

$N = 8
$fpath = "C:10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$N, [System.IO.SeekOrigin]::End) | Out-Null
$buffer = new-object Byte[] $N
$fs.Read($buffer, 0, $N) | Out-Null
$fs.Close()
[System.Text.Encoding]::UTF8.GetString($buffer)

UPDATE 2. To read last M lines, we'll be reading the file by portions until there are more than M newline char sequences in the result:

$M = 3
$fpath = "C:10GBfile.dat"

$result = ""
$seq = "`r`n"
$buffer_size = 10
$buffer = new-object Byte[] $buffer_size

$fs = [IO.File]::OpenRead($fpath)
while (([regex]::Matches($result, $seq)).Count -lt $M)
{
    $fs.Seek(-($result.Length + $buffer_size), [System.IO.SeekOrigin]::End) | Out-Null
    $fs.Read($buffer, 0, $buffer_size) | Out-Null
    $result = [System.Text.Encoding]::UTF8.GetString($buffer) + $result
}
$fs.Close()

($result -split $seq) | Select -Last $M

Try playing with bigger $buffer_size - this ideally is equal to expected average line length to make fewer disk operations. Also pay attention to $seq - this could be rn or just n. This is very dirty code without any error handling and optimizations.

Monday, August 2, 2021
 
muaaz
answered 4 Months ago
76

I would use argparse to create an option parser that accepts a file path and opens it.

import argparse

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('infile', type='open')
    args = parser.parse_args()

    for line in args.infile:
        print line

if __name__ == '__main__':
    main()

If type='open' does not provide enough control, it can be replaced with argparse.FileType('o') which accepts bufsize and mode args (see http://docs.python.org/dev/library/argparse.html#type)

EDIT: My mistake. This will not support your use case. This will allow you to provide a filepath, but not pipe the file contents into the process. I'll leave this answer here as it might be useful as an alternative.

Monday, August 16, 2021
 
NIKHIL
answered 4 Months ago
76

You can use Tie::File to handle the file as an array.

use Tie::File;
tie (@File, 'Tie::File', $Filename);
splice (@File, -125000, 125000);
untie @File;

An alternative is to use head and wc -l in the shell.

edit: grepsedawk reminds us of the -n option to head, no wc necessary:

head -n -125000 FILE > NEWFILE
Sunday, October 3, 2021
 
Greg Malcolm
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share