Python unable to process heavy files

66 views Asked by At

I have a zipped (.gz) log file logfile.20221227.gz. I am writing a python script to process it. I did a test run with file which had 100 lines and the script worked fine. When i ran the same script on actual log file which is almost 5GB the script gets broken. Note that i was able to process log files upto 2GB. Unfortunately the only log file heavier than this is 5GB+ or 7GB+ and the script fails with both of them. My code is as below.

count = 0
toomany = 0 
maxhits = 5000
logfile = '/foo/bar/logfile.20221228.gz'
with gzip.open(logfile, 'rt', encoding='utf-8') as page:
    for line in page:
        count += 1
        print("\nFor loop count is: ",count)
        string = line.split(' ', 5)
        if len(string) < 5:
            continue
        level = string[3]
        shortline = line[0:499]
        if level == 'FATAL':
            log_lines.append(shortline)
            total_fatal += 1
        elif level == 'ERROR':
            log_lines.append(shortline)
            total_error += 1
        elif level == 'WARN':
            log_lines.append(shortline)
            total_warn += 1
        if not toomany and (total_fatal + total_error + total_warn) > max_hits:
            toomany = 1
if len(log_lines) > 0:
    send_report(total_fatal, total_error, total_warn, toomany, log_lines, max_hits)

Output:

For loop count is:  1
.
.
For loop count is:  192227123    
Killed

What does the Killed means here? It does not offer much to investigate just with this one keyword. Also is there a limit on file size and is there a way to bypass it.

Thank you.

1

There are 1 answers

3
tturbo On

From the updated code about, it may is a memory problem because log_lines gets to big

try to write shortline to a temporary file rather than log_lines.append, then in the end send the file (or its content) via email.

But check first how big the file is, because it may gets to big to be send via email. You can then try to zip it. You may also want to write the temp file as gz directly:

import gzip
with gzip.open('./log_lines.txt.gz', 'wb') as log_lines:
    with gzip.open(logfile, 'rt', encoding='utf-8') as page:
        # ...
        log_lines.write(shortline)