Converting a log.txt file to JSON using python

57 views Asked by At

I am learning Python and have very limited programming knowledge, as a learning project I have a .txt system log that I am trying to convert to JSON.

I want the python program to parse through the .txt file making each event an object and the entries for that event to be split into key:value pairs. This is so later on I can query and summarise the alerts in the log. eventually I want the program to accept user input to query the JSON ( but that is for another day).

my current script is looking like this

import re
import json
import os

def parse_log_file(input_file):
    events = []

    with open(input_file, 'r') as file:
        log_content = file.read()

    # extract individual events
    event_pattern = re.compile(r'Event \d+\s+(.*?)\s+(?=(?:Event \d+|$))', re.DOTALL)
    matches = event_pattern.findall(log_content)

    for match in matches:
        event_dict = {}
        lines = match.split('\n')

        for line in lines:
            if line.strip():
                key, value = map(str.strip, line.split(':', 1))
                event_dict[key] = value

        events.append(event_dict)

    # Write the JSON output with the same name as the input file
    output_file = os.path.splitext(input_file)[0] + ".json"
    with open(output_file, 'w') as json_file:
        json.dump(events, json_file, indent=4)

    print(f"JSON file saved as,{output_file}")
if __name__ == "__main__":
    input_file = "log.txt"
    parse_log_file(input_file)

Desired output:

Event 1
{ "LogName" : "System",
"MachineName" : "LAPTOP" ,
"ProviderName" : "Intel",
"LevelDisplayName" : "Information",
"Message: : "Check the remaining resource budget. Module exceeds resource budget, failed to AllocateFwCps,
STATUS = Insufficient system resources exist to complete the API.." },

Event 2 {
"LogName" : "System",
"MachineName" : "LAPTOP"
"ProviderName" : "Microsoft-Windows-Kernel-Power"
"LevelDisplayName" : "Information"
"Message" : "The system session has transitioned from 186 to 188. Reason InputPoUserPresent
BootId: 67"
}

however my output currently looks like this:

LogName : "System
MachineName : LAPTOP
ProviderName : Microsoft-Windows-Kernel-Power
LevelDisplayName : Information
Message : The system session has transitioned from 186 to 188. Reason InputPoUserPresent
BootId: 67"

Where I am going wrong? Ideally I would like each element of the alert i.e. LogName, MachineName etc.. to be the keys and the information to be the value

1

There are 1 answers

0
jwP54 On

Use print statements to troubleshoot this, at different stages. Firstly, try and print 'matches' from

matches = event_pattern.findall(log_content)

if that is what you want, move on. print 'lines' and makes sure it is what you want.

    for match in matches:
        event_dict = {}
        lines = match.split('\n')

Use that methodology and you'll get it.