Extract occurrence of text between brackets from a text file Python

2.5k views Asked by At

Log file:

INFO:werkzeug:127.0.0.1 - - [20/Sep/2018 19:40:00] "GET /socket.io/?polling HTTP/1.1" 200 -
INFO:engineio: Received packet MESSAGE, ["key",{"data":{"tag1":12,"tag2":13,"tag3": 14"...}}]

I'm interested in extracting only the text from with in the brackets which contain the keyword "key" and not all of the occurrences that match the regex pattern from below.

Here is what I have tried so far:

import re
with open('logfile.log', 'r') as text_file:
    matches = re.findall(r'\[([^\]]+)', text_file.read())
    with open('output.txt', 'w') as out:
        out.write('\n'.join(matches))

This outputs all of the occurrences that match the regex. The desired output to the output.txt would look like this:

"key",{"data":{"tag1":12,"tag2":13,"tag3": 14"...}}
1

There are 1 answers

2
Wiktor Stribiżew On BEST ANSWER

To match text within square brackets that cannot have [ and ] inside it, but should contain some other text can be matched with a [^][] negated character class.

That is, you may match the whole text within square brackets with \[[^][]*], and if you need to match some text inside, you need to put that text after [^][]* and then append another occurrence of [^][]* before the closing ].

You may use

re.findall(r'\[([^][]*"key"[^][]*)]', text_file.read()) 

See the Python demo:

import re
s = '''INFO:werkzeug:127.0.0.1 - - [20/Sep/2018 19:40:00] "GET /socket.io/?polling HTTP/1.1" 200 - 
INFO:engineio: Received packet MESSAGE, ["key",{"data":{"tag1":12,"tag2":13,"tag3": 14"...}}]'''
print(re.findall(r'\[([^][]*"key"[^][]*)]', s)) 

Output:

['"key",{"data":{"tag1":12,"tag2":13,"tag3": 14"...}}']