Python look for text block, put in dictonary array

50 views Asked by At

I am trying to look for a text block, and put some of the lines in a dictionary array. So a dictionary for every text block I find. for example the following text:

some
other
text

address-object object1
    name "name1"
    uuid 4ac9cf52-02b5-eecf-0100-18c24100da5e
    zone zone1
    host ip1
    exit

address-object object2
    name "name2"
    uuid a5c02150-a47e-748d-0100-18c24100da5e
    zone zone2
    host ip2
    exit

some
more text

I would for example would like to store ip, and zone in the array for each block so I would end up with [[host:ip1,zone:zone1],[host:ip2,zone:zone2]].

I have tried to loop through the text file but cannot get the looping through the block correct. I think I have to use some form of iteration but I am not sure. I end up with a single array with all the items from the first address-object line until the some keyword. I need a loop for each address-object and got the next when I encounter an empty line.

3

There are 3 answers

0
inspectorG4dget On
importantKeys = {'host', 'zone'}

with open('path/to/file') as infile:
    answer = [{}]
    for line in infile:
        k,_,v = line.strip().partition(' ')
        if k in importantKeys:
            answer[-1][k] = v
        if len(answer[-1]) == len(importantKeys):
            answer.append({})

And the result:

In [28]: answer
Out[28]: [{'zone': 'zone1', 'host': 'ip1'}, {'zone': 'zone2', 'host': 'ip2'}, {}]

In [29]: answer = [d for d in answer if d]

In [30]: answer
Out[30]: [{'zone': 'zone1', 'host': 'ip1'}, {'zone': 'zone2', 'host': 'ip2'}]
0
Andrej Kesely On

One possible solution is to use re module:

import re

text = """\
some
other
text

address-object object1
    name "name1"
    uuid 4ac9cf52-02b5-eecf-0100-18c24100da5e
    zone zone1
    host ip1
    exit

address-object object2
    name "name2"
    uuid a5c02150-a47e-748d-0100-18c24100da5e
    zone zone2
    host ip2
    exit

some
more text
"""


pat = r"\s+(zone|host)\s+(.+)"

out = re.findall(pat, text)
out = [dict(t) for t in zip(out[::2], out[1::2])]
print(out)

Prints:

[{"zone": "zone1", "host": "ip1"}, {"zone": "zone2", "host": "ip2"}]
2
dawg On

First get a very specific regex that describes those data blocks. HERE is an example.

Then once you have those specific blocks, you can use a very simple regex to get the data items of interest.

Working model:

import re 

inp='''\
some
other
text

address-object object1
    name "name1"
    uuid 4ac9cf52-02b5-eecf-0100-18c24100da5e
    zone zone1
    host ip1
    exit

address-object object2
    name "name2"
    uuid a5c02150-a47e-748d-0100-18c24100da5e
    zone zone2
    host ip2
    exit

some
more text'''
print (
    [dict(re.findall(r'(?m)^\s+(host|zone)\s+(\S+)', block.group(1))) 
        for block in re.finditer(r'(?m)^\s*$\n^(address-object\b[\s\S]+?^\s+exit\b)', inp) ]
)

Prints:

[{'zone': 'zone1', 'host': 'ip1'}, {'zone': 'zone2', 'host': 'ip2'}]

Or, with a little tweak, you can get all the data in one go:

pat=r'(?m)^\s*$\n^address-object\b.*\r?\n([\s\S]+?)\s+^\s+exit\b'
for b in re.finditer(pat, inp):
    print( 
        {k:v for k,_,v in 
            (e.strip().partition(' ') 
                for e in  b.group(1).splitlines())} )

Prints:

{'name': '"name1"', 'uuid': '4ac9cf52-02b5-eecf-0100-18c24100da5e', 'zone': 'zone1', 'host': 'ip1'}
{'name': '"name2"', 'uuid': 'a5c02150-a47e-748d-0100-18c24100da5e', 'zone': 'zone2', 'host': 'ip2'}