How to convert a string of a tuple into a list in python?

144 views Asked by At

So the title sounds odd because perhaps my problem is odd... I have a .txt file with thousands of lines of machine output from a different program in the following format:

candidates(6,1,0,5,[ev(-1000,'C0009814','Stenosis','Acquired stenosis',[stenosis],[patf])])

Essentially we have 'candidates' marking the start of a tuple, and 'ev' marking the start of a second tuple inside of a single element list. When I read all of this into python from the file, it reads in as a string. But I need an object so I can access the nth index of the tuple. Truly, I would be happy just finding a way to consistently get the last value of the ev() tuple from this string, in this case 'patf'.

I had considered just splitting on ',' but this is not always successful because the list inside the list '[stenosis]', can sometimes have values like '[regurgitation, aortic]'. That extra ',' throws off the list index by 1 and therefore it return 'aortic]' instead of '[patf]'.

Please let me know if I can clarify anything or if I took some piece of knowledge for granted that needs to be said before this can be solved. Many thanks. I also included a second examples below that illustrates the problem of splitting on ','.

candidates(8,1,0,7,[ev(-875,'C0003501','Aortic Valve','Aortic valve structure',[aortic,valve],[bpoc])])

Edit: The object doesn't need to be a list, I guess. A tuple of the same format works well. Just as long as I can consistently reference one index for the info I need. Thanks!

Edit 2: I use python 2.7.6

3

There are 3 answers

3
Christian Aichinger On

If your data is always formatted the same way, the quickest way is to use regular expressions (module re), if you know how to.

Otherwise, and this is quite an unsightly hack, you can try to "parse" the data using eval. Here's an example:

eval_globals = {
    "candidates": lambda *args: args,
    "ev": lambda *args: args,
    "aortic": "aortic",
    "valve": "valve",
    "bpoc": "bpoc",
    # Add more of the keywords you need here
}
result = eval(line, eval_globals)        
0
ssm On

Just split on [. So you can do s.split('[')[-1].split(']')[0] where s is a line from the file ...

1
roippi On

You have a nested grammar that you're trying to parse. Albeit it is narrowly scoped, so regex could be constructed to deal with it, but it's going to be fragile. Like, really fragile.

Try using ast. This gets a little complex so I'll try to walk (haha) through an an example. If you want the tl;dr, skip to the middle/end.

We're looking for a name in a list node, so we can start there.

import ast

s = "candidates(6,1,0,5,[ev(-1000,'C0009814','Stenosis','Acquired stenosis',[stenosis],[patf])])"

mod = ast.parse(s)

for node in ast.walk(mod):
    if isinstance(node, ast.List):
        print(node, list(ast.iter_child_nodes(node)))

<_ast.List object at 0xb3f2ddec> [<_ast.Call object at 0xb3f2de0c>, <_ast.Load object at 0xb712756c>]
<_ast.List object at 0xb3f2deec> [<_ast.Name object at 0xb3f2df0c>, <_ast.Load object at 0xb712756c>]
<_ast.List object at 0xb3f2df2c> [<_ast.Name object at 0xb3f2df4c>, <_ast.Load object at 0xb712756c>]

We see that there are three ast.List nodes in our syntax tree. The first one is going to be the outer list that calls ev, and the two inner ones are going to contain those bare ast.Name nodes. That's what we want to get at - you specifically want the second one.


tl;dr skips here

We can make this all a lot more straightforward, I'm just walking through how I personally explored this syntax tree. Here's a one-ish-liner:

s = "candidates(6,1,0,5,[ev(-1000,'C0009814','Stenosis','Acquired stenosis',[stenosis],[patf])])"

mod = ast.parse(s)

[next(ast.iter_fields(node)) for node in ast.walk(mod) if isinstance(node, ast.Name)]
Out[62]: [('id', 'candidates'), ('id', 'ev'), ('id', 'stenosis'), ('id', 'patf')]

So just grab the second index of the last element of that, there's your string. This approach works for your other example too:

s = "candidates(8,1,0,7,[ev(-875,'C0003501','Aortic Valve','Aortic valve structure',[aortic,valve],[bpoc])])"

mod = ast.parse(s)

[next(ast.iter_fields(node)) for node in ast.walk(mod) if isinstance(node, ast.Name)]
Out[65]: 
[('id', 'candidates'),
 ('id', 'ev'),
 ('id', 'aortic'),
 ('id', 'valve'),
 ('id', 'bpoc')]

You can use this approach to grab really any element you want out of that syntax tree. Just explore the output of ast.walk with ast.iter_fields and ast.iter_child_nodes.