Pythonic way to get value defined or not defined in yaml

2.7k views Asked by At

I have a yaml file with test(s) configuration(s) and there is an optional parameter "ignore-dup-txn" in optional section "test-options".

test-name:
    test-type:        trh_txn
    test-src-format:  excel
    test-src-excel-sheet:   invalid_txns
    test-options:
      ignore-dup-txn: True

I read section "test-name" to "test" dict and for now I check it this way:

if 'test-options' in test and 'ignore-dup-txn' in test['test-options']:
    ignore_dups = test['test-options']['ignore-dup-txn']
else:
    ignore_dups = None

What would be the pythonic way to do it? More clear, simple and shorter.

I was thinking to do "getter", but if I do get(test['test-option']['ignore-dup-txn']), I will get an exception in case if option is not defined, obviously.

3

There are 3 answers

0
hspandher On BEST ANSWER

This would work:

test.get('test-options', {}).get('ignore-dup-txn', None)
1
Scott Hunter On

You can use the get method:

test['test-options'].get('ignore-dup-txn',default-value)

0
Anthon On

If you just want a "one-liner" and don't want an empty dict to be created you can do:

ignore_dups = test['test-options'].get('ignore-dup-txn') if 'test-options' in test else None

but this leads to long lines, and doesn't expand well to another level and is not very pythonic.

For something that is IMO a more pythonic first look at what happens when you have a dict and use a list as key for assignment or as first argument to .get() ¹:

d = dict()
l = ['a', 'b', 'c']
try:
    d[l] = 3
except TypeError as e:
    assert e.message == "unhashable type: 'list'"
else:
    raise NotImplementedError
try:
    d.get(l, None)
except TypeError as e:
    assert e.message == "unhashable type: 'list'"
else:
    raise NotImplementedError

That means some_dict.get(['a', 'b', 'c'], default) will throw a TypeError. on the other hand that is rather a nice concise syntax to get a value from a dict within a dict within ... .
So the question becomes how can I get such a .get() to work?

First you have to realise you cannot just replace the .get() method on a dict, you'll get an AttributeError:

d = dict()
def alt_get(key, default):
    pass
try:
    d.get = alt_get
except AttributeError as e:
    assert e.message == "'dict' object attribute 'get' is read-only"
else:
    raise NotImplementedError

So you will have to subclass dict, this allows you to override the .get() method:

class ExtendedDict(dict):
    def multi_level_get(self, key, default=None):
        if not isinstance(key, list):
            return self.get(key, default)
        # assume that the key is a list of recursively accessible dicts
        # *** using [] and not .get() in the following on purpose ***
        def get_one_level(key_list, level, d):
            if level >= len(key_list):
                if level > len(key_list):
                    raise IndexError
                return d[key_list[level-1]]
            return get_one_level(key_list, level+1, d[key_list[level-1]])

        try:
            return get_one_level(key, 1, self)
        except KeyError:
            return default

    get = multi_level_get # delete this if you don't want to mask get()
                          # you can still use the multi_level-get()

d = dict(a=dict(b=dict(c=42)))
assert d['a']['b']['c'] == 42

try:
    d['a']['xyz']['c'] == 42
except KeyError as e:
    assert e.message == 'xyz'
else:
    raise NotImplementedError

ed = ExtendedDict(d)
assert ed['a']['b']['c'] == 42
assert ed.get(['a', 'b', 'c'], 196) == 42
assert ed.get(['a', 'xyz', 'c'], 196) == 196 # no execption!

This works fine when only having dicts within dicts recursively, but also to a limited extend when you mix these with lists:

e = dict(a=[dict(c=42)])
assert e['a'][0]['c'] == 42
ee = ExtendedDict(e)
# the following works becauuse get_one_level() uses [] and not get()
assert ee.get(['a', 0, 'c'], 196) == 42
try:
    ee.get(['a', 1, 'c'], 196) == 42
except IndexError as e:
    assert e.message == 'list index out of range'
else:
    raise NotImplementedError
try:
    ee.get(['a', 'b', 'c'], 196) == 42
except TypeError as e:
    assert e.message == 'list indices must be integers, not str'
else:
    raise NotImplementedError

You can of course catch the latter two errors as well in multi_level_get() by using except (KeyError, TypeError, IndexError): and returning the default for all these cases.

In ruamel.yaml ² this multi-level-get is implemented as mlget() (which requires an extra parameter to allow lists to be part of the hierarchy):

import ruamel.yaml as yaml
from ruamel.yaml.comments import CommentedMap

yaml_str = """\
test-name:
    test-type:        trh_txn
    test-src-format:  excel
    test-src-excel-sheet:   invalid_txns
    test-options:
      ignore-dup-txn: True
"""

data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)

assert data['test-name']['test-options']['ignore-dup-txn'] is True
assert data.mlget(['test-name', 'test-options', 'ignore-dup-txn'], 42) is True
assert data.mlget(['test-name', 'test-options', 'abc'], 42) == 42

print(data['test-name']['test-src-format'])

which prints:

excel

¹ In the examples I rather use assertions to confirm what is happening than print statements and then separate explanations on what gets printed. This keeps the explanation more conscise, and in the case of assertions within try/except blocks clear that the exception was thrown, without breaking the code and prohibiting following code from being executed. All of the example example code comes from python file that runs and only prints one value.
² I am the author of that package, which is an enhanced version of PyYAML.