Python - pprint large dict nicely?

8k views Asked by At

I'm trying to pretty print dictionary which could get quite big and I want to make it as readable as possible. Though it still does not look like how I would manually write it (with new lines and indentation when needed).

I'm trying it to have format like this (2 spaces indentation):

{
  'a': {
         '1': [],
         '2': []
       },
  'b': {
         '1': [],
         '2': [],
       }
}

Now originally dict looks like this (without using pprint):

{'test': {'0.2.0': {'deploy': {'some.host.com': {'outputs': [], 'inputs': []}},
   'release': {'some.git': {'outputs': [], 'inputs': []}}},
  '0.1.0': {'deploy': {'some.host.com': {'outputs': [], 'inputs': []}},
   'release': {'some.git': {'outputs': [], 'inputs': []}}}},
 'stage': {'0.1.0': {'deploy': {'stage.com': {'outputs': [], 'inputs': []}},
   'release': {'stage.git': {'outputs': [], 'inputs': []}}}}}

with: pprint.pprint(my_dict), it looks like:

{'stage': {'0.1.0': {'deploy': {'stage.com': {'inputs': [], 'outputs': []}},
                     'release': {'stage.git': {'inputs': [], 'outputs': []}}}},
 'test': {'0.1.0': {'deploy': {'some.host.com': {'inputs': [], 'outputs': []}},
                    'release': {'some.git': {'inputs': [], 'outputs': []}}},
          '0.2.0': {'deploy': {'some.host.com': {'inputs': [], 'outputs': []}},
                    'release': {'some.git': {'inputs': [], 'outputs': []}}}}}

Well not that different. I tried playing with pprint.pprint options like indent, width, compact, but none seem to format the way I want. Is it possible to achieve similar formatting with pprint as I mentioned above? Or maybe there is some better tool for that?

P.S If you would suggest some other tool, it would be great to be able to write to file with that tool aswel. Cause I'm using pprint to directly write to file.

4

There are 4 answers

1
Priyank Chheda On BEST ANSWER

You can do that with JSON module

import json
_d = {'a': {'1': [],'2': []},'b': {'1': [],'2': [],}}
print json.dumps(_d, indent=2)
0
Albert On

I also needed that, and was not satisfied with the original pprint. Specifically, I wanted it to do normal indentation (by 2 or 4 spaces), and not indent in the way pprint does it.

Specifically, for some dict, I got this output with the original pprint:

  {'melgan': {'class': 'subnetwork',
              'from': 'data',
              'subnetwork': {'l0': {'axes': 'spatial',
                                    'class': 'pad',
                                    'from': 'data',
                                    'mode': 'reflect',
                                    'padding': (3, 3)},
                             'la1': {'activation': None,
                                     'class': 'conv',
                                     'dilation_rate': (1,),
                                     'filter_size': (7,),
                                     'from': 'l0',
                                     'n_out': 384,
                                     'padding': 'valid',
                                     'strides': (1,),
                                     'with_bias': True},
                             'lay2': {'class': 'eval',
                                      'eval': 'tf.nn.leaky_relu(source(0), '
                                              'alpha=0.2)',
                                      'from': 'la1'},
                             'layer3_xxx': {'activation': None,
                                            'class': 'transposed_conv',
                                            'filter_size': (10,),
                                            'from': 'lay2',
                                            'n_out': 192,
                                            'output_padding': (1,),
                                            'padding': 'valid',
                                            'remove_padding': (3,),
                                            'strides': (5,),
                                            'with_bias': True},
                             'output': {'class': 'copy', 'from': 'layer3_xxx'}}},
   'output': {'class': 'copy', 'from': 'melgan'}}

But I wanted it to be like this:

  {
    'melgan': {
      'class': 'subnetwork',
      'from': 'data',
      'subnetwork': {
        'l0': {'class': 'pad', 'mode': 'reflect', 'axes': 'spatial', 'padding': (3, 3), 'from': 'data'},
        'la1': {
          'class': 'conv',
          'from': 'l0',
          'activation': None,
          'with_bias': True,
          'n_out': 384,
          'filter_size': (7,),
          'padding': 'valid',
          'strides': (1,),
          'dilation_rate': (1,)
        },
        'lay2': {'class': 'eval', 'eval': 'tf.nn.leaky_relu(source(0), alpha=0.2)', 'from': 'la1'},
        'layer3_xxx': {
          'class': 'transposed_conv',
          'from': 'lay2',
          'activation': None,
          'with_bias': True,
          'n_out': 192,
          'filter_size': (10,),
          'strides': (5,),
          'padding': 'valid',
          'output_padding': (1,),
          'remove_padding': (3,)
        },
        'output': {'class': 'copy', 'from': 'layer3_xxx'}
      }
    },
    'output': {'class': 'copy', 'from': 'melgan'}
  }

There is the Python Rich library, which also provides an own pprint variant (from rich.pretty import pprint), which is close to what I want.

And there is also pprintpp, which also is close to that.

I implemented an own very simple variant here. Code:

from typing import Any
import sys
import numpy


def pprint(o: Any, *, file=sys.stdout,
           prefix="", postfix="",
           line_prefix="", line_postfix="\n") -> None:
  if "\n" in line_postfix and _type_simplicity_score(o) <= _type_simplicity_limit:
    prefix = f"{line_prefix}{prefix}"
    line_prefix = ""
    postfix = postfix + line_postfix
    line_postfix = ""

  def _sub_pprint(o: Any, prefix="", postfix="", inc_indent=True):
    multi_line = "\n" in line_postfix
    if not multi_line and postfix.endswith(","):
      postfix += " "
    pprint(
      o, file=file, prefix=prefix, postfix=postfix,
      line_prefix=(line_prefix + "  " * inc_indent) if multi_line else "",
      line_postfix=line_postfix)

  def _print(s: str, is_end: bool = False):
    nonlocal prefix  # no need for is_begin, just reset prefix
    file.write(line_prefix)
    file.write(prefix)
    file.write(s)
    if is_end:
      file.write(postfix)
    file.write(line_postfix)
    if "\n" in line_postfix:
      file.flush()
    prefix = ""

  def _print_list():
    for i, v in enumerate(o):
      _sub_pprint(v, postfix="," if i < len(o) - 1 else "")

  if isinstance(o, list):
    if len(o) == 0:
      _print("[]", is_end=True)
      return
    _print("[")
    _print_list()
    _print("]", is_end=True)
    return

  if isinstance(o, tuple):
    if len(o) == 0:
      _print("()", is_end=True)
      return
    if len(o) == 1:
      _sub_pprint(o[0], prefix=f"{prefix}(", postfix=f",){postfix}", inc_indent=False)
      return
    _print("(")
    _print_list()
    _print(")", is_end=True)
    return

  if isinstance(o, set):
    if len(o) == 0:
      _print("set()", is_end=True)
      return
    _print("{")
    _print_list()
    _print("}", is_end=True)
    return

  if isinstance(o, dict):
    if len(o) == 0:
      _print("{}", is_end=True)
      return
    _print("{")
    for i, (k, v) in enumerate(o.items()):
      _sub_pprint(v, prefix=f"{k!r}: ", postfix="," if i < len(o) - 1 else "")
    _print("}", is_end=True)
    return

  if isinstance(o, numpy.ndarray):
    _sub_pprint(
      o.tolist(),
      prefix=f"{prefix}numpy.array(",
      postfix=f", dtype=numpy.{o.dtype}){postfix}",
      inc_indent=False)
    return

  # fallback
  _print(repr(o), is_end=True)


def pformat(o: Any) -> str:
  import io
  s = io.StringIO()
  pprint(o, file=s)
  return s.getvalue()


_type_simplicity_limit = 120.  # magic number


def _type_simplicity_score(o: Any, _offset=0.) -> float:
  """
  :param Any o:
  :param float _offset:
  :return: a score, which is a very rough estimate of len(repr(o)), calculated efficiently
  """
  _spacing = 2.
  if isinstance(o, bool):
    return 4. + _offset
  if isinstance(o, (int, numpy.integer)):
    if o == 0:
      return 1. + _offset
    return 1. + numpy.log10(abs(o)) + _offset
  if isinstance(o, str):
    return 2. + len(o) + _offset
  if isinstance(o, (float, complex, numpy.number)):
    return len(repr(o)) + _offset
  if isinstance(o, (tuple, list, set)):
    for x in o:
      _offset = _type_simplicity_score(x, _offset=_offset + _spacing)
      if _offset > _type_simplicity_limit:
        break
    return _offset
  if isinstance(o, dict):
    for x in o.values():  # ignore keys...
      _offset = _type_simplicity_score(x, _offset=_offset + 10. + _spacing)  # +10 for key
      if _offset > _type_simplicity_limit:
        break
    return _offset
  if isinstance(o, numpy.ndarray):
    _offset += 10.  # prefix/postfix
    if o.size * 2. + _offset > _type_simplicity_limit:  # too big already?
      return o.size * 2. + _offset
    if str(o.dtype).startswith("int"):
      a = _type_simplicity_score(numpy.max(numpy.abs(o))) + _spacing
      return o.size * a + _offset
    a = max([_type_simplicity_score(x) for x in o.flatten()]) + _spacing
    return o.size * a + _offset
  # Unknown object. Fallback > _type_simplicity_limit.
  return _type_simplicity_limit + 1. + _offset
1
Juan Diego Godoy Robles On

Use the JSON library.

Example

>>> my_dict = {'test': {'0.2.0': {'deploy': {'some.host.com': {'outputs': [], 'inputs': []}},
   'release': {'some.git': {'outputs': [], 'inputs': []}}},
  '0.1.0': {'deploy': {'some.host.com': {'outputs': [], 'inputs': []}},
   'release': {'some.git': {'outputs': [], 'inputs': []}}}},
 'stage': {'0.1.0': {'deploy': {'stage.com': {'outputs': [], 'inputs': []}},
   'release': {'stage.git': {'outputs': [], 'inputs': []}}}}}... ... ... ... ...
>>>
>>> import json
>>> print(json.dumps(my_dict, indent=2))
{
  "test": {
    "0.2.0": {
      "deploy": {
        "some.host.com": {
          "outputs": [],
          "inputs": []
        }
      },
      "release": {
        "some.git": {
          "outputs": [],
          "inputs": []
        }
      }
    },
    "0.1.0": {
      "deploy": {
        "some.host.com": {
          "outputs": [],
          "inputs": []
        }
      },
      "release": {
        "some.git": {
          "outputs": [],
          "inputs": []
        }
      }
    }
  },
  "stage": {
    "0.1.0": {
      "deploy": {
        "stage.com": {
          "outputs": [],
          "inputs": []
        }
      },
      "release": {
        "stage.git": {
          "outputs": [],
          "inputs": []
        }
      }
    }
  }
}
0
Karl On

A bit hacky, and doesn't win any pricing for generalizing to other problems (that could be fixed with some effort though), but you could consider something like this as well. It will print a result that is much more compact than the json format:

d = {'test': {'0.2.0': {'deploy': {'some.host.com': {'outputs': [], 'inputs': []}},
   'release': {'some.git': {'outputs': [], 'inputs': []}}},
  '0.1.0': {'deploy': {'some.host.com': {'outputs': [], 'inputs': []}},
   'release': {'some.git': {'outputs': [], 'inputs': []}}}},
 'stage': {'0.1.0': {'deploy': {'stage.com': {'outputs': [], 'inputs': []}},
   'release': {'stage.git': {'outputs': [], 'inputs': []}}}}}

print(pd.DataFrame({
    (i,j, k, l, m): str(d[i][j][k][l][m])
    for i in d.keys() 
    for j in d[i].keys() 
    for k in d[i][j].keys() 
    for l in d[i][j][k].keys() 
    for m in d[i][j][k][l].keys()
}, index = [0]
).T