I am parsing out /proc/PID/stat
of a process. The file has input of:
25473 (firefox) S 25468 25465 25465 0 -1 4194304 149151169 108282 32 15 2791321 436115 846 86 20 0 84 0 9648305 2937786368 209665 18446744073709551615 93875088982016 93875089099888 140722931705632 140722931699424 140660842079373 0 0 4102 33572009 0 0 0 17 1 0 0 175 0 0 93875089107104 93875089109128 93875116752896 140722931707410 140722931707418 140722931707418 140722931707879 0
I came up with:
import re
def get_stats(pid):
with open('/proc/{}/stat'.format(pid)) as fh:
stats_raw = fh.read()
stat_pattern = '(\d+\s)(\(.+\)\s)(\w+\s)(-?\d+\s?)'
return re.findall(stat_pattern, stats_raw)
This will match the first three groups but only return one field for the last group of (-?\d+\s?)
:
[('25473 ', '(firefox) ', 'S ', '25468 ')]
I was looking for a way to match only set number for the last group:
'(\d+\s)(\(.+\)\s)(\w+\s)(-?\d+\s?){49}'
You cannot access each repeated capture with
re
regex. You may capture the whole rest of the string into Group 4 and then split with whitespace:Output:
If you literally need to only get 49 numbers into Group 4, use
With PyPi regex module, you may use
r'(?P<o>\d+)\s+(?P<o>\([^)]+\))\s+(?P<o>\w+)\s+(?P<o>-?\d+\s?){49}'
and after running aregex.search(pattern, s)
access.captures("o")
stack with the values you need.Output:
['25473', '(firefox)', 'S', '25468 ', '25465 ', '25465 ', '0 ', '-1 ', '4194304 ', '149151169 ', '108282 ', '32 ', '15 ', '2791321 ', '436115 ', '846 ', '86 ', '20 ', '0 ', '84 ', '0 ', '9648305 ', '2937786368 ', '209665 ', '18446744073709551615 ', '93875088982016 ', '93875089099888 ', '140722931705632 ', '140722931699424 ', '140660842079373 ', '0 ', '0 ', '4102 ', '33572009 ', '0 ', '0 ', '0 ', '17 ', '1 ', '0 ', '0 ', '175 ', '0 ', '0 ', '93875089107104 ', '93875089109128 ', '93875116752896 ', '140722931707410 ', '140722931707418 ', '140722931707418 ', '140722931707879 ', '0']