I am trying to analyze very large text string in Python containing nvidia-smi outputs but I really want to spend more time analyzing the data than working on my regex skills. I got the regex as follows but it takes forever in some rows (it might be the variation of input data in some rows), but I thought maybe my regex pattern is very compute-intensive as well.
extracted_line1 = r'[=]*[+][=]*[+][=]*\|\n\|(\s+(.*?)\|)+\n\|(\s+(.*?)\|)(\s+(.*?)\|)(\s+(.*?)\|)\n\|'
This pattern matches the third row in the table.
This one down below ⬇️
===============================+======================+======================|
| 0 GeForce GTX 1080 On | 00000000:04:00.0 Off | N/A |
| 27% 20C P8 6W / 180W | 2MiB / 8119MiB | 0% E. Process |
| | | N/A |
It works for most rows but randomly hangs for some rows. What would be a more simplified version of this regex expression? Or maybe a better question is what is the best approach to grab each of the values in this table for every row (corresponding metrics for each GPU)?
Truncated input string is here
... bunch of text
nvidia-smi:
Tue Jun 8 15:00:02 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80 Driver Version: 460.80 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 On | 00000000:04:00.0 Off | N/A |
| 27% 20C P8 6W / 180W | 2MiB / 8119MiB | 0% E. Process |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 On | 00000000:05:00.0 Off | N/A |
| 27% 23C P8 6W / 180W | 2MiB / 8119MiB | 0% E. Process |
| | | N/A |
+-------------------------------+----------------------+----------------------+
... bunch of text
P.S I am trying to extract the following values
gpu_index = [processed result of regex output here]
gpu_model_name = [processed result of regex output here]
persistance_mode = [processed result of regex output here]
bus_id = [processed result of regex output here]
display_active = [processed result of regex output here]
volatile_ecc = [processed result of regex output here]
fan = [processed result of regex output here]
temperature = [processed result of regex output here]
perf = [processed result of regex output here]
power_usage = [processed result of regex output here]
max_power = [processed result of regex output here]
memory_usage = [processed result of regex output here]
available_mem = [processed result of regex output here]
gpu_utilization = [processed result of regex output here]
compute_mode = [processed result of regex output here]
multiple_instance_gpu_mode = [processed result of regex output here]
I suggest another pattern, easier on your machine's resources.
Pattern
First of all, I got rid of all the starting pattern of finding
=
or+
chars because regex knows how to find the stuff you instruct it to find. No 'helper' handles needed.Next I found that you only need to grab a hold on english chars
\w
, digits\d
and whitespaces\s
and so the whole pattern was pretty easy to write.Explanation
I'm building the whole pattern match group by match group until reaching the final result. Please notice each explanation is only valid for the last match group i.e.
(some ReGex expresion in parantesis)
(\d+%)
will match any number of digits followed by%
(\d+%)\s+(\d+C)
will match any number of digits after unknown amount of whitespaces, followed by the letterC
(\d+%)\s+(\d+C)\s+(\w\d)
will match any single char followed by any single digit(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)
will match any number of digits after unknown amount of whitespaces, followed by a single char(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)
knowing there should by some whitespaces, then/
and some other whitespaces, this expression will match any number of digits followed byW
(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB)
knowing there should by some whitespaces, then|
and some other whitespaces, this expression will match any number of digits followed byMiB
(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB)\s+\/\s+(\d+MiB)
knowing there should by some whitespaces, then/
and some other whitespaces, this expression will match any number of digits followed by aMiB
(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB)\s+\/\s+(\d+MiB)\s+\|\s+(\d+\%)
knowing there should by some whitespaces, then|
and some other whitespaces, this expression will match any number of digits followed by a%
Last bit
(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB)\s+\/\s+(\d+MiB)\s+\|\s+(\d+\%)\s+(.*?)\s+\|
knowing there should by some whitespaces, this expression will match any number of any characters until it hits some unknown amount of whitespaces followed by|
Finally the next variables are covered: