Most simplified form of the following regex / Extracting all values from nvidia-smi output

271 views Asked by At

I am trying to analyze very large text string in Python containing nvidia-smi outputs but I really want to spend more time analyzing the data than working on my regex skills. I got the regex as follows but it takes forever in some rows (it might be the variation of input data in some rows), but I thought maybe my regex pattern is very compute-intensive as well.

extracted_line1 = r'[=]*[+][=]*[+][=]*\|\n\|(\s+(.*?)\|)+\n\|(\s+(.*?)\|)(\s+(.*?)\|)(\s+(.*?)\|)\n\|'

This pattern matches the third row in the table.

This one down below ⬇️

 ===============================+======================+======================|
|   0  GeForce GTX 1080    On   | 00000000:04:00.0 Off |                  N/A |
| 27%   20C    P8     6W / 180W |      2MiB /  8119MiB |      0%   E. Process |
|                               |                      |                  N/A |

It works for most rows but randomly hangs for some rows. What would be a more simplified version of this regex expression? Or maybe a better question is what is the best approach to grab each of the values in this table for every row (corresponding metrics for each GPU)?

Truncated input string is here

... bunch of text
nvidia-smi:
Tue Jun  8 15:00:02 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80       Driver Version: 460.80       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    On   | 00000000:04:00.0 Off |                  N/A |
| 27%   20C    P8     6W / 180W |      2MiB /  8119MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    On   | 00000000:05:00.0 Off |                  N/A |
| 27%   23C    P8     6W / 180W |      2MiB /  8119MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

... bunch of text

P.S I am trying to extract the following values

        gpu_index = [processed result of regex output here]
        gpu_model_name = [processed result of regex output here]
        persistance_mode = [processed result of regex output here]
        bus_id = [processed result of regex output here]
        display_active = [processed result of regex output here]
        volatile_ecc = [processed result of regex output here]
        fan = [processed result of regex output here]
        temperature = [processed result of regex output here]
        perf = [processed result of regex output here]
        power_usage =  [processed result of regex output here]
        max_power = [processed result of regex output here]
        memory_usage = [processed result of regex output here] 
        available_mem = [processed result of regex output here] 
        gpu_utilization = [processed result of regex output here]
        compute_mode = [processed result of regex output here]
        multiple_instance_gpu_mode = [processed result of regex output here]
1

There are 1 answers

1
Pavel Gomon On BEST ANSWER

I suggest another pattern, easier on your machine's resources.

Pattern

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB)\s+\/\s+(\d+MiB)\s+\|\s+(\d+\%)\s+(.*?)\s+\|

First of all, I got rid of all the starting pattern of finding = or + chars because regex knows how to find the stuff you instruct it to find. No 'helper' handles needed.

Next I found that you only need to grab a hold on english chars \w, digits \d and whitespaces \s and so the whole pattern was pretty easy to write.

Explanation

I'm building the whole pattern match group by match group until reaching the final result. Please notice each explanation is only valid for the last match group i.e. (some ReGex expresion in parantesis)

(\d+%) will match any number of digits followed by %

(\d+%)\s+(\d+C) will match any number of digits after unknown amount of whitespaces, followed by the letter C

(\d+%)\s+(\d+C)\s+(\w\d) will match any single char followed by any single digit

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W) will match any number of digits after unknown amount of whitespaces, followed by a single char

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W) knowing there should by some whitespaces, then / and some other whitespaces, this expression will match any number of digits followed by W

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB) knowing there should by some whitespaces, then | and some other whitespaces, this expression will match any number of digits followed by MiB

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB)\s+\/\s+(\d+MiB) knowing there should by some whitespaces, then / and some other whitespaces, this expression will match any number of digits followed by a MiB

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB)\s+\/\s+(\d+MiB)\s+\|\s+(\d+\%) knowing there should by some whitespaces, then | and some other whitespaces, this expression will match any number of digits followed by a %

Last bit

(\d+%)\s+(\d+C)\s+(\w\d)\s+(\d+W)\s+\/\s+(\d+W)\s+\|\s+(\d+MiB)\s+\/\s+(\d+MiB)\s+\|\s+(\d+\%)\s+(.*?)\s+\| knowing there should by some whitespaces, this expression will match any number of any characters until it hits some unknown amount of whitespaces followed by |

Finally the next variables are covered:

gpu_index = not implemented
gpu_model_name = not implemented
persistance_mode = not implemented
bus_id = not implemented
display_active = not implemented
volatile_ecc = not implemented
fan = (\d+%)
temperature = (\d+C)
perf = (\w\d)
power_usage =  (\d+W)
max_power = (\d+W)
memory_usage = (\d+MiB)
available_mem = (\d+MiB)
gpu_utilization = (\d+\%)
compute_mode = (.*?)
multiple_instance_gpu_mode = not implemented