I have a shell command that return rows like
timestamp=1511270820724797892 eventID=1511270820724797892 eventName="corvil_request_summary" channelID="HTTP: Other" channelDir=false classID="class-default" packetID=2809419165205232 messageOffset=1 warnCSMInvalidSample=false warnCSMOverflow=false warnEventInvalidSample=false Server="nginx/1.10.1" Method="GET" RequestURI="/system/varlogmessages/" UserAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" WebSite="backup-server-new" Domain="backup-server-new" SrcIP="172.20.1.13" SrcPort="80" DstIP="172.18.4.181" DstPort="60065"
timestamp=1511270820735795372 eventID=1511270820735795372 eventName="corvil_request_summary" channelID="HTTP: Other" channelDir=false classID="class-default" packetID=2809419176202992 messageOffset=1 warnCSMInvalidSample=false warnCSMOverflow=false warnEventInvalidSample=false Server="probe" Method="GET" RequestURI="/system/status" WebSite="probe609:8111" Domain="probe609:8111" SrcIP="172.20.2.109" SrcPort="8111" DstIP="172.18.4.96" DstPort="49714"
I am trying to read it as:
for i, row in enumerate(csv.reader(execute(cmd), delimiter=' ', skipinitialspace=True)):
print i, len(row)
if i > 10:
break
but this is not working correctly as white spaces inside quotes are not ignored. For example channelID="HTTP: Other" is split as two variables because of the space between HTTP: and Other
What is the right way to parse this type of input?
This is hackish, but it strikes me that the rules here are similar to those for parsing attributes in an HTML tag.
Result: