I have a tshark's pcap file with data that I want to analyze. I would like to analyze it and export to CSV or xls file. In the tshark documentation I can see that I can either use -z
option with proper arguments or -T
together with -E
and -e
. I'm using python 3.6 on Debian machine. Currently, my command looks like this:
command="tshark -q -o tcp.relative_sequence_numbers:false -o tcp.analyze_sequence_numbers:false " \
"-o tcp.track_bytes_in_flight:false -Q -l -z diameter,avp,272,Session-Id,Origin-Host," \
"Origin-Realm,Destination-Realm,Auth-Application-Id,Service-Context-Id,CC-Request-Type,CC-Request-Number," \
"Subscription-Id,CC-Session-Failover,Destination-Host,User-Name,Origin-State-Id," \
"Multiple-Services-Credit-Control,Requested-Service-Unit,Used-Service-Unit,SN-Total-Used-Service-Unit," \
"SN-Remaining-Service-Unit,Service-Identifier,Rating-Group,User-Equipment-Info,Service-Information," \
"Route-Record,Credit-Control-Failure-Handling -r {}".format(args.input_file)
Later I'm processing it with pandas dataframe like so:
# loops adding TCP and/or UDP ports to scan traffic from
if args.tcp:
for port in args.tcp:
command += " -d tcp.port=={},diameter".format(port)
if args.udp:
for port in args.udp:
command += " -d udp.port=={},diameter".format(port)
# calling subprocess with output redirection to task variable
task = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
# a loop adding new data dictionaries to data_list
for line in task.stdout:
line = re.sub(r"'", "", line.decode("utf-8")) # firstly, decode byte string and get rid of '
# secondly, split string every whitespace or = and obtain dictionary-like list of keys, values
line = re.split(r"\s|=", line)
# convert obtained list to ordered dictionary to preserve column order
# transform list to dictionary so that each i item is dictionary key and i+1 item is it's value
dict = OrderedDict(line[i:i+2] for i in range(0, len(line)-2, 2))
data_list.append(dict)
# remove last 4 dictionaries (last 4 lines of task.stdout)
data_list = data_list[:-4]
df = pd.DataFrame(data_list).fillna("-") # create data frame from list of dicts and fill each NaN with "-"
df.to_excel("{}.xls".format(args.output_file), index=False)
print("Please remember that 'frame' column may not correspond to row index!")
When I open output file I can see that it works ok, except the fact that in e.g. CC-Request-Number
I have numeric values instead of string representation, that is e.g. in Wireshark I have data like this:
and in the output excel file in the CC-Request-Number
column I can see 3
in the row corresponding to this packet, instead of TERMINATION-REQUEST
.
My question is: how can I translate this number to its string representation, while using -z
option, or (as I can guess from what I've seen on the web) how can I get fields mentioned above with their values using -T
and -e
command? I listed all available fields with tshark -G
but there are too many of them and I can't think of any reasonable way to find the ones that I want.
Thanks to John Zwick's suggestion, this answer and Python documentation on The ElementTree XML API I implemented code presented below (I downloaded dictionary.xml and chargecontrol.xml from official Wireshark Github repository):