so I am creating a script that will go through a file with a certain format and rearrange it to the same format as another file. Here is a sample of the unformatted file
, 0x40a846, mov [ecx+2bh],al, 88 41 2B, , , , \par
, 0x40a849, jmp $+001775cbh (0x581e14), E9 C6 75 17 00, , , , \par
, 0x40a84e, int3, CC, , , , \par
, 0x40a84f, int3, CC, , , , \par
, 0x40a850, push esi, 56, , , , \par
, 0x40a851, mov esi,ecx, 8B F1, , , , \par
the end goal is to have each line of the file looking like this
0x40a846, 0x 88 41 2B ,"mov [ecx+2bh],al",,,
My main issue is some lines of the file only have one section of source code while others have 2, making it difficult for me to make a regular expression that will grab both of them without grabbing the code bytes on accident. I wanted to use capture groups to rearrange the information on each line. Below is my script as of now:
import csv
import string
import re, sys
file_to_change = 'testingthecodexlconverter.csv'
# = raw_input("Please specify what codexl file you would like to convert: ")
file1 = open(file_to_change, 'r+')
with file1 as f:
for line in f:
line = line[2:-12]
line = line.rstrip('\n') + ',,'
# mo = re.search(r'(.*?),.*?.*?,.*?(.*?),.*?.*?,.*?(.*?),.*?.*?,.*?(.*?)', line)
#mo = re.search(r'(.*?),.*?(.*?,.*?.*?,).*?.*?,.*?(.*?),.*?.*?,.*?(.*?)', line)
mo = re.search(r'(.*?),.*?(.*?.*?,\S*?,).*?.*?.*?,.*?(.*?),', line)
if mo:
print(mo.group(2))
Can anyone lend me a hand?
I'd use
pandas
and just rearrange the columns according to your need as it seems they are in a reasonablecsv
format. This method also allows you to visualise how you manipulate the data in your csv whilst you edit it:Your problem is a littler unclear in what data format you are exacting in each individual column.
I believe you might have missing comas in your input csv file. My suggestion is to do a search for these missing commas and add them to have a properly formatted input file.
The fastest way of course is by just splitting the string as mentioned above using
.split()
but it seems you are not sure what you are doing hence my suggestion ofpandas
for parsing.