I have hundreds of proxy log files in one folder and want delete the auth_user
column from all the log files and output them to another folder.
The auth_user
column is enclosed by double quotes. The biggest problem is I can not use space characters as the text delimiter, because some log files have no space between timestamp
and auth_user
. I tried to use double quote as the text delimiter, but this leads to some weird results, since sometimes there is nothing between the pairs of double quotes.
What I have so far:
for src_name in glob.glob(os.path.join(source_dir, '*.log')):
base = os.path.basename(src_name)
dest_name = os.path.join(dest_dir,base)
with open(src_name, 'rb') as infile:
with open(dest_name, 'w') as outfile:
reader = csv.reader(infile, delimiter='"')
writer = csv.writer(outfile, delimiter='"')
for row in reader:
row[1] = ''
writer.writerow(row)
The log file is as follows (time_stamp
"auth_user"
src_ip
):
[21/Apr/2013:00:00:00 -0300]"cn=john smith,ou=central,ou=microsoft,o=com" 192.168.2.5
[21/Apr/2013:00:00:01 -0400]"jsmith" 192.168.4.5
[21/Apr/2013:00:00:01 -0400]"" 192.168.15.5
[22/Apr/2013:00:00:01 -0400]"" 192.168.4.5
[22/Apr/2013:00:00:01 -0400]"jkenndy" 192.168.14.5
I would like to change it into this (time_stamp
src_ip
):
[21/Apr/2013:00:00:00 -0300] 192.168.2.5
[21/Apr/2013:00:00:01 -0400] 192.168.4.5
[21/Apr/2013:00:00:01 -0400] 192.168.15.5
[22/Apr/2013:00:00:01 -0400] 192.168.4.5
[22/Apr/2013:00:00:01 -0400] 192.168.14.5
Assuming that each file has the structure:
Assuming that the first two lines need to be skipped: