I need to parse apache-access log files which has 16 space delimited columns, that is,
xyz abc ... ... home?querystring
I need to count total number of hits for each page in that file, that is, total number of home page hits ignoring querystring
For few lines the url is column 16 and for other its 14 or 15. Hence I need to parse each line in reverse order (get the last column, ignore query string of the last column, aggregate page hits)
I am new to linux, shell scripting. How do I approach this, do I have to look into awk or shell scripting. Can you give a small sample code that would perform such task.
ANSWER: perl one liner solved the problem
perl -lane | scalar array
Well for starters, if you are only interested in working on columns 14-16, I would start by running
Note: there are two spaces after the d\
You can then pretty easily just count up the urls that you see. I also think this would be solved a lot easier using a few lines of python or perl.