I need to sort web log file by IP, so I need to connect same IPs under next. I'm lazy but I want to learn ways in C++ so I don't want to sort it in excel. I did some changes in log so for example after IP in every line is (8 q [symbols] { qqqqqqqq }) after that goes another address - so I can sort string in lines by numbers for every string, because IPs don't have same length - so i need to give only 16 characters in line to array and compare - at least I thought it would be good idea.
Example of log:
85.xx.xx.58 qqqqqqqq 85.xx.xx.58.xxxxxxxxx bla,bla,bla,bla,
105.216.xx.xx qqqqqqqq - bla,bla,bla,bla,bla,bla,bla,
85.xx.xx.58 qqqqqqqq 85.xx.xx.58.xxxxxxxxx bla,bla,bla,bla,
Log have more than 60 000 lines, and I used C++ to erase robot.txt, .js, .gif, .jpg etc. lines so I kind of want to recycle old code. example for "robot.txt" delete-line.
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
ifstream infile("C:\\ips.txt");
ofstream myfile;
string line;
while (getline(infile, line)) {
myfile.open("C:\\ipout.txt");
for (string line; getline(infile, line); ) {
if (line.find("robots.txt") != string::npos)
myfile << line << "\n";
}
}
infile.close();
myfile.close();
cout << " \n";
cin.get();
return 0;
}
I know this code looks horrible but it did its work, I'm still learnig, and of course I want to have old file, and another file (new).
I found help around this topic, but it was kind of off the road for me...
I'm thinking about changing "if" statement to read only 16 characters, compare them and connect them (under each other, to lines) of course the whole line should be intact - if it is possible.
I'm not sure I really understood the log format but I guess you can adapt this to fit your needs.
This assumes a line based log format where each line starts with the key that you want to group on (the ip number for example). It uses an
unordered_map
, but you can try a normalmap
too. The key in the map is the IP number and the rest of the line will be put in a vector of strings.Given this input:
This is a possible output: