C++ sort same IPs together, web log

467 views Asked by At

I need to sort web log file by IP, so I need to connect same IPs under next. I'm lazy but I want to learn ways in C++ so I don't want to sort it in excel. I did some changes in log so for example after IP in every line is (8 q [symbols] { qqqqqqqq }) after that goes another address - so I can sort string in lines by numbers for every string, because IPs don't have same length - so i need to give only 16 characters in line to array and compare - at least I thought it would be good idea.

Example of log:

85.xx.xx.58 qqqqqqqq    85.xx.xx.58.xxxxxxxxx   bla,bla,bla,bla,
105.216.xx.xx   qqqqqqqq    - bla,bla,bla,bla,bla,bla,bla,
85.xx.xx.58 qqqqqqqq    85.xx.xx.58.xxxxxxxxx   bla,bla,bla,bla,

Log have more than 60 000 lines, and I used C++ to erase robot.txt, .js, .gif, .jpg etc. lines so I kind of want to recycle old code. example for "robot.txt" delete-line.

#include <iostream>
#include <string>
#include <fstream>

using namespace std;

int main()
{
ifstream infile("C:\\ips.txt");
ofstream myfile;
string line;

while (getline(infile, line)) {

    myfile.open("C:\\ipout.txt");

    for (string line; getline(infile, line); ) {
        if (line.find("robots.txt") != string::npos)
                myfile << line << "\n";
    }
}

infile.close();
myfile.close();

cout << " \n";
cin.get();

return 0;
}

I know this code looks horrible but it did its work, I'm still learnig, and of course I want to have old file, and another file (new).

I found help around this topic, but it was kind of off the road for me...

I'm thinking about changing "if" statement to read only 16 characters, compare them and connect them (under each other, to lines) of course the whole line should be intact - if it is possible.

2

There are 2 answers

0
Ted Lyngmo On BEST ANSWER

I'm not sure I really understood the log format but I guess you can adapt this to fit your needs.

This assumes a line based log format where each line starts with the key that you want to group on (the ip number for example). It uses an unordered_map, but you can try a normal map too. The key in the map is the IP number and the rest of the line will be put in a vector of strings.

#include <iostream>
#include <vector>
#include <sstream>
#include <unordered_map>

// alias for the map
using logmap = std::unordered_map<std::string, std::vector<std::string>>;

logmap readlog(std::istream& is) {
    logmap rv;
    std::string line;
    while(std::getline(is, line)) {
        // put the line in a stringstream to extract ip and the rest
        std::stringstream ss(line);
        std::string ip;
        std::string rest;
        ss >> ip >> std::ws;
        std::getline(ss, rest);
        // add your filtering here 
        // put the entry in the map using ip as key
        rv[ip].push_back(rest);
    }
    return rv;
}

int main() {
    logmap lm = readlog(std::cin);
    for(const auto& m : lm) {
        std::cout << m.first << "\n";
        for(const auto& l : m.second) {
            std::cout << " " << l << "\n";
        }
    }
}

Given this input:

127.0.0.1 first ip first line
192.168.0.1 first line of second ip
127.0.0.1 this is the second for the first ip
192.168.0.1 second line of second ip
127.0.0.1 and here's the third for the first
192.168.0.1 third line of second ip

This is a possible output:

192.168.0.1
 first line of second ip
 second line of second ip
 third line of second ip
127.0.0.1
 first ip first line
 this is the second for the first ip
 and here's the third for the first
1
Miki On

Thank you for post and code, it was helpful and I learned new things. You're right my description of what I wanted is kind of strange, but I allowed myself to alter your code for my needs. So for ppl looking for this kind of web log alteration I will share this code.

#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <sstream>
#include <unordered_map>

using namespace std;

using logmap = std::unordered_map<std::string, std::vector<std::string>>;

logmap readlog(std::istream& is) {
logmap rv;
std::string line;
while (std::getline(is, line)) {
    // put the line in a stringstream to extract ip and the rest
    std::stringstream ss(line);
    std::string ip;
    std::string rest;
    ss >> ip >> std::ws;
    std::getline(ss, rest);
    // add your filtering here 
    // put the entry in the map using ip as key
    rv[ip].push_back(rest);
}
return rv;
}

int main() {

ifstream infile("C:\\ips.txt");
ofstream myfile;
myfile.open("C:\\ipout.txt");
long nr = 0;

logmap lm = readlog(infile);
for (const auto& m : lm) {
    nr++;
    for (const auto& l : m.second){
        myfile << nr << " " << m.first << " " << l << "\n";
    }
}
infile.close();
myfile.close();
std::cout << "Enter ! \n";
std::cin.get();

return 0;
}

Input (ips.txt) - web log file:

1.2.3.4     qqqqqqqq    GET" line code, code,code,code,code,code,code,
5.6.7.8     qqqqqqqq    code,code,code,code,code,code,code,code,tygy
9.10.11.12  qqqqqqqq    all
1.2.3.4     qqqqqqqq    GET" line code, code,code,code,code,code,code,6fg
3.6.7.2     qqqqqqqq    GET" line code,
5.6.7.8     qqqqqqqq    code,code,code,code,code,code,code,code,s5
1.2.3.4     qqqqqqqq    GET" line code, code,code,code,code,code,code,
9.10.11.12  qqqqqqqq    all

Output of code (ipout.txt) :

1 5.6.7.8 qqqqqqqq  code,code,code,code,code,code,code,code,tygy
1 5.6.7.8 qqqqqqqq  code,code,code,code,code,code,code,code,s5
2 1.2.3.4 qqqqqqqq  GET" line code, code,code,code,code,code,code,
2 1.2.3.4 qqqqqqqq  GET" line code, code,code,code,code,code,code,6fg
2 1.2.3.4 qqqqqqqq  GET" line code, code,code,code,code,code,code,
3 9.10.11.12 qqqqqqqq   all
3 9.10.11.12 qqqqqqqq   all
4 3.6.7.2 qqqqqqqq  GET" line code,

And my first code from 1. question, can help you delete unwanted lines.

So once more Thank you my hero >> Ted Lyngmo <<, live long and prosper :-).