Load a log file with multiple JSON datasets into MongoDB

98 views Asked by At

Warning - I'm new to MongoDB and JSON.

I've a log file which contain JSON datasets. A single file has multiple JSON formats as it is capturing clickstream data. Here is an example of one log file.

[
        {  
           "username":"",
           "event_source":"server",
           "name":"course.activated",
           "accept_language":"",
           "time":"2016-10-12T01:02:07.443767+00:00",
           "agent":"python-requests/2.9.1",
           "page":null,
           "host":"courses.org",
           "session":"",
           "referer":"",
           "context":{  
              "user_id":null,
              "org_id":"X",
              "course_id":"3T2016",
              "path":"/api/enrollment"
           },
           "ip":"160.0.0.1",
           "event":{  
              "course_id":"3T2016",
              "user_id":11,
              "mode":"audit"
           },
           "event_type":"activated"
        },
        {  
           "username":"VTG",
           "event_type":"/api/courses/3T2016/",
           "ip":"161.0.0.1",
           "agent":"Mozilla/5.0",
           "host":"courses.org",
           "referer":"http://courses.org/16773734",
           "accept_language":"en-AU,en;q=0.8,en-US;q=0.6,en;q=0.4",
           "event":"{\"POST\": {}, \"GET\": {}}",
           "event_source":"server",
           "context":{  
              "course_user_tags":{  

              },
              "user_id":122,
              "org_id":"X",
              "course_id":"3T2016",
              "path":"/api/courses/3T2016/"
           },
           "time":"2016-10-12T00:51:57.756468+00:00",
           "page":null
        }
    ]

Now I want to store this data in MongoDB. So here are my novice questions:

  • Do I need to parse the file and then split it into 2 datasets before storing in MongoDB? If yes, then is here a simple program to do this as my file has multiple dataset formats?
  • Is there some magic in MongoDB that can split the various datasets when we upload it?
1

There are 1 answers

2
karthi On

First of all you have invalid json format, Make sure your json being formatted as I have cite below. After Successfully having your json data you can perform Mongodb restore option to insert your data back to database.

mongorestore --host hostname --port 27017 --dir pathtojsonfile --db <database_name_to_restore>

Fo more information refer https://docs.mongodb.com/manual/reference/program/mongorestore/

Formatted json

[
        {  
           "username":"",
           "event_source":"server",
           "name":"course.activated",
           "accept_language":"",
           "time":"2016-10-12T01:02:07.443767+00:00",
           "agent":"python-requests/2.9.1",
           "page":null,
           "host":"courses.org",
           "session":"",
           "referer":"",
           "context":{  
              "user_id":null,
              "org_id":"X",
              "course_id":"3T2016",
              "path":"/api/enrollment"
           },
           "ip":"160.0.0.1",
           "event":{  
              "course_id":"3T2016",
              "user_id":11,
              "mode":"audit"
           },
           "event_type":"activated"
        },
        {  
           "username":"VTG",
           "event_type":"/api/courses/3T2016/",
           "ip":"161.0.0.1",
           "agent":"Mozilla/5.0",
           "host":"courses.org",
           "referer":"http://courses.org/16773734",
           "accept_language":"en-AU,en;q=0.8,en-US;q=0.6,en;q=0.4",
           "event":"{\"POST\": {}, \"GET\": {}}",
           "event_source":"server",
           "context":{  
              "course_user_tags":{  

              },
              "user_id":122,
              "org_id":"X",
              "course_id":"3T2016",
              "path":"/api/courses/3T2016/"
           },
           "time":"2016-10-12T00:51:57.756468+00:00",
           "page":null
        }
    ]