Turn a json list to a data frame

258 views Asked by At

I have a json list that its saved as text, and i'm trying to convert it into a data frame.List looks like that:

`{"posts": {
        "data": [
          {
            "comments": {
              "data": [
                {
                  "created_time": "2020-01-25T16:48:03+0000",
                  "message": "I love all kind of art and paintings. However 19 thousand dollars for a painting is entirely too much!!. ",
                  "id": "1579832716452874_1373756966159579"
                },
                {
                  "created_time": "2020-01-25T15:21:29+0000",
                  "message": "The wind blows a piece of paper and lands next to your house. You unravel it, \"Hey hope you are having a nice day!\"...",
                  "id": "1579832716452874_1373704542831488"
                }
              ]
            }
          }
        ]
      }
    }`

and so on. I would like for my data frame to be split in the following columns:

`created_time|message|id`

along with their respective data. I have tried the following command but with no success as i'm getting the same exact output:

` df <- data.frame(matrix(unlist(data), ncol=length(data), byrow = FALSE))`

Also because the data are saved as text, the json packages (rjson,jsonlite) won't work. Any suggestions will be greatly appreciated.

1

There are 1 answers

0
Brian On BEST ANSWER

The issue you seemed to be having was with the quotation marks. If you have your JSON object saved as a text file, {jsonlite} will automatically escape those characters when reading the file. The defaults for fromJSON flatten lists into dataframes when possible, which is what you wanted.

x <- jsonlite::fromJSON("complex_json.json")

x_df <- x$posts$data[1]$comments$data[[1]]

tibble::as_tibble(x_df)  # tibble is for pretty-printing purposes only
# A tibble: 2 x 3
  created_time        message                                                          id                   
  <chr>               <chr>                                                            <chr>                
1 2020-01-25T16:48:0~ "I love all kind of art and paintings. However 19 thousand doll~ 1579832716452874_137~
2 2020-01-25T15:21:2~ "The wind blows a piece of paper and lands next to your house. ~ 1579832716452874_137~

The only tricky bit is figuring out the x$posts$data[1]$comments$data[[1]]. I always do that interactively at the console, occasionally checking str() of the resulting object to see where there are hidden levels of nesting that aren't printing clearly.