so this should be a relatively easy question on pulling items in a list into a dataframe, but I'm stuck on something.
I have the following list (I'm showing just part of the list for you, it's far longer than this):
str(raw_jobs_list)
List of 2
$ :List of 4
..$ id : chr "3594134"
..$ score : int 1
..$ fields:List of 16
.. ..$ date :List of 3
.. .. ..$ changed: chr "2020-04-18T00:35:00+00:00"
.. .. ..$ created: chr "2020-04-07T11:15:37+00:00"
.. .. ..$ closing: chr "2020-04-17T00:00:00+00:00"
.. ..$ country :List of 1
.. .. ..$ :List of 6
.. .. .. ..$ href : chr "https://api.reliefweb.int/v1/countries/149"
.. .. .. ..$ name : chr "Mali"
.. .. .. ..$ location :List of 2
.. .. .. .. ..$ lon: num -1.25
.. .. .. .. ..$ lat: num 17.4
.. .. .. ..$ id : int 149
.. .. .. ..$ shortname: chr "Mali"
.. .. .. ..$ iso3 : chr "mli"
.. ..$ title : chr "REGIONAL MANAGER West Africa"
I tried pulling them out using:
jobs_data_df <- list.stack(list.select(raw_jobs_list,
fields$title,
fields$country$name,
fields$date$created))
Where raw_jobs_list is the list, but I get these NAs and am not sure how to get past it.
glimpse(jobs_data_df)
Rows: 2
Columns: 3
$ V1 <chr> "REGIONAL MANAGER West Africa", "Support Relief Group Public Health Advisor (Multiple Positions)"
$ V2 <lgl> NA, NA
$ V3 <chr> "2020-04-07T11:15:37+00:00", "2020-05-04T15:20:37+00:00"
It's possible there's something obvious I'm overlooking as I haven't worked much with lists before. Any ideas?
Thanks so much! C
PS. If you're interested, I'm working with this API and this is how I got there so far.
jobs <- GET(url = "https://api.reliefweb.int/v1/jobs?appname=apidoc&preset=analysis&profile=full&limit=2")
raw_jobs_list <- content(jobs)$data
The portion displayed above is a subset of the whole data; here is a portion of the first element of the list:
dput(lapply(raw_jobs_list, function(x) c(x[c("id","score")], list(fields=x[[3]][intersect(names(x[[3]]),c("date","country","title"))]))))
list(list(id = "3594134", score = 1L, fields = list(date = list(
changed = "2020-04-18T00:35:00+00:00", created = "2020-04-07T11:15:37+00:00",
closing = "2020-04-17T00:00:00+00:00"), country = list(list(
href = "https://api.reliefweb.int/v1/countries/149", name = "Mali",
location = list(lon = -1.25, lat = 17.35), id = 149L, shortname = "Mali",
iso3 = "mli")), title = "REGIONAL MANAGER West Africa")),
list(id = "3594129", score = 1L, fields = list(date = list(
changed = "2020-05-19T00:04:01+00:00", created = "2020-05-04T15:20:37+00:00",
closing = "2020-05-18T00:00:00+00:00"), title = "Support Relief Group Public Health Advisor (Multiple Positions)")))
If you look at just one element at a time, I think that
as.data.frame
does a pretty decent job. While I'll demonstrate using the abbreviated data (that I edited into your question), and the first element looks like:Shown differently (just for variety here), it's
In order to do this on all elements, we need to account for a few things:
Here's a first stab:
This also works with
data.table::rbindlist
. It does not work as well withdo.call(rbind.data.frame, ...)
, since that is less tolerant of missing names. (This it can be done without too much trouble, there are occasionally other advantages to using these two options.)Note: if you do this on the original data, R's default mechanism of displaying a
data.frame
will cramp your console with all of the text, which might be annoying. If you are already usingdplyr
ordata.table
in any of your work, both of those formats provide string-limiting, so that it is more tolerable on the console. For example, showing the whole thing:For
data.table
, I already have some options set that facilitate this. Notably, I'm currently using:At this point, you have a
data.frame
that should contain all of the data (andNA
for elements with missing fields). From here, if you don't like the nested-names convention (e.g.,fields.date.changed
), then they can be easily renamed using patterns or conventional methods.