A JSON file I need to work with imports into a dataframe with lists nested inside, before converting to a dataframe it is a list of nested dicts. The file itself is nested.
Sample JSON:
{
"State": [
{
"ts": "2018-04-11T21:37:05.401Z",
"sensor": [
"accBodyX_ftPerSec2"
],
"value": null
},
{
"ts": "2018-04-11T21:37:05.901Z",
"sensor": [
"accBodyX_ftPerSec2"
],
"value": [
-3.38919
]
},
{
"ts": "2018-04-11T21:37:05.901Z",
"sensor": [
"accBodyY_ftPerSec2"
],
"value": [
-2.004781
]
},
{
"ts": "2018-04-11T21:37:05.901Z",
"sensor": [
"accBodyZ_ftPerSec2"
],
"value": [
-34.77694
]
}
]
}
The dataframe looks like:
sensor ts value
0 [accBodyX_ftPerSec2] 2018-04-11T21:37:05.901Z [-3.38919]
1 [accBodyY_ftPerSec2] 2018-04-11T21:37:05.901Z [-2.004781]
2 [accBodyZ_ftPerSec2] 2018-04-11T21:37:05.901Z [-34.77694]
Ultimately, I'd like the remove the nesting or find a way to work with it. The goal is to extract a list of values for a given sensor name with accompanying timestamp into another dataframe for processing/plotting, something like this:
ts value
0 2018-04-11T21:37:05.901Z -3.38919
1 2018-04-11T21:37:06.401Z -3.00241
2 2018-04-11T21:37:06.901Z -3.87694
To remove the nesting I've done this but it is slow on just 100,000 rows but thankfully much faster than a for loop. (made possible thanks to this post python pandas operations on columns)
def func(row):
row.sensor = row.sensor[0]
if type(row.value) is list:
row.value = row.value[0]
return row
df.apply(func, axis=1)
For working with the nesting I'm able to extract individual values. For example this:
print( df.iloc[:,2].iloc[1][0] )
-2.004781
However, trying to return a list of values from index 0 of each list within each row results in returning just the first value:
print( df.iloc[:,2].iloc[:][0] )
-3.38919
Of course I could do this with a for loop but I know there's a way to do it with Pandas functions that I'm not able to discover yet.
You may need to just do some manual cleaning-up before reading into a DataFrame:
This loads the JSON file into a Python list of dictionaries, converts any length-1 lists into scalar values, and then loads that result into a DataFrame. That is admittedly not the most efficient means, but your other option of parsing the JSON itself is probably overkill unless the file is massive.
Finally, to convert to datetime:
You may also want to consider converting
sensorto a categorical data type to save a possibly significant amount of memory:In explicit-loop form, this would look like: