I have a problem as I am loading my data to pandas using Jupyterlab running with Anaconda3 as my VM suddenly went down. After it was up, I found that my code doesn't work anymore for some reason. Here is my code:
awsc = pd.DataFrame()
json_pattern = os.path.join('logs_old/AWSCloudtrailLog/','*')
file_list = glob.glob(json_pattern)
for file in file_list:
data = pd.read_json(file, lines=True)
awsc = awsc.append(data, ignore_index = True)
awsc = pd.concat([awsc, pd.json_normalize(awsc['userIdentity'])], axis=1).drop('userIdentity', 1)
awsc.rename(columns={'type':'userIdentity_type',
'principalId':'userIdentity_principalId',
'arn':'userIdentity_arn',
'accountId':'userIdentity_accountId',
'accessKeyId':'userIdentity_accessKeyId',
'userName':'userIdentity_userName',}, inplace=True)
When I run the code it gave me the KeyError message like this:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/anaconda3/envs/environment/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2888 try:
-> 2889 return self._engine.get_loc(casted_key)
2890 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'userIdentity'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-9-efd1d2e600a5> in <module>
1 # unpack nested json
2
----> 3 awsc = pd.concat([awsc, pd.json_normalize(awsc['userIdentity'])], axis=1).drop('userIdentity', 1)
4 awsc.rename(columns={'type':'userIdentity_type',
5 'principalId':'userIdentity_principalId',
~/anaconda3/envs/environment/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
~/anaconda3/envs/environment/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2889 return self._engine.get_loc(casted_key)
2890 except KeyError as err:
-> 2891 raise KeyError(key) from err
2892
2893 if tolerance is not None:
KeyError: 'userIdentity'
The output of the dataframe awsc as i run print(awss.info()) or print(awsc.info()):
<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Empty DataFrameNone
Any solution to solve this issue? Does the problem come from the Pandas or Anaconda?
Using Code from OP
awsc
is empty.pd.read_json(file, lines=True)
is the correct method to use.pd.json_normalize(awsc['userIdentity'])
will work on a column ofdicts
. It's more than likely the column is strings though.dicts
arestr
type, useast.literal_eval
to convert them todict
type.New Code with Sample Data
.json_normalize
to read the logs, normalizes'userIdentity'
, so a second step is not required.Sample Data
test.json
test2.json