I'm using Python 3, numpy and pandas. I am struggling to unwrap an array as new columns in a dataframe. I am pulling data from the Zendesk Ticketing API, and reading it as csv in Colab. Each row of the dataset is data related to one ticket. Now one of the colum contains for each row a 1D Array, which stores a list of dicts. For that column, the size of the list contained within the array can have a different length. When looking at the column in the dataframe it is like so :
Reason to this is each ticket can have different fields filled out, so the length of the list matches that. Each dict within the list has 2 entries, one being the field_id, and the other being the value that was filled out in that field, like so :
{'id': 360021329158, 'value': None}
Now if I display the value of a single cell (so the data of 1 ticket) it is like below; In the list within the array, we see all the values filled out for each field for that ticket :
array([list([{'id': 27537689, 'value': None}, {'id': 27537849, 'value': None}, {'id': 44603405, 'value': None}, {'id': 45292929, 'value': None}, {'id': 45324385, 'value': None}, {'id': 44586449, 'value': None}, {'id': 44586469, 'value': None}, {'id': 360021514497, 'value': None}, {'id': 360021522437, 'value': None}, {'id': 360022367238, 'value': None}, {'id': 360021325318, 'value': None}, {'id': 360021327878, 'value': None}, {'id': 360021329158, 'value': None}, {'id': 360021516297, 'value': None}, {'id': 360021522697, 'value': None}, {'id': 360006385678, 'value': None}, {'id': 360031461137, 'value': None}, {'id': 360021515537, 'value': None}, {'id': 4419462003217, 'value': None}, {'id': 360008990738, 'value': None}, {'id': 360009218578, 'value': None}, {'id': 360021521177, 'value': None}, {'id': 360021522457, 'value': None}, {'id': 360021325338, 'value': None}, {'id': 360021327898, 'value': None}, {'id': 360021329178, 'value': None}, {'id': 14385177693981, 'value': None}, {'id': 360021516317, 'value': None}, {'id': 6236236637981, 'value': None}, {'id': 360021518877, 'value': None}, {'id': 360021522717, 'value': None}, {'id': 360013744157, 'value': None}, {'id': 360009223197, 'value': None}, {'id': 9501450359069, 'value': None}, {'id': 6636845193501, 'value': None}, {'id': 360021481757, 'value': None}, {'id': 9693901745693, 'value': None}, {'id': 360021521697, 'value': None}, {'id': 360021483297, 'value': None}, {'id': 360014826018, 'value': None}, {'id': 360008990758, 'value': None}, {'id': 360021323818, 'value': None}, {'id': 360021327918, 'value': None}, {'id': 360021522737, 'value': None}, {'id': 360021521717, 'value': None}, {'id': 360013653558, 'value': None}, {'id': 360006386998, 'value': None}, {'id': 360021333558, 'value': None}, {'id': 45541945, 'value': None}, {'id': 360008990778, 'value': None}, {'id': 360010961978, 'value': None}, {'id': 360006385978, 'value': None}, {'id': 360021323838, 'value': None}, {'id': 360009329477, 'value': None}, {'id': 360021522757, 'value': None}, {'id': 360013744197, 'value': None}, {'id': 360021521737, 'value': None}, {'id': 360008990538, 'value': None}, {'id': 360013653578, 'value': None}, {'id': 360021328458, 'value': None}, {'id': 360022279757, 'value': None}, {'id': 360006385998, 'value': None}, {'id': 360021522257, 'value': None}, {'id': 360023554897, 'value': None}, {'id': 360021328978, 'value': None}, {'id': 360008998997, 'value': None}, {'id': 360022104918, 'value': None}, {'id': 360021522777, 'value': None}, {'id': 360000189018, 'value': None}, {'id': 360009224797, 'value': None}, {'id': 360021328478, 'value': None}, {'id': 360022067038, 'value': None}, {'id': 360021516897, 'value': None}, {'id': 360031475557, 'value': None}, {'id': 360008991078, 'value': None}, {'id': 360021326438, 'value': None}, {'id': 360021519977, 'value': None}, {'id': 360009223017, 'value': None}, {'id': 360021521517, 'value': None}, {'id': 360021522797, 'value': None}, {'id': 360023397998, 'value': None}, {'id': 360021325678, 'value': None}, {'id': 360000189038, 'value': None}, {'id': 360011006577, 'value': None}, {'id': 360021521777, 'value': None}, {'id': 360021327218, 'value': None}, {'id': 360021516917, 'value': None}, {'id': 360006419577, 'value': None}, {'id': 360021521017, 'value': None}, {'id': 360009169017, 'value': None}, {'id': 360008991098, 'value': None}, {'id': 360021326458, 'value': None}, {'id': 360021519997, 'value': None}, {'id': 360021326718, 'value': None}, {'id': 360021328258, 'value': None}, {'id': 360010262658, 'value': None}, {'id': 360009218178, 'value': None}, {'id': 360000189058, 'value': None}, {'id': 360021521797, 'value': None}, {'id': 360000186757, 'value': None}, {'id': 360023400838, 'value': None}, {'id': 360006419597, 'value': None}, {'id': 360021326478, 'value': None}, {'id': 360022280337, 'value': None}, {'id': 360021326738, 'value': None}, {'id': 360023572117, 'value': None}, {'id': 360021328278, 'value': None}, {'id': 360009218198, 'value': None}, {'id': 360000186777, 'value': None}, {'id': 11450090522269, 'value': None}, {'id': 12477907213725, 'value': None}, {'id': 6236409964189, 'value': None}, {'id': 12309478391965, 'value': None}, {'id': 9498430167965, 'value': None}, {'id': 7859465116829, 'value': None}, {'id': 7859446135453, 'value': None}, {'id': 14855067062429, 'value': None}, {'id': 360021328798, 'value': None}, {'id': 360008998817, 'value': None}, {'id': 360021326498, 'value': None}, {'id': 360021289378, 'value': None}, {'id': 360021518757, 'value': None}, {'id': 45511589, 'value': None}, {'id': 360014913957, 'value': None}, {'id': 360023572137, 'value': None}, {'id': 45542825, 'value': None}, {'id': 360009168297, 'value': None}, {'id': 360021328298, 'value': None}, {'id': 360021324718, 'value': None}, {'id': 360021327278, 'value': None}, {'id': 360021522097, 'value': None}, {'id': 360021328818, 'value': None}, {'id': 360008998837, 'value': None}, {'id': 45511349, 'value': None}, {'id': 360021325238, 'value': None}, {'id': 360021518777, 'value': None}, {'id': 360021521337, 'value': None}, {'id': 360021329338, 'value': None}, {'id': 360021520317, 'value': None}, {'id': 360009223357, 'value': None}, {'id': 360021324738, 'value': None}, {'id': 360021520837, 'value': None}, {'id': 360021325258, 'value': None}, {'id': 360021518797, 'value': None}, {'id': 360021521357, 'value': None}, {'id': 360021522637, 'value': None}, {'id': 360023571917, 'value': None}, {'id': 360021328078, 'value': None}, {'id': 360021519057, 'value': None}, {'id': 360021526737, 'value': None}, {'id': 360023572177, 'value': None}, {'id': 360021324758, 'value': None}, {'id': 360021520857, 'value': None}, {'id': 360023441117, 'value': None}, {'id': 360021325278, 'value': None}, {'id': 360000016865, 'value': None}, {'id': 360021522657, 'value': None}, {'id': 360023571937, 'value': None}, {'id': 360011002337, 'value': None}, {'id': 360021329378, 'value': None}, {'id': 360021515497, 'value': None}, {'id': 360013653738, 'value': None}, {'id': 360022256877, 'value': None}, {'id': 360021522417, 'value': None}, {'id': 360021325298, 'value': None}, {'id': 360021329138, 'value': None}, {'id': 45511669, 'value': None}, {'id': 360021526777, 'value': None}, {'id': 360006385658, 'value': None}, {'id': 360021515517, 'value': None}, {'id': 360008990718, 'value': None}, {'id': 360013653758, 'value': None}, {'id': 360021328638, 'value': None}])],
dtype=object)
I have a dataframe length of 100k + values (matching 100k+ tickets) , and for that column containing one array per row, I'd like to unwrap the content of the dicts as new columns in the dataframe, using each dict 1st value as column name, and each dict 2ndvalue as cell content, like so :
So I can have 1 new column per field_id, and what was filled out for that field as value in that column. Now the issue I am bumping into is I can't find a way to achieve that, I believe mainly because eachlist has a different length.
I have tried using the column as a list type, and exploding its content, but with that method for each 1000 rows I get 170000 rows instead, and end up with a dataframe of 14M+ lines (while there are only 100k tickets) which isn't the best.
I'd like to find a way to not explode as rows, but "explode as new columns" , to eventually have 1 new column per field_id, and what was filled out for that field as value in that column, while remaining in the same dataframe.
Any help with this will be greatly appreciated !