I receive data as a list of dicts in a single column. Each list can be a different length. Sample data looks like this:
df = pd.DataFrame(
[
[[{'value': 1}, {'value': 2}, {'value': 3}]],
[[{'value': 4}, {'value': 5}]]
],
columns=['data'],
)
df
data
0 [{'value': 1}, {'value': 2}, {'value': 3}]
1 [{'value': 4}, {'value': 5}]
I want to create a new column min_val which contains the minimum value for each row. I'm trying this:
df.assign(min_val=lambda row: min(val['value'] for val in row.data))
But I get the error:
TypeError: list indices must be integers or slices, not str
A very similar lambda/comprehension combination works in Dask Bag but not in raw Pandas, which is very confusing.
Any help would be very much appreciated.
assignwith a callable argument works on the entire dataframe, not on rows, so you need to thenapplyyour function to thedataseries:Output: