Polars schema breaks with List type

255 views Asked by At

I tried creating simple polars dataframe with two columns:

import polars as pl

data = {"a": [ ["X"], ["Y"], []], "b": [3, 4, 5]}

# this works normaly
df = pl.DataFrame(data)

df_schema = [("a", pl.List),
             ("b", pl.Int8)]

# this breaks - invalid series dtype: expected `Utf8`, got `null`
df = pl.DataFrame(data, schema=schema)

Without specifying schema, it creates following Dataframe shape:

shape: (3, 2)
a           b
list[str]   i64
["X"]       3
["Y"]       4
[]          5

but when I specify exact same schema, it breaks. What could be the problem?

using polars==0.19.12

1

There are 1 answers

0
ignoring_gravity On BEST ANSWER

You need to specify the inner dtype:


In [59]: import polars as pl
    ...:
    ...: data = {"a": [ ["X"], ["Y"], []], "b": [3, 4, 5]}
    ...:
    ...: # this works normaly
    ...: df = pl.DataFrame(data)
    ...:
    ...: df_schema = [("a", pl.List(pl.Utf8)),
    ...:              ("b", pl.Int8)]
    ...:
    ...: df = pl.DataFrame(data, schema=df_schema)

In [60]: df
Out[60]:
shape: (3, 2)
┌───────────┬─────┐
│ a         ┆ b   │
│ ---       ┆ --- │
│ list[str] ┆ i8  │
╞═══════════╪═════╡
│ ["X"]     ┆ 3   │
│ ["Y"]     ┆ 4   │
│ []        ┆ 5   │
└───────────┴─────┘