Combine Two Polars Dataframe in a complex way

113 views Asked by At

I have a dataframe with transactions and another dataframe with employees who are assigned to those transactions. Each transaction can have 0...N assignments. I would like to have one dataframe which consists of transaction ids and all of the employee assignments in separate columns. Please see example below:

I have a one dataframe as follows:

TransactionIds Other_Columns..
T1 Cell 2...These don't matter
T2 Cell 4...These don't matter
T3 Cell 4...These don't matter

I have another dataframe as follows:

TransactionIds Assignments
T1 Assignment1
T1 Assignment2
T1 Assignment3
T2 Assignment3
T2 Assignment4
T3 Assignment6
T4 NULL

I would like to have a dataframe which looks like as follows:

TransactionIds Assignment1 Assignment2 Assignment3 AssignmentN
T1 Assignment1 Assignment2 Assignment3 NULL
T2 Assignment3 Assignment4 NULL NULL
T3 NULL NULL NULL NULL

I tried group by and then agg() function. However, it gives me a list which I don't know how to convert to columns. Another problem with this approach is I wouldn't know how many columns to convert this list into. I would like to dynamically determine/create the number of assignment columns from the SELECT.

1

There are 1 answers

0
WannaBeAProgrammer On BEST ANSWER

I was able to resolve this by doing the following:

    let stns: Vec<String> = (1..6).map(|i| format!("Station {i}")).collect();
    let weather = df!(
            "station"=> &stns,
            "temperatures"=> &[
                "20 5 5 E1 7 13 19 9 6 20",
                "18 8 16 11 23 E2 8 E2 E2 E2 90 70 40",
                "19 24 E9 16 6 12 10 22",
                "E2 E0 15 7 8 10 E1 24 17 13 6",
                "14 8 E0 16 22 24 E1",
            ],
    )
    .unwrap();
    println!("{}", &weather);

    let out = weather
        .clone()
        .lazy()
        .with_columns([col("temperatures").str().split(lit(" "))])
        .with_column(col("temperatures").list().lengths().alias("counts"))
        .with_column(col("counts").max())
        .collect()
        .unwrap();

    println!("{}", &out);

    let max_is = out.column("counts").unwrap().u32().unwrap().max();

    println!("max_is: {:?}", max_is);

    let mut selects = vec![];

    for i in 0..max_is.unwrap() {
        selects.push(col("temperatures").list().get(lit(i)).alias(format!("Assignment_{}", i + 1).as_str()));
    }

    println!("Selects: {:#?}", selects);

    let out12 = out.lazy().with_columns(&selects).drop_columns(["foo"]).collect().unwrap();
    println!("{}", &out12);