How may I construct a value of vec of vecs for a record batch in DataFusion?

129 views Asked by At

I can create column of type "UTF8" as follows

    let schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Int32, false),
        Field::new("payload", DataType::Utf8, false),
    ]));

    let vec_of_strings: Vec<String> = vec!["one".to_string(), "two".to_string()];
    
    let batch = RecordBatch::try_new(
        schema,
        vec![
            Arc::new(Int32Array::from_slice([1, 2])),
            Arc::new(StringArray::from(vec_of_strings)),
        ],
    )?;

    ctx.register_batch("demo", batch)?;

Executing a query against this, like so

    let df = ctx.sql(r#"
       SELECT *
       from demo
    "#).await?;

gives the expected results

+----+---------+
| id | payload |
+----+---------+
| 1  | one       |
| 2  | two      |
+----+---------+

Now I have a usecase where the payload should be an array. So something like this

+----+---------+
| id | payload |
+----+---------+
| 1  | [piano, guitar, drums]   |
| 2  | [violin, piano]      |
+----+---------+

How may I go about this?

changing the vec_of_strings to vec_of_vecs fails. I mean this

    let vec_of_vecs: Vec<Vec<String>> = vec![
        vec!["piano".to_string(), "guitar".to_string(), "drums".to_string()],
        vec!["violin".to_string(), "guitar".to_string()]
    ];

When used to create the batch like this

    let batch = RecordBatch::try_new(
        schema,
        vec![
            Arc::new(Int32Array::from_slice([1, 2])),
            Arc::new(StringArray::from(vec_of_vecs)),
        ],
    )?;

Fails to compile with the error

   |
80 |             Arc::new(StringArray::from(vec_of_vecs)),
  |                      ----------------- ^^^^^^^^^^^ the trait `From<Vec<Vec<std::string::String>>>` is not implemented for `GenericByteArray<GenericStringType<i32>>`
  |                      |
  |                      required by a bound introduced by this call
  |
  = help: the following other types implement trait `From<T>`:
            <GenericByteArray<GenericBinaryType<OffsetSize>> as From<GenericByteArray<GenericStringType<OffsetSize>>>>
            <GenericByteArray<GenericBinaryType<OffsetSize>> as From<Vec<&[u8]>>>
            <GenericByteArray<GenericBinaryType<OffsetSize>> as From<Vec<Option<&[u8]>>>>
            <GenericByteArray<GenericBinaryType<T>> as From<GenericListArray<T>>>
            <GenericByteArray<GenericStringType<OffsetSize>> as From<GenericByteArray<GenericBinaryType<OffsetSize>>>>
            <GenericByteArray<GenericStringType<OffsetSize>> as From<GenericListArray<OffsetSize>>>
            <GenericByteArray<GenericStringType<OffsetSize>> as From<Vec<&str>>>
            <GenericByteArray<GenericStringType<OffsetSize>> as From<Vec<Option<&str>>>>
          and 3 others

Any idea on how I may achieve the above?

0

There are 0 answers