how to create a polars-arrow `Array` from raw values (`&[u8]`)

35 views Asked by At

how can I create an Array from a set of raw values that is created elsewhere?

for example if i have


let offsets: &[i64] = [ 0, 5, 5, 10, 0, 0, 0, 0 ];
// "helloworld"
let values: &[u8] = [ 104, 101, 108, 108, 111, 119, 111, 114, 108, 100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ];
// "hello", null, "world"
let null_bitmap: &[u8] = [ 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ];

let utf8_array = // <how do i get from those values to a Utf8Array>

The desired outcome would be the equivalent of [Some("hello"), None, Some("world")]

For reference, this is how i would do it via arrow-rs

let array_data = ArrayData::builder(DataType::LargeUtf8)
    .len(3)
    .null_count(1)
    .add_buffer(Buffer::from_slice_ref(offsets))
    .add_buffer(Buffer::from_slice_ref(values))
    .null_bit_buffer(Some(Buffer::from_slice_ref(null_bitmap)))
    .build()
    .unwrap();
let string_array = LargeStringArray::from(array_data);
1

There are 1 answers

1
BallpointBen On

I'm not sure what the representation of the data in those arrays is, exactly, but in polars-arrow it looks like this:

use polars_arrow::{
    array::Utf8Array, bitmap::Bitmap, buffer::Buffer, datatypes::ArrowDataType,
    offset::OffsetsBuffer,
};

fn main() -> anyhow::Result<()> {
    let offsets: &[i64] = &[0, 5, 5, 10];
    let values: &[u8] = &[104, 101, 108, 108, 111, 119, 111, 114, 108, 100];

    let utf8_array = Utf8Array::try_new(
        ArrowDataType::LargeUtf8,  // i64 indices -> LargeUtf8; i32 -> Utf8
        OffsetsBuffer::try_from(offsets.to_owned())?,
        Buffer::from(values.to_owned()),
        Some(Bitmap::from(vec![true, false, true])),
    )?;

    println!("{utf8_array:?}");
    println!("{:?}", utf8_array.iter().collect::<Vec<_>>());

    Ok(())
}
LargeUtf8Array[hello, None, world]
[Some("hello"), None, Some("world")]