I often need to fetch large quantities of data from Microsoft SQL servers to be manipulated with Polars in Rust, and per enterprise security policy am more or less forced to use ODBC for these connections. The ODBC requirement restricts me from using mature and featureful libraries like ConnectorX. I am able to connect and efficiently read query results into RecordBatch objects from Arrow using arrow_odbc, but have not been able to convert these RecordBatch objects into Polars DataFrames.
Because the actual data component of RecordBatch and Series have the same underlying representation, I thought it would be possible to create a DataFrame from a RecordBatch zero-copy.
However in columns.push(Series::from_arrow(&schema.fields().get(i).unwrap().name(), *column)?); I get the error:
mismatched types
expected struct `std::boxed::Box<(dyn polars::export::polars_arrow::array::Array + 'static)>`
found struct `Arc<dyn arrow::array::Array>`
I was under the impression that an Arc<dyn Array> is an ArrayRef, is the real problem perhaps that I have an Arc<dyn arrow::array::Array> and Series::from_arrow() is expecting a Polars Arc<Array>? If so, how do I resolve that?
My full code is below for reference.
use arrow_odbc::{odbc_api::{Environment, ConnectionOptions}, OdbcReaderBuilder};
use arrow::record_batch::RecordBatch;
use polars::prelude::*;
use anyhow::Result;
const CONNECTION_STRING: &str = "...";
pub fn test() -> Result<()> {
let odbc_environment = Environment::new()?;
let connection = odbc_environment.connect_with_connection_string(
CONNECTION_STRING,
ConnectionOptions::default()
)?;
let cursor = connection.execute("SELECT * FROM Backcast_Power_Plant_Map", ())?.unwrap();
let arrow_record_batches = OdbcReaderBuilder::new().build(cursor)?;
fn record_batch_to_dataframe(batch: &RecordBatch) -> Result<DataFrame, PolarsError> {
let schema = batch.schema();
let mut columns = Vec::with_capacity(batch.num_columns());
for (i, column) in batch.columns().iter().enumerate() {
columns.push(Series::from_arrow(&schema.fields().get(i).unwrap().name(), *column)?);
}
Ok(DataFrame::from_iter(columns))
}
for batch in arrow_record_batches {
dbg!(record_batch_to_dataframe(&batch?));
}
Ok(())
}
It appears
polarsandarrow-odbcuses different arrow crates:polarsusespolars-arrow, andarrow-odbcusesarrow. The former's array type isBox<dyn polars_arrow::array::Array>, while the latter has the typeArrayRef, which is an alias forArc<dyn arrow::array::Array>.Luckily for us, there exists a compatibility layer in the
polars-arrowcrate. You can convert between the two types (and more) viaFromimpls:Note this requires
polars-arrowwith thearrow_rsfeature as a dependency.From what I can tell, this does not copy the actual data.