What is the best way to send Arrow data to the browser?

2.1k views Asked by At

I have Apache Arrow data on the server (Python) and need to use it in the browser. It appears that Arrow Flight isn't implemented in JS. What are the best options for sending the data to the browser and using it there?

I don't even need it necessarily in Arrow format in the browser. This question hasn't received any responses, so I'm adding some additional criteria for what I'm looking for:

  • Self-describing: don't want to maintain separate schema definitions
  • Minimal overhead: For example, an array of float32s should transfer as something compact like a data type indicator, length value and sequence of 4-byte float values
  • Cross-platform: Able to be easily sent from Python and received and used in the browser in a straightforward way

Surely this is a solved problem? If it is I've been unable to find a solution. Please help!

1

There are 1 answers

1
amoeba On BEST ANSWER

Building off of the comments on your original post by David Li, you can implement a non-streaming version what you want without too much code using PyArrow on the server side and the Apache Arrow JS bindings on the client. The Arrow IPC format satisfies your requirements because it ships the schema with the data, is space-efficient and zero-copy, and is cross-platform.

Here's a toy example showing generating a record batch on server and receiving it on the client:

Server:

from io import BytesIO

from flask import Flask, send_file
from flask_cors import CORS
import pyarrow as pa

app = Flask(__name__)
CORS(app)

@app.get("/data")
def data():
    data = [
        pa.array([1, 2, 3, 4]),
        pa.array(['foo', 'bar', 'baz', None]),
        pa.array([True, None, False, True])
    ]
    batch = pa.record_batch(data, names=['f0', 'f1', 'f2'])

    sink = pa.BufferOutputStream()

    with pa.ipc.new_stream(sink, batch.schema) as writer:
        writer.write_batch(batch)

    return send_file(BytesIO(sink.getvalue().to_pybytes()), "data.arrow")

Client

const table = await tableFromIPC(fetch(URL));
// Do what you like with your data

Edit: I added a runnable example at https://github.com/amoeba/arrow-python-js-ipc-example.