Firebase function Node.js transform stream

1k views Asked by At

I'm creating a Firebase HTTP Function that makes a BigQuery query and returns a modified version of the query results. The query potentially returns millions of rows, so I cannot store the entire query result in memory before responding to the HTTP client. I am trying to use Node.js streams, and since I need to modify the results before sending them to the client, I am trying to use a transform stream. However, when I try to pipe the query stream through my transform stream, the Firebase Function crashes with the following error message: finished with status: 'response error'.

My minimal reproducible example is as follows. I am using a buffer, because I don't want to process a single row (chunk) at a time, since I need to make asynchronous network calls to transform the data.

return new Promise((resolve, reject) => {
    const buffer = new Array(5000)
    let bufferIndex = 0
    const [job] = await bigQuery.createQueryJob(options)
    const bqStream = job.getQueryResultsStream()

    const transformer = new Transform({
        writableObjectMode: true,
        readableObjectMode: false,
        transform(chunk, enc, callback) {
            buffer[bufferIndex] = chunk
            if (bufferIndex < buffer.length - 1) {
                bufferIndex++
            }
            else {
                this.push(JSON.stringify(buffer).slice(1, -1)) // Transformation should happen here.
                bufferIndex = 0
            }
            callback()
        },
        flush(callback) {
            if (bufferIndex > 0) {
                this.push(JSON.stringify(buffer.slice(0, bufferIndex)).slice(1, -1))
            }
            this.push("]")
            callback()
        },
    })

    bqStream
        .pipe(transform)
        .pipe(response)

    bqStream.on("end", () => {
        resolve()
    })
}
2

There are 2 answers

2
Doug Stevenson On BEST ANSWER

I cannot store the entire query result in memory before responding to the HTTP client

Unfortunately, when using Cloud Functions, this is precisely what must happen.

There is a documented limit of 10MB for the response payload, and that is effectively stored in memory as your code continues to write to the response. Streaming of requests and responses is not supported.

One alternative is to write your response to an object in Cloud Storage, then send a link or reference to that file to the client so it can read the response fully from that object.

If you need to send a large streamed response, Cloud Functions is not a good choice. Neither is Cloud Run, which is similarly limited. You will need to look into other solutions that allow direct socket access, such as Compute Engine.

0
Wilhelm Mauch On

I tried to implement the workaround as suggested by Doug Stevenson and got the following error:

@firebase/firestore: Firestore (9.8.2): 
Connection GRPC stream error. 
Code: 3 
Message: 3 
INVALID_ARGUMENT: Request payload size exceeds the limit: 11534336 bytes.

I created a workaround to store data in Firestore first. It works fine when the content size is below 10MB.

import * as firestore from "firebase/firestore";
import { initializeApp } from "firebase/app";
import { firebaseConfig }  from '../conf/firebase'


// Initialize Firebase
const app = initializeApp(firebaseConfig);
const fs = firestore.getFirestore(app);


export async function storeStudents(data, context) {

    const students = await api.getTermStudents()
    const batch = firestore.writeBatch(fs);

    students.forEach((student) => {
        const ref = firestore.doc(fs, 'students', student.studentId)
        batch.set(ref, student)
    })

    await batch.commit()

    return 'stored'
}

exports.getTermStudents = functions.https.onCall(storeStudents);

UPDATE:

To bypass Firestore's limit when using the batch function, I just looped through the array and set (add/update) documents. Set() creates or overwrites a single document.

export async function storeStudents(data, context) {

    const students = await api.getTermStudents({images: true})

    students.forEach((student: Student) => {
        const ref = firestore.doc(fs, 'students', student.student_id)
        firestore.setDoc(ref, student)
    })

    return 'stored'
}