A client-side virtual file system for browsers that works with chunking

1.8k views Asked by At

I am trying to port some part of a desktop application to be able to run in the browser (client-side). I need a sort of virtual file system, in which I can read and write files (binary data). From what I gather, one of the only options that works broadly across browsers is IndexedDB. However, I'm kind of alienated trying to find examples that read or write larger files. It seems the API only supports passing/obtaining an entire file contents to/from the database (a blob or byte array).

What I'm trying to find, is something in which I can continuously "stream" so to speak the data to/from the virtual file system, analoguous to how you do it on any other non-browser application. E.g. (pseudo code)

val in = new FileInputStream(someURLorPath)
val chunkSize = 4096
val buf = new Array[Byte](chunkSize)
while (in.hasRemaining) {
  val sz = min(chunkSize, in.remaining)
  in.read(buf, 0, sz)
  processSome(buf, 0, sz)
  ...
)
in.close()

I understand synchronous API is a problem for browsers; it would also be ok, if read was an asynchronous method instead. But I want to go through the file - which can be huge, e.g. several 100 MB - block by block. The block size doesn't matter. This goes both for reading and for writing.

Random-access (being able to seek to a position within the virtual file) would be a plus, but not mandatory.


One idea I have is that one store = one virtual file, and then the keys are chunk indices? A bit like the cursor example on MDN, but each record is a blob or array of a fixed size. Does that make sense? Is there a better API or approach?


It seems that Streams would conceptually be the API I'm looking for, but I don't know how to "stream to/from" a virtual file system such as IndexedDB.

1

There are 1 answers

3
Dai On BEST ANSWER

Assuming you want the ability to transparently work with initially remote resources which are cached (and consistent) locally, you can abstract over fetch (with Range: requests) and IndexedDB.

BTW, you'll really want to use TypeScript for this, because working with Promise<T> in pure JavaScript is a PITA.

one could say either read-only or append-only write. Strictly speaking, I don't need to be able to overwrite file contents (although it would be convenient to have)

Something like this..

I cobbled this together from MDN's docs - I haven't tested it, but I hope it put you in the right direction:

Part 1 - LocalFileStore

These classes allow you to store arbitrary binary data in in chunks of 4096 bytes, where each chunk is represented by an ArrayBuffer.

The IndexedDB API is confusing at first, as it doesn't use native ECMAScript Promise<T>s but instead its own IDBRequest-API and with oddly named properties - but the gist of it is:

  • A single IndexedDB database named 'files' holds all of the files cached locally.
  • Each file is represented by its own IDBObjectStore instance.
  • Each 4096-byte chunk of each file is represented by its own record/entry/key-value-pair inside that IDBObjectStore, where the key is the 4096-aligned offset into the file.
    • Note that all IndexedDB operations happen within an IDBTransaction context, hence why class LocalFile wraps a IDBTransaction object rather than an IDBObjectStore object.
class LocalFileStore {
    
    static open(): Promise<IDBDatabase> {
        
        return new Promise<IDBDatabase> ( function( accept, reject ) {
            
            // Surprisingly, the IndexedDB API is designed such that you add the event-handlers *after* you've made the `open` request. Weird.
            const openReq = indexedDB.open( 'files' );
            openReq.addEventListener( 'error', function( err ) {
                reject( err );
            };
            openReq.addEventListener( 'success', function() {
                const db = openReq.result;
                accept( db );
            };
        } );
    }

    constructor(
        private readonly db: IDBDatabase
    ) {    
    }
    
    openFile( fileName: string, write: boolean ): LocalFile {
        
        const transaction = this.db.transaction( fileName, write ? 'readwrite' : 'readonly', 'strict' );
        
        return new LocalFile( fileName, transaction, write );
    }
}

class LocalFile {
    
    constructor(
        public readonly fileName: string,
        private readonly t: IDBTransaction,
         public readonly writable: boolean
    ) {
    }

    getChunk( offset: BigInt ): Promise<ArrayBuffer> {
        
        if( offset % 4096 !== 0 ) throw new Error( "Offset value must be a multiple of 4096." );
       
        return new Promise<ArrayBuffer>( function( accept, reject ) {
        
            const key = offset.ToString()
            const req = t.objectStore( this.fileName ).get( key );
            
            req.addEventListener( 'error', function( err ) {
                reject( err );
            } );
            
            req.addEventListener( 'success', function() {
                const entry = req.result;
                if( typeof entry === 'object' && entry !== null ) {
                    if( entry instanceof ArrayBuffer ) {
                        accept( entry as ArrayBuffer );
                        return;
                    }
                }
                else if( typeof entry === 'undefined' ) {
                    accept( null );
                    return;
                }

                reject( "Entry was not an ArrayBuffer or 'undefined'." );
            } );

        } );
    }

    putChunk( offset: BigInt, bytes: ArrayBuffer ): Promise<void> {
        if( offset % 4096 !== 0 ) throw new Error( "Offset value must be a multiple of 4096." );
        if( bytes.length > 4096 ) throw new Error( "Chunk size cannot exceed 4096 bytes." );
        
        return new Promise<ArrayBuffer>( function( accept, reject ) {
        
            const key = offset.ToString();
            const req = t.objectStore( this.fileName ).put( bytes, key );
            
            req.addEventListener( 'error', function( err ) {
                reject( err );
            } );
            
            req.addEventListener( 'success', function() {
                accept();
            } );

        } );
    }

    existsLocally(): Promise<boolean> {
        // TODO: Implement check to see if *any* data for this file exists locally.
    }
}

Part 2: AbstractFile

  • This class wraps the IndexedDB-based LocalFileStore and LocalFile classes above and also uses fetch.
  • When you make a read request for a range of a file:
    1. It first checks with the LocalFileStore; if it has the necessary chunks then it will retrieve them.
    2. If it's lacking any chunks in the range then it will fallback to retrieving the requested range using fetch with a Range: header, and cache those chunks locally.
  • When you make a write request to a file:
    • I actually haven't implemented that bit yet, but that's an exercise left up to the reader :)
class AbstractFileStore {
    
    private readonly LocalFileStore lfs;

    constructor() {
        this.lfs = LocalFileStore.open();
    }

    openFile( fileName: string, writeable: boolean ): AbstractFile {
        
        return new AbstractFile( fileName, this.lfs.openFile( fileName, writeable ) );
    }
}

class AbstractFile {
    
    private static const BASE_URL = 'https://storage.example.com/'

    constructor(
        public readonly fileName: string,
        private readonly localFile: LocalFile
    ) {
        
    }

    read( offset: BigInt, length: number ): Promise<ArrayBuffer> {

        const anyExistsLocally = await this.localFile.existsLocally();
        if( !anyExistsLocally ) {
            return this.readUsingFetch( chunk, 4096 ); // TODO: Cache the returned data into the localFile store.
        }

        const concat = new Uint8Array( length );
        let count = 0;

        for( const chunkOffset of calculateChunks( offset, length ) ) {
             // TODO: Exercise for the reader: Split `offset + length` into a series of 4096-sized chunks.
            
            const fromLocal = await this.localFile.getChunk( chunk );
            if( fromLocal !== null ) {
                concat.set( new Uint8Array( fromLocal ), count );
                count += fromLocal.length;
            }
            else {
                const fromFetch = this.readUsingFetch( chunk, 4096 );
                concat.set( new Uint8Array( fromFetch ), count );
                count += fromFetch.length;
            }
        }

        return concat;
    }

    private readUsingFetch( offset: BigInt, length: number ): Promise<ArrayBuffer> {
        
        const url = AbstractFile.BASE_URL + this.fileName;

        const headers = new Headers();
        headers.append( 'Range', 'bytes=' + offset + '-' + ( offset + length ).toString() );

        const opts = {
            credentials: 'include',
            headers    : headers
        };

        const resp = await fetch( url, opts );
        return await resp.arrayBuffer();
    }

    write( offset: BigInt, data: ArrayBuffer ): Promise<void> {
        
        throw new Error( "Not yet implemented." );
    }
}

Part 3 - Streams?

As the classes above use ArrayBuffer, you can make-use of existing ArrayBuffer functionality to create a Stream-compatible or Stream-like representation - it will have to be asynchronous of course, but async + await make that easy. You could write a generator-function (aka iterator) that simply yields each chunk asynchronously.