How to download a large file on startup in FastAPI without blocking the event loop?

742 views Asked by At

I would like to download a large file when the application starts up but it should happen parallelly. As in an actual app startup, the app shouldn't wait for the file download to complete.

What I am currently doing is:

from fastapi import FastAPI


app = FastAPI()

items = {}


@app.on_event("startup")
def startup_event():
    //Download file

Now this seems to work but I get a lot of critical worker timeout errors. I wanted to know if there is someway that I can do the download just when the application starts but also do it in a way that it doesn't make the application wait for the download to finish.

2

There are 2 answers

1
Prudhviraj Panisetti On

Lets say for example take 10GB file (https://speed.hetzner.de/10GB.bin) to download on startup.

As the application starts, it triggers an asynchronous download task using aiohttp, fetch a file from https://speed.hetzner.de/10GB.bin and saving it as downloaded_file.

The download occurs in chunks, and this background process allows the application to initiate other tasks and respond to incoming requests without waiting for the download to complete.

import asyncio
from fastapi import FastAPI
import aiohttp

app = FastAPI()

async def download_large_file():
    async with aiohttp.ClientSession() as session:
        url = "https://speed.hetzner.de/10GB.bin"
        async with session.get(url) as response:
            if response.status == 200:
                with open('downloaded_file', 'wb') as file:
                    while True:
                        chunk = await response.content.read(1024)
                        if not chunk:
                            break
                        file.write(chunk)

@app.on_event("startup")
async def startup_event():
    loop = asyncio.get_event_loop()
    loop.create_task(download_large_file())

Hope this block of code helps.

0
Chris On

This answer derives code and information from the following answers. Hence, please take a look at them for more details and explanation:

  1. How to initialise a global object or variable and reuse it in every FastAPI endpoint?
  2. What is the proper way to make downstream Https requests inside of Uvicorn/FastAPI?
  3. Is having a concurrent.futures.ThreadPoolExecutor call dangerous in a FastAPI endpoint?
  4. FastAPI python: How to run a thread in the background?
  5. Return File/Streaming response from online video URL in FastAPI
  6. FastAPI UploadFile is slow compared to Flask
  7. How to download a large file using FastAPI?
  8. How to run another application within the same running event loop?

The solutions provided below use the httpx library, which provides a powerful HTTP client library for Python, an async API and support for both HTTP/1.1 and HTTP/2. The aiofiles library is also used for handling file operations (such as writing files to disk) in asyncio applications. Public videos (large files) for testing the solutions can be found here.

Solution 1

Use this solution if you would like to reuse the HTTP client across your application.

from fastapi import FastAPI, Request
from contextlib import asynccontextmanager
from fastapi.responses import StreamingResponse
from starlette.background import BackgroundTask
import asyncio
import aiofiles
import httpx


async def download_large_file(client: httpx.AsyncClient):
    large_file_url = 'http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4'
    path = 'save_to/video.mp4'
    req = client.build_request('GET', large_file_url)
    r = await client.send(req, stream=True)
    async with aiofiles.open(path, 'wb') as f:
        async for chunk in r.aiter_raw():
            await f.write(chunk)
    await r.aclose()

    
@asynccontextmanager
async def lifespan(app: FastAPI):
    # Initialise the Client on startup and add it to the state
    async with httpx.AsyncClient() as client:
        asyncio.create_task(download_large_file(client))
        yield {'client': client}
        # The Client closes on shutdown


app = FastAPI(lifespan=lifespan)


@app.get('/')
async def home():
    return 'Hello World!'


@app.get('/download')
async def download_some_file(request: Request):
    client = request.state.client  # reuse the HTTP client
    req = client.build_request('GET', 'https://www.example.com')
    r = await client.send(req, stream=True)
    return StreamingResponse(r.aiter_raw(), background=BackgroundTask(r.aclose)) 

Solution 2

Use this solution if you don't need reusing the HTTP client, but only need using it at startup.

from fastapi import FastAPI
from contextlib import asynccontextmanager
import asyncio
import aiofiles
import httpx


async def download_large_file():
    large_file_url = 'http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4'
    path = 'save_to/video.mp4'
    async with httpx.AsyncClient() as client:
        async with client.stream('GET', large_file_url) as r:
            async with aiofiles.open(path, 'wb') as f:
                async for chunk in r.aiter_raw():   
                    await f.write(chunk)


@asynccontextmanager
async def lifespan(app: FastAPI):
    asyncio.create_task(download_large_file())
    yield


app = FastAPI(lifespan=lifespan)


@app.get('/')
async def home():
    return 'Hello World!'