I have a flask web-scraping app that works locally but fails when I deploy.
My problems started when I added Playwright to my app in order to scrape a webpage that used javascript to render html that wasn't immediately available. In one instance I even needed to open an entirely new window and parse through that new document's html.
So I use Playwright with Chromium in a headless browser to navigate, and I'm using Beautiful Soup to do the actual scraping.
It works fine locally, but when I deploy to Render I get the following error:
Executable doesn't exist at /opt/render/.cache/ms-playwright/chromium-1091/chrome-linux/chrome
"Looks like Playwright was just installed or updated.
Please run the following command to download new browsers:
playwright install"
My issue appears to be that render doesn't seem to have chromium available by default, and I've had issues trying to install it natively.
I have been pointed to these links talking about installing chromium
https://community.render.com/t/installing-headless-chromium-w-o-docker/5185 https://community.render.com/t/playwright-install-with-chromium/11218
The second link at the top mentions the "Recommended solution" is to use a Dockerfile.
Here is the Dockerfile I came up with:
FROM python:3.9.6-slim
RUN apt-get update && apt-get install -y chromium
ENV PLAYWRIGHT_BROWSERS_PATH=/usr/lib/playwright
ENV PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD 1
RUN pip install [email protected]
RUN pip install playwright==1.21.1
RUN playwright install
WORKDIR /usr/src/app
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
RUN apt-get update && apt-get install -y \
libglib2.0-0 \
libnss3 \
libnspr4 \
libdbus-1-3 \
libatk1.0-0 \
libatk-bridge2.0-0 \
libcups2 \
libdrm2 \
libxcb1 \
libxkbcommon0 \
libatspi2.0-0 \
libx11-6 \
libxcomposite1 \
libxdamage1 \
libxext6 \
libxfixes3 \
libxrandr2 \
libgbm1 \
libpango-1.0-0 \
libcairo2 \
libasound2
EXPOSE 80
CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:80", "app:app"]
But I get the same error when I deploy to render.
My build command is "pip install -r requirements.txt"
Am I missing something? I tried to update the build command but render doesn't recognize the "docker build" syntax.
I've been on this for over a week, and although the app runs, I'd love to include these new websites in automatic updates to make the app more usable.
Any advice is helpful, and if I need more information please let me know.
In my case, I needed to start from scratch on render with a new web service so I could "Deploy an existing image from a registry".
I was attempting to update a github repository which meant render never even used my Dockerfile.
Still having troubles getting the container to run on render, but in case any one else has this specific problem wanted to get it out there.