Deploying a Low-Memory Headless Browser API Service with FastAPI

Problem Overview

Are you struggling to fetch JavaScript-heavy web pages on a low-memory server? Maybe you need an efficient way to scrape or render dynamic content without breaking the bank on resource usage. If you’re running a VPS or a small cloud instance with just 2GB of RAM, this guide is for you. We’ll walk through setting up a headless browser API service using FastAPI and Playwright that won’t hog your server’s resources.

What You Need

Before diving in, make sure you have:

  • A server running Debian or Ubuntu.
  • Access to a terminal with sudo privileges.
  • Python 3.7 or later installed.

Step 1: Prepare Your Server

First, let’s clean up your server to ensure a fresh start. This script will hard-reset everything:

sudo pkill -f chrome
sudo apt-get clean
sudo apt-get autoremove

This command kills any old Chromium processes and cleans up unneeded packages. It’s a good practice to start from a clean slate.

Step 2: Install System Dependencies

Next, we’ll install the bare minimum dependencies for running Chromium headless:

sudo apt-get update
sudo apt-get install -y wget unzip
sudo apt-get install -y libnss3 libgconf-2-4 libxss1 libx11-xcb1

These packages are necessary for Chromium to run smoothly. Keeping the installation minimal helps save disk space.

Step 3: Set Up Python Environment

Now, let’s create a dedicated Python virtual environment:

python3 -m venv playwright-env
source playwright-env/bin/activate

This isolates your project dependencies from the system Python installation, avoiding potential conflicts.

Step 4: Install FastAPI and Playwright

With your virtual environment activated, install FastAPI and Playwright:

pip install fastapi uvicorn playwright

We are pinning recent but stable versions for reliability. Remember to install only Chromium:

playwright install chromium

This keeps your setup lightweight by skipping other browsers.

Step 5: Create the FastAPI Application

Let’s write a simple FastAPI app. Create a file named server.py:

from fastapi import FastAPI, HTTPException
from playwright.sync_api import sync_playwright

app = FastAPI()

@app.post("/render")
def render(url: str, wait_ms: int = 0):
    with sync_playwright() as p:
        browser = p.chromium.launch(args=["--no-sandbox", "--disable-gpu", "--enable-low-end-device-mode"])
        page = browser.new_page()
        page.goto(url)
        if wait_ms > 0:
            page.wait_for_timeout(wait_ms)
        content = page.content()
        browser.close()
        return { "html": content }

This code sets up an endpoint at /render that takes a URL and an optional wait time, then returns the rendered HTML.

Step 6: Run Your Application

To run your FastAPI application, execute the following command:

uvicorn server:app --host 0.0.0.0 --port 3000 --reload

This command launches your FastAPI server on port 3000, allowing you to hit the /render endpoint with POST requests.

Step 7: Testing the Endpoint

To test if everything is working, you can use curl or Postman to make a request:

curl -X POST "http://your-server-ip:3000/render" -H "Content-Type: application/json" -d '{"url": "https://example.com"}'

You should receive the rendered HTML in response. If not, check your server logs for any errors.

Final Touches

Make sure to set up your server permissions and consider using an API key for added security. You can generate a random 64-character hex key for this purpose.

And there you have it! A low-memory headless browser API service ready to scrape dynamic content.

Next Steps

Need more help? Check the latest CrushEdge posts.

No Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.