Problem Overview
Are you struggling to fetch JavaScript-heavy web pages on a low-memory server? Maybe you need an efficient way to scrape or render dynamic content without breaking the bank on resource usage. If you’re running a VPS or a small cloud instance with just 2GB of RAM, this guide is for you. We’ll walk through setting up a headless browser API service using FastAPI and Playwright that won’t hog your server’s resources.
What You Need
Before diving in, make sure you have:
- A server running Debian or Ubuntu.
- Access to a terminal with sudo privileges.
- Python 3.7 or later installed.
Step 1: Prepare Your Server
First, let’s clean up your server to ensure a fresh start. This script will hard-reset everything:
sudo pkill -f chrome
sudo apt-get clean
sudo apt-get autoremove
This command kills any old Chromium processes and cleans up unneeded packages. It’s a good practice to start from a clean slate.
Step 2: Install System Dependencies
Next, we’ll install the bare minimum dependencies for running Chromium headless:
sudo apt-get update
sudo apt-get install -y wget unzip
sudo apt-get install -y libnss3 libgconf-2-4 libxss1 libx11-xcb1
These packages are necessary for Chromium to run smoothly. Keeping the installation minimal helps save disk space.
Step 3: Set Up Python Environment
Now, let’s create a dedicated Python virtual environment:
python3 -m venv playwright-env
source playwright-env/bin/activate
This isolates your project dependencies from the system Python installation, avoiding potential conflicts.
Step 4: Install FastAPI and Playwright
With your virtual environment activated, install FastAPI and Playwright:
pip install fastapi uvicorn playwright
We are pinning recent but stable versions for reliability. Remember to install only Chromium:
playwright install chromium
This keeps your setup lightweight by skipping other browsers.
Step 5: Create the FastAPI Application
Let’s write a simple FastAPI app. Create a file named server.py:
from fastapi import FastAPI, HTTPException
from playwright.sync_api import sync_playwright
app = FastAPI()
@app.post("/render")
def render(url: str, wait_ms: int = 0):
with sync_playwright() as p:
browser = p.chromium.launch(args=["--no-sandbox", "--disable-gpu", "--enable-low-end-device-mode"])
page = browser.new_page()
page.goto(url)
if wait_ms > 0:
page.wait_for_timeout(wait_ms)
content = page.content()
browser.close()
return { "html": content }
This code sets up an endpoint at /render that takes a URL and an optional wait time, then returns the rendered HTML.
Step 6: Run Your Application
To run your FastAPI application, execute the following command:
uvicorn server:app --host 0.0.0.0 --port 3000 --reload
This command launches your FastAPI server on port 3000, allowing you to hit the /render endpoint with POST requests.
Step 7: Testing the Endpoint
To test if everything is working, you can use curl or Postman to make a request:
curl -X POST "http://your-server-ip:3000/render" -H "Content-Type: application/json" -d '{"url": "https://example.com"}'
You should receive the rendered HTML in response. If not, check your server logs for any errors.
Final Touches
Make sure to set up your server permissions and consider using an API key for added security. You can generate a random 64-character hex key for this purpose.
And there you have it! A low-memory headless browser API service ready to scrape dynamic content.
Next Steps
Need more help? Check the latest CrushEdge posts.
No Comments