In today’s fast-paced digital world, URLs are more than just links. They’re entryways into vast resources and opportunities. But as with anything digital, these URLs can sometimes become inactive or “broken”. For those of us who manage vast lists of URLs, whether they’re for product listings, reference materials, or any other purpose, ensuring these links are active becomes crucial.
Imagine you’re an event manager at a global corporation, and you’re tasked with validating a spreadsheet full of URLs for upcoming event locations. These links direct visitors to landing pages that provide crucial details about each event. Now, you have thousands of URLs to verify, and visiting each manually isn’t feasible. This is where our Python script comes in handy!
Using a combination of Python, the Openpyxl library for reading Excel files, and the Requests library for making HTTP requests, we devised a script that:
- Reads URLs from an Excel file.
- Checks the HTTP status of each URL.
- Writes the result (“ACTIVE” or “NOT ACTIVE”) to a new CSV file.
Breaking Down the Script
Libraries and Dependencies
The core of our script rests on three Python packages: openpyxl, requests, and csv. openpyxl lets us interact with Excel files, requests lets us check URLs, and csv is useful for writing results.
The URL Checking Function
The function get_http_status_code(url) takes a URL and checks its HTTP status. If the link leads to a “404 – Not Found” error, it’s deemed “NOT ACTIVE”. Otherwise, it’s “ACTIVE”.
Sometimes, the spreadsheet might have incomplete URLs, missing “http” or “https” at the beginning. Our reconstruct_url(url) function ensures these links are valid for checking.
Before checking, we ensure the data looks like a URL using the is_url(string) function. This basic validation step prevents unnecessary checks and potential errors.
Reading, Checking, and Writing
The script reads each URL from the Excel sheet, validates it, checks its status, and then writes the results (along with the original URL) to a new CSV file.
Automating repetitive tasks, like checking the activity of a list of URLs, can save hours of manual work and reduce errors. Python, with its vast ecosystem of libraries and its simplicity, is an excellent tool for such jobs. So next time you’re presented with a mammoth task at your workplace, take a moment to ponder – can Python simplify this?
Remember, it’s not just about working hard, but working smart. Happy coding!