Python

Checking Website Activity with Python

Introduction

In today’s fast-paced digital world, URLs are more than just links. They’re entryways into vast resources and opportunities. But as with anything digital, these URLs can sometimes become inactive or “broken”. For those of us who manage vast lists of URLs, whether they’re for product listings, reference materials, or any other purpose, ensuring these links are active becomes crucial.

Imagine you’re an event manager at a global corporation, and you’re tasked with validating a spreadsheet full of URLs for upcoming event locations. These links direct visitors to landing pages that provide crucial details about each event. Now, you have thousands of URLs to verify, and visiting each manually isn’t feasible. This is where our Python script comes in handy!

The Solution

Using a combination of Python, the Openpyxl library for reading Excel files, and the Requests library for making HTTP requests, we devised a script that:

  • Reads URLs from an Excel file.
  • Checks the HTTP status of each URL.
  • Writes the result (“ACTIVE” or “NOT ACTIVE”) to a new CSV file.

Breaking Down the Script

Libraries and Dependencies

The core of our script rests on three Python packages: openpyxl, requests, and csv. openpyxl lets us interact with Excel files, requests lets us check URLs, and csv is useful for writing results.

The URL Checking Function

The function get_http_status_code(url) takes a URL and checks its HTTP status. If the link leads to a “404 – Not Found” error, it’s deemed “NOT ACTIVE”. Otherwise, it’s “ACTIVE”.

URL Reconstruction

Sometimes, the spreadsheet might have incomplete URLs, missing “http” or “https” at the beginning. Our reconstruct_url(url) function ensures these links are valid for checking.

URL Validation

Before checking, we ensure the data looks like a URL using the is_url(string) function. This basic validation step prevents unnecessary checks and potential errors.

Reading, Checking, and Writing

The script reads each URL from the Excel sheet, validates it, checks its status, and then writes the results (along with the original URL) to a new CSV file.

Conclusion

Automating repetitive tasks, like checking the activity of a list of URLs, can save hours of manual work and reduce errors. Python, with its vast ecosystem of libraries and its simplicity, is an excellent tool for such jobs. So next time you’re presented with a mammoth task at your workplace, take a moment to ponder – can Python simplify this?

Remember, it’s not just about working hard, but working smart. Happy coding!

No Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: