The Problem: The "Fragile Scraper" Syndrome

You spend hours mapping out a website's HTML, only for the developer to update the site a week later. Suddenly, your class names like class="price-v2-blue" are gone, replaced by class="sc-1px7j9".

Your script crashes, your database stays empty, and you have to manually rewrite your selectors. This is the brittleness problem in web scraping—where minor UI updates break the entire data pipeline.

The Solution: Building Resilient Data Pipelines

To stop your scrapers from breaking every time a button changes color, move away from rigid selectors and use these "defensive" scraping techniques:

1. Avoid Highly Specific Selectors

Stop using "Copy Selector" from Chrome, which gives you long paths like: div > div:nth-child(2) > span > b. Instead, look for attributes that are unlikely to change, such as itemprop, data-testid, or aria-labels.

Bad: div.main > div.content > h1

Good: h1[itemprop="name"]

2. The "Hidden API" First Strategy

Before you even touch the HTML, open your browser’s Network Tab and filter by Fetch/XHR. Many modern sites fetch their data as a clean JSON file. If you scrape the API directly:

The data structure is consistent.

The site layout can change completely without affecting your script.

It is significantly faster than parsing HTML.

3. Implement Schema Validation

Use tools like Pydantic (for Python) to validate your data. If the price field suddenly returns a string instead of a number, your script should alert you immediately rather than polluting your database with bad data.

Summary

Websites are living organisms—they change constantly. By prioritizing API discovery and semantic selectors, you can build scrapers that last months instead of days. Don't just scrape hard; scrape smart.