I've built web scrapers with Claude Code for lead generation, competitor price monitoring, market research, and content aggregation. The results have been remarkable — what used to require hiring a developer for $3,000–$8,000 now takes me an afternoon and costs less than $1,000 in total build time. But here's the part nobody talks about: most businesses don't need a scraper. They need better data hygiene and a clearer question.
In this post I'll walk through when web scraping makes sense, how Claude Code handles the build, and the legal and technical guardrails you need to stay out of trouble. This isn't a tutorial for hobbyists — it's a practical guide for business owners who need real data at scale.
When Web Scraping Actually Makes Sense
Before you build a scraper, ask yourself: is the data I need available through an API? Most major platforms — LinkedIn, Google Maps, Yelp, industry directories — offer paid API access that's faster, more reliable, and legally cleaner than scraping. If an API exists and the cost is reasonable, use it. Scraping should be your second choice, not your first.
That said, there are legitimate use cases where scraping is the only option:
- Lead generation from niche directories — local business listings, regional association member pages, industry-specific directories that don't offer bulk export
- Competitor price monitoring — tracking product pricing, service packages, or promotional offers across competitor websites
- Market intelligence — aggregating public job postings, real estate listings, event calendars, or news from sources without RSS feeds
- Content aggregation for programmatic SEO — pulling structured public data (city demographics, business hours, product specs) to populate landing pages at scale
If your use case fits one of these patterns and there's no API alternative, scraping is worth exploring. If not, you're solving the wrong problem.
How Claude Code Builds a Web Scraper
Claude Code approaches web scraping the same way a developer would: it writes a script that fetches a web page, parses the HTML, extracts the target data, and outputs it in a usable format (usually JSON or CSV). The difference is speed. A task that would take a junior developer two days takes Claude Code about 90 minutes, including testing.
Here's the typical workflow I follow when building a scraper:
Step 1: Define the Target and Data Schema
I start by identifying the exact pages I want to scrape and the fields I need. For a lead generation scraper targeting Vancouver marketing agencies, that might be:
- Agency name
- Website URL
- Contact email (if publicly listed)
- Services offered
- Team size (if available)
Claude Code needs a clear schema. Vague instructions like "get all the info" produce messy output. Specificity matters.
Step 2: Inspect the Page Structure
I use browser DevTools to examine the HTML structure of the target pages. I'm looking for consistent selectors — CSS classes, IDs, or data attributes — that identify the elements I want to extract. If the site uses semantic HTML, this step is straightforward. If it's a JavaScript-heavy single-page app, I'll need to use a headless browser (more on that below).
Step 3: Write the Scraper Script
I give Claude Code a prompt like this:
You are building a web scraper in Python using requests and BeautifulSoup. Target URL: https://example.com/agencies Extract the following fields from each listing: - Agency name (inside h2.agency-title) - Website URL (href of a.website-link) - Contact email (text inside span.contact-email, if present) - Services (comma-separated list from ul.services li) Output a JSON array of objects. Handle pagination by following the "Next" link until none exists. Include error handling for missing fields and rate limiting (2 seconds between requests).
Claude Code generates a working script in about 60 seconds. I test it on a small sample (5–10 pages) to verify accuracy before running the full scrape.
Step 4: Handle Edge Cases
Real-world scraping always hits edge cases: pages with missing data, inconsistent HTML structure, anti-bot measures, or dynamic content loaded via JavaScript. Claude Code can handle most of these if you describe the problem clearly.
For JavaScript-heavy sites, I switch to Playwright or Puppeteer. The prompt changes to:
Rewrite this scraper to use Playwright. Launch a headless browser, navigate to the URL, wait for the element .agency-title to load, then extract the same fields. Close the browser after each page to avoid memory leaks.
Headless browsers are slower and more expensive to run, but they're the only reliable way to scrape modern web apps that rely on client-side rendering.
Legal and Ethical Guardrails
Web scraping exists in a legal gray area. In Canada and the US, scraping publicly accessible data is generally legal, but there are important limits:
- Respect robots.txt — if a site's robots.txt file disallows scraping, don't scrape it. This is the clearest signal of the site owner's intent.
- Implement rate limiting — don't hammer a server with hundreds of requests per second. I typically set a 1–3 second delay between requests to avoid triggering rate limits or causing performance issues.
- Don't scrape password-protected content — if you need to log in to access the data, you're likely violating the site's terms of service. Stick to public pages only.
- Avoid re-publishing proprietary data — scraping for internal use (lead lists, competitor analysis) is usually fine. Re-publishing that data publicly or selling it crosses into copyright and database rights issues.
When in doubt, consult a lawyer. I'm not one, and this isn't legal advice — just the guardrails I follow in my own work.
What I've Built with Web Scrapers
The most common scraper I build for clients is a lead generation tool. A Vancouver-based B2B consultant wanted a list of every co-working space in Western Canada with contact info. There was no API, no exportable directory — just a website with 200+ listings spread across multiple pages. I built a scraper that extracted name, address, website, and phone number from each listing and output a clean CSV. Total build time: 4 hours. Client saved an estimated 40 hours of manual data entry.
Another common use case is price monitoring. An e-commerce client wanted daily snapshots of competitor pricing for 50 SKUs. The scraper runs nightly via cron, checks each product page, extracts the current price, and logs it to a Google Sheet. When a competitor drops their price below a threshold, the client gets an automated Slack alert. This kind of real-time intelligence used to require a full-time analyst. Now it's a 20-line Python script.
I've also used scrapers to build programmatic SEO landing pages. For a real estate client, I scraped public MLS data (legally accessible, not behind a login wall) to generate neighborhood profile pages with median home prices, school ratings, and walk scores. That scraper fed directly into a Claude Code content pipeline that generated 150 landing pages in two days.
Common Mistakes to Avoid
The biggest mistake I see: building a scraper without a clear plan for what happens to the data afterward. Scraping is easy. Cleaning, normalizing, deduplicating, and integrating the data into your CRM or marketing automation system — that's where most projects stall. Before you build anything, map out the full pipeline from scrape to action.
Second mistake: scraping a site that changes frequently without building in error handling. Websites redesign. HTML structure changes. Selectors break. A scraper that works today might fail silently in six weeks. I always include logging and error alerts so I know immediately when something breaks.
Third mistake: over-engineering. You don't need a distributed scraping infrastructure for most business use cases. A single Python script running on your laptop or a $5/month VPS is enough to scrape thousands of pages per day. Start simple. Scale only if you actually need to.
How to Get Started
If you want to test this for your business, start with a small, well-defined project. Pick a single directory or competitor site, define exactly what data you need, and build a scraper for 10–20 pages first. Once you've verified the data quality and confirmed the legal and technical feasibility, scale up.
If you're not comfortable writing the prompts yourself, I build custom scrapers for clients as part of broader AI automation projects. The typical engagement includes the scraper itself, data normalization, and integration into your existing CRM or spreadsheet workflow.
And if you have questions about whether scraping is the right approach for your specific use case, the FAQ covers most of the common scenarios I've encountered.
Web scraping is a tool, not a strategy. Use it when the data you need doesn't exist anywhere else and when the legal and technical risks are manageable. Done right, it's one of the most cost-effective ways to build competitive intelligence and fuel growth. Done wrong, it's a liability. The difference is knowing when to scrape and when to look for a better solution.