This powerful Python scraper allows you to extract comprehensive search result data from Costco using the Playwright framework. It is designed to navigate Costco's product listings, bypass common anti-bot hurdles using ScrapeOps integration, and capture detailed product information including pricing, ratings, and availability in a structured format.
- What This Scraper Extracts
- Quick Start
- Supported URLs
- Configuration
- Output Schema
- Anti-Bot Protection
- How It Works
- Error Handling & Troubleshooting
- Alternative Implementations
- breadcrumbs (array): Navigation path leading to the current search results.
- pagination (object): Details on current page, total pages, and navigation links.
- products (array): Comprehensive list of product items containing:
- Product name (string)
- Product ID (string)
- Price (number) and Currency (string)
- Ratings (value and count)
- Availability status
- Images (array of URLs and alt text)
- Promotions and Badges (e.g., "Instant Savings")
- recommendations: Suggested products based on the search.
- relatedSearches (array): List of similar search terms.
- searchMetadata (object): Metadata regarding the search query and results count.
- sponsoredProducts (array): List of promoted items appearing in search.
- Python 3.7 or higher
- pip package manager (for Python) or npm (for Node.js)
- Install required dependencies:
pip install playwright beautifulsoup4-
Get your ScrapeOps API key from https://scrapeops.io/app/register/ai-scraper
-
Update the API key in the scraper:
API_KEY = "YOUR-API-KEY"- Navigate to the scraper directory:
cd python/playwright/product_search- Edit the URLs in
scraper/costco_scraper_product_search_v1.py:
# In costco_scraper_product_search_v1.py
url = "https://www.costco.com/s?keyword=laptop"- Run the scraper:
python costco_scraper_product_search_v1.pyThe scraper will generate a timestamped JSONL file (e.g., Costco_com_product_search_page_scraper_data_20260114_120000.jsonl) containing all extracted data.
See example-data/product_search.json for a sample of the extracted data structure.
The scraper supports the following URL patterns:
http://residential-proxy.scrapeops.io:8181https://www.costco.comhttps://www.costco.com/s?keyword=laptop
The scraper supports several configuration options. See the scraper code for available parameters.
The scraper can use ScrapeOps for anti-bot protection and request optimization:
# ScrapeOps proxy configuration
proxy_url = f"https://proxy.scrapeops.io/v1/?api_key={API_KEY}"ScrapeOps Features:
- Proxy rotation (may help reduce IP blocking)
- Request header optimization (can help reduce detection)
- Rate limiting management
- Note: CAPTCHA challenges may occur depending on site behavior and cannot be guaranteed to be resolved automatically
The scraper outputs data in JSONL format (one JSON object per line). Each object contains:
| Field | Type | Description |
|---|---|---|
breadcrumbs |
array | Array of objects |
pagination |
object | Object containing nested fields |
products |
array | Array of objects |
recommendations |
null | null value |
relatedSearches |
array | Array |
searchMetadata |
object | Object containing nested fields |
sponsoredProducts |
array | Array |
- breadcrumbs (array): Contains a list of object items
- pagination (object): Contains nested data
- products (array): Contains a list of object items
- recommendations (null): See field details
- relatedSearches (array): Contains a list of items
- searchMetadata (object): Contains nested data
- sponsoredProducts (array): Contains a list of items
Field Details:
The products array contains deeply nested objects including aggregateRating (ratingValue, reviewCount), images (url, altText), and promotions (discount descriptions and end dates). pagination includes boolean flags like hasNextPage to facilitate crawling.
This scraper can integrate with ScrapeOps to help handle Costco's anti-bot measures:
Costco may employ various anti-scraping measures including:
- Rate limiting and IP blocking
- Browser fingerprinting
- CAPTCHA challenges (may occur depending on site behavior)
- JavaScript rendering requirements
- Request pattern analysis
The scraper can use ScrapeOps proxy service which may provide:
- Proxy Rotation: May help distribute requests across multiple IP addresses
- Request Optimization: May optimize headers and request patterns to reduce detection
- Retry Logic: Built-in retry mechanism with exponential backoff
Note: Anti-bot measures vary by site and may change over time. CAPTCHA challenges may occur and cannot be guaranteed to be resolved automatically. Using proxies and browser automation can help reduce blocking, but effectiveness depends on the target site's specific anti-bot measures.
- Sign up for a free account at https://scrapeops.io/app/register/ai-scraper
- Get your API key from the dashboard
- Replace
YOUR-API-KEYin the scraper code - The scraper can use ScrapeOps for requests (if configured)
Free Tier: ScrapeOps offers a generous free tier perfect for testing and small-scale scraping.
This scraper uses Playwright for browser automation, which allows it to render JavaScript just like a real user. It utilizes playwright-stealth to reduce the likelihood of detection by hiding automated browser traits.
The workflow is as follows:
- Navigation: The scraper launches a headless browser and navigates to the specified Costco search URL.
- Extraction: It uses CSS selectors and regex to identify product containers, pricing, and metadata.
- Data Processing: Extracted strings are cleaned (HTML stripped), currency is detected, and URLs are converted to absolute paths.
- Pipeline: Data is passed through a
DataPipelinethat checks for duplicates before appending the structured data into a JSONL file.
- Missing Selectors: If Costco updates their website layout, selectors may break. Check the logs to see where extraction fails.
- Proxy Issues: If you encounter 403 Forbidden errors, ensure your ScrapeOps API key is active and has remaining credits.
- Timeouts: For slow-loading pages, increase the Playwright timeout settings in the code.
Enable detailed logging:
# Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)This will show:
- Request URLs and responses
- Extraction steps
- Parsing errors
- Retry attempts
The scraper includes retry logic with configurable retry attempts and exponential backoff.
This repository provides multiple implementations for scraping Costco Product Search pages:
Use BeautifulSoup/Cheerio when:
- You need fast, lightweight scraping
- JavaScript rendering is not required
- You want minimal dependencies
- You're scraping simple HTML pages
Use Playwright when:
- You need modern browser automation with excellent API
- You want built-in waiting and auto-waiting features
- You need cross-browser support (Chromium, Firefox, WebKit)
- You want reliable network interception
Use Puppeteer when:
- You only need Chromium/Chrome support
- You want a mature, stable API
- You need Chrome DevTools Protocol features
- You prefer a smaller dependency footprint
Use Selenium when:
- You need maximum browser compatibility
- You're working with legacy systems
- You need WebDriver standard compliance
- You want the most widely-used framework
The scraper supports concurrent requests. See the scraper code for configuration options.
Recommendations:
- Start with minimal concurrency for testing
- Gradually increase based on your ScrapeOps plan limits
- Monitor for rate limiting or blocking
Data is saved in JSONL format (one JSON object per line):
- Efficient for large datasets
- Easy to process line-by-line
- Can be imported into databases or data processing tools
- Each line is a complete, valid JSON object
The scraper processes data incrementally:
- Products are written to file immediately after extraction
- No need to load entire dataset into memory
- Suitable for scraping large pages
- Respect Rate Limits: Use appropriate delays and concurrency settings
- Monitor ScrapeOps Usage: Track your API usage in the ScrapeOps dashboard
- Handle Errors Gracefully: Implement proper error handling and logging
- Validate URLs: Ensure URLs are valid Costco pages before scraping
- Update Selectors: Costco may change HTML structure; update selectors as needed
- Test Regularly: Test scrapers regularly to catch breaking changes early
- ScrapeOps Documentation: https://scrapeops.io/docs/intro/
- Framework Documentation: See framework-specific documentation
- Example Output: See
example-data/product_search.jsonfor sample data structure - Scraper Code: See
scraper/costco_scraper_product_search_v1.pyfor implementation details
This scraper is provided as-is for educational and commercial use. Please ensure compliance with Costco's Terms of Service and robots.txt when using this scraper.