Skip to content

Add scraper for rickbayless.com#1889

Open
joeygerovac wants to merge 1 commit intohhursev:mainfrom
joeygerovac:add-rickbayless-scraper
Open

Add scraper for rickbayless.com#1889
joeygerovac wants to merge 1 commit intohhursev:mainfrom
joeygerovac:add-rickbayless-scraper

Conversation

@joeygerovac
Copy link
Copy Markdown

Add scraper for rickbayless.com

Adds support for scraping recipes from rickbayless.com.

Site structure

Rick Bayless's site uses custom HTML markup with no JSON-LD structured
data, requiring a dedicated scraper. Recipe data is stored in semantic
class names:

  • Title: <h1> inside div.page-header
  • Description: div.recipe-description
  • Ingredients: <li itemprop="ingredients"> with separate spans for
    quantity/unit, name, and preparation notes
  • Instructions: <p> tags inside div.recipe-instructions

Testing

Tested against multiple recipes including:

  • Simple recipes (classic guacamole)
  • Recipes with quantity-less ingredients ("Salt")
  • Recipes with linked ingredient names (anchor tags)
  • Single and multi-paragraph instruction sets
  • Recipes with and without serving size data

All 1030 tests pass.

Edge cases handled

  • Ingredients with no quantity parse correctly
  • Anchor tags within ingredient names are stripped to plain text
  • yields() returns None when serving size div is empty
  • image() falls back gracefully when no og:image tag exists

Adds support for scraping recipes from rickbayless.com using custom
HTML parsing (no schema.org markup on the site). Extracts title,
description, ingredients, instructions, image, and yields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant