Skip to content

Fix: halfbakedharvest.com instruction list parsing#1887

Open
Nikash-B wants to merge 5 commits intohhursev:mainfrom
Nikash-B:fix/halfbakedharvest-instruction-list-parsing
Open

Fix: halfbakedharvest.com instruction list parsing#1887
Nikash-B wants to merge 5 commits intohhursev:mainfrom
Nikash-B:fix/halfbakedharvest-instruction-list-parsing

Conversation

@Nikash-B
Copy link
Copy Markdown

@Nikash-B Nikash-B commented Apr 9, 2026

Resolves #1880

This PR improves the extraction and formatting of recipe instructions for the halfbakedharvest.com scraper. The main change is to split and clean up inline numbered instructions to keep a consistent format for the instructions list.

Improvements to instruction parsing and formatting:

  • Added logic in halfbakedharvest.py to detect and split inline numbered steps in the instructions, removing leading step numbers for clarity. On failure, the new code falls back to returning the entire instruction string as was done previously.

Test data updates for improved instruction extraction:

  • Updated halfbakedharvest.json and halfbakedharvest_groups.json test data to reflect the new behavior: each step in the instructions list is now a separate string, matching the new parsing logic. [1] [2]

Nikash-B added 3 commits April 9, 2026 14:50
Adds a site-specific instructions parser for Half Baked Harvest when recipeInstructions contains a single HowToStep with multiple numbered steps concatenated together. Splits the inline numbered steps so instructions() returns newline-separated steps and instructions_list() produces a proper list.
Add a Half Baked Harvest-specific parser for schema instructions that are emitted as one concatenated HowToStep, splitting them into separate steps and stripping numeric prefixes. Update the Half Baked Harvest test fixtures to match the corrected instructions_list output and align with existing fixture conventions.
@jknndy jknndy self-requested a review April 9, 2026 20:07
Adds a test for recipe where halfbakedharvest.com emits recipeInstructions as a single HowToStep with all steps concatenated and inline-numbered — the shape reported in issue hhursev#1880 that the scraper override was written to fix. Existing fixtures only exercise the well-formed path, leaving the splitter untested; this fixture exercises the full override and raises coverage of halfbakedharvest.py from 30% to 83%.
Swap the Brown Butter Orzo capture (redundant with _concatenated) for a
Chocolate Chip Banana Bread capture whose recipeInstructions is a
multi-element HowToStep list, exercising the schema.instructions()
fallback at halfbakedharvest.py:36. Brings statement coverage to 100%.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scraper issue with halfbakedharvest.com

1 participant