Fix: halfbakedharvest.com instruction list parsing#1887
Open
Nikash-B wants to merge 5 commits intohhursev:mainfrom
Open
Fix: halfbakedharvest.com instruction list parsing#1887Nikash-B wants to merge 5 commits intohhursev:mainfrom
Nikash-B wants to merge 5 commits intohhursev:mainfrom
Conversation
Adds a site-specific instructions parser for Half Baked Harvest when recipeInstructions contains a single HowToStep with multiple numbered steps concatenated together. Splits the inline numbered steps so instructions() returns newline-separated steps and instructions_list() produces a proper list.
Add a Half Baked Harvest-specific parser for schema instructions that are emitted as one concatenated HowToStep, splitting them into separate steps and stripping numeric prefixes. Update the Half Baked Harvest test fixtures to match the corrected instructions_list output and align with existing fixture conventions.
Adds a test for recipe where halfbakedharvest.com emits recipeInstructions as a single HowToStep with all steps concatenated and inline-numbered — the shape reported in issue hhursev#1880 that the scraper override was written to fix. Existing fixtures only exercise the well-formed path, leaving the splitter untested; this fixture exercises the full override and raises coverage of halfbakedharvest.py from 30% to 83%.
Swap the Brown Butter Orzo capture (redundant with _concatenated) for a Chocolate Chip Banana Bread capture whose recipeInstructions is a multi-element HowToStep list, exercising the schema.instructions() fallback at halfbakedharvest.py:36. Brings statement coverage to 100%.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #1880
This PR improves the extraction and formatting of recipe instructions for the
halfbakedharvest.comscraper. The main change is to split and clean up inline numbered instructions to keep a consistent format for the instructions list.Improvements to instruction parsing and formatting:
halfbakedharvest.pyto detect and split inline numbered steps in the instructions, removing leading step numbers for clarity. On failure, the new code falls back to returning the entire instruction string as was done previously.Test data updates for improved instruction extraction:
halfbakedharvest.jsonandhalfbakedharvest_groups.jsontest data to reflect the new behavior: each step in the instructions list is now a separate string, matching the new parsing logic. [1] [2]