Skip to content

Added support for naturalharry.au#1430

Open
JackSun815 wants to merge 4 commits intohhursev:mainfrom
JackSun815:naturalharry_scraper
Open

Added support for naturalharry.au#1430
JackSun815 wants to merge 4 commits intohhursev:mainfrom
JackSun815:naturalharry_scraper

Conversation

@JackSun815
Copy link
Copy Markdown

Pull Request: Add New Scraper for Natural Harry Recipes

This pull request introduces a new scraper for recipes hosted on naturalharry.au. The scraper is implemented as a subclass of AbstractScraper and provides support for extracting various recipe details from the site. The implementation has been thoroughly tested to ensure compatibility and correctness.

Features Added:

  • Scraper Functionality:
    The scraper extracts the following details for recipes:

    • Host URL: Identifies the source website.
    • Author: Captures the recipe author, e.g., "Harry."
    • Title: Extracts the title of the recipe.
    • Languages: Determines the language of the recipe, e.g., "en-US."
    • Description: Extracts a concise description of the recipe.
    • Category: (if available) Identifies the recipe category.
    • Total Time: Parses and calculates the total preparation and cooking time.
    • Ingredients: Accurately extracts and formats ingredients from the recipe content.
    • Instructions: Captures the step-by-step instructions, ensuring no extraneous content is included.
    • Image: Retrieves the main image associated with the recipe.
    • Yields: Extracts the yield/serving size, e.g., "about 10 tacos."
    • Cuisine: (if available) Identifies the cuisine type.
  • Testing:
    Test cases have been added to validate the scraper's functionality:

    • JSON test cases for all supported fields, ensuring accurate parsing and alignment with expected outputs.

How to Test:

  1. Run the scraper on the naturalharry.au recipes using the following command:
    python -m unittest -k naturalharry
  2. Validate that all test cases pass and extracted fields match the expected outputs in the JSON test files.
  3. Ensure the scraper handles variations in recipe formatting gracefully.

Future Improvements:

  • Dynamic error handling for unexpected changes in the site's HTML structure.

@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this file altogether

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants