ottolenghi scraper #1209

jacksgreen · 2024-08-12T19:39:07Z

No description provided.

jknndy · 2024-08-12T21:10:58Z

Hi @jacksgreen , thanks for the contribution! Just a few things to fix up here

The tests failing require a few fixes

- Our test jsons now follow a specific order for the expected keys, which is tested through tests\library\test_json_order.py . There is a helper script to auto-reorder the keys correctly which can be run via python .\scripts\reorder_json_keys.py
- You'll need to add a readme entry under the Scrapers available for: header (in alphabetical order). This can be tested via tests\library\test_readme.py

Improvements:

There is additional information available on this page that would be great to include in the output! Some of which may require a custom implementation.

- Category: this is available via recipe schema, you'll just need to add a test json entry. "category": "Desserts",
- Ingredient_groups: this site, even this specific test recipe, include ingredient_groups which functionality could be added for. This would require an additional test case showing not-grouped ingredients being added.
- yields, cook_time & prep_time all of this information is available on the page but will require custom implementations. *cook_time is displayed as 5o on this recipe but this appears to be an error, other recipes list it with a 0.

jacksgreen · 2024-08-13T19:17:12Z

Hi @jacksgreen , thanks for the contribution! Just a few things to fix up here

The tests failing require a few fixes

Our test jsons now follow a specific order for the expected keys, which is tested through tests\library\test_json_order.py . There is a helper script to auto-reorder the keys correctly which can be run via python .\scripts\reorder_json_keys.py

You'll need to add a readme entry under the Scrapers available for: header (in alphabetical order). This can be tested via tests\library\test_readme.py

Improvements:

There is additional information available on this page that would be great to include in the output! Some of which may require a custom implementation.

Category: this is available via recipe schema, you'll just need to add a test json entry. "category": "Desserts",

Ingredient_groups: this site, even this specific test recipe, include ingredient_groups which functionality could be added for. This would require an additional test case showing not-grouped ingredients being added.

yields, cook_time & prep_time all of this information is available on the page but will require custom implementations. *cook_time is displayed as 5o on this recipe but this appears to be an error, other recipes list it with a 0.

@jknndy Thanks for the feedback, I've updated everything in the pr!

jknndy · 2024-08-13T21:17:42Z

Just one last thing and then it should be good to go!

For sites that have ingredient_groups support we like to have two test cases (one with groupings & one without) included. Usually with the naming convention...
sitename_1.json & testhtml for the case without groupings example page
sitename_2.json & testhmtl for case with. (the existing test data)

jayaddison · 2024-09-02T15:59:12Z

A note for anyone who gets a bit confused by this, as I did briefly: we recently added support for books.ottolenghi.co.uk -- but the recipes listed within that website are published distinctly from the recipes on ottolenghi.co.uk. I think the existing scraper-selection code (that uses the SCRAPERS dictionary) should handle this without a problem.

jacksgreen · 2024-09-13T11:23:41Z

Just one last thing and then it should be good to go!

For sites that have ingredient_groups support we like to have two test cases (one with groupings & one without) included. Usually with the naming convention... sitename_1.json & testhtml for the case without groupings example page sitename_2.json & testhmtl for case with. (the existing test data)

Hey @jknndy,
I've added the second test case, let me know if it's all good to merge!

jknndy · 2024-09-17T22:12:21Z

recipe_scrapers/ottolenghi.py

@@ -0,0 +1,69 @@
+from ._abstract import AbstractScraper
+from ._grouping_utils import group_ingredients
+


Suggested change

from ._utils import get_minutes, get_yields

jknndy · 2024-09-17T22:22:55Z

recipe_scrapers/ottolenghi.py

+    def yields(self):
+        return (
+            self.soup.find("div", class_="c-recipe-header__timings")
+            .find("span")
+            .get_text(strip=True)
+        )
+
+    def prep_time(self):
+        return (
+            self.soup.find("div", class_="c-recipe-header__timings")
+            .find_all("span")[1]
+            .get_text(strip=True)
+        )
+
+    def cook_time(self):
+        return (
+            self.soup.find("div", class_="c-recipe-header__timings")
+            .find_all("span")[2]
+            .get_text(strip=True)
+        )


Suggested change

def yields(self):

return (

self.soup.find("div", class_="c-recipe-header__timings")

.find("span")

.get_text(strip=True)

)

def prep_time(self):

return (

self.soup.find("div", class_="c-recipe-header__timings")

.find_all("span")[1]

.get_text(strip=True)

)

def cook_time(self):

return (

self.soup.find("div", class_="c-recipe-header__timings")

.find_all("span")[2]

.get_text(strip=True)

)

def _extract_timing_elements(self, prefix):

timings = self.soup.find("div", class_="c-recipe-header__timings").find_all("span")

for timing in timings:

if prefix.lower() in timing.get_text().lower():

return timing.get_text()

def yields(self):

yield_text = self._extract_timing_elements("serves")

return get_yields(yield_text)

def prep_time(self):

prep_text = self._extract_timing_elements("prep")

return get_minutes(prep_text)

def cook_time(self):

cook_text = self._extract_timing_elements("cook")

if cook_text and '5o' in cook_text:

cook_text = cook_text.replace('5o', '50')

return get_minutes(cook_text)

What do you think about refactoring this to use a shared helper and using the starting text as the matching parameter instead of position?

Also I implemented two existing utils get_minutes and get_yields to normalize the fields outputs. this will require some changes to the test JSONs

Also in cook_time I added coverage for an error on the recipe page where 50 is displayed as 5o

ottolenghi scraper

5b3b86e

jacksgreen added 3 commits August 13, 2024 22:06

adding extra data

0d965ea

adding extra data

a6d5e5b

linting

2342bd1

jacksgreen and others added 2 commits September 13, 2024 14:20

adding second test case

d0421b6

Merge branch 'main' into main

376602d

jknndy reviewed Sep 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ottolenghi scraper #1209

ottolenghi scraper #1209

jacksgreen commented Aug 12, 2024

jknndy commented Aug 12, 2024 •

edited

Loading

jacksgreen commented Aug 13, 2024

jknndy commented Aug 13, 2024

jayaddison commented Sep 2, 2024

jacksgreen commented Sep 13, 2024

jknndy Sep 17, 2024

jknndy Sep 17, 2024

jknndy Sep 17, 2024 •

edited

Loading

		@@ -0,0 +1,69 @@
		from ._abstract import AbstractScraper
		from ._grouping_utils import group_ingredients

ottolenghi scraper #1209

Are you sure you want to change the base?

ottolenghi scraper #1209

Conversation

jacksgreen commented Aug 12, 2024

jknndy commented Aug 12, 2024 • edited Loading

jacksgreen commented Aug 13, 2024

jknndy commented Aug 13, 2024

jayaddison commented Sep 2, 2024

jacksgreen commented Sep 13, 2024

jknndy Sep 17, 2024

Choose a reason for hiding this comment

jknndy Sep 17, 2024

Choose a reason for hiding this comment

jknndy Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

jknndy commented Aug 12, 2024 •

edited

Loading

jknndy Sep 17, 2024 •

edited

Loading