Add support for quitoque.fr #1163

romainchassaigne · 2024-07-02T10:47:05Z

Resolved the issue #725

Added quitoque.fr scrapper using graphql API.
Added tests using legacy tests because it uses an API.

Please let me know if anything is wrong as this is my first pull request to this project.

jayaddison

Hi @romainchassaigne - thank you for developing this!

I notice that the size of the JSON returned from the API is fairly large - more than 10 megabytes. Also, most of the information we'd like to extract from the recipe does appear already in the HTML, without making that second (large) network request.

Perhaps we could retrieve the information from the HTML? Or otherwise, at least reduce the size of the network request and test data?

romainchassaigne · 2024-07-10T08:01:30Z

Hi @jayaddison ! Thank you for your reply.

I'll try to reduce the graphql request. Maybe it could be possible to only get the 2 serving (as this is the only information I want) recipe.

romainchassaigne · 2024-07-15T07:39:53Z

Hi @jayaddison !

I've successfuly reduced the query from 10MB to approx 302KB. I've refined the graphql query only with the informations I really want. Unfortunnatly I have tried to remove "extensions" from the query, but it seems to be related to the server.

Please let me know if anything needs to be corrected.

recipe_scrapers/quitoque.py

Co-authored-by: James Addison <[email protected]>

recipe_scrapers/quitoque.py

romainchassaigne · 2024-07-16T16:57:01Z

Quitoque.fr have changed the way to retrieve the recipes. I need to change everything to scrap the html instead. GraphQL API is not available anymore.

jayaddison · 2024-07-16T17:23:19Z

Drats - sorry to hear that, @romainchassaigne. Would you like to hold this pull request open for a while in case that API returns? Thanks for making the suggested adjustments along the way so far (despite it seemingly not working out!).

romainchassaigne · 2024-07-17T07:24:18Z

Drats - sorry to hear that, @romainchassaigne. Would you like to hold this pull request open for a while in case that API returns? Thanks for making the suggested adjustments along the way so far (despite it seemingly not working out!).

No I think I'm good to rework using the HTML scrapper, it seems to be simpler than later.

romainchassaigne · 2024-07-18T09:15:01Z

Hi @jayaddison, I'm running into a weird error while running the tests.

======================================================================
FAIL: tests/test_data/quitoque.fr/quitoque.json (tests.RecipeTestCase.tests/test_data/quitoque.fr/quitoque.json) [canonical_url]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/romain.chassaigne/Documents/Perso/recipe-scrapers/recipe-scrapers/tests/__init__.py", line 92, in test_func
    self.assertEqual(
AssertionError: 'https://quitoque.fr/products/cuisse-de-ca[48 chars]3720' != 'quitoque.fr'
- https://quitoque.fr/products/cuisse-de-canard-au-romarin-et-pommes-de-terre-sarladaises-13720
+ quitoque.fr
 : The actual value for .canonical_url() did not match the expected value.

----------------------------------------------------------------------
Ran 1 test in 0.046s

FAILED (failures=1)

This is (partially) my quitoque.json file :

{
  "author": "QuiToque",
  "canonical_url": "https://quitoque.fr/products/cuisse-de-canard-au-romarin-et-pommes-de-terre-sarladaises-13720",
  "site_name": null,
  "host": "quitoque.fr",
  "language": "fr",
  "title": "Cuisse de canard au romarin et pommes de terre sarladaises"
}

It seems that cannonical URL is waiting for host instead of canonical_url. Do you have any clues ? Maybe I should open an issue instead ?

EDIT:
I got it, it is simply because testhtml (and HTML pages) does not have a link with canonical relationship. What shoud I do between thos 2 solutions :

Set the canonical_url as quitoque.fr
Add manually something like this into the quitoque.testhtml file

<link href="https://www.quitoque.fr/products/cuisse-de-canard-au-romarin-et-pommes-de-terre-sarladaises-13720" rel="canonical">

EDIT2:

I found another way implemented in #1131

jayaddison · 2024-07-18T14:17:34Z

Hi again @romainchassaigne!

It seems that cannonical URL is waiting for host instead of canonical_url. Do you have any clues ? Maybe I should open an issue instead ?

EDIT: I got it, it is simply because testhtml (and HTML pages) does not have a link with canonical relationship.

Yep, you figured it out - that's exactly it 👍

If I remember correctly, the library uses the input URL as a fallback for this if it can't be retrieved from the page; but during the test suite, there is no web URL.

I found another way implemented in #1131

Perfect, good find. If this becomes more common across websites, we could move this into a utility function for re-use by others.

I'll review the HTML-based retrieval approach in this pull request soon (either later today or tomorrow) - thanks for implementing it!

recipe_scrapers/quitoque.py

tests/test_data/quitoque.fr/quitoque.json

recipe_scrapers/quitoque.py

Co-authored-by: James Addison <[email protected]>

jayaddison · 2024-08-01T14:15:32Z

recipe_scrapers/quitoque.py

+
+    def keywords(self):
+        product_tags = self.soup.find(id="product-tags").find_all(class_="badge")
+        keywords = ",".join(self._get_text(tag) for tag in product_tags)


I'm not 100% sure about this -- your thoughts are welcome -- but I think it might be safer to filter out None values here and in similar string-join cases.

I would suggest:

Suggested change

keywords = ",".join(self._get_text(tag) for tag in product_tags)

keywords = ",".join(normalize_string(tag.text) for tag in product_tags if tag and tag.text)

jayaddison · 2024-08-01T14:17:13Z

Apologies for the delay @romainchassaigne - the pull request generally looks good to me; thank you for accommodating my code review suggestions. I have one more optional change suggestion, and this should be ready for merge soon.

feat: add scrapper for quitoque

6a831f7

romainchassaigne changed the title ~~feat: add scrapper for quitoque #725~~ feat: add scrapper for quitoque Jul 2, 2024

jayaddison reviewed Jul 8, 2024

View reviewed changes

fix(quitoque): refine graphql request

b301144

jayaddison reviewed Jul 15, 2024

View reviewed changes

recipe_scrapers/quitoque.py Outdated Show resolved Hide resolved

fix(quitoque): query is now multiline

1092b56

jayaddison reviewed Jul 15, 2024

View reviewed changes

recipe_scrapers/quitoque.py Outdated Show resolved Hide resolved

jayaddison reviewed Jul 15, 2024

View reviewed changes

recipe_scrapers/quitoque.py Outdated Show resolved Hide resolved

Update recipe_scrapers/quitoque.py

7d567eb

Co-authored-by: James Addison <[email protected]>

jayaddison reviewed Jul 15, 2024

View reviewed changes

recipe_scrapers/quitoque.py Outdated Show resolved Hide resolved

jayaddison changed the title ~~feat: add scrapper for quitoque~~ Add support for quitoque.fr Jul 15, 2024

fix(quitoque): category are added in a set

948f289

romainchassaigne added 2 commits July 18, 2024 11:17

Merge branch 'hhursev:main' into quitoque

c59ea30

fix(quitoque): scraping html instead of graphql

ef2f1a7

jayaddison reviewed Jul 18, 2024

View reviewed changes

recipe_scrapers/quitoque.py Show resolved Hide resolved

jayaddison reviewed Jul 18, 2024

View reviewed changes

recipe_scrapers/quitoque.py Outdated Show resolved Hide resolved

jayaddison reviewed Jul 18, 2024

View reviewed changes

recipe_scrapers/quitoque.py Outdated Show resolved Hide resolved

jayaddison reviewed Jul 18, 2024

View reviewed changes

recipe_scrapers/quitoque.py Outdated Show resolved Hide resolved

jayaddison reviewed Jul 18, 2024

View reviewed changes

recipe_scrapers/quitoque.py Outdated Show resolved Hide resolved

jayaddison reviewed Jul 18, 2024

View reviewed changes

tests/test_data/quitoque.fr/quitoque.json Show resolved Hide resolved

jayaddison reviewed Jul 18, 2024

View reviewed changes

recipe_scrapers/quitoque.py Outdated Show resolved Hide resolved

romainchassaigne and others added 2 commits July 19, 2024 09:50

fix(quitoque): correcting keywords ordering

b8be6be

Co-authored-by: James Addison <[email protected]>

refactor(quitoque): improve readability

9d3160a

jayaddison reviewed Aug 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for quitoque.fr #1163

Add support for quitoque.fr #1163

romainchassaigne commented Jul 2, 2024

jayaddison left a comment

romainchassaigne commented Jul 10, 2024

romainchassaigne commented Jul 15, 2024

romainchassaigne commented Jul 16, 2024

jayaddison commented Jul 16, 2024

romainchassaigne commented Jul 17, 2024

romainchassaigne commented Jul 18, 2024 •

edited

Loading

jayaddison commented Jul 18, 2024

jayaddison Aug 1, 2024

jayaddison commented Aug 1, 2024

	keywords = ",".join(self._get_text(tag) for tag in product_tags)
	keywords = ",".join(normalize_string(tag.text) for tag in product_tags if tag and tag.text)

Add support for quitoque.fr #1163

Are you sure you want to change the base?

Add support for quitoque.fr #1163

Conversation

romainchassaigne commented Jul 2, 2024

jayaddison left a comment

Choose a reason for hiding this comment

romainchassaigne commented Jul 10, 2024

romainchassaigne commented Jul 15, 2024

romainchassaigne commented Jul 16, 2024

jayaddison commented Jul 16, 2024

romainchassaigne commented Jul 17, 2024

romainchassaigne commented Jul 18, 2024 • edited Loading

jayaddison commented Jul 18, 2024

jayaddison Aug 1, 2024

Choose a reason for hiding this comment

jayaddison commented Aug 1, 2024

romainchassaigne commented Jul 18, 2024 •

edited

Loading