Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use openfoodfacts-query service for facet queries instead of product_tags collection #8947

Merged
merged 31 commits into from
Oct 9, 2023

Conversation

john-gom
Copy link
Contributor

@john-gom john-gom commented Sep 4, 2023

  • Include unit tests for new functionality
  • If you have multiple commits please combine them into one commit by squashing them.

What

Use the Postgres-based query service for counts and aggregations. Updated the product_tags refresh scripts to refresh this database instead

Related issue(s) and discussion

@john-gom john-gom requested a review from a team as a code owner September 4, 2023 14:57
@github-actions github-actions bot added MongoDB We have 2 mongodb collections: one for current products, and one for obsolete products 🧴 Open Beauty Facts Our cosmetic analysis project https://world.openbeautyfacts.org 📚 Documentation Documentation issues improve the project for everyone. 🐾 Open Pet Food Facts Our pet food analysis project https://world.openpetfoodfacts.org 📸 Open Products Facts Our project to increase the lifespan of objects. https://world.openproductsfacts.org JavaScript GitHub Actions Pull requests that update Github_actions code Display config labels Sep 4, 2023
@john-gom john-gom changed the title feat: Use openfoodfacts-query servie for facet queries instead of product_tags collection feat: Use openfoodfacts-query service for facet queries instead of product_tags collection Sep 4, 2023
Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Kudos @john-gom (for the off-query part firstly !).

lib/ProductOpener/Display.pm Outdated Show resolved Hide resolved
lib/ProductOpener/Display.pm Outdated Show resolved Hide resolved

$count_results = [$count_results->all]->[0];
if ((not $@) and (defined $count_results)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there again, I think it's wiser to save $@ just after the eval block, to avoid a future mistake.

Comment on lines 35 to 38
my $path = "$query_url/importfrommongo?from=$from";
print STDERR "Importing to $path\n";
my $ua = LWP::UserAgent->new();
$ua->get($path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this query returning quickly ? (maybe it should)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does wait until the refresh has been completed before returning at the moment. It will be quick if it is an incremental update, but takes about 1.5 hours to do a full load.

lib/ProductOpener/Display.pm Outdated Show resolved Hide resolved
@stephanegigandet
Copy link
Contributor

@john-gom
Copy link
Contributor Author

I had to do a small change to run it in a dev environment in Linux:

host.docker.internal is not available in Linux by default (only Windows and Mac), but it could be added like this: (it may have unwanted side effects)

+++ b/docker/dev.yml
@@ -20,6 +20,10 @@ x-backend-conf: &backend-conf
     - ./docker/docker-entrypoint.sh:/docker-entrypoint.sh
     # we use this for debugging, from times to times
     - ./debug:/mnt/podata/debug/
+  # Allow the container to connect to the host
+  # this is needed to access the dev openfoodfacts-query running on the host
+  extra_hosts:
+    - "host.docker.internal:host-gateway"

Hi @stephanegigandet , just to clarify, you should only need this kind of configuration if something running inside docker needs to access a service running on the host. So this would either be the query service running in docker trying to access MongoDB running on the host (not in Docker) or Product Opener running in Docker trying to access the Query service running on the host (not in Docker). If the Query service is running in Docker and is accessing MongoDB running in docker then it should be able to access the MongoDB database using the "mongodb" host name as the Query service joins the po_default network. What scenario were you trying?

@john-gom
Copy link
Contributor Author

For queries that include a country filter, the country filter is not taken into account.

e.g. http://uk.openfoodfacts.localhost/label/fair-trade/misc

[Nest] 1  - 09/14/2023, 8:46:11 AM   DEBUG [QueryService] Object:
[
  {
    "$match": {
      "labels_tags": "en:fair-trade",
      "countries_tags": "en:united-kingdom"
    }
  },
  {
    "$unwind": "$misc_tags"
  },
  {
    "$group": {
      "_id": "$misc_tags",
      "count": {
        "$sum": 1
      }
    }
  },
  {
    "$sort": {
      "count": -1
    }
  },
  {
    "$skip": 0
  },
  {
    "$limit": 10000
  }
]

[Nest] 1  - 09/14/2023, 8:46:11 AM   DEBUG [QueryService] select value _id, count(*) count from "query"."product_misc_tag" as "pt" where (not pt.obsolete) and (EXISTS (select "pt2".* from "query"."product_labels_tag" as "pt2" where (pt2.product_id = pt.product_id and pt2.value = 'en:fair-trade'))) group by "pt"."value" order by 2 desc limit 10000
[Nest] 1  - 09/14/2023, 8:46:11 AM   DEBUG [QueryService] Processed misc_tags where labels_tags == en:fair-trade in 21 ms. Returning 54 records

Hi @stephanegigandet . I hadn't realised that we supported multiple filters like this. I can enhance the query handling to support this relatively easily

@john-gom
Copy link
Contributor Author

For queries that include a country filter, the country filter is not taken into account.
e.g. http://uk.openfoodfacts.localhost/label/fair-trade/misc

[Nest] 1  - 09/14/2023, 8:46:11 AM   DEBUG [QueryService] Object:
[
  {
    "$match": {
      "labels_tags": "en:fair-trade",
      "countries_tags": "en:united-kingdom"
    }
  },
  {
    "$unwind": "$misc_tags"
  },
  {
    "$group": {
      "_id": "$misc_tags",
      "count": {
        "$sum": 1
      }
    }
  },
  {
    "$sort": {
      "count": -1
    }
  },
  {
    "$skip": 0
  },
  {
    "$limit": 10000
  }
]

[Nest] 1  - 09/14/2023, 8:46:11 AM   DEBUG [QueryService] select value _id, count(*) count from "query"."product_misc_tag" as "pt" where (not pt.obsolete) and (EXISTS (select "pt2".* from "query"."product_labels_tag" as "pt2" where (pt2.product_id = pt.product_id and pt2.value = 'en:fair-trade'))) group by "pt"."value" order by 2 desc limit 10000
[Nest] 1  - 09/14/2023, 8:46:11 AM   DEBUG [QueryService] Processed misc_tags where labels_tags == en:fair-trade in 21 ms. Returning 54 records

Hi @stephanegigandet . I hadn't realised that we supported multiple filters like this. I can enhance the query handling to support this relatively easily

Hi @stephanegigandet , this should be fixed now

@github-actions github-actions bot added the 🐋 Docker https://docker-curriculum.com/ label Sep 15, 2023
@codecov-commenter
Copy link

codecov-commenter commented Sep 15, 2023

Codecov Report

Merging #8947 (f311104) into main (7e07fcb) will decrease coverage by 0.03%.
The diff coverage is 17.14%.

@@            Coverage Diff             @@
##             main    #8947      +/-   ##
==========================================
- Coverage   47.95%   47.93%   -0.03%     
==========================================
  Files          64       64              
  Lines       20013    20052      +39     
  Branches     4850     4868      +18     
==========================================
+ Hits         9597     9611      +14     
- Misses       9168     9186      +18     
- Partials     1248     1255       +7     
Files Coverage Δ
lib/ProductOpener/Data.pm 76.00% <50.00%> (-8.22%) ⬇️
lib/ProductOpener/Display.pm 11.27% <10.34%> (+0.06%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@stephanegigandet
Copy link
Contributor

What scenario were you trying?

I was following the instructions on https://github.com/openfoodfacts/openfoodfacts-query#prepare-for-development
I didn't realize that I already had a running query server inside docker, so I ran the npm commands that started another one on the host.

@github-actions github-actions bot removed MongoDB We have 2 mongodb collections: one for current products, and one for obsolete products JavaScript GitHub Actions Pull requests that update Github_actions code labels Sep 19, 2023
@github-actions github-actions bot added the GitHub Actions Pull requests that update Github_actions code label Oct 2, 2023
@john-gom
Copy link
Contributor Author

john-gom commented Oct 3, 2023

Hi @stephanegigandet , @raphael0202 , @alexgarel , I have reverted the code so that the products_tags collection is used if the QUERY_URL is not available (or the call to the query service fails). Would you be happy to take a look at the latest changes to check you have no concerns.

Also, the tests are not passing but I can't figure out why my changes should have affected this. If anyone can help I would be vary grateful.

@sonarcloud
Copy link

sonarcloud bot commented Oct 9, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@stephanegigandet
Copy link
Contributor

Also, the tests are not passing but I can't figure out why my changes should have affected this. If anyone can help I would be vary grateful.

You are getting a 404 because the returned product count is 0 I think.

@john-gom
Copy link
Contributor Author

john-gom commented Oct 9, 2023

Also, the tests are not passing but I can't figure out why my changes should have affected this. If anyone can help I would be vary grateful.

You are getting a 404 because the returned product count is 0 I think.

Thanks. Figured it out, the code was still going to product_tags when no_cache=1

@john-gom
Copy link
Contributor Author

john-gom commented Oct 9, 2023

Some comparisons. Local version connecting to quer.openfoodfacts.org, production still using product_tags collection:
image
image
Ecoscore by nutrition grade D:
image

Query counts seem lower - investigating...

@john-gom
Copy link
Contributor Author

john-gom commented Oct 9, 2023

I added a "select" service to compare results but realised that my own local service was still using my local query service as I hadn't rebuilt and re-started my docker containers. Results are now as expected query service is slightly higher as it was refreshed more recently than product_tags:
image
image

@john-gom john-gom merged commit 1fbbe06 into main Oct 9, 2023
14 checks passed
@john-gom john-gom deleted the issue/8676 branch October 9, 2023 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config Display 🐋 Docker https://docker-curriculum.com/ 📚 Documentation Documentation issues improve the project for everyone. GitHub Actions Pull requests that update Github_actions code 🧴 Open Beauty Facts Our cosmetic analysis project https://world.openbeautyfacts.org 🐾 Open Pet Food Facts Our pet food analysis project https://world.openpetfoodfacts.org 📸 Open Products Facts Our project to increase the lifespan of objects. https://world.openproductsfacts.org
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move products_tags to Postgres to improve performance
5 participants