Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CO pull location update each day #167

Open
2 tasks done
syphax-bouazzouni opened this issue Dec 29, 2021 · 5 comments
Open
2 tasks done

CO pull location update each day #167

syphax-bouazzouni opened this issue Dec 29, 2021 · 5 comments
Assignees
Labels
content Issues related to the content of AgroPortal enhancement

Comments

@syphax-bouazzouni
Copy link
Contributor

syphax-bouazzouni commented Dec 29, 2021

The CO ontologies are updated each day at 18:00 from there pull URL.

Ontologies concerned :

image

Started from November 2021 (here is the example of CO_325)

image

This may be the cause of this problems :

Todo :

@jonquet
Copy link
Contributor

jonquet commented Jan 5, 2022

Long term resolution: avoid to parse ontologies when the source file is exactly the same but retrieved by the automatic pull.
See #171

Solution for CO ontologies:

  • Recontact with @marieALaporte to see if cropontology.org can make sure that HTTP headers (date) do not change if file (that is regenerated at each call) has not changed.
  • Manually list the ontologies concerned here. => @jonquet
  • Manually temporarily unplug all the CO ontologies concerned => @jonquet
  • Write a script (in ncbo_cron) that for an ontology remove all the submissions (from 4store and files) above a certain submisison id => @syphax-bouazzouni
  • Test the script on stage
  • Run the script on each ontologies listed above

@jonquet
Copy link
Contributor

jonquet commented Jan 5, 2022

List of ontologies to process (28):
CO_358, CO_350, CO_357, CO_345, CO_339, CO_335, CO_338, CO_348, CO_325, CO_331, CO_346, CO_341, CO_330, CO_327, CO_360, CO_322, CO_337, CO_366, CO_321, CO_340, CO_320, CO_324, CO_343, CO_365, CO_334, CO_336, CO_356, CO_323,

All ontologies unplugged => pullLocation" : ""

Notes:

Also unplugged POLAPGEN_BARLEY, CO_121, CO_020 which pullURLs were generating an error in the log (but no notification email). To be fixed when we will resume all pullURLs

@syphax-bouazzouni
Copy link
Contributor Author

syphax-bouazzouni commented Jan 5, 2022

@jonquet so after checking the code to know more about how is the ncbo_cron job figuring out, that a new version of an ontology was released.
And i found that it's not looking for the http header but download every day the ontologies and hash it to compare it with the local ones (see code below source : https://github.com/ontoportal-lirmm/ncbo_cron/blob/master/lib/ncbo_cron/ontology_pull.rb#L54)

image

@syphax-bouazzouni syphax-bouazzouni added the content Issues related to the content of AgroPortal label Jan 11, 2022
@syphax-bouazzouni
Copy link
Contributor Author

syphax-bouazzouni commented Jan 11, 2022

Summary of What Todo (after the last updates)

@jonquet
Copy link
Contributor

jonquet commented Mar 9, 2022

After discussion with @marieALaporte we will either :

  • Fix the new file generated situation on the cropontollogy.org site and stay in "pull mode" in AgroPortal
  • Add an automatic push from cropontology.org to AgroPortal in the ontology modification script (push mode)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Issues related to the content of AgroPortal enhancement
Projects
None yet
Development

No branches or pull requests

2 participants