Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move offliners definitions (or at least flags descriptions) to a data-driven source #886

Open
benoit74 opened this issue Dec 12, 2023 · 11 comments
Assignees
Labels

Comments

@benoit74
Copy link
Collaborator

  • Location: API

Situation

Currently, the offliners definitions are placed in the source code, including the description displayed for every offliners flags.

This makes their edition pretty complex (code change, code review, deployment) even if all this is as automated as possible.

It is particularly true for flag description which should be able to be updated "fast".

We (content team) wonder if we should move this to somewhere else (a file somewhere, or a DB table with UI screens and special user role, ...).

Let's discuss about this.

@benoit74 benoit74 added backend: API enhancement content Necessary for the content team labels Dec 12, 2023
@benoit74
Copy link
Collaborator Author

Remark from @kelson42 (live in a random meeting): priority is probably to start with flag description only, the rest of the offliner definition is obviously more complex to transfer to a data driven source

@benoit74
Copy link
Collaborator Author

See openzim/zimit-frontend#46, localization need should maybe be taken into account when working on this issue

@kelson42
Copy link
Contributor

kelson42 commented Jan 30, 2024

Raising prio and put in project Zimit2.

@kelson42
Copy link
Contributor

@benoit74 @rgaudin What about proposing a JSON format where the necessary information should be defined by the scraper. This data would be then indgested by both the scraper and the Zimfarm to move forward?

@benoit74
Copy link
Collaborator Author

I don't get where you want to put this JSON file and what it will contain.

@kelson42
Copy link
Contributor

kelson42 commented Mar 19, 2024

I don't get where you want to put this JSON file and what it will contain.

  • "Defined by the scraper", so in the scraper repository
  • "necessary information", whatever is needed (description, name, constraints on the value, etc..)

Typically, Zimfarm would download/get these files once at build time.

@benoit74
Copy link
Collaborator Author

Yes, it could work.

I however wonder if adding all scrapers as Zimfarm dependencies wouldn't make more sense.

With a plugin approach (i.e. dependencies are not here at build time but discovered live at runtime when the Zimfarm starts).

This would be a great enabler for #891 where we might want to run some scraper code in Zimfarm backend to check for recipe validity / capability for the scraper to proceed with the task.

@kelson42
Copy link
Contributor

I however wonder if adding all scrapers as Zimfarm dependencies wouldn't make more sense.

I have no strong opinion.

My proposal has the advantage to be pretty light. And Yes, this is intend to fix #891 too. You just need to define the format and the indgester/parser. The very limitation I see is that you can not put logic in it.

If you can describe a software dependency system which is techno. agnostic and "doable", glad to read how this could looks like.

@benoit74
Copy link
Collaborator Author

No it is not techno. agnostic, it would work only for Python ... but it is our main language and I imagined other scrapers could implement a small chunk of Python code when needed.

This would allow to run some simple checks like "is this URL reachable and correspond to the intended API" and could be quite easy to code even for scrapers not in Python like mwoffliner.

@rgaudin
Copy link
Member

rgaudin commented Mar 19, 2024

I think depending on the actual scraper code is too much of a requirement ; brings additional constraints on ZF. I understand the value of your example @benoit74 but I don't think it's a good approach; at least in short term (this is prio as zimit2)

@kelson42's proposal sure is lighter but still requires changing every scraper to work off that new data file (otherwise it's still duplicated and useless). I think it's more robust and maintainable at the moment though.

@benoit74
Copy link
Collaborator Author

Then why not stick with the original plan: only move flags descriptions to one single data-driven source (one JSON-file) so that it achieves the goal of this issue: helping quickly edit flags descriptions for content editors so they better match their understanding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants