-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft of a Global Cache Checklist #52
Conversation
If there are no comments, can this be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need clarification from WMO secretariat.
There are lot of "will" in the text, in addition to SHALL, SHOULD and may,
Is the "will" mandatory or recommendation or can?
@kaiwirt - Thanks for drafting the content about Global Caches. Your additions are good. I propose some amendments - see below. I note that there are already a couple of pull requests affecting this adoc resource, so I wonder if it would be easier for you to modify your fork and add to this pull request? Line 78. Correct reference to Unified Data Policy is: WMO Unified Data Policy, Resolution 1 (Cg-Ext(2021)) Line 85. Update discussion about handling metadata records.
Line 92 (and others). Formatting - use in-line code format rather than bold-text, e.g., Line 93. Remove reference to special treatment of metadata resources. Also amending comment about retention times. Should read:
++++ Under Technical Considerations ... Add note about the
Add a note about fixed IP address:
Add a note about integrity checking of data:
Add a note about the GC conceptual architecture (like is written for the GB):
|
In the SLA part I put some figures on those aspects. IMHO 100GB is too small. I'd suggest pushing this to 500 GB. At the moment poor ol' GTS is already more than 50... In GBON era doubling this is not enough. |
You might be right. Before I commit to expanding the current 100GB limit, I want to demonstrate that the UK/USA Global Cache is providing good value. Anyway - that's a task for 2024! |
@6a6d74: Yes, I think we need the expected cache (storage) size and estimate the downloaded data volume. Depending on the estimated data volume, we may need to consider something. I wish you a Happy New Year. |
Pushed an update regarding the comments from @6a6d74 to this pull request |
@kaiwirt - thanks for updating the PR. One thing to verify: in line 95 you state that the Global Cache shall republish a notification message even for recommended data. I hadn't anticipated Global Caches having any involvement (or, indeed, awareness of) recommended data. |
@6a6d74 I am not sure if we made a decision on this. The idea was, that Global Caches subscribe to all messages. For data that they do not cache (either cache:false, data to big or recommended) they just republish the original message on the cache topic. For data they cache they send out the message with href updated. Therefore, users only need to subscribe to cache/# and the origin/# topic hierarchy is not available to end-users. |
I don't think we agreed on the "correct" behaviour.
As recommended data may have specific licence, access rights,... it is probably a good thing not to hide that from the user and force then to subscribe to origin/... Gut feeling, I'd go for 2. |
@KenRJTD -- "shall" (for an obligation) and "should" (very strong recommendation) are the official WMO words to indicate directives. All other terms do not have the same status. |
The main problem i see with 2 is, that if users subscribe to origin/# they also see the core messages from the nodes and eventually will try to download core data from the nodes directly. |
Considering that we have, in theory, 4 Global Caches (DWD, JMA, US/UK/Synoptic, KMA) and using those numbers, it is 7500 GB/day (bytes) per cache. A 1Gb/s (bits) used at 80% over 24h gives roughly 8TB of download. It means that GC would require on average a 1Gb/s bandwidth download. Nowadays, it seems quite achievable to me... |
Good point. Nevertheless, we are recommending WIS Nodes to protect download for unknown sources (all but GCs). |
@golfvert Did you write something about this (in which section?) or should i add a paragraph to the Global Cache Section |
ET-W2AT 08.01.2024: Caches do not subscribe to recommended data. Only messages for core data are republished. It should be reviewed towards half time of the preoperational phase if there are issues with this behaviour. Will open an issue on this. |
I did add a "sla" part in https://github.com/wmo-im/wis2-guide/blob/main/guide/sections/part2/global-services.adoc : A Global Cache: should support a mimimum of 100 GB of data in the cache should support a minimum of 1000 simultaneous downloads could limit the number of simultaneous connections from a user (known by its originating source IP) to 5 could limit the bandwidth usage of the service to 1Gb/s |
Ok, then in my PoV this PR can be merged. |
Summary of discussion from Jan 8th:
|
To prevent that, we can use authorization in MQTT. The everyone/everyone would NOT have access to origin...core and only to cache...core and origin...recommended. Then the Global Services would not use everyone/everyone. Which would be a good thing any way. Typically, if we want to used shared subscription (for Global Cache that would be very useful), then, each Global Service would need it's own user/pwd. |
Moved the discussion to here: #65 |
No description provided.