Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of a Global Cache Checklist #52

Merged
merged 5 commits into from
Jan 25, 2024

Conversation

kaiwirt
Copy link
Contributor

@kaiwirt kaiwirt commented Nov 20, 2023

No description provided.

@kaiwirt
Copy link
Contributor Author

kaiwirt commented Dec 5, 2023

@6a6d74
@KenRJTD

Comments?

@kaiwirt
Copy link
Contributor Author

kaiwirt commented Dec 19, 2023

If there are no comments, can this be merged?

Copy link

@KenRJTD KenRJTD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need clarification from WMO secretariat.
There are lot of "will" in the text, in addition to SHALL, SHOULD and may,
Is the "will" mandatory or recommendation or can?

@6a6d74
Copy link
Collaborator

6a6d74 commented Dec 22, 2023

@kaiwirt - Thanks for drafting the content about Global Caches. Your additions are good. I propose some amendments - see below. I note that there are already a couple of pull requests affecting this adoc resource, so I wonder if it would be easier for you to modify your fork and add to this pull request?

Line 78. Correct reference to Unified Data Policy is: WMO Unified Data Policy, Resolution 1 (Cg-Ext(2021))

Line 85. Update discussion about handling metadata records.

A Global Cache will temporarily cache all resources publishes on the metadata topic. A Global Discovery Catalogue will subscribe to notifications about publication of new metadata, download the metadata record from the Global Cache, and insert it into the catalogue. A Global Discovery Catalogue will also publish a "metadata record archive" each day containing the complete content of the catalogue and advertise its availability with a notification message. This resource will also be cached by a Global Cache.

Line 92 (and others). Formatting - use in-line code format rather than bold-text, e.g., origin/a/wis2/+/data/# rather than origin/a/wis2/+/data/#

Line 93. Remove reference to special treatment of metadata resources. Also amending comment about retention times. Should read:

A Global Cache SHALL retain the data and metadata they receive for a minimum period of 24 hours. Requirements relating to varying retention times for different types of data may be added later.

++++

Under Technical Considerations ...

Add note about the cache property:

The default behaviour for a Global Cache is to cache all data published under the data/core topic. A data publisher may indicate that data should not be cached by adding the properties.cache=false assertion in the WIS Notification Message.

A Global Cache may decide not to cache data. For example, if the data is considered too large, or a WIS2 node publishes an excessive number of small files. Where a Global Cache decides not to cache data, it should behave as though the cache property is set to false and flag this with a report or log. The Global Cache operator should work with the originating WIS center and their GISC to remedy the issue.

Add a note about fixed IP address:

A Global Cache should operate with a fixed IP address so that WIS Nodes can permit access to download resources based on IP address filtering.

Add a note about integrity checking of data:

A Global Cache should validate the integrity of the resources it caches.

Add a note about the GC conceptual architecture (like is written for the GB):

A Global Cache is built around three software components:

  • A highly available data server allowing data consumers to download cached resources.
  • A message broker implementing both MQTT 3.1.1 and MQTT 5.0 for publishing notification messages about resources that are available from the Global Cache.
  • Cache management implementing the features needed to connect with the WIS ecosystem and manage the content of the cache.

@6a6d74
Copy link
Collaborator

6a6d74 commented Dec 22, 2023

@kaiwirt, @KenRJTD - do you think we should add a comment under Technical Considerations about the expected size of the Global Cache? For example, the UK/USA Global Cache is designed around an assumption that the Global Cache will contain approximately 100GB of data, refreshed daily.

@golfvert
Copy link
Collaborator

In the SLA part I put some figures on those aspects. IMHO 100GB is too small. I'd suggest pushing this to 500 GB. At the moment poor ol' GTS is already more than 50... In GBON era doubling this is not enough.

@6a6d74
Copy link
Collaborator

6a6d74 commented Dec 22, 2023

In the SLA part I put some figures on those aspects. IMHO 100GB is too small. I'd suggest pushing this to 500 GB. At the moment poor ol' GTS is already more than 50... In GBON era doubling this is not enough.

You might be right. Before I commit to expanding the current 100GB limit, I want to demonstrate that the UK/USA Global Cache is providing good value. Anyway - that's a task for 2024!

@KenRJTD
Copy link

KenRJTD commented Dec 23, 2023

@6a6d74: Yes, I think we need the expected cache (storage) size and estimate the downloaded data volume. Depending on the estimated data volume, we may need to consider something.
@golfvert: I don't know exactly about the total core volume yet, but if 60 users (30% of WMO Members) download 500GB (for GBON?), the total downloaded volume will be 30,000GB/day, nearly one-Petabyte/month. But I know it’s difficult for us to estimate their total core volume.

I wish you a Happy New Year.
Kenji

@kaiwirt
Copy link
Contributor Author

kaiwirt commented Dec 27, 2023

Pushed an update regarding the comments from @6a6d74 to this pull request

@6a6d74
Copy link
Collaborator

6a6d74 commented Jan 2, 2024

@kaiwirt - thanks for updating the PR.

One thing to verify: in line 95 you state that the Global Cache shall republish a notification message even for recommended data. I hadn't anticipated Global Caches having any involvement (or, indeed, awareness of) recommended data.

@kaiwirt
Copy link
Contributor Author

kaiwirt commented Jan 2, 2024

@6a6d74 I am not sure if we made a decision on this. The idea was, that Global Caches subscribe to all messages. For data that they do not cache (either cache:false, data to big or recommended) they just republish the original message on the cache topic. For data they cache they send out the message with href updated.

Therefore, users only need to subscribe to cache/# and the origin/# topic hierarchy is not available to end-users.

@golfvert
Copy link
Collaborator

golfvert commented Jan 2, 2024

I don't think we agreed on the "correct" behaviour.
I can see two opposite choices:

  1. as @kaiwirt suggested republish on cache/a/wis2/... without download. It hides the complexity to the users. Same as "core".
  2. GC ignore recommended data. Users will have to subscribe to origin/a/wis2/... from the Global Broker. Users have to be aware of that. It could be a good thing as users will have to obey to condition of use/reuse.

As recommended data may have specific licence, access rights,... it is probably a good thing not to hide that from the user and force then to subscribe to origin/...

Gut feeling, I'd go for 2.

@amilan17
Copy link
Member

amilan17 commented Jan 2, 2024

I need clarification from WMO secretariat.
There are lot of "will" in the text, in addition to SHALL, SHOULD and may,
Is the "will" mandatory or recommendation or can?

@KenRJTD -- "shall" (for an obligation) and "should" (very strong recommendation) are the official WMO words to indicate directives. All other terms do not have the same status.

@kaiwirt
Copy link
Contributor Author

kaiwirt commented Jan 2, 2024

The main problem i see with 2 is, that if users subscribe to origin/# they also see the core messages from the nodes and eventually will try to download core data from the nodes directly.

@golfvert
Copy link
Collaborator

golfvert commented Jan 2, 2024

@golfvert: I don't know exactly about the total core volume yet, but if 60 users (30% of WMO Members) download 500GB (for GBON?), the total downloaded volume will be 30,000GB/day, nearly one-Petabyte/month. But I know it’s difficult for us to estimate their total core volume.

Considering that we have, in theory, 4 Global Caches (DWD, JMA, US/UK/Synoptic, KMA) and using those numbers, it is 7500 GB/day (bytes) per cache. A 1Gb/s (bits) used at 80% over 24h gives roughly 8TB of download. It means that GC would require on average a 1Gb/s bandwidth download. Nowadays, it seems quite achievable to me...

@golfvert
Copy link
Collaborator

golfvert commented Jan 2, 2024

The main problem i see with 2 is, that if users subscribe to origin/# they also see the core messages from the nodes and eventually will try to download core data from the nodes directly.

Good point. Nevertheless, we are recommending WIS Nodes to protect download for unknown sources (all but GCs).
Each option has some advantage and drawbacks... Let's have a discussion on this in the next W2AT meeting. January 8th ?

@kaiwirt
Copy link
Contributor Author

kaiwirt commented Jan 8, 2024

In the SLA part I put some figures on those aspects. IMHO 100GB is too small. I'd suggest pushing this to 500 GB. At the moment poor ol' GTS is already more than 50... In GBON era doubling this is not enough.

@golfvert Did you write something about this (in which section?) or should i add a paragraph to the Global Cache Section

@kaiwirt
Copy link
Contributor Author

kaiwirt commented Jan 8, 2024

ET-W2AT 08.01.2024: Caches do not subscribe to recommended data. Only messages for core data are republished.

It should be reviewed towards half time of the preoperational phase if there are issues with this behaviour. Will open an issue on this.

@golfvert
Copy link
Collaborator

golfvert commented Jan 8, 2024

I did add a "sla" part in https://github.com/wmo-im/wis2-guide/blob/main/guide/sections/part2/global-services.adoc :

A Global Cache:

should support a mimimum of 100 GB of data in the cache

should support a minimum of 1000 simultaneous downloads

could limit the number of simultaneous connections from a user (known by its originating source IP) to 5

could limit the bandwidth usage of the service to 1Gb/s

@kaiwirt
Copy link
Contributor Author

kaiwirt commented Jan 9, 2024

Ok, then in my PoV this PR can be merged.

@6a6d74
Copy link
Collaborator

6a6d74 commented Jan 9, 2024

Summary of discussion from Jan 8th:

  • @6a6d74 expects notification messages about recommended data to be at least as numerous as those for core data - getting the Global Caches to process all those extra messages for zero functional value seems wasteful
  • Data Consumers may accidentally (i.e., cut and paste) or deliberately (i.e., because messages arrive a fraction quicker) subscribe to origin/a/wis2/{centre-id}/data/core/* instead of cache/a/wis2/{centre-id}/data/core/*
  • We note that it may be useful for Data Consumers to see origin vs. cache (and core vs recommended) to signal (potential) differences in usage conditions for data.
  • Main problem with subscribing to origin/a/wis2/{centre-id}/data/core/* is that the URLs in notification messages published to that topic will point to the WIS2 Node (not the Global Cache) - and we don't want people to download from origin!
  • For now we'll assume that Data Consumers behave properly :) - GCs will ignore notification messages for "recommended data" (Guide to be updated by @kaiwirt)
  • New Issue to be created by @kaiwirt describing the potential that data consumers may start subscribing to origin/a/wis2/{centre-id}/data/core/* instead of the cache branch of the topic hierarchy
  • We'll review what happens in practice (i.e., which topics people subscribe to / where people download from) and take remedial action - assessment pending mid-2024.

@golfvert
Copy link
Collaborator

  • Data Consumers may accidentally (i.e., cut and paste) or deliberately (i.e., because messages arrive a fraction quicker) subscribe to origin/a/wis2/{centre-id}/data/core/* instead of cache/a/wis2/{centre-id}/data/core/*

To prevent that, we can use authorization in MQTT. The everyone/everyone would NOT have access to origin...core and only to cache...core and origin...recommended. Then the Global Services would not use everyone/everyone. Which would be a good thing any way. Typically, if we want to used shared subscription (for Global Cache that would be very useful), then, each Global Service would need it's own user/pwd.

@kaiwirt
Copy link
Contributor Author

kaiwirt commented Jan 15, 2024

Moved the discussion to here: #65

@tomkralidis tomkralidis merged commit 022d281 into wmo-im:main Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants