-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collections Docs #599
Collections Docs #599
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major feedback is to make it clear that collections
is something in openfn and not an external adaptor.
Then generally, services team can help add more examples and use cases that will help users better imagine how to use this... but we can do that later and debate where those docs go.
adaptors/collections.md
Outdated
|
||
## Collections Overview | ||
|
||
The Collections API is a key/value storage solution. It is designed for high |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make clear that this is a data store in OpenFn... not something external (like all other adaptors).
@@ -0,0 +1,200 @@ | |||
--- | |||
title: Collections Adaptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we can somehow tag collections
and common
as special OpenFn adaptors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. Collections is definitely special. Common is only sort of special 🤔
So I guess to answer this I'd have to ask what we mean by "special" and what exactly we want to flag
You can also fetch multiple items with `get()`, which supports the same query | ||
options as `each()`. | ||
|
||
Bear in mind that all the items will be loaded into memory at once. For large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... will ping you next week to help me understand tradeoffs of using get()
and each()
for different use cases so I can better understand when we might start bumping into limits, and to know which is best for different scenarios... we can then beef up these docs with some examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't ping me - let me fix it here!
@josephjclark awesome to see and v excited to use this! Left some feedback. Also we definitely need a docs page for this. Could you (or ask Rita or someone else on the team for help) draft a basic docs page describing the feature? I feel like either it could go on the Steps page or maybe even it's own page in Adaptor docs. It should also probably link to these mapping design docs which we can also update to talk about some practical use cases for this. Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I wanted to try out collections on staging just now and was hoping to see how to add items to a collection. Is that documented somewhere else?
@taylordowns2000 My bad - I forgot to add set values. That's coming up in a minute |
@aleksa-krolls So to be clear (and the context is probably a bit lost), this is the Overview adaptor docs. If you go to the doc site and the adaptor list, this is the Overview. It's designed as a high level introduction to the collections feature generally and the collections API specifically. Collections support is implemented as an adaptor. That's how we inject an API in job code, and how we provide documentation. We should not hide or obfuscate this because it's kind of relevant - the collections adaptor appears in run logs, and if you're using the CLI, you need to pass the collections adaptor explicitly for support. I agree that we need some more general, less technical docs somewhere in the main docs page. I'm not sure where. I suppose there's a general But I don't know WHERE that goes. Maybe under Platform as a section called Collections? Or nested under Platform / Build & Manage Workflows (that seems very deeply nested, but I think it makes sense as a peer to eg Credentials, Limits, Snaphots, Triggers) |
@josephjclark I agree it makes sense there. Can we do that to start, and can you please raise in engineering standup to get this done sometime in the coming days? This page should then link to the adaptor docs.
|
@aleksa-krolls I'm taking responsibility for docs on this, so I'll get it done tomorrow. |
So:
@stuartc Would you mind looking over both pages and calling out any egregious errors? These docs DO NOT include:
Please also note that I've raised an issue for better collections support in the CLI (eg, |
Collections is suitable for buffering, caching and aggregating data from | ||
Webhooks, storing large mapping files, and sharing state between workflows. | ||
|
||
Collections can be used to store a very large number of items (in the order of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stuartc your nervousness please about the phrase in the order of millions
I just want give some idea of what "high volume" means. I know it's woolly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave it as is for now. We will need to determine two things:
- How much work it is for the db to store really large values, right now they are strings and not indexed, but can be really large; save for our middleware that will reject large (>10MB) HTTP requests.
- If users are ok without being able to query by value.
If either of these are an issue, then the number of records we would want to limit would be clearer. But for now, millions should be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful! 🫶
Collections is suitable for buffering, caching and aggregating data from | ||
Webhooks, storing large mapping files, and sharing state between workflows. | ||
|
||
Collections can be used to store a very large number of items (in the order of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave it as is for now. We will need to determine two things:
- How much work it is for the db to store really large values, right now they are strings and not indexed, but can be really large; save for our middleware that will reject large (>10MB) HTTP requests.
- If users are ok without being able to query by value.
If either of these are an issue, then the number of records we would want to limit would be clearer. But for now, millions should be ok.
Short Description
Docs for Collections:
There is a polite notice indication that collections are in Beta (ie , not totally production ready right now, thanks, but have a play)
AI Usage
Please disclose how you've used AI in this work (it's cool, we just want to
know!):
You can read more details in our
Responsible AI Policy