Collections Docs #599

josephjclark · 2024-11-07T17:47:17Z

Short Description

Docs for Collections:

Under Platform / Build & Manage Workflows we have a high-level Collections overview. This has a couple of code examples but is designed as a user guide, rather than a developer / job writing guide
Under Adaptors / Collections is a detailed API overview and explanation of how it all works (the important bits anyway)

There is a polite notice indication that collections are in Beta (ie , not totally production ready right now, thanks, but have a play)

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to
know!):

You can read more details in our
Responsible AI Policy

aleksa-krolls

Major feedback is to make it clear that collections is something in openfn and not an external adaptor.

Then generally, services team can help add more examples and use cases that will help users better imagine how to use this... but we can do that later and debate where those docs go.

aleksa-krolls · 2024-11-07T19:04:34Z

adaptors/collections.md

+
+## Collections Overview
+
+The Collections API is a key/value storage solution. It is designed for high


We need to make clear that this is a data store in OpenFn... not something external (like all other adaptors).

adaptors/collections.md

aleksa-krolls · 2024-11-07T19:08:43Z

adaptors/collections.md

@@ -0,0 +1,200 @@
+---
+title: Collections Adaptor


Wondering if we can somehow tag collections and common as special OpenFn adaptors?

Maybe. Collections is definitely special. Common is only sort of special 🤔

So I guess to answer this I'd have to ask what we mean by "special" and what exactly we want to flag

adaptors/collections.md

aleksa-krolls · 2024-11-07T19:12:48Z

adaptors/collections.md

+You can also fetch multiple items with `get()`, which supports the same query
+options as `each()`.
+
+Bear in mind that all the items will be loaded into memory at once. For large


Hmm... will ping you next week to help me understand tradeoffs of using get() and each() for different use cases so I can better understand when we might start bumping into limits, and to know which is best for different scenarios... we can then beef up these docs with some examples.

Don't ping me - let me fix it here!

aleksa-krolls · 2024-11-07T19:18:12Z

@josephjclark awesome to see and v excited to use this! Left some feedback.

Also we definitely need a docs page for this. Could you (or ask Rita or someone else on the team for help) draft a basic docs page describing the feature? I feel like either it could go on the Steps page or maybe even it's own page in Adaptor docs. It should also probably link to these mapping design docs which we can also update to talk about some practical use cases for this. Thoughts?

taylordowns2000

Looks good, but I wanted to try out collections on staging just now and was hoping to see how to add items to a collection. Is that documented somewhere else?

josephjclark · 2024-11-08T12:15:15Z

@taylordowns2000 My bad - I forgot to add set values. That's coming up in a minute

josephjclark · 2024-11-08T12:43:07Z

@aleksa-krolls So to be clear (and the context is probably a bit lost), this is the Overview adaptor docs. If you go to the doc site and the adaptor list, this is the Overview. It's designed as a high level introduction to the collections feature generally and the collections API specifically.

Collections support is implemented as an adaptor. That's how we inject an API in job code, and how we provide documentation. We should not hide or obfuscate this because it's kind of relevant - the collections adaptor appears in run logs, and if you're using the CLI, you need to pass the collections adaptor explicitly for support.

I agree that we need some more general, less technical docs somewhere in the main docs page. I'm not sure where. I suppose there's a general Use Collections style page which explains the feature at a high level (this can show API examples and will link to the adaptor docs, but otherwise doesn't treat collections as an adaptor), plus a Collections Admin page which explains how to create, delete and a generally manage collections through the admin API.

But I don't know WHERE that goes. Maybe under Platform as a section called Collections? Or nested under Platform / Build & Manage Workflows (that seems very deeply nested, but I think it makes sense as a peer to eg Credentials, Limits, Snaphots, Triggers)

aleksa-krolls · 2024-11-10T16:49:59Z

@josephjclark I agree it makes sense there. Can we do that to start, and can you please raise in engineering standup to get this done sometime in the coming days? This page should then link to the adaptor docs.

But I don't know WHERE that goes. Maybe under Platform as a section called Collections? Or nested under Platform / Build & Manage Workflows (that seems very deeply nested, but I think it makes sense as a peer to eg Credentials, Limits, Snaphots, Triggers)

josephjclark · 2024-11-10T18:09:05Z

@aleksa-krolls I'm taking responsibility for docs on this, so I'll get it done tomorrow.

josephjclark · 2024-11-11T14:33:37Z

So:

The Adaptors page is now a technical overview and API reference for the Collections API.
The Docs page is a high level overview of the feature. It has a couple of code examples and refers to the API, but otherwise is a conceptual overview and usage guide more than a technical one.

@stuartc Would you mind looking over both pages and calling out any egregious errors?

These docs DO NOT include:

performance tips for big data sets (eg, use each not get). I would like to do this later but I just don't have time and I'm cutting cloth
Details on usage limits and costs. I have no idea what the rules are here
A good sample workflow which uses collections. Not that we have a good place to share this at the moment...

Please also note that I've raised an issue for better collections support in the CLI (eg, openfn collections set would be awesome), se OpenFn/kit#819

josephjclark · 2024-11-11T14:35:37Z

docs/build/collections.md

+Collections is suitable for buffering, caching and aggregating data from
+Webhooks, storing large mapping files, and sharing state between workflows.
+
+Collections can be used to store a very large number of items (in the order of


@stuartc your nervousness please about the phrase in the order of millions

I just want give some idea of what "high volume" means. I know it's woolly.

I think we can leave it as is for now. We will need to determine two things:

How much work it is for the db to store really large values, right now they are strings and not indexed, but can be really large; save for our middleware that will reject large (>10MB) HTTP requests.

If users are ok without being able to query by value.

If either of these are an issue, then the number of records we would want to limit would be clearer. But for now, millions should be ok.

stuartc

Wonderful! 🫶

stuartc · 2024-11-12T06:58:27Z

docs/build/collections.md

+Collections is suitable for buffering, caching and aggregating data from
+Webhooks, storing large mapping files, and sharing state between workflows.
+
+Collections can be used to store a very large number of items (in the order of


I think we can leave it as is for now. We will need to determine two things:

How much work it is for the db to store really large values, right now they are strings and not indexed, but can be really large; save for our middleware that will reject large (>10MB) HTTP requests.

If users are ok without being able to query by value.

If either of these are an issue, then the number of records we would want to limit would be clearer. But for now, millions should be ok.

first swing at collections docs

ef66d02

josephjclark requested review from stuartc, taylordowns2000 and aleksa-krolls November 7, 2024 17:47

aleksa-krolls requested changes Nov 7, 2024

View reviewed changes

taylordowns2000 reviewed Nov 8, 2024

View reviewed changes

update set values

4b846c8

better high level intro

2fff983

typo and notes

e9cdee0

collections: more docs

e7c0e8d

josephjclark marked this pull request as ready for review November 11, 2024 14:34

josephjclark commented Nov 11, 2024

View reviewed changes

stuartc approved these changes Nov 12, 2024

View reviewed changes

aleksa-krolls approved these changes Nov 12, 2024

View reviewed changes

aleksa-krolls merged commit f85a89b into main Nov 12, 2024
1 check passed

aleksa-krolls deleted the collections-docs branch November 12, 2024 08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collections Docs #599

Collections Docs #599

josephjclark commented Nov 7, 2024 •

edited

Loading

aleksa-krolls left a comment

aleksa-krolls Nov 7, 2024

aleksa-krolls Nov 7, 2024

josephjclark Nov 8, 2024

aleksa-krolls Nov 7, 2024

josephjclark Nov 8, 2024

aleksa-krolls commented Nov 7, 2024

taylordowns2000 left a comment

josephjclark commented Nov 8, 2024

josephjclark commented Nov 8, 2024

aleksa-krolls commented Nov 10, 2024

josephjclark commented Nov 10, 2024

josephjclark commented Nov 11, 2024

josephjclark Nov 11, 2024

stuartc Nov 12, 2024

stuartc left a comment

stuartc Nov 12, 2024


		## Collections Overview

		The Collections API is a key/value storage solution. It is designed for high

Collections Docs #599

Collections Docs #599

Conversation

josephjclark commented Nov 7, 2024 • edited Loading

Short Description

AI Usage

aleksa-krolls left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aleksa-krolls commented Nov 7, 2024

taylordowns2000 left a comment

Choose a reason for hiding this comment

josephjclark commented Nov 8, 2024

josephjclark commented Nov 8, 2024

aleksa-krolls commented Nov 10, 2024

josephjclark commented Nov 10, 2024

josephjclark commented Nov 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuartc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephjclark commented Nov 7, 2024 •

edited

Loading