Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collections Docs #599

Merged
merged 5 commits into from
Nov 12, 2024
Merged

Collections Docs #599

merged 5 commits into from
Nov 12, 2024

Conversation

josephjclark
Copy link
Contributor

@josephjclark josephjclark commented Nov 7, 2024

Short Description

Docs for Collections:

  • Under Platform / Build & Manage Workflows we have a high-level Collections overview. This has a couple of code examples but is designed as a user guide, rather than a developer / job writing guide
  • Under Adaptors / Collections is a detailed API overview and explanation of how it all works (the important bits anyway)

There is a polite notice indication that collections are in Beta (ie , not totally production ready right now, thanks, but have a play)

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to
know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our
Responsible AI Policy

Copy link
Member

@aleksa-krolls aleksa-krolls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major feedback is to make it clear that collections is something in openfn and not an external adaptor.

Then generally, services team can help add more examples and use cases that will help users better imagine how to use this... but we can do that later and debate where those docs go.


## Collections Overview

The Collections API is a key/value storage solution. It is designed for high
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make clear that this is a data store in OpenFn... not something external (like all other adaptors).

adaptors/collections.md Outdated Show resolved Hide resolved
adaptors/collections.md Outdated Show resolved Hide resolved
@@ -0,0 +1,200 @@
---
title: Collections Adaptor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we can somehow tag collections and common as special OpenFn adaptors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. Collections is definitely special. Common is only sort of special 🤔

So I guess to answer this I'd have to ask what we mean by "special" and what exactly we want to flag

adaptors/collections.md Show resolved Hide resolved
You can also fetch multiple items with `get()`, which supports the same query
options as `each()`.

Bear in mind that all the items will be loaded into memory at once. For large
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... will ping you next week to help me understand tradeoffs of using get() and each() for different use cases so I can better understand when we might start bumping into limits, and to know which is best for different scenarios... we can then beef up these docs with some examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't ping me - let me fix it here!

@aleksa-krolls
Copy link
Member

@josephjclark awesome to see and v excited to use this! Left some feedback.

Also we definitely need a docs page for this. Could you (or ask Rita or someone else on the team for help) draft a basic docs page describing the feature? I feel like either it could go on the Steps page or maybe even it's own page in Adaptor docs. It should also probably link to these mapping design docs which we can also update to talk about some practical use cases for this. Thoughts?

Copy link
Member

@taylordowns2000 taylordowns2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I wanted to try out collections on staging just now and was hoping to see how to add items to a collection. Is that documented somewhere else?

@josephjclark
Copy link
Contributor Author

@taylordowns2000 My bad - I forgot to add set values. That's coming up in a minute

@josephjclark
Copy link
Contributor Author

@aleksa-krolls So to be clear (and the context is probably a bit lost), this is the Overview adaptor docs. If you go to the doc site and the adaptor list, this is the Overview. It's designed as a high level introduction to the collections feature generally and the collections API specifically.

Collections support is implemented as an adaptor. That's how we inject an API in job code, and how we provide documentation. We should not hide or obfuscate this because it's kind of relevant - the collections adaptor appears in run logs, and if you're using the CLI, you need to pass the collections adaptor explicitly for support.

I agree that we need some more general, less technical docs somewhere in the main docs page. I'm not sure where. I suppose there's a general Use Collections style page which explains the feature at a high level (this can show API examples and will link to the adaptor docs, but otherwise doesn't treat collections as an adaptor), plus a Collections Admin page which explains how to create, delete and a generally manage collections through the admin API.

But I don't know WHERE that goes. Maybe under Platform as a section called Collections? Or nested under Platform / Build & Manage Workflows (that seems very deeply nested, but I think it makes sense as a peer to eg Credentials, Limits, Snaphots, Triggers)

@aleksa-krolls
Copy link
Member

@josephjclark I agree it makes sense there. Can we do that to start, and can you please raise in engineering standup to get this done sometime in the coming days? This page should then link to the adaptor docs.

But I don't know WHERE that goes. Maybe under Platform as a section called Collections? Or nested under Platform / Build & Manage Workflows (that seems very deeply nested, but I think it makes sense as a peer to eg Credentials, Limits, Snaphots, Triggers)

Screenshot 2024-11-07 at 9 05 57 PM

@josephjclark
Copy link
Contributor Author

@aleksa-krolls I'm taking responsibility for docs on this, so I'll get it done tomorrow.

@josephjclark
Copy link
Contributor Author

So:

  • The Adaptors page is now a technical overview and API reference for the Collections API.
  • The Docs page is a high level overview of the feature. It has a couple of code examples and refers to the API, but otherwise is a conceptual overview and usage guide more than a technical one.

@stuartc Would you mind looking over both pages and calling out any egregious errors?

These docs DO NOT include:

  • performance tips for big data sets (eg, use each not get). I would like to do this later but I just don't have time and I'm cutting cloth
  • Details on usage limits and costs. I have no idea what the rules are here
  • A good sample workflow which uses collections. Not that we have a good place to share this at the moment...

Please also note that I've raised an issue for better collections support in the CLI (eg, openfn collections set would be awesome), se OpenFn/kit#819

@josephjclark josephjclark marked this pull request as ready for review November 11, 2024 14:34
Collections is suitable for buffering, caching and aggregating data from
Webhooks, storing large mapping files, and sharing state between workflows.

Collections can be used to store a very large number of items (in the order of
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuartc your nervousness please about the phrase in the order of millions

I just want give some idea of what "high volume" means. I know it's woolly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave it as is for now. We will need to determine two things:

  1. How much work it is for the db to store really large values, right now they are strings and not indexed, but can be really large; save for our middleware that will reject large (>10MB) HTTP requests.
  2. If users are ok without being able to query by value.

If either of these are an issue, then the number of records we would want to limit would be clearer. But for now, millions should be ok.

Copy link
Member

@stuartc stuartc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful! 🫶

Collections is suitable for buffering, caching and aggregating data from
Webhooks, storing large mapping files, and sharing state between workflows.

Collections can be used to store a very large number of items (in the order of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave it as is for now. We will need to determine two things:

  1. How much work it is for the db to store really large values, right now they are strings and not indexed, but can be really large; save for our middleware that will reject large (>10MB) HTTP requests.
  2. If users are ok without being able to query by value.

If either of these are an issue, then the number of records we would want to limit would be clearer. But for now, millions should be ok.

@aleksa-krolls aleksa-krolls merged commit f85a89b into main Nov 12, 2024
1 check passed
@aleksa-krolls aleksa-krolls deleted the collections-docs branch November 12, 2024 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants