Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard Storage Solution #2

Open
0x4007 opened this issue Jun 4, 2024 · 14 comments
Open

Standard Storage Solution #2

0x4007 opened this issue Jun 4, 2024 · 14 comments

Comments

@0x4007
Copy link
Member

0x4007 commented Jun 4, 2024

Standardizing Plug-in Data Storage in Organization-Wide Configuration Repository

Objective

Establish a standardized method for storing plug-in data in the .ubiquibot-config repository, ensuring data integrity and security. An additional benefit is that this allows partners full control over their data and decentralizes the data storage.

Specification

Storage Structure

  • Each plug-in will have its own JSON database file.
  • The filename of each JSON database will be the plug-in ID.
  • This ensures that plug-ins cannot tamper with each other's data.

JSON Database Format

  • Each JSON file will store data specific to its corresponding plug-in.
  • The structure within the JSON file is determined by the plug-in's requirements.

Example

For a plug-in with ID @ubiquibot/command-start-stop, the JSON file will be named ubiquibot-command-start-stop.json.

{
    "dataKey1": "value1",
    "dataKey2": "value2",
    ...
}

Access Control

  • The kernel will manage read and write permissions.
  • Write access will be restricted to ensure plug-ins can only modify their own JSON file.
  • Read access can be granted based on plug-in ID, allowing access to other plug-ins' data as needed.

Implementation

  1. Repository Setup

    • Use the .ubiquibot-config repository as the general-purpose utility repository per organization.
    • Configure GitHub App permissions to allow the kernel to manage repository access.
  2. Kernel Configuration

    • Ensure the kernel has write access to the repository.
    • Implement read access control based on plug-in IDs.

Security Considerations

  • Restrict write permissions to prevent unauthorized modifications.

GitHub App Permissions

  • The kernel requires the following GitHub App permissions:
    • Read and write access to the configuration repository.

Benefits

  • Data Integrity and Security: By isolating each plug-in's data in its own JSON file, we ensure that plug-ins cannot interfere with each other’s data.
  • Partner Control: Partners have full control over their data, enhancing privacy and security.
  • Decentralized Storage: Decentralizing data storage minimizes the risk of data breaches and central points of failure.
  • Simplified Development: Standardizing data storage eliminates the need to handle different data providers when developing plugins. Methods in our SDK will make it simple for plugin developers to store and access data.

Summary

By standardizing the storage of plug-in data in separate JSON files named after the plug-in ID, we ensure data integrity and security. The kernel will manage access control, providing a robust framework for plug-in data management, and simplifying the development process for plugin developers.

@0x4007
Copy link
Member Author

0x4007 commented Jun 4, 2024

First step is to ensure that assumptions are accurate.

  1. Have the kernel push code to the repository when working in another repository.
  2. Be able to read JSON databases from the other plugins.
  3. Do all of this without requiring threatening permissions.

@gentlementlegen
Copy link
Member

Having something self contained is a great idea, and would probably make plugin development easier if we didn't have to spin up a db instance for each plugin. However json format might be limited at some point which is why I would suggest something more robust like SQLite.

I think ideally plugins should not rely on the Kernel for reading their own content, but be responsible themselves for it. However we will always reach a limitations when it comes to user / wallet retrievals as this data should be shared for everything, otherwise we would end up with duplicate DBs which would be difficult to maintain and update.

We still have one issue remaining which is the storage. Even by using JSON, SQLite or any file base system, we need to store / read / update the content. First, it might trigger security issues if the data becomes sensitive. Second, it would have atomic requirements since many runs could occur in parallel.

@0x4007
Copy link
Member Author

0x4007 commented Jun 11, 2024

On the fence about SQLite. It's nice that it handles so many catastrophic errors out of the box but I also would rather ensure that plugin development is as easy as possible for new developers.

Auditing a plaintext json object is way easier than working with a database or having to find a database viewer. The files are stored as binary objects in SQLite.

@gentlementlegen
Copy link
Member

Yes that's a nice thing to consider. For me the advantage is also that it is easier to have:

  • generated types based on the schema
  • query engine, so easier to aggregate, sort etc.
  • migration system, if any change in the schema is needed
  • backup and copies
  • security for data loss (ACID)
  • atomicity
  • lower memory consumption, so less resource hungry (JSON would put the whole file in memory)

With JSON, you would need to write a manual script for any schema change. Each plugin would have its own custom code for query which is very error prone and tedious to maintain, and way less performant. If two plugins access the data, or if the server crashes, very high chances to break and lose the whole content. All of these reason would be quite a trade-off just to be able to view data.

For me, IntelliJ comes with a built-in viewer for my DB so I actually never leave my IDE. VsCode has a similar plugin to view them:
https://marketplace.visualstudio.com/items?itemName=qwtel.sqlite-viewer

@rndquu
Copy link
Member

rndquu commented Jun 13, 2024

Plain JSON storage is useful only for really simple and small plugins. It is not scalable at all compared to any RDBMS. Why don't we let plugin developers choose the storage they want (i.e. need for specific task) instead of forcing them to use a solution that is applicable to only a small part of storage use cases.

@0x4007
Copy link
Member Author

0x4007 commented Jun 13, 2024

Let's start with plain JSON files and then we can add more advanced support later if needed. None of our existing plugins have any sort of complex data querying needs.

There is no need to over-engineer things "just in case" if we haven't gotten close to those hypothetical problems in a couple of years of r&d for the existing bot capabilities.

@rndquu
Copy link
Member

rndquu commented Jun 13, 2024

Let's start with plain JSON files and then we can add more advanced support later if needed. None of our existing plugins have any sort of complex data querying needs.

There is no need to over-engineer things "just in case" if we haven't gotten close to those hypothetical problems in a couple of years of r&d for the existing bot capabilities.

My point is that we don't need to add storage support to the SDK at all since:

  1. We won't cover all possible use cases
  2. We should give plugin developers a freedom of selection a storage solution they want to use + fits plugin use case

There is no need to over-engineer

Exactly, there is no need to implement "save to JSON file SDK" from scratch when plugin developers can setup this feature in an hour using any npm package

@gentlementlegen
Copy link
Member

My question would be more "where do we store it"? Because my experience so far right now the major problem I encounter with plugins is where do I store the data. Currently I have access to Supabase, but other contributors don't.

Letting the developer chose its own solution is ok, but say they chose Neo4j somehow to do their plugin, how do we handle this? Because we should not rely on that external contributor to have its own instance, so we should definitely be in control of the data. Also JSON would mean anything can read, and potentially write into it.

@rndquu
Copy link
Member

rndquu commented Jun 13, 2024

Letting the developer chose its own solution is ok, but say they chose Neo4j somehow to do their plugin, how do we handle this?

Why do we need to handle it? Let the developer use Neo4j.

we should definitely be in control of the data

We should be in control of the data related only to the core plugins (conversation rewards, permit generations, etc...). We don't need access to 3rd party plugins data.

@0x4007
Copy link
Member Author

0x4007 commented Jun 14, 2024

It is attractive to DAOs especially to decentralize the storage and to allow them to own their own data. In addition, it makes plugin development simple and straightforward for debugging. That is why JSON storage in the utility repository that the bot already requires .ubiquibot-config makes sense.

The implementation logic can be any existing framework, that's fine. But it needs to authenticate via the kernel to write to the repository.

@Keyrxng
Copy link
Member

Keyrxng commented Jul 4, 2024

Is it possible for the kernel to restrict the fetching of repo contents down to a specific file? Or is the intention to pass the data via the payload? Via the payload makes custom handling difficult. If it's possible, the public & private repo approach with restrictions placed on what is shared only for private repo storage would be great.


I think add support for JSON storage out of the box but from the SDK not the kernel as it'll be most common and is good DX.

Allow custom solutions but the burden is on the developer to make it easy to integrate with other plugins.

Two new flags:

  • publicStorage
  • accessibleWhilePrivate

The first determines if it's in the public storage repo accessible by all. The second determines if the kernel will allow it to be shared across plugins while in the privateRepo. This would be ideal if possible, in my opinion.

These flags should make it possible for the org to configure the visibility of aspects of their storage as they see fit without affecting plugin usage.

If publicStorage: false and accessibleWhilePrivate: false, plugins can't access the data so developers must handle output specifically for each plugin.

@0x4007
Copy link
Member Author

0x4007 commented Jul 5, 2024

We will continue to use the .ubiquibot-config repo. There's no point to making a separate storage repo.

Not sure about the implementation details otherwise but I did see under the GitHub app settings that you can share a specific file. Maybe there are some other similar permissions settings that could be of use.

@Keyrxng
Copy link
Member

Keyrxng commented Oct 4, 2024

I've implemented an approach for this here.

  • using ubiquibot-config repository
  • targeting the storage branch
  • path looks like: ubiquibot-config/plugin-storage/telegram-bot/<dbOject>.json

I think we should have a couple global storage objects that all plugins can use and I think we should have a partner create an app dedicated to their storage needs. Afaik currently partners need only create one app isn't that right? The bot itself and I don't think it's too much of an ask to create one more that'll remove DB dependency completely, plus we may need it for safer private access.

user-base.json: it makes sense to track a partner's user base globally as opposed to each plugin having to build up their own user database. Commands like /wallet, /register etc either via GitHub or Telegram should fetch/push to the same object. 9/10 plugins will need user base info as most current plugins do.

This here is the user-base.json I have created for the telegram-bot plugin. This pretty much covers everything we need about a contributor (minus the notification specific props), I propose that we do something like this detached from any specific plugin. Here we could also store their level, USD earned, etc as well maybe?
image

  • org-ownership.json: If a partner has multiple orgs like we do, we can use this to store references to other orgs which a partner owns which we can leverage to build a single storage location. Prior to any fetch/push we check if they specify a main org to use for storage. Setting this up could be done via TG bot or UI (as we'd need some kind of validation that doesn't use app_private_key, requires more thought). It'll suck having to use /register /subscribe /wallet etc 4x for new contributors to be able to set themselves up across all of our orgs.

In my mind I think it may become hard to work with if we keep all storage completely walled off from any plugin other than the one that created it. I feel we need a few globals that can make interoperability more feasible. We could expose DB shapes, locations etc via a plugin' manifest (the author decides) to make things really accessible.

Additionally I think that "GitHub as a storage layer" should be documented entirely separate from within plugins in our official Ubiquity OS ecosystem docs and/or extensively in the README if it needs to be a plugin.

@0x4007
Copy link
Member Author

0x4007 commented Oct 5, 2024

Let's make it possible to read from other plugin values that's fine. The plugin developer simply must specifically request for it by ID like

const data = storage.get(`ubiquity-os-marketplace/conversation-rewards`);

In other news I realize that the simple solution for global storage can be that we create a dedicated organization with a special repo that is hard coded into the kernel to be able to fetch from or something.

For example:

@ubiquity-os-storage/ubiquibot-config/.github/plugin-store/*.json

We can consider making a batch writing system when we have scaling problems1

Footnotes

  1. the writes are committed on separate branches based on the org name then consolidated daily with a cron job or something. We can make reads intelligently look for the updates first on the org branch and then use it to sum to the consolidated global main branch results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants