Database Schema Design: October 2023 #787

0x4007 · 2023-09-20T13:10:26Z

Overview

We need a robust schema to manage settlements, which include both credits and debits in our payments ledger.
Every table should start with the following columns: id, created, updated and have foreign keys in the last columns.
We also should split off the permits into a separate table, and associate them with credits as needed.
- Perhaps in the future we can allow for credits without permits.
The bot is expected to be stable after these changes are implemented which implies that all of the functions that interact with the database should also be refactored to accomodate this new architecture.

Schema

my_modified_backup.sql.zip


access
Access control now has a column for JSON arrays, so that any project can set access control for fully customizable labels.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


user_id
integer NOT NULL


location_id
integer NOT NULL


multiplier
smallint DEFAULT '1'::smallint NOT NULL


labels
json



credits
Permits are optional. Perhaps we can assume XP calculations for credits without permits. Alternatively we need to think about other ways to compensate. For Gnosis Safe "permits", what data can do store instead of permits? rfc @rndquu



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


updated
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


amount
numeric NOT NULL


permit_id
integer



debits
No remarks



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


amount
numeric NOT NULL


location_id
integer



location
Represents where the event occurred. org/repo/issue/comment are all optional fields, because I anticipate that we will track only the most granular level of detail. For example, if we are tracking a comment, we will not need to track the issue or repository.
We are using node_ids because GitHub does not allow direct lookup of resources using numerical IDs. Instead, we would have to "search" for those results.
I'm unsure if it makes the most sense to consolidate to a single column for the node_id or if we should keep them separate. I'm leaning towards keeping them separate, because it will be easier to query for specific events. For example, if we want to see all events that occurred on a specific repository, we can query for node_id_repository.
I included a URL for easy auditing while we are in development.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


updated
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


node_id_organization
text


node_id_repository
text


node_id_issue
text


node_id_comment
text


url
text



logs
I did not put significant thought into this, but I figured we could use it for generic remarks/comments in the database and dump everything into log_entry.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


log_entry
text


location_id
integer



partners
location_id is intended to save the partner organization.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


wallet_id
integer


location_id
integer NOT NULL



permits
Foreign keys to recognized tokens, partners, and users (beneficiary)
Just realized but not sure if I need to store the network for the transactions as well or if we should derive from token_id.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


amount
numeric


nonce
numeric NOT NULL


deadline
numeric NOT NULL


signature
character(132) NOT NULL


token_id
integer


partner_id
integer


beneficiary_id
integer NOT NULL


transaction
character(66)


location_id
integer



tokens
Tokens take a lot of space. Let's make a recognized table of tokens. Perhaps we should also store the token symbol since they are unlikely to change.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


network
smallint NOT NULL


address
character(42) NOT NULL



users
I am debating on if we should store all their GitHub data. We don't have any planned use for it (other than generating embeddings for their solved issues) but even that isn't included in the original user schema which just had their profile information.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


wallet_id
integer



wallets
No remarks.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


address
character(42)


CONSTRAINT
wallets_address_check CHECK ((length(address) = 42))



settlements
No remarks.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


updated
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


user_id
integer NOT NULL


location_id
integer


credit_id
integer


debit_id
integer

Old Spec (For Reference)

Tables

I think it might make sense to do them in reverse order. Here I listed parents then children, but creating the children tables in order to link to the parent tables might be easiest for implementation.

Implementation Notes

The Credits table now has a foreign key permitId that can link to the Permits table, allowing you to associate a credit with a permit if necessary. However, this is optional, so you can have credits without permits as well.
Set up foreign key constraints for creditId and debitId in the Settlements table.
Application logic should prevent a Settlements entry from having both a creditId and a debitId.
Adjust data types like BigInteger or Decimal for large amounts if needed.

The text was updated successfully, but these errors were encountered:

0x4007 · 2023-09-20T18:01:10Z

Debating on this but still thinking about when we eventually make a command to allow repository maintainers to override the contributor's XP: we will need to create a new column in the settlements table. xpId or something. It will be a foreign key to an "XP Modifiers" table. There we can store all the details around that modification. For example: id, created time, updated time, amount.

We can exclude modifiers from the global leaderboard calculation (globalXP) so that its more difficult to exploit (i.e. collusion between an org and a contributor). We could even make a whistleblower incentive to make sure that people police each other for exploiting the XP system possibly.

Since it would be available in the settlements table, that means the penalties parent table can associate the user, orgId, repoId, issueId and commentURL to see exactly what happened.

wannacfuture · 2023-09-20T22:39:17Z

/start

ubiquibot · 2023-09-20T22:39:24Z

Too many assigned issues, you have reached your max of 2

seprintour · 2023-09-21T05:22:56Z

/start

ubiquibot · 2023-09-21T05:23:01Z


  
  
    
    
  
  
    Deadline
    Fri, 22 Sep 2023 05:23:00 UTC
  
  
    Registered Wallet
    0x3623338046b101ecEc741De9C3594CC2176f39E5

Tips:

Use /wallet 0x0000...0000 if you want to update your registered payment wallet address @user.
Be sure to open a draft pull request as soon as possible to communicate updates on your progress.
Be sure to provide timely updates to us when requested, or you will be automatically unassigned from the bounty.

seprintour · 2023-09-21T23:16:51Z

@pavlovcik There was already a Permit table, guess we should use that instead..

It has all of the fields here and more

0x4007 · 2023-09-22T07:55:45Z

@pavlovcik There was already a Permit table, guess we should use that instead..

It has all of the fields here and more

Thanks for highlighting this. The only concern I have with the current implementation is that it has an issue ID associated with every permit, but I want to allow the flexibility for us to issue permits without necessarily associating them to issue IDs.

I believe that my specification allows for more flexibility, but I also do not claim to be a database architecture expert.

I also just realized that ChatGPT jumbled up some of the rows. It left some permit specific data inside of the credits table.

seprintour · 2023-09-22T08:02:12Z

I also just realized that ChatGPT jumbled up some of the rows. It left some permit specific data inside of the credits table.

Yea, network, signature and owner is permit data.

Previous permit table contains repository data and issue id to pinpoint the exact origin of the permit. Are you saying we don't need that anymore.

Because none of the tables here refers to a repository (origin) of the payment

seprintour · 2023-09-23T09:21:08Z

Rfc

0x4007 · 2023-09-23T09:26:52Z

The parent table does. #767

seprintour · 2023-09-23T11:04:10Z

The parent table does. #767

Okay then, this issue entail just creating the tables, moving permit data to the new table. Is creating functions to add these data to DB included? @pavlovcik

0x4007 · 2023-09-23T13:07:38Z

The bot should be stable after the database is changed out so yes it includes modifying all the functions that interact with the database.

0x4007 · 2023-09-23T13:07:52Z

/ask how long do you estimate that this task would take, in hours?

ubiquibot · 2023-09-23T13:07:56Z

! Error: This model's maximum context length is 16385 tokens. However, you requested 20407 tokens (4023 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.

0x4007 · 2023-09-23T13:08:29Z

! Error: This model's maximum context length is 16385 tokens. However, you requested 20407 tokens (4023 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.

@Keyrxng time for compression/prioritization? Not a great first real world attempt lol.

Prioritization order:

Current issue specification
Linked issue specification (in order of linked, the first link taking higher priority than the next link)
Current issue conversation
Linked issue conversations (same ordering system)

We should use a tokenization estimator to know how much we should exclude.

seprintour · 2023-09-23T13:24:04Z

@pavlovcik I'm adding penalties to this too?

I see the penalty issue was rolled onto this

0x4007 · 2023-09-23T13:25:21Z

Most likely but let me just get a little time to refine this spec. Sorry I realized after filing the issue that there were some mistakes in my specification. I might break this up into separate tasks.

0x4007 · 2023-09-23T13:26:26Z

/ask how long do you estimate that this task would take, in hours?

ubiquibot · 2023-09-23T13:26:29Z

! Error: This model's maximum context length is 16385 tokens. However, you requested 21257 tokens (4873 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.

0x4007 · 2023-09-23T13:41:22Z

@seprintour I broke it down into four subtasks

seprintour · 2023-09-23T13:47:08Z

@seprintour I broke it down into four subtasks

Okay, credit depends on permit (because permit is going to be modified and it needs to be linked to the permit id after modification)

Settlements depends on credit and debit (If its not merged in this order, the supabase migration will fail to run automatically)

0x4007 · 2023-09-23T13:47:56Z

I noted that as well on the Tables section.

seprintour · 2023-09-23T13:57:59Z

Debits Table Schema Debits Table Schema #809

I also think the amount for debit and credit should be a float or big integer if we are converting decimal to bigint because permit amount is not always in whole numbers

0x4007 · 2023-09-23T13:59:53Z

Lets chat under that issue instead.

Keyrxng · 2023-09-23T17:24:08Z

4023 in the messages, 16384 in the completion

Same as other shout out, you've set the output to be the maximum token amount for that model. max_tokens represents output so you gotta drop the token limit to half or 2/3.

the actual context fed into GPT was only 4k tokens

0x4007 · 2023-09-23T17:42:03Z

I set to 8000 max tokens in the org config and it silent crashes

0x4007 · 2023-09-26T18:37:14Z

just realized that ids are useless. need node_ids and graphql api

0x4007 · 2023-09-26T23:29:38Z

0x4007 · 2023-09-27T00:12:26Z

0x4007 · 2023-09-27T00:38:25Z

I finished the schema but i still need to verify the node id situation. I might need to change the ids under 'location' table to track references on github e.g. org/repo/issue/comment ids
final-database.zip

0x4007 · 2023-09-27T07:00:10Z


access
Access control now has a column for JSON arrays, so that any project can set access control for fully customizable labels.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


user_id
integer NOT NULL


location_id
integer NOT NULL


multiplier
smallint DEFAULT '1'::smallint NOT NULL


labels
json



credits
Permits are optional. Perhaps we can assume XP calculations for credits without permits. Alternatively we need to think about other ways to compensate. For Gnosis Safe "permits", what data can do store instead of permits? rfc @rndquu



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


updated
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


amount
numeric NOT NULL


permit_id
integer



debits
No remarks



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


amount
numeric NOT NULL


location_id
integer



location
Represents where the event occurred. org/repo/issue/comment are all optional fields, because I anticipate that we will track only the most granular level of detail. For example, if we are tracking a comment, we will not need to track the issue or repository.
We are using node_ids because GitHub does not allow direct lookup of resources using numerical IDs. Instead, we would have to "search" for those results.
I'm unsure if it makes the most sense to consolidate to a single column for the node_id or if we should keep them separate. I'm leaning towards keeping them separate, because it will be easier to query for specific events. For example, if we want to see all events that occurred on a specific repository, we can query for node_id_repository.
I included a URL for easy auditing while we are in development.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


updated
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


node_id_organization
text


node_id_repository
text


node_id_issue
text


node_id_comment
text


url
text



logs
I did not put significant thought into this, but I figured we could use it for generic remarks/comments in the database and dump everything into log_entry.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


log_entry
text


location_id
integer



partners
location_id is intended to save the partner organization.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


wallet_id
integer


location_id
integer NOT NULL



permits
Foreign keys to recognized tokens, partners, and users (beneficiary)



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


amount
numeric


nonce
numeric NOT NULL


deadline
numeric NOT NULL


signature
character(132) NOT NULL


token_id
integer


partner_id
integer


beneficiary_id
integer NOT NULL


transaction
character(66)


location_id
integer



tokens
Tokens take a lot of space. Let's make a recognized table of tokens. Perhaps we should also store the token symbol since they are unlikely to change.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


network
smallint NOT NULL


address
character(42) NOT NULL



users
I am debating on if we should store all their GitHub data. We don't have any planned use for it (other than generating embeddings for their solved issues) but even that isn't included in the original user schema which just had their profile information.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


wallet_id
integer



wallets
No remarks.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


address
character(42)


CONSTRAINT
wallets_address_check CHECK ((length(address) = 42))



settlements
No remarks.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


updated
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


user_id
integer NOT NULL


location_id
integer


credit_id
integer


debit_id
integer

0x4007 · 2023-09-27T19:51:25Z

1695844219.zip

0x4007 mentioned this issue Sep 20, 2023

Penalty Improvement - Traceback #767

Closed

0x4007 changed the title ~~Database: creditId~~ Database: settlementId Sep 20, 2023

0x4007 changed the title ~~Database: settlementId~~ Database Schema Design: Settlements Sep 20, 2023

0x4007 added Time: <1 Day Priority: 3 (High) labels Sep 20, 2023

ubiquibot bot added the Price: 450 USD label Sep 20, 2023

ubiquibot bot mentioned this issue Sep 20, 2023

Database Schema Design: Settlements ubiquity/devpool-directory#934

Closed

ubiquibot bot assigned seprintour Sep 21, 2023

0x4007 mentioned this issue Sep 23, 2023

Database Rearchitecture #806

Closed

0x4007 mentioned this issue Sep 23, 2023

/ask Priority - Token Optimization #807

Open

0x4007 removed the Time: <1 Day label Sep 23, 2023

0x4007 removed Priority: 3 (High) Price: 450 USD labels Sep 23, 2023

0x4007 unassigned seprintour Sep 23, 2023

0x4007 mentioned this issue Sep 24, 2023

chore: create permit table and move data #817

Open

seprintour mentioned this issue Sep 25, 2023

Settlements Table Schema #808

Closed

0x4007 mentioned this issue Sep 25, 2023

build: added a settlements schema #833

Closed

0x4007 changed the title ~~Database Schema Design: Settlements~~ Database Schema Design: October 2023 Sep 27, 2023

0x4007 mentioned this issue Sep 27, 2023

Combine unclaimed permits tool ubiquity/pay.ubq.fi#139

Closed

whilefoo mentioned this issue Oct 9, 2023

chore: add credits schema #851

Closed

Database Schema Design: October 2023 #787

Database Schema Design: October 2023 #787

Comments

0x4007 commented Sep 20, 2023 • edited Loading

Overview

Schema

access

credits

debits

location

logs

partners

permits

tokens

users

wallets

settlements

Old Spec (For Reference)

Tables

Implementation Notes

0x4007 commented Sep 20, 2023

wannacfuture commented Sep 20, 2023

ubiquibot bot commented Sep 20, 2023

seprintour commented Sep 21, 2023

ubiquibot bot commented Sep 21, 2023

Tips:

seprintour commented Sep 21, 2023

0x4007 commented Sep 22, 2023 • edited Loading

seprintour commented Sep 22, 2023

seprintour commented Sep 23, 2023

0x4007 commented Sep 23, 2023

seprintour commented Sep 23, 2023

0x4007 commented Sep 23, 2023

0x4007 commented Sep 23, 2023

ubiquibot bot commented Sep 23, 2023

0x4007 commented Sep 23, 2023 • edited Loading

seprintour commented Sep 23, 2023

0x4007 commented Sep 23, 2023 • edited Loading

0x4007 commented Sep 23, 2023

ubiquibot bot commented Sep 23, 2023

0x4007 commented Sep 23, 2023

seprintour commented Sep 23, 2023

0x4007 commented Sep 23, 2023

seprintour commented Sep 23, 2023

0x4007 commented Sep 23, 2023

Keyrxng commented Sep 23, 2023

0x4007 commented Sep 23, 2023

0x4007 commented Sep 26, 2023 • edited Loading

0x4007 commented Sep 26, 2023

0x4007 commented Sep 27, 2023

0x4007 commented Sep 27, 2023 • edited Loading

0x4007 commented Sep 27, 2023 • edited Loading

access

credits

debits

location

logs

partners

permits

tokens

users

wallets

settlements

0x4007 commented Sep 27, 2023

0x4007 commented Sep 20, 2023 •

edited

Loading

0x4007 commented Sep 22, 2023 •

edited

Loading

0x4007 commented Sep 23, 2023 •

edited

Loading

0x4007 commented Sep 23, 2023 •

edited

Loading

0x4007 commented Sep 26, 2023 •

edited

Loading

0x4007 commented Sep 27, 2023 •

edited

Loading

0x4007 commented Sep 27, 2023 •

edited

Loading