Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log Service Setup #1721

Merged
merged 14 commits into from
Feb 16, 2024
Merged

Log Service Setup #1721

merged 14 commits into from
Feb 16, 2024

Conversation

weiligu
Copy link
Contributor

@weiligu weiligu commented Feb 15, 2024

Description of changes

https://linear.app/trychroma/issue/CHR-241/stand-up-log-service

  • Stand up Log Service in Dev
    • stand up postgres DB
    • stand up migration: atlas - depend on postgres
    • stand up logservice - depend on migration
    • stand up coordinator - depend on migration
  • database migration
    • change env name
    • change database name
    • add definition for reccord log (we can test perf for this later, not hard to change)
  • log service: go
    • entry point: main with Cmd
    • grpc service: with proto change
  • coordinator
    • connect to docker postgres
    • reorganize packages to accommodate with logservice
    • rename bin to coordinator instead of chroma
    • tests connect to local postgres instead of sqlite
      • fix a bug from segment delete

Test plan

How are these changes tested?

  • Tilt Up successfully run, docker containers successfully starts
  • Coordinator tests passes

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Cmd.Flags().StringVar(&conf.Username, "username", "root", "MetaTable username")
Cmd.Flags().StringVar(&conf.Password, "password", "", "MetaTable password")
Cmd.Flags().StringVar(&conf.Address, "db-address", "127.0.0.1", "MetaTable db address")
Cmd.Flags().StringVar(&conf.SystemCatalogProvider, "system-catalog-provider", "database", "System catalog provider")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure want to change the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes because memory is not Postgres and I want to have local end to end set up to include postgres

Copy link
Collaborator

@HammadB HammadB Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, can we add some tooling to clear/nuke postgres in that case as a follow up? its useful while debugging to be able to easily clear the sysdb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 parts of this:

  1. unit tests: should clean up the state - add setup and teardown (I can add this, very easy)
  2. local tests: scripts to nuke, for now the easiest thing is to just drop and re-create the db
  3. docker end to end test: there is a refresh button in tilt UI, one click should do the work

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker restart should not remove state. We should use volumes correctly and have a script provided for People to reset the db

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not use volume for the local docker setup as of now. can go add that later if we want persistence.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use volumes. Lets do the right thing from the get go please!

@@ -21,7 +21,7 @@ func (s *segmentMetadataDb) DeleteBySegmentID(segmentID string) error {
func (s *segmentMetadataDb) DeleteBySegmentIDAndKeys(segmentID string, keys []string) error {
return s.db.
Where("segment_id = ?", segmentID).
Where("`key` IN ?", keys).
Where("key IN ?", keys).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason for change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because unit tests were using sqlite and this throws an error when I change them to postgres

@@ -0,0 +1,39 @@
apiVersion: apps/v1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: probably not wanted in dev/. dev/ is meant for deployments we only use while developing (like the postgres service)

Copy link
Contributor

@nicolasgere nicolasgere Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should rename it, but the /dev is use by tilt only

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops! my bad - i confused this for test/ - I didn't realize we added dev/.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add these to deployments/ too then?

Copy link
Contributor

@nicolasgere nicolasgere Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think yet we decided how to do sql migration EDIT: my comment was not related to the right file.

Yes, we should also add it in deployment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I was confused? This is already in dev and we don't want to add this to deployment because we probably will do this differently?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/chroma-core/chroma/blob/main/k8s/deployment/kubernetes.yaml we should add the deployment for the log service in that file too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see, this is what is used for minikube and kube apply. ok I can add that. Can we make them into one version later?

Copy link
Collaborator

@HammadB HammadB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. Please try to break up larger PRs in the future. There are several logically separate changes in this PR. I ask that we make sure @Ishiihara Has a chance to look at this before we merge, given that he's done the bulk of our Go work to date.

@HammadB
Copy link
Collaborator

HammadB commented Feb 15, 2024

Also the coordinator tests are segfaulting.

FAIL: TestCreateGetDeleteCollections (0.01s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference

https://github.com/chroma-core/chroma/actions/runs/7909839228/job/21591525871?pr=1721

@weiligu
Copy link
Contributor Author

weiligu commented Feb 15, 2024

Also the coordinator tests are segfaulting.

FAIL: TestCreateGetDeleteCollections (0.01s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference

https://github.com/chroma-core/chroma/actions/runs/7909839228/job/21591525871?pr=1721

yea this is because we don't have postgres in our github setups. I will post to the team channel about this

@weiligu
Copy link
Contributor Author

weiligu commented Feb 15, 2024

This makes sense. Please try to break up larger PRs in the future. There are several logically separate changes in this PR. I ask that we make sure @Ishiihara Has a chance to look at this before we merge, given that he's done the bulk of our Go work to date.

He is out until next week. Are you ok with me merging but ask him to review and make changes in separate PR next week?

@HammadB
Copy link
Collaborator

HammadB commented Feb 15, 2024

This makes sense. Please try to break up larger PRs in the future. There are several logically separate changes in this PR. I ask that we make sure @Ishiihara Has a chance to look at this before we merge, given that he's done the bulk of our Go work to date.

He is out until next week. Are you ok with me merging but ask him to review and make changes in separate PR next week?

Yeah that is fine - this was before he sent his OOO :)

Copy link
Contributor

@beggers beggers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly this seems like two separate changes to me:

  • Initialize logservice code and stand up log service.
  • Move coordinator off sqlite for testing.

Could we split these into two PRs? If I'm mistaken or there's a strict requirement for these to be 1 PR please tell me.

@@ -188,7 +258,7 @@ spec:
spec:
containers:
- command:
- "chroma"
- "coordinator"
- "coordinator"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I feel a little weird about these commands being coordinator coordinator and logservice logservice. Could we make them coordinator run and logservice run or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea sounds good to me I will open a follow up pr for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the logservice living in go/coordinator. Could we move it to go/logservice instead?

Copy link
Contributor Author

@weiligu weiligu Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan is to have:

  • coordinator (we can rename this to sth else later)
    • logservice
    • sysdb
    • cluster manager

does that sounds good to you?

@weiligu weiligu merged commit 93194c8 into main Feb 16, 2024
94 of 97 checks passed
@weiligu
Copy link
Contributor Author

weiligu commented Feb 16, 2024

If I'm understanding correctly this seems like two separate changes to me:

  • Initialize logservice code and stand up log service.
  • Move coordinator off sqlite for testing.

Could we split these into two PRs? If I'm mistaken or there's a strict requirement for these to be 1 PR please tell me.

Yes they don't have to come in one PR. The logservice shares the same code connecting to Postgres so I made the change in one PR. Will separate them next time!

atroyn pushed a commit to csbasil/chroma that referenced this pull request Apr 3, 2024
## Description of changes
https://linear.app/trychroma/issue/CHR-241/stand-up-log-service

- Stand up Log Service in Dev
  - stand up postgres DB
  - stand up migration: atlas - depend on postgres
  - stand up logservice - depend on migration
  - stand up coordinator - depend on migration
- database migration
  - change env name
  - change database name
- add definition for reccord log (we can test perf for this later, not
hard to change)
- log service: go
  - entry point: main with Cmd
  - grpc service: with proto change
- coordinator
  - connect to docker postgres
  - reorganize packages to accommodate with logservice
  - rename bin to coordinator instead of chroma
  - tests connect to local postgres instead of sqlite
    - fix a bug from segment delete

- system_test fix will be in a separate PR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants