feat(#107): Adds multi-db watcher support #113

dianabarsan · 2024-06-10T09:46:01Z

Adds watcher for continuous doc import. Watcher has a timeout of 5 seconds between tries.
Adds multi database support.
Ignores postgres deadlock errors, and adds retry.

Updates e2e tests to:

not use two additional containers to push data to CouchDb, that used python to push data
not have data in a zipped file, where it's literally invisible to the developer. instead using scalability csv docs, and using cht-conf to generate documents that get uploaded before the test runs.
updates test to wait until DBT processing is complete and checks number of results.

#107

# Conflicts: # couch2pg/src/db.js # couch2pg/src/index.js

witash

LGTM

Because I was messing around with dbt models at the same time, I had trouble running the e2e tests; the counts in the dbt tables didn't match the number of documents generated by csv-to-docs, and so waitForDBT waited forever and timed out. This means that the dbt models were not working and it's valid for tests to fail, but it's unexpected to have changes in one repository break tests in another. I don't have any alternative suggestions, just noting it, there is already a long (and not very productive) discussion of this here.

I also used this branch to add more detailed tests for base models; having dbt dockerized and using csv-to-docs to add test data is much easier to work with than what is in cht-pipeline currently, where I'm getting version and test data issues. But then, do those more detailed tests belong in this repository? maybe instead should do something similar in the cht-pipeline repository...but also don't want to have copy/pasted code that diverges

witash · 2024-06-17T07:53:43Z

tests/utils/couchdb-utils.js

+  const opts =  { auth: { username: env.COUCHDB_USER, password: env.COUCHDB_PASSWORD, skip_setup: false } };
+  const db = new PouchDb(dbUrl(dbName), opts);
+
+  const dbDocs = docs.map(doc => ({  ...doc, _id: `${dbName}-${doc._id}` }));


${dbName}-${doc._id} instead of ${doc._id} adds some weirdness with references, is it really necessary?

Yes :)
Because otherwise both databases would have the same docs. And I want them to have "different" docs.

not sure exactly what you mean...like the two couchdb databases, medic and medic_sentinel for example, would have the same docs? if so, its a problem for references, like in the json docs if a contact "aaa" has a parent "bbb", it ends up in the db with id "medic-bbb" but without replacing the reference, "aaa" ends up in postgres with parent id "bbb", which now doesn't exist...

or the "source" (the json docs) and "target" (test couchdb) database have the same docs? in which case why do they need to be different?

Ummmm lol. welcome to how our PG works.
All CHT databases are copied into a single PG table, this has been the case forever.
The unique field for documents is _id, which should not have conflicts across these CHT databases (ideally - this is sort of naive but this is how people want it), mainly because they are uuids with low conflict rates, and we purposefully NEVER create a doc with the same uuid in two databases, even if two docs are "connected", their _id fields are either sufixed or prefixed with something.

In the case of this test, I wanted to check that both databases have their docs copied to the main pg table. This means that all docs would need to have different _id fields, otherwise they would get overwritten when we dump the _rev field, so I chose to prefix them.

I don't exactly understand the source of confusion here.

OK so its just for this test where the same document is in both databases, which is not a situation we would expect in the real world.

the problem is when expanding these tests beyond just counting the number of documents; in postgres the expectation is that if you have a document in couchdb with an _id you expect to be able to find a row the corresponding tables in postgres with that same id. Having to add a prefix is maybe ok, but adds complexity. Then if a document in couchdb refers to another document (e.g. contact with a parent), that needs to translate into a foreign key relationship. And there it breaks completely if the documents have their ids changed but references not changed.

But it is only an issue where expanding these tests, so I guess beyond the scope of this PR

Oh, yea, you're right, when we introduce relations between these docs and want to test the DBT model, then the _ids are relevant.
I agree that this would be an issue in a dbt model oriented test, but I structured these as sync copy sanity check tests.
We can easily revisit them when or if we want to change the tests to cover models too instead of just the copying.

dianabarsan · 2024-06-17T07:56:00Z

I agree with the whole weirdness around e2e tests in this repo depending on another repository. I really struggled not to make this change in this iteration, because I wanted to keep this relatively contained to the watcher + multi db support.

I think the simplest solution right now is to have a sort of test-branch for cht-pipeline instead of passing main to e2e tests here.
But the best would be to somehow source the DBT models to this repo.

# 1.0.0 (2024-09-10) ### Bug Fixes * Change env variables according to cht pipeline updates ([#71](#71)) ([c89aadf](c89aadf)) * Fix numbering ([#50](#50)) ([5c93300](5c93300)) ### Features * **#107:** Adds multi-db watcher support ([#113](#113)) ([279d8f2](279d8f2)), closes [#107](#107) [#107](#107) * **#112:** drop support for multiple copies of every document ([#115](#115)) ([b46f288](b46f288)), closes [#112](#112) [#118](#118) * **#129:** add back automatic pipeline updates ([#130](#130)) ([fc73fd7](fc73fd7)), closes [#129](#129) [#129](#129) [#129](#129) * **#1:** first release ([ff0fedd](ff0fedd)), closes [#1](#1) * **#25:** custom databases ([#33](#33)) ([cd10db0](cd10db0)), closes [#25](#25) * **#78:** full refresh on changed objects, only incremental runs continously ([0869ee9](0869ee9)), closes [#78](https://github.com/medic/cht-sync/issues/78) * add versioning and releases ([a528aba](a528aba)) * bind sequence token path to host for persistence ([#88](#88)) ([e1c3953](e1c3953)) * remove superset container and update Readme ([#64](#64)) ([8acbc93](8acbc93)) * update logstash base image version and update default configuration files ([#61](#61)) ([674582d](674582d)) * update postgres version to 16 ([8bf1e84](8bf1e84))

medic-ci · 2024-09-10T14:27:41Z

🎉 This PR is included in version 1.0.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

dianabarsan added 25 commits June 3, 2024 20:34

initial commit

ca62d9a

adds batch one unit tests

b2a9e2d

adds more unit tests

6c5daec

install on build

cfbdb9c

remove logstash and small dbt image

c8fdb36

js e2e tests

d74c108

bunch of test updates

3348555

bunch of test updates

4618b33

adds ignore scripts

ef01326

remove postgrest

b56f4cb

adds watcher tests

848350f

Merge branch 'refs/heads/main' into 107-couch2pg-watch

d18702f

# Conflicts: # couch2pg/src/db.js # couch2pg/src/index.js

fix linting

e84037f

use npm and latest node

5040421

use npm and latest node

ac2fa0e

use npm and latest node

2954432

use npm and latest node

404bb25

use npm and latest node

db1f568

reduce delay

74bfe61

reduce delay

d4a8c12

update e2e test

1230916

fix linting

789e2c6

fix linting

89d386f

remove debug log

090d22c

rename test

3758584

dianabarsan requested a review from witash June 12, 2024 08:10

witash approved these changes Jun 17, 2024

View reviewed changes

witash reviewed Jun 17, 2024

View reviewed changes

dianabarsan merged commit 279d8f2 into main Jun 18, 2024
5 checks passed

dianabarsan deleted the 107-couch2pg-watch branch June 18, 2024 11:44

medic-ci added the released label Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#107): Adds multi-db watcher support #113

feat(#107): Adds multi-db watcher support #113

dianabarsan commented Jun 10, 2024 •

edited

Loading

witash left a comment

witash Jun 17, 2024

dianabarsan Jun 17, 2024

witash Jun 17, 2024

dianabarsan Jun 17, 2024 •

edited

Loading

witash Jun 17, 2024

dianabarsan Jun 17, 2024

dianabarsan commented Jun 17, 2024

medic-ci commented Sep 10, 2024

feat(#107): Adds multi-db watcher support #113

feat(#107): Adds multi-db watcher support #113

Conversation

dianabarsan commented Jun 10, 2024 • edited Loading

witash left a comment

Choose a reason for hiding this comment

witash Jun 17, 2024

Choose a reason for hiding this comment

dianabarsan Jun 17, 2024

Choose a reason for hiding this comment

witash Jun 17, 2024

Choose a reason for hiding this comment

dianabarsan Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

witash Jun 17, 2024

Choose a reason for hiding this comment

dianabarsan Jun 17, 2024

Choose a reason for hiding this comment

dianabarsan commented Jun 17, 2024

medic-ci commented Sep 10, 2024

dianabarsan commented Jun 10, 2024 •

edited

Loading

dianabarsan Jun 17, 2024 •

edited

Loading