Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add lens migration engine to defra #1564

Merged
merged 60 commits into from
Jul 13, 2023

Conversation

AndrewSisley
Copy link
Contributor

@AndrewSisley AndrewSisley commented Jun 9, 2023

Relevant issue(s)

Resolves #1448 #1556

Known out of scope issues documented with inline todo comments:

I see none of these as blockers for a first release, feel very free to argue with me on that though. Some may be ticked off anyway if I find myself waiting for a while on this PR. Some are quite small and likely quick to fix, but this PR is plenty large already.

Todos (within this PR):

  • Update enumerable package dependency
  • Update lens repo dependencies
  • Document new test-build requirements (rust+rust-wasm)
  • Document breaking changes (key stuff)
  • The run tests job is failing in the CI only, it appears to deadlock calling wasmer.NewModule, GOMAXPROCS has no impact on my machine. Code-cov job works fine. Disabling -race had no effect. lens pool size was unnecessarily large for the tests, it has been shrunk.
  • Add Db restart test
  • Only test lens in make test:lens and make test:ci, update contributors doc
  • Add MIT license to tests/lenses dir root
  • Add test for fetcher with non queried field (two requests required) (likely broken, set source fetcher fields to nil if migrations found)
  • chop lens prop name prefix from registry props

Description

Adds lens as a migration engine to defra.

Migrations are run lazily, on fetch. This includes on updates (locally, and via P2P). The migration result is cached within the datastore. The DAG is never updated to reflect the migration result, including when the migrations are executed during an update. The current schema version of items in the datastore is tracked at the document level.

Commits should be clean, and contain some hopefully handy documentation, although the bulk of the work is in the last, large commit.

Notes

  • Noted briefly in the todo section, this does make rust, and rust-wasm dependencies of make test, I'll document that before merge as documented within CONTRIBUTING.md. Just in case the question is asked - no I really do not want to write the test modules in Go, Go wasm support is currently awful and it is not worth the hassle at the moment.
  • The CI build does not pass as a few dependencies are just local paths at the moment. This will change when the Lens repo is open sourced, and the immutable (enumerables) repo is updated. The linter and make test passes on my machine. The change detector will not pass atm, as this PR contains breaking changes that will be documented shortly before merge.
  • Lens repo is here, if anyone wants a quick link: https://github.com/lens-vm/lens
  • Performance has taken a bit of a hit - roughly 150µs per fetcher Start call (on my laptop) due to the getHistory func.

@AndrewSisley AndrewSisley added feature New feature or request area/schema Related to the schema system action/no-benchmark Skips the action that runs the benchmark. labels Jun 9, 2023
@AndrewSisley AndrewSisley added this to the DefraDB v0.6 milestone Jun 9, 2023
@AndrewSisley AndrewSisley self-assigned this Jun 9, 2023
@AndrewSisley AndrewSisley force-pushed the 1448-lens-in-defra branch 24 times, most recently from 96d12e1 to fee3881 Compare June 15, 2023 17:13
@AndrewSisley AndrewSisley merged commit 5d74fe2 into sourcenetwork:develop Jul 13, 2023
10 checks passed
@AndrewSisley AndrewSisley deleted the 1448-lens-in-defra branch July 13, 2023 10:26
@islamaliev
Copy link
Contributor

islamaliev commented Jul 21, 2023

note (bug bash): no way to get schemaVersionID

At the moment there is no easy way to get a schema's version id (that is required to create migrations). I used quick fix made by John on branch https://github.com/sourcenetwork/defradb/tree/jsimnz/feat/schema-list-id-extension that extends client schema list to show schema version ids

@islamaliev
Copy link
Contributor

islamaliev commented Jul 21, 2023

todo (bug bash): improve UX with ~ in path

Path from home doesn’t work. Executed command:

defradb client schema migration set bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxq '{"Lenses": [{"Path": "~/dev/wasm/set_default.wasm", "Arguments": {"dst":"verified", "value": true}}]}'

Got

2023-07-21T10:42:38.278+0200, ERROR, cli, logging/logger.go:107, Failed to set schema migration, {"Errors": [{"message":"open ~/dev/wasm/set_default.wasm: no such file or directory"}]}


With a path from root works
:

defradb client schema migration set bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxq '{"Lenses": [{"Path": "/Users/islam/dev/wasm/set_default.wasm", "Arguments": {"dst":"verified", "value": true}}]}'
2023-07-21T10:42:02.350+0200, INFO, cli, logging/logger.go:98, Successfully set schema migration

If we don't support ~ then I think the message should clearly say that, instaed of "no such file or directory"

@islamaliev
Copy link
Contributor

islamaliev commented Jul 21, 2023

todo (bug bash): remove duplicated output for migrate get

Migration get outputs the result twice:

defradb client schema migration get
2023-07-21T10:59:18.911+0200, INFO, cli, logging/logger.go:98, {"data":{"configuration":[{"SourceSchemaVersionID":"bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta","DestinationSchemaVersionID":"bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxq","Lenses":[{"Path":"/Users/islam/dev/wasm/set_default.wasm","Inverse":false,"Arguments":{"dst":"verified","value":true}}]}]}}
2023-07-21T10:59:18.911+0200, INFO, cli, logging/logger.go:98, Successfully got schema migrations, {"Configuration": [{"SourceSchemaVersionID":"bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta","DestinationSchemaVersionID":"bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxq","Lenses":[{"Path":"/Users/islam/dev/wasm/set_default.wasm","Inverse":false,"Arguments":{"dst":"verified","value":true}}]}]}

That's not ideal user experience and we should fix it.

@islamaliev
Copy link
Contributor

todo (bug bash): fix migration

created a schema:

type Users {
	age: Int
	name: String
}

with schema version bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta

Created a doc:

{
    name: "John"
    age: 27
}

Dockey: bae-88b63198-7d38-5714-a9ff-21ba46374fd1

Patched the schema by adding verified field:

type Users {
	age: Int
	name: String
	verified: Boolean
}

schema version: bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxq

Added migration:

defradb client schema migration set bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxq '{"Lenses": [{"Path": "/Users/islam/dev/wasm/set_default.wasm", "Arguments": {"dst":"verified", "value": true}}]}'
2023-07-21T10:42:02.350+0200, INFO, cli, logging/logger.go:98, Successfully set schema migration

Queried users with Altair GraphQL client:

query QueryUsers {
  Users(dockey: "bae-88b63198-7d38-5714-a9ff-21ba46374fd1") {
    _key
    name
    age
    verified
    _version {
      cid
      schemaVersionId
    }
  }
}

Got:

{
  "data": [
    {
      "_key": "bae-88b63198-7d38-5714-a9ff-21ba46374fd1",
      "_version": [
        {
          "cid": "bafybeievhpxpptz2y2e4xzutpm2xek4zbvdv25hevt3hv6elxvhysan2t4",
          "schemaVersionId": "bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta"
        }
      ],
      "age": 27,
      "name": "John",
      "verified": null
    }
  ]
}

@islamaliev
Copy link
Contributor

note (bug bash): successful (allegedly) migration is not shown with client migration get

Executed migration set command using non-existing source schema version id:

$ defradb client schema migration set bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxqw '{"lenses": [{"path": "/Users/islam/dev/wasm/set_default.wasm", "arguments": {"dst":"verified", "value": true}}]}'
2023-07-24T10:52:32.055+0200, INFO, cli, logging/logger.go:98, Successfully set schema migration

Execuged migration get:

$ defradb client schema migration get
2023-07-24T10:52:35.003+0200, INFO, cli, logging/logger.go:98, {"data":{"configuration":[{"SourceSchemaVersionID":"bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfntae","DestinationSchemaVersionID":"bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxq","Lenses":[{"Path":"/Users/islam/dev/wasm/set_default.wasm","Inverse":false,"Arguments":{"dst":"verified","value":true}}]},{"SourceSchemaVersionID":"bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta","DestinationSchemaVersionID":"bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxqw","Lenses":[{"Path":"/Users/islam/dev/wasm/set_default.wasm","Inverse":false,"Arguments":{"dst":"verified","value":true}}]}]}}

Formattet json:

{
	"data": {
		"configuration": [{
			"SourceSchemaVersionID": "bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfntae",
			"DestinationSchemaVersionID": "bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxq",
			"Lenses": [{
				"Path": "/Users/islam/dev/wasm/set_default.wasm",
				"Inverse": false,
				"Arguments": {
					"dst": "verified",
					"value": true
				}
			}]
		}, {
			"SourceSchemaVersionID": "bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta",
			"DestinationSchemaVersionID": "bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxq",
			"Lenses": [{
				"Path": "/Users/islam/dev/wasm/set_default.wasm",
				"Inverse": false,
				"Arguments": {
					"dst": "verified",
					"value": true
				}
			}]
		}]
	}
}

Observe that just added migration is not there. Looks like it shouldn't be successful.

@islamaliev
Copy link
Contributor

todo (bug bash): regard source schema version id or remove it

I entered non-existing schema version id (with "w" in the end) for migration:

$ defradb client schema migration set bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxqw '{"lenses": [{"path": "/Users/islam/dev/wasm/set_default.wasm", "arguments": {"dst":"verified", "value": true}}]}'
2023-07-24T11:11:37.568+0200, INFO, cli, logging/logger.go:98, Successfully set schema migration
$ defradb client schema migration get
2023-07-24T11:12:38.420+0200, INFO, cli, logging/logger.go:98, {"data":{"configuration":[{"SourceSchemaVersionID":"bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta","DestinationSchemaVersionID":"bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxqw","Lenses":[{"Path":"/Users/islam/dev/wasm/set_default.wasm","Inverse":false,"Arguments":{"dst":"verified","value":true}}]}]}}

Afterwards fetched the document:

{
  "data": [
    {
      "_key": "bae-88b63198-7d38-5714-a9ff-21ba46374fd1",
      "age": 27,
      "name": "John",
      "verified": true
    }
  ]
}

@islamaliev
Copy link
Contributor

note (bug bash): client migrate get doesn't show the first migration

I have 2 migrations, but client migrate get shows only the second one:

$ defradb client schema migration get
2023-07-24T12:29:07.769+0200, INFO, cli, logging/logger.go:98, {"data":{"configuration":[{"SourceSchemaVersionID":"bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxq","DestinationSchemaVersionID":"bafkreidady7elh62qwextzkyfzzqohfrqvt2hcqhufgoclvfbypsrqx52q","Lenses":[{"Path":"/Users/islam/dev/wasm/remove.wasm","Inverse":false,"Arguments":{"target":"verified"}},{"Path":"/Users/islam/dev/wasm/copy.wasm","Inverse":false,"Arguments":{"dst":"exp","src":"age"}}]}]}}

The first one was:

{
	"data": {
		"configuration": [{
			"SourceSchemaVersionID": "bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta",
			"DestinationSchemaVersionID": "bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxqw",
			"Lenses": [{
				"Path": "/Users/islam/dev/wasm/set_default.wasm",
				"Inverse": false,
				"Arguments": {
					"dst": "verified",
					"value": true
				}
			}]
		}]
	}
}

@AndrewSisley
Copy link
Contributor Author

todo (bug bash): regard source schema version id or remove it
I entered non-existing schema version id (with "w" in the end) for migration:

This is by design, and quite important for P2P. Discussed in discord.

note (bug bash): client migrate get doesn't show the first migration

There can only be a single migration between any two versions, the second set will have overwritten the first. This is by design.

@islamaliev
Copy link
Contributor

islamaliev commented Jul 24, 2023

question (bug bash): is it fine to delete _key as part of a migration?

I created a migration:

$ defradb client schema migration set bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxq bafkreidady7elh62qwextzkyfzzqohfrqvt2hcqhufgoclvfbypsrqx52q '{"lenses": [{"path": "/Users/islam/dev/wasm/remove.wasm", "arguments": {"target":"_key"}}]}'

Now queries:

query QueryUsers {
  Users {
    _key
    name
    age
    _version {
      schemaVersionId
    }
  }
}

return:

{
  "data": [
    {
      "_key": null,
      "_version": null,
      "age": 27,
      "name": "John",
    },
    {
      "_key": null,
      "_version": null,
      "age": 33,
      "name": "Islam",
    }
  ]
}

@AndrewSisley
Copy link
Contributor Author

question (bug bash): is it fine to delete _key as part of a migration?

It shouldn't break anything I think (worth checking the datastore dump to make sure it is still cached under a non-nil key prefix), but it would be very strange to write/configure a migration that would do such a thing. That said, I'm not sure the nil key value would be cached, and it might be if you re-query (to read from cache) that the key value reappears.

Might be worth adding a set of tests to document whatever behaviour exists here.

@islamaliev
Copy link
Contributor

question (bug bash): is invalid migration supposed to be cached?

For this patch:

[{ "op": "add", "path": "/Users/Schema/Fields/-", "value": {"Name": "verified", "Kind": "Boolean"} }]

created a migration:

{"lenses": [{"path": "/Users/islam/dev/wasm/set_default.wasm", "arguments": {"dst":"verified", "value": "invalid"}}]}

On querying document verified field was null.
Then set this migration:

{"lenses": [{"path": "/Users/islam/dev/wasm/set_default.wasm", "arguments": {"dst":"verified", "value": 5}}]}

Field verified was still null
Then set this migration:

{"lenses": [{"path": "/Users/islam/dev/wasm/set_default.wasm", "arguments": {"dst":"verified", "value": false}}]}

Field verified was still null!
From the user's perspective that doesn't look healthy.

$ defradb client schema migration get
2023-07-25T09:18:37.778+0200, INFO, cli, logging/logger.go:98, {"data":{"configuration":[{"SourceSchemaVersionID":"bafkreidovoxkxttybaew2qraoelormm63ilutzms7wlwmcr3xru44hfnta","DestinationSchemaVersionID":"bafkreia4bbxhtqwzw4smby5xsqxv6ptoc6ijc6v3lmnlv66twpfak5gxxq","Lenses":[{"Path":"/Users/islam/dev/wasm/set_default.wasm","Inverse":false,"Arguments":{"dst":"verified","value":false}}]}]}}

@islamaliev
Copy link
Contributor

islamaliev commented Jul 25, 2023

note (bug bash): linking a relational object using alias

Schema:

# Schema ID: bafkreibaextmibnllnex7ewwn5ednh6r3znil55mljbl2t3gjgh6unxwra
# Version ID: bafkreibaextmibnllnex7ewwn5ednh6r3znil55mljbl2t3gjgh6unxwra
type Companies {
	employees: [Users]
	name: String
}

# Schema ID: bafkreia7zqb4jlmjqdksyqxhfimwgxotp3qgflfku7nhwkayy7mr4ojflq
# Version ID: bafkreia7zqb4jlmjqdksyqxhfimwgxotp3qgflfku7nhwkayy7mr4ojflq
type Users {
	company: Companies
	name: String
}

Added users.
Added (via patching) verified field to Users.
Set migration for Users:

$ defradb client schema migration set bafkreia7zqb4jlmjqdksyqxhfimwgxotp3qgflfku7nhwkayy7mr4ojflq bafkreiazbfbzel5iixrhgleqrnbb7u6sqkqav7oh54x4qfiguo6eofllzq '{"lenses": [{"path": "/Users/islam/dev/wasm/set_default.wasm", "arguments": {"dst":"company", "value": "bae-de8c99bf-ee0e-5655-8a72-919c2d459a30"}}]}'

Upon requesting a user whose company value was null observed the same:

{
  "data": [
    {
      "_key": "bae-3f1174ba-d9bc-5a6a-b0bc-8f19581f199d",
      "company": null,
      "name": "Islam"
    },
    {
      "_key": "bae-f456e9a1-d539-5ce4-94d7-a2f008df34b4",
      "company": {
        "_key": "bae-de8c99bf-ee0e-5655-8a72-919c2d459a30",
        "name": "Source"
      },
      "name": "John"
    }
  ]
}

Note: when using company_id in lens config it works

@AndrewSisley
Copy link
Contributor Author

linking a relational object using alias

I've added a ticket for this: #1710

@islamaliev
Copy link
Contributor

todo (bug bash): fix copying into _key

The setup up the same as above with one Users collection.
Created a migration:

{"lenses": [{"path": "/Users/islam/dev/wasm/copy.wasm", "arguments": {"src":"age", "dst": "_key"}}]}

Queried users and got:

{
  "errors": [
    "invalid key string"
  ],
  "data": null
}

shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
## Relevant issue(s)

Resolves sourcenetwork#1448 sourcenetwork#1556

Adds lens as a migration engine to defra.

Migrations are run lazily, on fetch. This includes on updates (locally,
and via P2P). The migration result is cached within the datastore. The
DAG is never updated to reflect the migration result, including when the
migrations are executed during an update. The current schema version of
items in the datastore is tracked at the document level.

Commits should be clean, and contain some hopefully handy documentation,
although the bulk of the work is in the last, large commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/schema Related to the schema system feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add schema migration system to Defra
5 participants