fix: Remove limit for fetching secondary docs #2594

islamaliev · 2024-05-06T14:38:54Z

Relevant issue(s)

Description

Make join fetch all secondary docs of a fetched-by-index primary doc.

fredcarle · 2024-05-06T17:08:02Z

@islamaliev are we sure the removal of the limit is the way to go? It feels like it's a bug with passing the correct limit along.

islamaliev · 2024-05-06T17:15:22Z

@islamaliev are we sure the removal of the limit is the way to go? It feels like it's a bug with passing the correct limit along.

That particular limit was a way to differentiate between 1-to-1 and 1-to-many relations. All previous tests passed before because they happened to have (or filtered out all but) one related object, so the limit accidentally matched.
You can see from the tests how few tests I had to adjust to accommodate this change.
This needs to be solved anyway in a more intelligent way by indexing relations, which I wanted to discuss today.

AndrewSisley · 2024-05-06T17:35:36Z

tests/integration/index/docs.go

@@ -216,7 +216,7 @@ func getUserDocs() predefined.DocsList {
 					},
 					{
 						"model": "Playstation 5",
-						"year":  2022,
+						"year":  2021,


question: Why has this changed?

so that this query can fetch it:

query { Device(filter: { year: {_eq: 2021} }) { model owner { name } } }

And also to make it consistent with another "Playstation 5".
It's not used in other tests anyway.

AndrewSisley · 2024-05-06T17:38:17Z

tests/integration/index/query_with_relation_filter_test.go

@@ -553,3 +553,113 @@ func TestQueryWithIndexOnOneToOne_IfFilterOnIndexedRelation_ShouldFilter(t *test

 	testUtils.ExecuteTestCase(t, test)
 }
+
+func TestQueryWithIndexOnManyToOne_IfFilterOnIndexedField_ShouldFilterWithExplain(t *testing.T) {


question: Was there a behaviour change here? Before the production code changed what would the results have been?

no, there was no change here. The test was passed in the beginning, but I realized that an explicit test was missing for this scenario.

What has changed RE publicly visible production behaviour?

you ask the same question but rephrased. The answer is the same: there was no change.
Production code did not change to make this test green.

My second question was not targeted at this particular test, I should have made that clearer.

You have answered that question for this test, and the existing explain one, but what about TestQueryWithIndexOnManyToOne_IfFilterOnIndexedRelation_ShouldFilterWithExplain - does that behave the same way before and after the production code change?

primary is User, Device is secondary.

It is not, device references user, not the other way around. Not really important though :)

it's not important indeed, but I'd like to get the terminology straight.
We usually apply these terms to relations 'User.devices' or 'Device.owner'.
In these examples User.devices is secondary and Device.owner is primary, right? (I hope I'm not mixing it up)
For me it means is a relation is primary, it refers to primary document, as opposed to being a field of a primary document.

In these examples User.devices is secondary and Device.owner is primary

Yes :) The side that contains the id of the other document is the primary side - it is perhaps easier to see in one-ones.

For me it means is a relation is primary, it refers to primary document, as opposed to being a field of a primary document.

Ah I see how you are seeing it, that makes sense to me. We are just talking about different things :) The primary side of the relation is on Device, but the primary object could be seen as the User.

That means we agree that a self-sufficient object (meaning it has no references to another one) is considered primary. In our case User.
And the dependent object that has a reference (via primary relation field) to another object is considered secondary. In this case Device.

AndrewSisley · 2024-05-06T17:41:34Z

tests/integration/index/query_with_relation_filter_test.go

+	testUtils.ExecuteTestCase(t, test)
+}
+
+func TestQueryWithIndexOnManyToOne_IfFilterOnIndexedRelation_ShouldFilterWithExplain(t *testing.T) {


todo: This test is really painful to try and read, to try and figure out what has/hasn't been filtered out I am required to read (and hold in memory) nearly 500 lines worth of json documents (without help from any comments). Can you please simplify this a little bit so that we can see what is going on.

I'm not quite sure what you mean by "simplify". The whole test is around 40 lines of code.
I added some comments.

I cannot readily see the relationship between the (hidden) inputs, and the asserted outputs. At the moment this test is not very valuable as documentation - it only asserts that the current (obscured) behaviour does not change, it does not describe what the current behaviour is (in a way that is readable to humans).

One of the bugs that was fixed last week was leaked due to a similar failing in the tests. It was tested, however the test asserted the behaviour that was observed, not the behaviour that was desired - and because the test was impossible to read the bug was discovered by a partner instead of us during development.

The whole test is around 40 lines of code.

If you don't care about what data the test is querying, if you do, it is closer to 500 lines.

At the moment this test is not very valuable as documentation

no tests are particularly valuable as documentation if they contain over 100 lines of code with most of them being copy&paste of one another.

it does not describe what the current behaviour is (in a way that is readable to humans).

Are you sure you are referring to the right test? The test under question initially failed, which means it wasn't an observable behaviour. In order to make it green - so that it behaves in the desired way - I had to change production code.

One of the bugs that was fixed last week was leaked due to a similar failing in the tests. It was tested, however the test asserted the behaviour that was observed, not the behaviour that was desired - and because the test was impossible to read the bug was discovered by a partner instead of us during development.

I'm not sure if you intentionally phrased it this way oversimplifying. I can give you a dozen of other factor thats contribute to it.

The test are possible to read. I would kindly ask you to write next time "it was had for me to read" so that we don't spend time arguing about subjective matter and try find a solution instead.

Do you think addresses and device specs are useful to this test? They are never queried. Most of the fields on the two collections used are not touched either. Anyone reading or debugging the test is forced to deal with them.

holy cow! Now I see why you are concerned so much about the setup.
But this is not how generation of predefined documents work. All the extra collections and even fields of collections that are not defined in the schema (given to SchemaUpdate) are ignored. Please refer to the readme of the package https://github.com/sourcenetwork/defradb/tree/develop/tests/predefined

Hahahaha 😁 That does make things an awful lot better than I thought 😁

It doesn't really remove the readabilty problem though (and in some ways makes it worse, although the trade off is well worth it in order to trim the data down).

I think documentation on the test action needs to be expanded, that readme is not as useful as it could be if it were linked to from the action, with a brief 2 line summary (out of scope here, not asking for that in this PR :))

That's nice though, we can largely ignore my prior debugging concerns here :)

I think with #2592 the importance of the normal non-explain request would be greatly reduced, as the 3 results that are returned would be covered by our normal tests, so perhaps we can focus on explaining the explain tests here.

suggstion: Could you perhaps add a comment explaining the 44 and the 1 explain request expected values? Both here and in the other 2 tests in this PR. Given the size of getUserDocs and the magic of CreatePredefinedDocs it remains very hard to understand what those values should be and how they are what they are.

Created an issue for the action documentation here: #2600

added comments to tests and to predefined docs action

added comments to tests and to predefined docs action

They look good, and thanks for sorting out #2600 here too!

AndrewSisley

Can you please clarify a few things regarding the public behaviour? It is hard to understand at the moment.

AndrewSisley · 2024-05-07T12:42:31Z

tests/integration/index/query_with_relation_filter_test.go

@@ -265,7 +265,7 @@ func TestQueryWithIndexOnOneToOnePrimaryRelation_IfFilterOnIndexedFieldOfRelatio
 			},
 			testUtils.Request{
 				Request:  makeExplainQuery(req2),
-				Asserter: testUtils.NewExplainAsserter().WithFieldFetches(15).WithIndexFetches(3),
+				Asserter: testUtils.NewExplainAsserter().WithFieldFetches(33).WithIndexFetches(3),


question: Did this change because of the production code change, or because of the Playstation 5 year change?

it's because the production code changed. The planner now doesn't stop after finding first matching related secondary doc and continues until all sec. docs are exhausted.

AndrewSisley

Approving now, assuming #2592 gets picked up soon, and some comments are added for the explain asserts.

Thanks for the fix Islam, and explaining it all to me :)

## Relevant issue(s) Resolves #2590 and #2600 ## Description Make join fetch all secondary docs of a fetched-by-index primary doc.

islamaliev added bug Something isn't working area/query Related to the query component labels May 6, 2024

islamaliev requested a review from a team May 6, 2024 14:38

islamaliev self-assigned this May 6, 2024

islamaliev force-pushed the fix/sec-index-fetch-all-sec-objects branch from 8e04962 to 8e25086 Compare May 6, 2024 15:11

AndrewSisley reviewed May 6, 2024

View reviewed changes

AndrewSisley requested changes May 6, 2024

View reviewed changes

fredcarle added this to the DefraDB v0.12 milestone May 6, 2024

islamaliev requested a review from AndrewSisley May 7, 2024 08:55

AndrewSisley reviewed May 7, 2024

View reviewed changes

islamaliev requested a review from AndrewSisley May 7, 2024 22:07

islamaliev force-pushed the fix/sec-index-fetch-all-sec-objects branch from 87d6dad to 8c2648e Compare May 8, 2024 07:16

AndrewSisley approved these changes May 8, 2024

View reviewed changes

islamaliev added 3 commits May 8, 2024 22:55

Remove limit for fetching secondary docs

6c7193e

Add comments, simplify filter

d3410a3

Add more documentation to predefined docs

53290bb

islamaliev force-pushed the fix/sec-index-fetch-all-sec-objects branch from 8c2648e to 53290bb Compare May 8, 2024 20:55

Add comments to tests

2e28d67

islamaliev merged commit ed3550a into sourcenetwork:develop May 8, 2024
28 of 29 checks passed

islamaliev mentioned this pull request May 8, 2024

Expand documentation on CreatePredefinedDocs test action #2600

Closed

shahzadlone pushed a commit that referenced this pull request May 14, 2024

fix: Remove limit for fetching secondary docs (#2594)

73ad077

## Relevant issue(s) Resolves #2590 and #2600 ## Description Make join fetch all secondary docs of a fetched-by-index primary doc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Remove limit for fetching secondary docs #2594

fix: Remove limit for fetching secondary docs #2594

islamaliev commented May 6, 2024 •

edited

Loading

fredcarle commented May 6, 2024

islamaliev commented May 6, 2024

AndrewSisley May 6, 2024

islamaliev May 7, 2024

AndrewSisley May 6, 2024

islamaliev May 7, 2024

AndrewSisley May 7, 2024

islamaliev May 7, 2024

AndrewSisley May 7, 2024

islamaliev May 8, 2024

AndrewSisley May 8, 2024

islamaliev May 8, 2024

AndrewSisley May 8, 2024

islamaliev May 8, 2024 •

edited

Loading

AndrewSisley May 6, 2024 •

edited

Loading

islamaliev May 7, 2024

AndrewSisley May 7, 2024

AndrewSisley May 7, 2024

islamaliev May 7, 2024

islamaliev May 8, 2024 •

edited

Loading

AndrewSisley May 8, 2024 •

edited

Loading

AndrewSisley May 8, 2024

islamaliev May 8, 2024

AndrewSisley May 8, 2024

AndrewSisley left a comment

AndrewSisley May 7, 2024

islamaliev May 7, 2024

AndrewSisley left a comment

fix: Remove limit for fetching secondary docs #2594

fix: Remove limit for fetching secondary docs #2594

Conversation

islamaliev commented May 6, 2024 • edited Loading

Relevant issue(s)

Description

fredcarle commented May 6, 2024

islamaliev commented May 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

islamaliev May 8, 2024 • edited Loading

Choose a reason for hiding this comment

AndrewSisley May 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

islamaliev May 8, 2024 • edited Loading

Choose a reason for hiding this comment

AndrewSisley May 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley left a comment

Choose a reason for hiding this comment

islamaliev commented May 6, 2024 •

edited

Loading

islamaliev May 8, 2024 •

edited

Loading

AndrewSisley May 6, 2024 •

edited

Loading

islamaliev May 8, 2024 •

edited

Loading

AndrewSisley May 8, 2024 •

edited

Loading