Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Remove limit for fetching secondary docs #2594

Conversation

islamaliev
Copy link
Contributor

@islamaliev islamaliev commented May 6, 2024

Relevant issue(s)

Resolves #2590 and #2600

Description

Make join fetch all secondary docs of a fetched-by-index primary doc.

@islamaliev islamaliev added bug Something isn't working area/query Related to the query component labels May 6, 2024
@islamaliev islamaliev requested a review from a team May 6, 2024 14:38
@islamaliev islamaliev self-assigned this May 6, 2024
@islamaliev islamaliev force-pushed the fix/sec-index-fetch-all-sec-objects branch from 8e04962 to 8e25086 Compare May 6, 2024 15:11
@fredcarle
Copy link
Collaborator

@islamaliev are we sure the removal of the limit is the way to go? It feels like it's a bug with passing the correct limit along.

@islamaliev
Copy link
Contributor Author

@islamaliev are we sure the removal of the limit is the way to go? It feels like it's a bug with passing the correct limit along.

That particular limit was a way to differentiate between 1-to-1 and 1-to-many relations. All previous tests passed before because they happened to have (or filtered out all but) one related object, so the limit accidentally matched.
You can see from the tests how few tests I had to adjust to accommodate this change.
This needs to be solved anyway in a more intelligent way by indexing relations, which I wanted to discuss today.

@@ -216,7 +216,7 @@ func getUserDocs() predefined.DocsList {
},
{
"model": "Playstation 5",
"year": 2022,
"year": 2021,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why has this changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that this query can fetch it:

query {
  Device(filter: {
    year: {_eq: 2021}
  }) {
    model
    owner {
      name
    }
  }
}

And also to make it consistent with another "Playstation 5".
It's not used in other tests anyway.

@@ -553,3 +553,113 @@ func TestQueryWithIndexOnOneToOne_IfFilterOnIndexedRelation_ShouldFilter(t *test

testUtils.ExecuteTestCase(t, test)
}

func TestQueryWithIndexOnManyToOne_IfFilterOnIndexedField_ShouldFilterWithExplain(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Was there a behaviour change here? Before the production code changed what would the results have been?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, there was no change here. The test was passed in the beginning, but I realized that an explicit test was missing for this scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What has changed RE publicly visible production behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you ask the same question but rephrased. The answer is the same: there was no change.
Production code did not change to make this test green.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My second question was not targeted at this particular test, I should have made that clearer.

You have answered that question for this test, and the existing explain one, but what about TestQueryWithIndexOnManyToOne_IfFilterOnIndexedRelation_ShouldFilterWithExplain - does that behave the same way before and after the production code change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

primary is User, Device is secondary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not, device references user, not the other way around. Not really important though :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not important indeed, but I'd like to get the terminology straight.
We usually apply these terms to relations 'User.devices' or 'Device.owner'.
In these examples User.devices is secondary and Device.owner is primary, right? (I hope I'm not mixing it up)
For me it means is a relation is primary, it refers to primary document, as opposed to being a field of a primary document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these examples User.devices is secondary and Device.owner is primary

Yes :) The side that contains the id of the other document is the primary side - it is perhaps easier to see in one-ones.

For me it means is a relation is primary, it refers to primary document, as opposed to being a field of a primary document.

Ah I see how you are seeing it, that makes sense to me. We are just talking about different things :) The primary side of the relation is on Device, but the primary object could be seen as the User.

Copy link
Contributor Author

@islamaliev islamaliev May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means we agree that a self-sufficient object (meaning it has no references to another one) is considered primary. In our case User.
And the dependent object that has a reference (via primary relation field) to another object is considered secondary. In this case Device.

testUtils.ExecuteTestCase(t, test)
}

func TestQueryWithIndexOnManyToOne_IfFilterOnIndexedRelation_ShouldFilterWithExplain(t *testing.T) {
Copy link
Contributor

@AndrewSisley AndrewSisley May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: This test is really painful to try and read, to try and figure out what has/hasn't been filtered out I am required to read (and hold in memory) nearly 500 lines worth of json documents (without help from any comments). Can you please simplify this a little bit so that we can see what is going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what you mean by "simplify". The whole test is around 40 lines of code.
I added some comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot readily see the relationship between the (hidden) inputs, and the asserted outputs. At the moment this test is not very valuable as documentation - it only asserts that the current (obscured) behaviour does not change, it does not describe what the current behaviour is (in a way that is readable to humans).

One of the bugs that was fixed last week was leaked due to a similar failing in the tests. It was tested, however the test asserted the behaviour that was observed, not the behaviour that was desired - and because the test was impossible to read the bug was discovered by a partner instead of us during development.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole test is around 40 lines of code.

If you don't care about what data the test is querying, if you do, it is closer to 500 lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment this test is not very valuable as documentation

no tests are particularly valuable as documentation if they contain over 100 lines of code with most of them being copy&paste of one another.

it does not describe what the current behaviour is (in a way that is readable to humans).

Are you sure you are referring to the right test? The test under question initially failed, which means it wasn't an observable behaviour. In order to make it green - so that it behaves in the desired way - I had to change production code.

One of the bugs that was fixed last week was leaked due to a similar failing in the tests. It was tested, however the test asserted the behaviour that was observed, not the behaviour that was desired - and because the test was impossible to read the bug was discovered by a partner instead of us during development.

I'm not sure if you intentionally phrased it this way oversimplifying. I can give you a dozen of other factor thats contribute to it.

The test are possible to read. I would kindly ask you to write next time "it was had for me to read" so that we don't spend time arguing about subjective matter and try find a solution instead.

Copy link
Contributor Author

@islamaliev islamaliev May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think addresses and device specs are useful to this test? They are never queried. Most of the fields on the two collections used are not touched either. Anyone reading or debugging the test is forced to deal with them.

holy cow! Now I see why you are concerned so much about the setup.
But this is not how generation of predefined documents work. All the extra collections and even fields of collections that are not defined in the schema (given to SchemaUpdate) are ignored. Please refer to the readme of the package https://github.com/sourcenetwork/defradb/tree/develop/tests/predefined

Copy link
Contributor

@AndrewSisley AndrewSisley May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahahaha 😁 That does make things an awful lot better than I thought 😁

It doesn't really remove the readabilty problem though (and in some ways makes it worse, although the trade off is well worth it in order to trim the data down).

I think documentation on the test action needs to be expanded, that readme is not as useful as it could be if it were linked to from the action, with a brief 2 line summary (out of scope here, not asking for that in this PR :))

That's nice though, we can largely ignore my prior debugging concerns here :)

I think with #2592 the importance of the normal non-explain request would be greatly reduced, as the 3 results that are returned would be covered by our normal tests, so perhaps we can focus on explaining the explain tests here.

suggstion: Could you perhaps add a comment explaining the 44 and the 1 explain request expected values? Both here and in the other 2 tests in this PR. Given the size of getUserDocs and the magic of CreatePredefinedDocs it remains very hard to understand what those values should be and how they are what they are.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue for the action documentation here: #2600

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments to tests and to predefined docs action

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments to tests and to predefined docs action

They look good, and thanks for sorting out #2600 here too!

Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clarify a few things regarding the public behaviour? It is hard to understand at the moment.

@fredcarle fredcarle added this to the DefraDB v0.12 milestone May 6, 2024
@@ -265,7 +265,7 @@ func TestQueryWithIndexOnOneToOnePrimaryRelation_IfFilterOnIndexedFieldOfRelatio
},
testUtils.Request{
Request: makeExplainQuery(req2),
Asserter: testUtils.NewExplainAsserter().WithFieldFetches(15).WithIndexFetches(3),
Asserter: testUtils.NewExplainAsserter().WithFieldFetches(33).WithIndexFetches(3),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Did this change because of the production code change, or because of the Playstation 5 year change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's because the production code changed. The planner now doesn't stop after finding first matching related secondary doc and continues until all sec. docs are exhausted.

@islamaliev islamaliev force-pushed the fix/sec-index-fetch-all-sec-objects branch from 87d6dad to 8c2648e Compare May 8, 2024 07:16
Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving now, assuming #2592 gets picked up soon, and some comments are added for the explain asserts.

Thanks for the fix Islam, and explaining it all to me :)

@islamaliev islamaliev force-pushed the fix/sec-index-fetch-all-sec-objects branch from 8c2648e to 53290bb Compare May 8, 2024 20:55
@islamaliev islamaliev merged commit ed3550a into sourcenetwork:develop May 8, 2024
28 of 29 checks passed
shahzadlone pushed a commit that referenced this pull request May 14, 2024
## Relevant issue(s)

Resolves #2590 and #2600

## Description

Make join fetch all secondary docs of a fetched-by-index primary doc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/query Related to the query component bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sec. Index: query with filter on an indexed field of a primary subtype always returns only 1 doc.
3 participants