Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Ability to explain an executed request #1188

Merged
merged 23 commits into from
Apr 1, 2023

Conversation

shahzadlone
Copy link
Member

@shahzadlone shahzadlone commented Mar 16, 2023

Relevant issue(s)

Resolves #326

Description

Adds ability to return datapoints / information gathered at every planner step. The information is stored during execution, and gathered post execution.

Usage

Add @explain(type: execute) after the query or mutation operation.

  • Execute explain request for query operation - example:
query @explain(type: execute) {
	Address(groupBy: [country]) {
		country
		_group {
			city
		}
	}
}
  • Execute explain request for mutation operation - example:
mutation @explain(type: execute) {
	update_address(
		ids: ["bae-c8448e47-6cd1-571f-90bd-364acb80da7b"],
		data: "{\"country\": \"USA\"}"
	) {
		country
		city
	}
}

For Reviewers

  • Commits should be fairly clean and would be easy to review commit by commit.

Note

  • I wanted to add the TotalElapsedTime datapoint for the request, however, it will need to wait until the explain testing framework is integrated properly (Integrate the new explain test setup into the new test action system #1243) and we can control how we want to test the varying time datapoint.
  • The information from this execute explain graph will be dumped to a metric system outside the planner nodes, should follow in a PR after this.
  • I think we can do better than typeIndexJoin, as most chunk of the join execution stuff happens under typeJoinMany and typeJoinOne. In the future perhaps make them explainable nodes (to avoid the hacky explaining like we did for simple explain).
  • In future also split Explain() interface function into separate SimpleExplain() and ExecuteExplain() functions.
  • In future introduce a verbose flag to allow simple explain attributes inside the execute explain.

Need Feedback:

  • Wondering if we should have a verbose = false/true option to add ability to hide some results like the actual document results? Resolved: if needed can be added in a later PR.

Tasks

  • I made sure the code is well commented, particularly hard-to-understand areas.
  • I made sure the repository-held documentation is changed accordingly.
  • I made sure the pull request title adheres to the conventional commit style (the subset used in the project can be found in tools/configs/chglog/config.yml).
  • I made sure to discuss its limitations such as threats to validity, vulnerability to mistake and misuse, robustness to invalidation of assumptions, resource requirements, ...

How has this been tested?

CI running the integration tests.

Specify the platform(s) on which this was tested:

  • Wsl 2 (Manjaro)

@source-devs

This comment was marked as off-topic.

@shahzadlone shahzadlone added the action/no-benchmark Skips the action that runs the benchmark. label Mar 16, 2023
@shahzadlone shahzadlone added this to the DefraDB v0.5 milestone Mar 22, 2023
@shahzadlone shahzadlone added area/query Related to the query component area/parser Related to the parser components area/planner Related to the planner system labels Mar 22, 2023
@shahzadlone shahzadlone self-assigned this Mar 22, 2023
@shahzadlone shahzadlone force-pushed the lone/feat/execute-explain branch 2 times, most recently from d67024c to 8cd844e Compare March 23, 2023 22:05
@codecov
Copy link

codecov bot commented Mar 23, 2023

Codecov Report

Merging #1188 (5d8eb56) into develop (a261c2f) will decrease coverage by 0.05%.
The diff coverage is 74.25%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1188      +/-   ##
===========================================
- Coverage    70.17%   70.13%   -0.05%     
===========================================
  Files          184      184              
  Lines        17392    17700     +308     
===========================================
+ Hits         12205    12414     +209     
- Misses        4251     4340      +89     
- Partials       936      946      +10     
Impacted Files Coverage Δ
planner/errors.go 0.00% <0.00%> (ø)
planner/planner.go 76.84% <ø> (-0.55%) ⬇️
planner/top.go 68.53% <0.00%> (-3.01%) ⬇️
request/graphql/schema/types/types.go 100.00% <ø> (ø)
planner/explain.go 56.02% <53.33%> (-7.78%) ⬇️
planner/average.go 84.61% <83.33%> (-0.84%) ⬇️
planner/type_join.go 73.25% <84.00%> (+0.34%) ⬆️
planner/create.go 67.70% <84.61%> (+2.23%) ⬆️
planner/order.go 83.17% <84.61%> (+0.01%) ⬆️
planner/commit.go 81.72% <85.00%> (+0.10%) ⬆️
... and 9 more

... and 6 files with indirect coverage changes

@shahzadlone shahzadlone requested a review from a team March 27, 2023 19:11
@AndrewSisley
Copy link
Contributor

Wondering if we should have a verbose = false/true option to add ability to hide some results like the actual document results?
Any other datapoints that others would like to add?

IMO we can add that/those in later if we want, no need to complicate things straight off

planner/average.go Outdated Show resolved Hide resolved
planner/group.go Outdated Show resolved Hide resolved
@shahzadlone shahzadlone force-pushed the lone/feat/execute-explain branch 4 times, most recently from 02eaf3c to b5f8ed1 Compare March 29, 2023 13:29
Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going into this task I had a different structure in mind, but after looking at the concrete implementation, much like Andy had mentioned, there is some unknown issues lurking in other designs that wouldve been trying to hard to be smart and fancy about collecting runtime metrics.

I will say that the goal of doing #970 (metrics) first, was to use them here, instead of just raw "counters" as youre doing now. You mentioned in another comment that you didnt want the otel stuff leaking around, but the goal of the metrics lib is to act as the abstracted interface between the underlying metrics provider/collector, and the metrics types (guages, counters, histograms, etc). As this would be a much more powerful primitive to use as you could do more interesting collection of metrics beyond a simple increment.

However, going through the various metrics packages, there are some issues in trying to track metrics on a "per request" basis, rather than the traditional app-wide aggregates. So will likely need to look further into that. There is likely some combination of tracing and metrics that will be ideal, as tracing is great for per request tracking of info.

Theres prob more to be said, but a lot of it is useless until we come up with a more long-term design that protects the prod code a bit more. But I recognize the difficulties here.

At the moment, I think this is a good enough solution, short of the noted todos and suggestions.

planner/explain.go Outdated Show resolved Hide resolved
planner/average.go Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: The execution explain should technically also include the info from the simple explain as well as the runtime execution metrics as well.

So you wouldnt need to run them both seperately. This can technically be left as is, and we can just run two, but feels nicer to have Execute include both.

Copy link
Member Author

@shahzadlone shahzadlone Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that and I was thinking to have that as an option either under the verbose: false to hide it if someone doesn't want to see the simple attributes, or a separate flag maybe like showSimpleAttributes: true to show/hide them. I did not want to group the somewhat "static" simple attributes with the execution datapoints. Whichever approach we take, will be outside the scope of this PR

planner/explain.go Outdated Show resolved Hide resolved
@shahzadlone shahzadlone force-pushed the lone/feat/execute-explain branch 4 times, most recently from 36af49c to ad1f99c Compare March 30, 2023 19:36
@shahzadlone shahzadlone marked this pull request as ready for review March 30, 2023 19:38
@shahzadlone shahzadlone requested review from a team, jsimnz and AndrewSisley March 30, 2023 23:24
Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: I appreciated the clean commits! I found it made it easier to review.

suggestion: You do seem to be doing what I used to do when I first started using git like this though - some of the commits are too small for reviewers - for example by splitting the new tests from the commit that introduces the feature you create a disconnect that reduces the context that the reviewer has. It can make like easier for sure whilst developing, but in the future I would suggest squashing them before opening the PR so that any individual commit will contain both the production code changes and the tests for that change.

Overall the change looks good - my brain is getting a bit foggy though, so I'll have another look in the morning (I covered the core code, but mostly skimmed a bit over the node-specific metrics and the tests today).

request/graphql/schema/types/types.go Show resolved Hide resolved
@@ -100,6 +109,17 @@ func (n *averageNode) SetPlan(p planNode) { n.plan = p }

// Explain method returns a map containing all attributes of this node that
// are to be explained, subscribes / opts-in this node to be an explainablePlanNode.
func (n *averageNode) Explain() (map[string]any, error) {
return map[string]any{}, nil
func (n *averageNode) Explain(explainType request.ExplainType) (map[string]any, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: It feels a bit wasteful both in terms of execution, and maintenance to have Explain take explainType as a param, and then perform the same switch for each node. It might be nicer to instead have two (parameterless) functions on planNode ExplainSimple() and ExplainExecute() and just do the switch once in explain.go

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a good point, I think I went this way because I didn't want to add another function to this interface:

type explainablePlanNode interface {
	planNode
	Explain(explainType request.ExplainType) (map[string]any, error)
}

Which would then result in another public function on all the explainable nodes:

averageNode
countNode
createNode
dagScanNode
deleteNode
groupNode
limitNode
orderNode
scanNode
selectNode
selectTopNode
sumNode
topLevelNode
typeIndexJoin
updateNode

How strong of a preference do you have for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing too strong as it doesnt really affect anything besides the explain code - I just think it would be slightly nicer. If you prefer it as is, for sure keep it as is - you'll almost certainly be working more than me with it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the potential upside of making it 2 exlicit methods instead of one, is controlling which nodes actually need a "custom" explain implementation, and which can kinda just skate by with either no explain, or a basic wrappedExplain which just tracks some basic info that is common across all nodes (like #of invocations, results, etc..).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good points, will keep in mind to split these outside this PR when implementing debug or predict explain, keeping as is for now, unless someone has a really strong preference.

planner/explain.go Show resolved Hide resolved
EDIT: I had a merge conflict that was due to the name change of plan to
planNode. The extra line diff you see is due to that resolve.
 to the `planner/explain.go` file.
@shahzadlone shahzadlone requested review from jsimnz and a team April 1, 2023 13:26
Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm late to reviewing this PR as I had my own struggles this week with the document delete stuff but I just took the time to go through it and it's nice work Shahzad! Andy has already approved and John will certainly do so shortly so I don't feel like I need to add mine.

I like that the testing covers a wide range of possibilities and that it includes a more complex set of schemas. I also appreciate the effort to get this through. 🤘

@fredcarle fredcarle added the feature New feature or request label Apr 1, 2023
planner/group.go Outdated Show resolved Hide resolved
Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks amazing! 2 minuscule suggestions. Approving!

Really great job on all the tests, and going through my various refactors 👍

planner/select.go Outdated Show resolved Hide resolved
@shahzadlone shahzadlone merged commit b9d2e32 into develop Apr 1, 2023
@shahzadlone shahzadlone deleted the lone/feat/execute-explain branch April 1, 2023 23:18
@islamaliev
Copy link
Contributor

tested by running all queries (similar) from tests/integration/explain/default package and few more and made sure the results show meaningful values

shahzadlone added a commit that referenced this pull request Apr 13, 2023
- Resolves #326 

- Description: Adds ability to return datapoints / information gathered at every planner step. The information is stored during execution, and gathered post execution.

- Usage: Add `@explain(type: execute) ` after the `query` or `mutation` operation.

- Execute explain request for `query` operation - example:
```
query @Explain(type: execute) {
	Address(groupBy: [country]) {
		country
		_group {
			city
		}
	}
}
```

- Execute explain request for `mutation` operation - example:
```
mutation @Explain(type: execute) {
	update_address(
		ids: ["bae-c8448e47-6cd1-571f-90bd-364acb80da7b"],
		data: "{\"country\": \"USA\"}"
	) {
		country
		city
	}
}
```
shahzadlone added a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
- Resolves sourcenetwork#326 

- Description: Adds ability to return datapoints / information gathered at every planner step. The information is stored during execution, and gathered post execution.

- Usage: Add `@explain(type: execute) ` after the `query` or `mutation` operation.

- Execute explain request for `query` operation - example:
```
query @Explain(type: execute) {
	Address(groupBy: [country]) {
		country
		_group {
			city
		}
	}
}
```

- Execute explain request for `mutation` operation - example:
```
mutation @Explain(type: execute) {
	update_address(
		ids: ["bae-c8448e47-6cd1-571f-90bd-364acb80da7b"],
		data: "{\"country\": \"USA\"}"
	) {
		country
		city
	}
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/parser Related to the parser components area/planner Related to the planner system area/query Related to the query component feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explain Request - Execution
6 participants