Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Internal doc restructure #471

Merged
merged 14 commits into from
Jun 27, 2022

Conversation

AndrewSisley
Copy link
Contributor

@AndrewSisley AndrewSisley commented May 26, 2022

RELEVANT ISSUE(S)

Resolves #448, #395, #390, #443, #427, #387, #484, #495

DESCRIPTION

Reworks the internal data structure used to house documents within the planner package. Solves a number of existing issues and allows for further code improvements. Automated tests should not have changed (minus a couple of tiny exceptions), and a couple have been added. More may be added as I start looking at the linked issues.

Key changes include:

  • Documents in the planner package are now []interface{}s, field location is now deterministic instead of map driven (inherently prevents the alias type bugs that drove me to do this now instead of in v0.4).
  • New mapper.Select struct and friends to replace parser.Select, this is currently constructed from parser.Select as it seems quicker to get it working that way first - the parser package could be removed later once the consuming code is more stable. This should contain all the info nodes will need, and allowed the removal of a lot of the complexity from the aggregate constuctors and hopefully type joins, and references to the ast stuff from the planner package.
  • RenderNode is no longer required and has been deleted.
  • The connor package has been pasted into our repo and has been adapted to work with the new doc structure.

Very strongly recommend reviewing commit by commit - they should be clean:

  • TEMP - enable logging caller will be dropped before merge
  • Alias internal doc object can be glossed over as it is overwritten by the restructure commit, but might be useful for context
  • Copy-paste connor fork into repo is a clean copy paste from our connor fork, no code changes were made, although a handful of build/CI related files were omitted
  • Restructure internal document data structure contains the bulk of the work and should be reviewed very carefully - it is a very large commit and pretty much all of it is 'manual' coding - please also keep an eye out for anything that looks half done

To do/confirm/discuss before merge:

  • Connor location and future. I've managed to strip away most of the changes to this package, but there are some, including a reference to []core.Doc as []core.Doc is a struct and is not handled by a switch-case []interface{}. I dont think we should change all inner collections to []*core.Doc (which would solve this), but you guys might have other opinions. UPDATE: decision made, connor will live in defra.
  • Most/all of connor's unit tests have been disabled for now, possibly depending on whether connor remains in the defra repo or not, we should decide to delete or preserve them. I am in favour of deletion, and if you guys want to keep them I would be in favour of doing that in a new issue. UPDATE: these will be deleted.
  • A handful of our unit tests have been disabled (func signatures changed). I see these as more of a liability that should be deleted, but am open to discussion here. No integration tests have been disabled (if you spot any please flag it). UPDATE: decision made, tests deleted.
  • I'll run the benchmarks locally and check to see if there any significant changes. Performance is not a focus here though, but it is a significant change and might have an impact. RESULT: No significant changes, results in another comment.
  • If connor is staying in the defra repo, cherry-pick the connor copy-paste commit into develop before merging this PR. Ensures proper credit is preserved regardless of how much it may then change.
  • Expand code-documentation for mapper-types (raised by Fred)
  • Document mapper 'entry' funcs
  • Revisit mapper 'entry' func names
  • Dont forget to remove the temp logging commit...
  • Post merge, review outline query spec and update if necessary
  • Post merge, create issue to abstract away core.Doc from external planNode interface refactor: Internal doc restructure #471 (comment)

Stuff I will be adding as I wait for reviewers (will merge as soon as properly approved, regardless of how many of these are done):

  • I'll go through any linked issues that will likely be solved by this and add tests. In some cases I might tweak the prod code if required and effort is minimal. EDIT: Issues added individually below - resolving this task early.
  • I would like to use this excuse to test out the coz profiler, will post results if I get round to this. UPDATE: I didnt get round to this
  • I would like to rework the join code. I think there can be significant improvements there making use of this stuff.
  • I would like to go through the parser package and drop any now-dead code. Is probably a fair bit in there that is not needed now, but I dont see this as a mandatory task pre-merge - feel very free to disagree with me here and ask that I do it before merging.
  • If connor is staying in the defra repo, delete the dead code and look at potential simplifications. UPDATE: This has been broken out of this PR and can be found here: refactor: Trim imported connor package  #530
  • Boost integration test coverage of filters - is really low. UPDATE: new issue for this Integration test coverage of filter conditions is really poor #511 - will do outside of this PR, but possibly before this PR is merged. UPDATE2: tests in PR - test: Restructure and expand filter tests #512

Related issues have been solved (commit per issue):

Super happy to have a call to walk through this if anyone feels the need. Is probably going to be a bit of a pain to review.

HOW HAS THIS BEEN TESTED?

Int. tests.

CHECKLIST:

  • I have commented the code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the repo-held documentation.
  • I have made sure that the PR title adheres to the conventional commit style (subset of the ones we use can be found under: tools/configs/chglog/config.yml

ENVIRONMENT / OS THIS WAS TESTED ON?

Please specify which of the following was this tested on (remove or add your own):

  • Debian Linux

@AndrewSisley AndrewSisley added bug Something isn't working feature New feature or request area/query Related to the query component area/parser Related to the parser components refactor This issue specific to or requires *notable* refactoring of existing codebases and components action/no-benchmark Skips the action that runs the benchmark. labels May 26, 2022
@AndrewSisley AndrewSisley added this to the DefraDB v0.3 milestone May 26, 2022
@AndrewSisley AndrewSisley self-assigned this May 26, 2022
@AndrewSisley AndrewSisley force-pushed the sisley/refactor/I448-doc-restructure branch 15 times, most recently from 7f24e1e to b69bf71 Compare May 27, 2022 16:06
core/doc.go Outdated Show resolved Hide resolved
core/doc.go Show resolved Hide resolved
@AndrewSisley
Copy link
Contributor Author

This cost is like max 10-15 mins more in my opinion on the odd occasion the plan graph is changed significantly. If the plan graph is changed with a small change (example: removal of renderNode) then the maintenance is a couple of seconds that would be taken to do that small change (remove that node for example, just 2 lines per test). On top of that #521 will further lessen the pain.

I disagree, and I think we are going in circles. Will leave these two deleted unless really really pushed back on.

@AndrewSisley AndrewSisley force-pushed the sisley/refactor/I448-doc-restructure branch 2 times, most recently from dac7e5d to 09b95b0 Compare June 27, 2022 19:55
@AndrewSisley AndrewSisley requested a review from jsimnz June 27, 2022 20:01
Are hard to maintain with the coming changes, and prod code should be covered by integration tests
Maintaining them is not worth the effort with connor now being covered by our integration tests, and the upcoming changes.
Work could be taken futher by creating a new mapper.relation struct, but would suggest that is out of scope here. Also removes an explain parameter as it is not useful, duplicating info already highly visible in the result whilst using an otherwise dead property.
Looks like case A in the issue was already tested, and the new test passes in develop - so I guess this was just a coverage issue, unless something else fixed it before I448.
Checked test against develop, it failed.  Was a bug and it is now fixed.
Turned out the bug wasnt really related to this PR (unless there was another bug hiding this one maybe), but I fixed it anyway.
@AndrewSisley AndrewSisley force-pushed the sisley/refactor/I448-doc-restructure branch from ebb45a0 to 3c7f82a Compare June 27, 2022 21:23
@AndrewSisley AndrewSisley merged commit 10f5fb0 into develop Jun 27, 2022
@AndrewSisley AndrewSisley deleted the sisley/refactor/I448-doc-restructure branch June 27, 2022 21:34
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
* Alias internal doc object

* Reduce Select mutation

* Remove unused return param (arb. join)

* Remove extra funcs from MultiNode

* Remove deprecated unit tests

Are hard to maintain with the coming changes, and prod code should be covered by integration tests

* Remove connor unit tests

Maintaining them is not worth the effort with connor now being covered by our integration tests, and the upcoming changes.

* Restructure internal document data structure

* Rework type join to make use of new select

Work could be taken futher by creating a new mapper.relation struct, but would suggest that is out of scope here. Also removes an explain parameter as it is not useful, duplicating info already highly visible in the result whilst using an otherwise dead property.

* Remove dead code from parser package

* I443: Expand deep group filter tests

Looks like case A in the issue was already tested, and the new test passes in develop - so I guess this was just a coverage issue, unless something else fixed it before I448.

* I387: Handle multiple aggregate sources

* I390: Handle multiple aliased joins with filters

Checked test against develop, it failed.  Was a bug and it is now fixed.

* I484: Handle filters on unrendered children

* I495: Handle version select within create

Turned out the bug wasnt really related to this PR (unless there was another bug hiding this one maybe), but I fixed it anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/parser Related to the parser components area/query Related to the query component bug Something isn't working feature New feature or request refactor This issue specific to or requires *notable* refactoring of existing codebases and components
Projects
None yet
4 participants