Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvstore: simplified stmt exec #467

Merged
merged 3 commits into from
Jan 28, 2022
Merged

kvstore: simplified stmt exec #467

merged 3 commits into from
Jan 28, 2022

Conversation

arnetheduck
Copy link
Member

iterator for exec - avoid callbacks etc for simplified row reading

iterator for `exec` - avoid callbacks etc for simplified row reading
@zah
Copy link
Contributor

zah commented Jan 26, 2022

4 yield statements inside an iterator would be quite heavy when it comes to code bloat. In the generated code, each yield statement is replaced by the full body of the consuming loop and the iterator is inlined everywhere where it's used.

@arnetheduck
Copy link
Member Author

3 (one is when), but ok - I think I'll note it in a TODO though - the whole machinery for exec is quite unergonomic because it tries to be too clever with the tuple-vs-non-tuple types etc, ie I lost half a day trying to fight it - using the raw sqlite interface would have been a lot easier - I think this code needs a round of simplifications in general, and then the iterator can be improved as well

@arnetheduck
Copy link
Member Author

down to 2 with reasonable code: 2079341

@zah
Copy link
Contributor

zah commented Jan 27, 2022

If I understand correctly, the benefits of providing this API are rather superficial (some people will like the syntax better), but there are objective technical downsides (code bloat due to inlining).

Were there any problems with the fact that the existing API is based on closures (e.g. were you running into limitations regarding captured variables)?

@arnetheduck
Copy link
Member Author

arnetheduck commented Jan 27, 2022

Were there any problems with the fact that the existing API is based on closures (e.g. were you running into limitations regarding captured variables)?

yes, var parameters need ugly workarounds which means you can't update an existing instance which leads to a host of downstream problems - nim introduces an uncontrolled amount of temporaries along the way which becomes significant when dealing with lots of data - we use pointers in some places, but that's ugly and the compiler is too dumb to understand that a closure is local and therefore really should inline itself (like a function operator in c++ / not create an environment.

regarding bloat, I have sympathy for not introducing unnecessary bloat, but I have more sympathy for readability in general - I think some of your fears about bloat miss the bigger picture a bt: if we're going to combat bloat, we need to look at the real costs, and as we've discussed with actual numbers before, macros (async) and generics (ie Table, serialization) are the actual real sources of bloat - not a yield here and there used in limited contexts - but we use them anyway because of the readability and reasonability benefits they bring.

iterators are really really sad already in nim - if we take away the ability to yield more than once, we might as well remove them entirely from our codebase because at that point, the mental overhead of keeping them in short-term active language feature memory is way too large, and we should advocate for their complete removal from Nim.

finally, compilers are really good at removing bloat through CSE in cases like this when there there is a boolean choice between the branches (works / doesn't work) - the actual bloat in the final binary will be nonexistent.

@mratsim
Copy link
Contributor

mratsim commented Jan 28, 2022

yes, var parameters need ugly workarounds which means you can't update an existing instance which leads to a host of downstream problems - nim introduces an uncontrolled amount of temporaries along the way which becomes significant when dealing with lots of data - we use pointers in some places, but that's ugly and the compiler is too dumb to understand that a closure is local and therefore really should inline itself (like a function operator in c++ / not create an environment.

I've mentioned to @Araq a couple of times that having something akin to "Heap Allocation Elision" would be quite important, both for async with a future that doesn't escape its scope (and multithreading with Flowvar) and for chained closure iterators:

I can open a RFC.

See: https://discord.com/channels/371759389889003530/371759389889003532/809802135629856808

I don't mind heap elision optimizations and we have the technology ready for it (it's not that different from escape analysis, deep immutability or anything we already do)

@arnetheduck
Copy link
Member Author

I've mentioned to @Araq a couple of times that having something akin to "Heap Allocation Elision" would be quite important, both for async with a future that doesn't escape its scope (and multithreading with Flowvar) and for chained closure iterators:

this is not quite relevant here: the reason to pass in a var is not necessarily to avoid allocations or to avoid closures, but because desired semantics dictate that we modify an existing instance, and doing that with captures is not .. safe / possible / bla bla..

the heap allocation is of less importance generally, even though avoiding it would be nice (for orthogonal reasons)

@zah zah merged commit ce4acc1 into master Jan 28, 2022
@zah zah deleted the exec-iterator branch January 28, 2022 13:23
@zah
Copy link
Contributor

zah commented Jan 28, 2022

When you author code, there are many layers of knowledge that you take into account (algorithmic, hardware, compilation theory, etc). Claiming that some of this knowledge is too much overhead, while other parts are acceptable is unreasonable IMO.

Iterators should be used whenever this makes sense. If there are simple ways to optimize them, they should be exploited. If you weighting the two sides on a decision like "should I use a function or an iterator", the size of the body of the iterator and the number of yield statements should just add weight towards selecting the function approach. Technically, it's possible to implement the iterators with multiple code generation strategies in the compiler (e.g. you can pass the body as a closure), so perhaps the code bloat concerns can be addressed in a systematic way in the future without requiring much rework of the code.

@arnetheduck
Copy link
Member Author

Claiming that some of this knowledge is too much overhead, while other parts are acceptable is unreasonable IMO.

price vs performance - if a feature has so many caveats that it becomes an exception that you manage to use it correctly, it's the feature that is wrong, not the developer.

@arnetheduck
Copy link
Member Author

so perhaps the code bloat concerns can be addressed in a systematic way in the future without requiring much rework of the code.

yes, this is what I would hope for, and as long as there's no measurable downside, it's good to write code as naturally as possible so as to give the right priority signals to the compiler devs: maybe it's an imaginary downside or maybe it's really prevalent and important, and sometimes, you just don't know until you try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants