-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Expand enumerable types (queue, socket) #7
Conversation
This will be used by lens within Defra.
Will be used by lens within defra
Will be used by lens-in-defra
bf8492a
to
6063ec4
Compare
func (s *enumerableConcat[T]) Next() (bool, error) { | ||
startSourceIndex := s.currentSourceIndex | ||
hasLooped := false | ||
|
||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I messed up this logic briefly and there is a strong case for adding concat
tests within this PR I think as this type is now a bit more complicated. It is tested by defra however. Let me know, otherwise I will just see how I feel once approved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have it tested within the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this Next
logic is a lot more complicated then it needs to be, and a few potential oddities, holding off specifics as you mentioned it was briefly messed up, is the current version the corrected version.
To make sure we're on the same page on the intended behavior of Concatenation
types, can you add a docstring to the interface describing the behavior of the new one. I think its straight forward, but just want to confirm, and make sure others that come across this also know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this Next logic is a lot more complicated then it needs to be
The complication involved is to preserve the order of items, so that Next does not bounce in and out of sources. We want to yield items from source A, then B, then C, then check if A has new items.
To make sure we're on the same page on the intended behavior of Concatenation types, can you add a docstring to the interface describing the behavior of the new one.
I'm not sure I understand you here. There is only one concrete Concatenation type, and one interface, and they are both doucmented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests have been added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few suggestions related to return types and a question.
func Concat[T any](sources ...Enumerable[T]) Enumerable[T] { | ||
// | ||
// New sources may be added after iteration has begun. | ||
func Concat[T any](sources ...Enumerable[T]) Concatenation[T] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Not sure if I mentioned this before but we should return concrete type when possible. This way type methods don't need to be strictly tied to the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concrete types are all private, and the new functions public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make the concrete types public then. I really think that unless an interface method needs an interface return type that we try to avoid returning interface types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make the concrete types public then. I really think that unless an interface method needs an interface return type that we try to avoid returning interface types
I am unsure as to how to express just how much I really do not want to make the concrete types public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? You don't have to make the fields public. It's much better than returning an interface type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. This are internal types, and I highly value the ability to rip and replace them, or allow for more complex constructors which may not return a single concrete type without having to worry about people directly referencing them.
These are not run of the mill business-logic types, but super generic data structures, and I am somewhat sceptical of anyone's ability to convince me to expose them in the short-medium term. I think that is very bad thing to do.
There is absolutely no reason for anyone to need to directly interact with the concrete types in this package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the interfaces (granted I'm a little interface happy compared to average Go dev 😉)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. This are internal types, and I highly value the ability to rip and replace them, or allow for more complex constructors which may not return a single concrete type without having to worry about people directly referencing them.
Returning a concrete type makes zero difference for this statement. You can still rip and replace without worrying about people directly referencing them.
There is absolutely no reason for anyone to need to directly interact with the concrete types in this package.
Again, returning a concrete type makes zero difference for this statement. All the methods relate to the concrete type and if there are no public fields the liming factor is the same.
The drawback with returning an interface is that if you want to add functionality to the type and still be able to use the constructor, you have to also change the interface which one might not want to do. It's much more flexible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the interfaces (granted I'm a little interface happy compared to average Go dev 😉)
The number of times I've hit walls because of us using interfaces for return types 😤... lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// NewQueue creates an empty FIFO queue. | ||
// | ||
// It is implemented using a dynamically sized ring-buffer. | ||
func NewQueue[T any]() Queue[T] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Same as above. Return concrete type instead of interface.
// If enumeration begins before a source has been set it will behave as if empty. | ||
// Reseting the Socket will reset the source if there is one, and then remove it | ||
// as the source of this Socket. | ||
func NewSocket[T any]() Socket[T] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Same as above. Return concrete type.
|
||
// Socket is an extention of the enumerable interface allowing the source | ||
// to be replaced after initial construction. | ||
type Socket[T any] interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Can you explain why you decided to name this type Socket
? At the moment it doesn't jibe with my current definition of a socket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A socket is a fixture into which other stuff may slot into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it confusing as it leads me to believe that it relates to network sockets. Don't you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
network sockets took the name from the same non-computing concept and I see them as being conceptually quite similar, only one is remote/cross-process, and this is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some questions on some of the design choices, no immedage "todos" at the moment, depending on the conversations, there may be followup suggestions/todos.
Marking as "Request Changes" for now just to prevent merge (sorry).
func Concat[T any](sources ...Enumerable[T]) Concatenation[T] { | ||
return &enumerableConcat[T]{ | ||
sources: sources, | ||
currentSourceIndex: 0, | ||
} | ||
} | ||
|
||
// Append appends a new source to this concatenation. | ||
// | ||
// This may be done after enumeration has begun. | ||
func (s *enumerableConcat[T]) Append(newSource Enumerable[T]) { | ||
s.sources = append(s.sources, newSource) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Little confused on the balance of having both Append
and Concat
. Where Concat
will take a list of sources, append them together, and produce a Concatentation[T]
type, which also allows additional sources to be appeneded together via Append
method.
Would it not be better to have a single Concat
or Append
top level function with no methods? This would produce a pattern similar to the existing native append
in go, where you always use the append, and the first parameter of the append is the target slice, and the rest are elements to be added. Should be easy enough to define a
EnumerableSlice
type for example, which still implements the Enumerable interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be better to have a single Concat or Append top level function with no methods?
I am not sure I understand what you mean here. Concat
is the constructor, Append
allows adding new source in to an existing item.
Append needs to live on the concat type, as only a concat type can handle this, and the stuff required for implementation are internal/private.
func (s *enumerableConcat[T]) Next() (bool, error) { | ||
startSourceIndex := s.currentSourceIndex | ||
hasLooped := false | ||
|
||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this Next
logic is a lot more complicated then it needs to be, and a few potential oddities, holding off specifics as you mentioned it was briefly messed up, is the current version the corrected version.
To make sure we're on the same page on the intended behavior of Concatenation
types, can you add a docstring to the interface describing the behavior of the new one. I think its straight forward, but just want to confirm, and make sure others that come across this also know.
|
||
// NewQueue creates an empty FIFO queue. | ||
// | ||
// It is implemented using a dynamically sized ring-buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why does a FIFO queue need to be implemented as a ring-buffer? Does this imply you can lose elements in the queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a dynamically sized ring buffer, no items can be lost. It is just a more efficienct way of implementing a queue.
enumerable/queue.go
Outdated
// For now, increasing the size one at a time is likely optimal | ||
// for the only useage of the queue type. We may wish to change | ||
// this at somepoint however. | ||
newValues := make([]T, len(q.values)+1) | ||
copy(newValues, q.values) | ||
q.values = newValues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why this is a preferred method for growing the size of the values
slice. Why not just append
it? This current method is extremely ineffecient, as it allocates and copys the whole slices every time (short of the ring buffer loop back, but I have a seperate comment above regarding the ring-buffer design).
The native append
is pretty optimized to avoid unnecessary allocations/copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The +1 is temporary-ish and is likely to change. copy
is much better to use than append
when adding more than 1 item at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy most certainly is not better when adding more than 1 item. append can append as many items as you want, and will determine the ideal allocations/copies when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are mistaken, check the benches in this blog post: https://eremeev.ca/posts/golang-copy-vs-append/
values []T | ||
currentIndex int | ||
lastSetIndex int | ||
zeroIndexSet bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: This seems to be a config/behavior flag, can you document it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not, it is internal stuff required for managing internal state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add, although I'm not sure much can be added to the declarations themselves.
- doc private internal state props
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: Great tests 👍
|
||
func (s *socket[T]) Value() (T, error) { | ||
if !s.source.HasValue() { | ||
var v T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Slight preference for var zero T
when using the default/zero value in a generic function like this. (SMALL PREFERENCE, feel free to ignore :) )
|
||
// Socket is an extention of the enumerable interface allowing the source | ||
// to be replaced after initial construction. | ||
type Socket[T any] interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Can you describe the usecase for this particular type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see this being used here: https://github.com/sourcenetwork/defradb/pull/1564/files#diff-e7f1317923231a3017c3a580d21ce39346555446638d045a69310b1a1aeab15d in the MigrateUp function. It allows enumerables to be defined before their source, and for sources to be swapped in and out.
639847a
to
599e1c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general looks good, but there are some major concerns related to performance.
@@ -1,5 +1,15 @@ | |||
package enumerable | |||
|
|||
// Concatenation is an extention of the enumerable interface allowing new sources | |||
// to be added after initial construction. | |||
type Concatenation[T any] interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I think the name does not quite reflect it's purpose.
If it's Enumerable
that can be expanded why not to call it ExpendableEnumerable
or DynamicEnumerable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expanded
or Dynamic
is way too vague for my liking as it can mean a whole bunch of stuff. And concat/concatenation is a fairly common term for this behaviour.
func (s *enumerableConcat[T]) Next() (bool, error) { | ||
startSourceIndex := s.currentSourceIndex | ||
hasLooped := false | ||
|
||
for { | ||
if s.currentSourceIndex >= len(s.sources) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I think it's worth mentioning that this scenario is only possible if Next
is called on an exhausted enumerable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem, will add
- doc if case
|
||
hasNext, err = concat.Next() | ||
require.NoError(t, err) | ||
require.False(t, hasNext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: these repetitive checks can be put in a loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of dislike that. We are testing a small number of specific items, and in some cases, they are from different sources. I see a loop as a complication and abstraction that detracts from the readability of the test.
I also thought about refactoring it out into a private test func, but that also masks the Act and Assert part of the test and I much prefer them to be highly visible.
// | ||
// This may include empty space where yield items previously resided. | ||
// Useful for testing and debugging. | ||
Size() int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why don't we have Size
for Enumerable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would mean complicating the core interface with a function that is not used in production code. And implementing it on every type, regardless as to whether it makes any sense on that type (I dont think it does for a lot of them, many just have a source
enumerable and a predicate/int/etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you just stated Andy is one of the reasons why returning the concrete type is better practice in Go. If you want a Size
when it make sense you don't need to modify the interface for it :)
enumerable/queue.go
Outdated
// For now, increasing the size one at a time is likely optimal | ||
// for the only useage of the queue type. We may wish to change | ||
// this at somepoint however. | ||
const growthRate int = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: this is not a rate. Rate should be a floating number. The better name would be growthSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, and I felt a little uncomfortable calling it rate initially but failed to come up with an alternative at the time. Size is much more accurate, thanks.
- Rename const
enumerable/queue.go
Outdated
} | ||
|
||
if index >= len(q.values) { | ||
newValues := make([]T, len(q.values)+growthRate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: why to increment capacity by 1?
This is extremely inefficient for any reasonably sized data structure.
I'd say we need to go with 1.5 ratio. Can't think of any disadvantages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is documented on the const.
type queue[T any] struct { | ||
values []T | ||
currentIndex int | ||
lastSetIndex int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: wouldn't it be more intuitive to call these two readIndex
and writeIndex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think read/write is less accurate though, especially given that the two props actually differ in terms of temporality: currentIndex
is the last index produced by a call to Next
, that may or may not be out of bounds, and may or may not have been read via Value
. lastSetIndex
has been set, was valid, and is essentially a record of the past.
Writing all that out though, I may have missed something whilst testing.
- Test post-loop Next=>Put=>Value - I'm pretty sure it is broken, and that a space between these two integers should always been maintained to prevent the premature overwrite of
Value
} else if index == q.currentIndex+growthRate { | ||
// If the write index has caught up to the read index | ||
// the new value needs to be written between the two | ||
// e.g: [3,4,here,1,2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: why not rearrange the elements?
I see now the reason why you grow the size by 1: it's easier to handle this scenario.
I would say when this happens we can rearrange from:
[3,4,1,2]
^
|
read and write index
to this:
[1,2,3,4,_,_,_]
^ ^
| |
| write index
read index
The queue is not thread-safe anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an interesting thought. Will look at this some more. I do worry that this might make an future threadsafetiness harder/more-expensive to achieve, but maybe not really anymore than allowing the buffer to grow already does.
What benefit do you see in this though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though this was the main reason why you chose (for now) to grow the array by 1. So that it's easier to manage indexes. Otherwise your read index will have to keep in mind that there are empty slots for future writing that it will have to jump over.
[1,2,_,_,_,3,4]
^ ^ ^
| | next read index
| write index
read index
But with the rearrangement you can grow it as much as you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no, the +1 is because I only want it to grow by one. The problem you just described doesnt exist, as read never crosses the gap (it is a queue, if an item is read it is consumed and never gets read again - if the read index reaches the gap it means the queue is currently empty).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I actually tried using this setup last week to help solve an edge case. It did not simplify the code unfortunately.
"github.com/stretchr/testify/require" | ||
) | ||
|
||
func TestConcatYieldsNothingGivenEmpty(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: may try the agreed upon test name structure: <category>_<condition>_<result>
?
This example would be TestConcat_IfEmpty_YieldNothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had already added tests before that conversation took place, and I'd prefer not to mix and match within a PR, and I'd also rather not rename everything. The agreement was more of a 'lets try this for a bit' agreement, not a 'lets convert the entire codebase right now' to this agreement.
enumerable/queue_test.go
Outdated
require.True(t, hasNext) | ||
|
||
r2, err := queue.Value() | ||
// [4, 5, 6, , 3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: helpful visualisation
|
||
// For now, increasing the size one at a time is likely optimal | ||
// for the only useage of the queue type. We may wish to change | ||
// this at somepoint however. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: I can't see why we would want to change it in the future and not now. (original comment #7 (comment))
Would be nice to hear some benefits of growing the size by 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because +1 fits the current use case very well, but in the future that may change as both the scope of Lens-in-Defra grows, and/or if other areas wish to use this type.
Please remember that Lens in Defra is currently as simple as it will get, and it will very likely continue to grow in complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a valid argument when developing an independent component, which the whole enumerable
is.
We should keep in mind ideally all or at least major use cases, and not only specific ones.
What if we decide (which is highly likely) to use the queue in some other component and stuff it with thousands of items?
The cost of making it more future-proof is low, so I still don't understand why we don't do it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you Islam and I'm not a fan of using copy in this situation either but for the sake of moving this PR ahead, I think we can leave it as is and if we end up using it elsewhere, we can make the change at that time.
599e1c8
to
bda585e
Compare
bda585e
to
ca35901
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Resolves #6
Expands enumerable types, adding a ring-buffer queue, and a socket, as well as allowing the Concat enumerable to receive new sources after enumeration has begun.
These are to be used by the lens-in-defra code.
You can see how they are to be used in the WIP draft here: sourcenetwork/defradb#1564 - the code in
Next
function inlens/lens.go
is the most relevant, and whilst still fairly unpolished it should be fairly stable.