Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Expand enumerable types (queue, socket) #7

Merged
merged 8 commits into from
Jun 30, 2023

Conversation

AndrewSisley
Copy link
Collaborator

@AndrewSisley AndrewSisley commented Jun 9, 2023

Resolves #6

Expands enumerable types, adding a ring-buffer queue, and a socket, as well as allowing the Concat enumerable to receive new sources after enumeration has begun.

These are to be used by the lens-in-defra code.

You can see how they are to be used in the WIP draft here: sourcenetwork/defradb#1564 - the code in Next function in lens/lens.go is the most relevant, and whilst still fairly unpolished it should be fairly stable.

This will be used by lens within Defra.
Will be used by lens within defra
Will be used by lens-in-defra
func (s *enumerableConcat[T]) Next() (bool, error) {
startSourceIndex := s.currentSourceIndex
hasLooped := false

for {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I messed up this logic briefly and there is a strong case for adding concat tests within this PR I think as this type is now a bit more complicated. It is tested by defra however. Let me know, otherwise I will just see how I feel once approved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have it tested within the repo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this Next logic is a lot more complicated then it needs to be, and a few potential oddities, holding off specifics as you mentioned it was briefly messed up, is the current version the corrected version.

To make sure we're on the same page on the intended behavior of Concatenation types, can you add a docstring to the interface describing the behavior of the new one. I think its straight forward, but just want to confirm, and make sure others that come across this also know.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this Next logic is a lot more complicated then it needs to be

The complication involved is to preserve the order of items, so that Next does not bounce in and out of sources. We want to yield items from source A, then B, then C, then check if A has new items.

To make sure we're on the same page on the intended behavior of Concatenation types, can you add a docstring to the interface describing the behavior of the new one.

I'm not sure I understand you here. There is only one concrete Concatenation type, and one interface, and they are both doucmented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests have been added

Copy link
Contributor

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few suggestions related to return types and a question.

func Concat[T any](sources ...Enumerable[T]) Enumerable[T] {
//
// New sources may be added after iteration has begun.
func Concat[T any](sources ...Enumerable[T]) Concatenation[T] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Not sure if I mentioned this before but we should return concrete type when possible. This way type methods don't need to be strictly tied to the interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concrete types are all private, and the new functions public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make the concrete types public then. I really think that unless an interface method needs an interface return type that we try to avoid returning interface types

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make the concrete types public then. I really think that unless an interface method needs an interface return type that we try to avoid returning interface types

I am unsure as to how to express just how much I really do not want to make the concrete types public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? You don't have to make the fields public. It's much better than returning an interface type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. This are internal types, and I highly value the ability to rip and replace them, or allow for more complex constructors which may not return a single concrete type without having to worry about people directly referencing them.

These are not run of the mill business-logic types, but super generic data structures, and I am somewhat sceptical of anyone's ability to convince me to expose them in the short-medium term. I think that is very bad thing to do.

There is absolutely no reason for anyone to need to directly interact with the concrete types in this package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the interfaces (granted I'm a little interface happy compared to average Go dev 😉)

Copy link
Contributor

@fredcarle fredcarle Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. This are internal types, and I highly value the ability to rip and replace them, or allow for more complex constructors which may not return a single concrete type without having to worry about people directly referencing them.

Returning a concrete type makes zero difference for this statement. You can still rip and replace without worrying about people directly referencing them.

There is absolutely no reason for anyone to need to directly interact with the concrete types in this package.

Again, returning a concrete type makes zero difference for this statement. All the methods relate to the concrete type and if there are no public fields the liming factor is the same.

The drawback with returning an interface is that if you want to add functionality to the type and still be able to use the constructor, you have to also change the interface which one might not want to do. It's much more flexible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the interfaces (granted I'm a little interface happy compared to average Go dev 😉)

The number of times I've hit walls because of us using interfaces for return types 😤... lol

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// NewQueue creates an empty FIFO queue.
//
// It is implemented using a dynamically sized ring-buffer.
func NewQueue[T any]() Queue[T] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Same as above. Return concrete type instead of interface.

// If enumeration begins before a source has been set it will behave as if empty.
// Reseting the Socket will reset the source if there is one, and then remove it
// as the source of this Socket.
func NewSocket[T any]() Socket[T] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Same as above. Return concrete type.


// Socket is an extention of the enumerable interface allowing the source
// to be replaced after initial construction.
type Socket[T any] interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Can you explain why you decided to name this type Socket? At the moment it doesn't jibe with my current definition of a socket.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A socket is a fixture into which other stuff may slot into.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it confusing as it leads me to believe that it relates to network sockets. Don't you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

network sockets took the name from the same non-computing concept and I see them as being conceptually quite similar, only one is remote/cross-process, and this is not.

Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some questions on some of the design choices, no immedage "todos" at the moment, depending on the conversations, there may be followup suggestions/todos.

Marking as "Request Changes" for now just to prevent merge (sorry).

Comment on lines +23 to +35
func Concat[T any](sources ...Enumerable[T]) Concatenation[T] {
return &enumerableConcat[T]{
sources: sources,
currentSourceIndex: 0,
}
}

// Append appends a new source to this concatenation.
//
// This may be done after enumeration has begun.
func (s *enumerableConcat[T]) Append(newSource Enumerable[T]) {
s.sources = append(s.sources, newSource)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Little confused on the balance of having both Append and Concat. Where Concat will take a list of sources, append them together, and produce a Concatentation[T] type, which also allows additional sources to be appeneded together via Append method.

Would it not be better to have a single Concat or Append top level function with no methods? This would produce a pattern similar to the existing native append in go, where you always use the append, and the first parameter of the append is the target slice, and the rest are elements to be added. Should be easy enough to define a

EnumerableSlice type for example, which still implements the Enumerable interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be better to have a single Concat or Append top level function with no methods?

I am not sure I understand what you mean here. Concat is the constructor, Append allows adding new source in to an existing item.

Append needs to live on the concat type, as only a concat type can handle this, and the stuff required for implementation are internal/private.

func (s *enumerableConcat[T]) Next() (bool, error) {
startSourceIndex := s.currentSourceIndex
hasLooped := false

for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this Next logic is a lot more complicated then it needs to be, and a few potential oddities, holding off specifics as you mentioned it was briefly messed up, is the current version the corrected version.

To make sure we're on the same page on the intended behavior of Concatenation types, can you add a docstring to the interface describing the behavior of the new one. I think its straight forward, but just want to confirm, and make sure others that come across this also know.


// NewQueue creates an empty FIFO queue.
//
// It is implemented using a dynamically sized ring-buffer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why does a FIFO queue need to be implemented as a ring-buffer? Does this imply you can lose elements in the queue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a dynamically sized ring buffer, no items can be lost. It is just a more efficienct way of implementing a queue.

Comment on lines 51 to 56
// For now, increasing the size one at a time is likely optimal
// for the only useage of the queue type. We may wish to change
// this at somepoint however.
newValues := make([]T, len(q.values)+1)
copy(newValues, q.values)
q.values = newValues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this is a preferred method for growing the size of the values slice. Why not just append it? This current method is extremely ineffecient, as it allocates and copys the whole slices every time (short of the ring buffer loop back, but I have a seperate comment above regarding the ring-buffer design).

The native append is pretty optimized to avoid unnecessary allocations/copies.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The +1 is temporary-ish and is likely to change. copy is much better to use than append when adding more than 1 item at a time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy most certainly is not better when adding more than 1 item. append can append as many items as you want, and will determine the ideal allocations/copies when needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are mistaken, check the benches in this blog post: https://eremeev.ca/posts/golang-copy-vs-append/

values []T
currentIndex int
lastSetIndex int
zeroIndexSet bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: This seems to be a config/behavior flag, can you document it :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not, it is internal stuff required for managing internal state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentation would be helpful

Copy link
Collaborator Author

@AndrewSisley AndrewSisley Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add, although I'm not sure much can be added to the declarations themselves.

  • doc private internal state props

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Great tests 👍


func (s *socket[T]) Value() (T, error) {
if !s.source.HasValue() {
var v T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Slight preference for var zero T when using the default/zero value in a generic function like this. (SMALL PREFERENCE, feel free to ignore :) )


// Socket is an extention of the enumerable interface allowing the source
// to be replaced after initial construction.
type Socket[T any] interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Can you describe the usecase for this particular type?

Copy link
Collaborator Author

@AndrewSisley AndrewSisley Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see this being used here: https://github.com/sourcenetwork/defradb/pull/1564/files#diff-e7f1317923231a3017c3a580d21ce39346555446638d045a69310b1a1aeab15d in the MigrateUp function. It allows enumerables to be defined before their source, and for sources to be swapped in and out.

@AndrewSisley AndrewSisley force-pushed the sisley/6-concat-fanciness branch 4 times, most recently from 639847a to 599e1c8 Compare June 20, 2023 16:53
Copy link
Contributor

@islamaliev islamaliev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general looks good, but there are some major concerns related to performance.

@@ -1,5 +1,15 @@
package enumerable

// Concatenation is an extention of the enumerable interface allowing new sources
// to be added after initial construction.
type Concatenation[T any] interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I think the name does not quite reflect it's purpose.
If it's Enumerable that can be expanded why not to call it ExpendableEnumerable or DynamicEnumerable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expanded or Dynamic is way too vague for my liking as it can mean a whole bunch of stuff. And concat/concatenation is a fairly common term for this behaviour.

func (s *enumerableConcat[T]) Next() (bool, error) {
startSourceIndex := s.currentSourceIndex
hasLooped := false

for {
if s.currentSourceIndex >= len(s.sources) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I think it's worth mentioning that this scenario is only possible if Next is called on an exhausted enumerable.

Copy link
Collaborator Author

@AndrewSisley AndrewSisley Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem, will add

  • doc if case


hasNext, err = concat.Next()
require.NoError(t, err)
require.False(t, hasNext)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: these repetitive checks can be put in a loop

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of dislike that. We are testing a small number of specific items, and in some cases, they are from different sources. I see a loop as a complication and abstraction that detracts from the readability of the test.

I also thought about refactoring it out into a private test func, but that also masks the Act and Assert part of the test and I much prefer them to be highly visible.

//
// This may include empty space where yield items previously resided.
// Useful for testing and debugging.
Size() int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why don't we have Size for Enumerable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would mean complicating the core interface with a function that is not used in production code. And implementing it on every type, regardless as to whether it makes any sense on that type (I dont think it does for a lot of them, many just have a source enumerable and a predicate/int/etc).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you just stated Andy is one of the reasons why returning the concrete type is better practice in Go. If you want a Size when it make sense you don't need to modify the interface for it :)

// For now, increasing the size one at a time is likely optimal
// for the only useage of the queue type. We may wish to change
// this at somepoint however.
const growthRate int = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: this is not a rate. Rate should be a floating number. The better name would be growthSize

Copy link
Collaborator Author

@AndrewSisley AndrewSisley Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and I felt a little uncomfortable calling it rate initially but failed to come up with an alternative at the time. Size is much more accurate, thanks.

  • Rename const

}

if index >= len(q.values) {
newValues := make([]T, len(q.values)+growthRate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: why to increment capacity by 1?

This is extremely inefficient for any reasonably sized data structure.
I'd say we need to go with 1.5 ratio. Can't think of any disadvantages

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is documented on the const.

type queue[T any] struct {
values []T
currentIndex int
lastSetIndex int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: wouldn't it be more intuitive to call these two readIndex and writeIndex?

Copy link
Collaborator Author

@AndrewSisley AndrewSisley Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think read/write is less accurate though, especially given that the two props actually differ in terms of temporality: currentIndex is the last index produced by a call to Next, that may or may not be out of bounds, and may or may not have been read via Value. lastSetIndex has been set, was valid, and is essentially a record of the past.

Writing all that out though, I may have missed something whilst testing.

  • Test post-loop Next=>Put=>Value - I'm pretty sure it is broken, and that a space between these two integers should always been maintained to prevent the premature overwrite of Value

} else if index == q.currentIndex+growthRate {
// If the write index has caught up to the read index
// the new value needs to be written between the two
// e.g: [3,4,here,1,2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why not rearrange the elements?

I see now the reason why you grow the size by 1: it's easier to handle this scenario.
I would say when this happens we can rearrange from:

[3,4,1,2]
     ^
     |
     read and write index

to this:

[1,2,3,4,_,_,_]
 ^       ^
 |       |
 |       write index
 read index

The queue is not thread-safe anyway

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an interesting thought. Will look at this some more. I do worry that this might make an future threadsafetiness harder/more-expensive to achieve, but maybe not really anymore than allowing the buffer to grow already does.

What benefit do you see in this though?

Copy link
Contributor

@islamaliev islamaliev Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though this was the main reason why you chose (for now) to grow the array by 1. So that it's easier to manage indexes. Otherwise your read index will have to keep in mind that there are empty slots for future writing that it will have to jump over.

[1,2,_,_,_,3,4]
   ^ ^     ^
   | |     next read index
   | write index
   read index

But with the rearrangement you can grow it as much as you want.

Copy link
Collaborator Author

@AndrewSisley AndrewSisley Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, the +1 is because I only want it to grow by one. The problem you just described doesnt exist, as read never crosses the gap (it is a queue, if an item is read it is consumed and never gets read again - if the read index reaches the gap it means the queue is currently empty).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I actually tried using this setup last week to help solve an edge case. It did not simplify the code unfortunately.

"github.com/stretchr/testify/require"
)

func TestConcatYieldsNothingGivenEmpty(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: may try the agreed upon test name structure: <category>_<condition>_<result>?

This example would be TestConcat_IfEmpty_YieldNothing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had already added tests before that conversation took place, and I'd prefer not to mix and match within a PR, and I'd also rather not rename everything. The agreement was more of a 'lets try this for a bit' agreement, not a 'lets convert the entire codebase right now' to this agreement.

require.True(t, hasNext)

r2, err := queue.Value()
// [4, 5, 6, , 3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: helpful visualisation


// For now, increasing the size one at a time is likely optimal
// for the only useage of the queue type. We may wish to change
// this at somepoint however.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I can't see why we would want to change it in the future and not now. (original comment #7 (comment))

Would be nice to hear some benefits of growing the size by 1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because +1 fits the current use case very well, but in the future that may change as both the scope of Lens-in-Defra grows, and/or if other areas wish to use this type.

Please remember that Lens in Defra is currently as simple as it will get, and it will very likely continue to grow in complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a valid argument when developing an independent component, which the whole enumerable is.
We should keep in mind ideally all or at least major use cases, and not only specific ones.
What if we decide (which is highly likely) to use the queue in some other component and stuff it with thousands of items?
The cost of making it more future-proof is low, so I still don't understand why we don't do it now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you Islam and I'm not a fan of using copy in this situation either but for the sake of moving this PR ahead, I think we can leave it as is and if we end up using it elsewhere, we can make the change at that time.

Copy link
Contributor

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AndrewSisley AndrewSisley merged commit 65c38cd into main Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expand Concat capability, and add Input pipe/queue
4 participants