Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Expand enumerable types (queue, socket) #7

Merged
merged 8 commits into from
Jun 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 39 additions & 3 deletions enumerable/concat.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
package enumerable

// Concatenation is an extention of the enumerable interface allowing new sources
// to be added after initial construction.
type Concatenation[T any] interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I think the name does not quite reflect it's purpose.
If it's Enumerable that can be expanded why not to call it ExpendableEnumerable or DynamicEnumerable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expanded or Dynamic is way too vague for my liking as it can mean a whole bunch of stuff. And concat/concatenation is a fairly common term for this behaviour.

Enumerable[T]
// Append appends a new source to this concatenation.
//
// This may be done after enumeration has begun.
Append(Enumerable[T])
}

type enumerableConcat[T any] struct {
sources []Enumerable[T]
currentSourceIndex int
Expand All @@ -8,29 +18,55 @@ type enumerableConcat[T any] struct {
// Concat takes zero to many source `Ènumerable`s and stacks them on top
// of each other, resulting in one enumerable that will iterate through all
// the values in all of the given sources.
func Concat[T any](sources ...Enumerable[T]) Enumerable[T] {
//
// New sources may be added after iteration has begun.
func Concat[T any](sources ...Enumerable[T]) Concatenation[T] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Not sure if I mentioned this before but we should return concrete type when possible. This way type methods don't need to be strictly tied to the interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concrete types are all private, and the new functions public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make the concrete types public then. I really think that unless an interface method needs an interface return type that we try to avoid returning interface types

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make the concrete types public then. I really think that unless an interface method needs an interface return type that we try to avoid returning interface types

I am unsure as to how to express just how much I really do not want to make the concrete types public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? You don't have to make the fields public. It's much better than returning an interface type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. This are internal types, and I highly value the ability to rip and replace them, or allow for more complex constructors which may not return a single concrete type without having to worry about people directly referencing them.

These are not run of the mill business-logic types, but super generic data structures, and I am somewhat sceptical of anyone's ability to convince me to expose them in the short-medium term. I think that is very bad thing to do.

There is absolutely no reason for anyone to need to directly interact with the concrete types in this package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the interfaces (granted I'm a little interface happy compared to average Go dev 😉)

Copy link
Contributor

@fredcarle fredcarle Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. This are internal types, and I highly value the ability to rip and replace them, or allow for more complex constructors which may not return a single concrete type without having to worry about people directly referencing them.

Returning a concrete type makes zero difference for this statement. You can still rip and replace without worrying about people directly referencing them.

There is absolutely no reason for anyone to need to directly interact with the concrete types in this package.

Again, returning a concrete type makes zero difference for this statement. All the methods relate to the concrete type and if there are no public fields the liming factor is the same.

The drawback with returning an interface is that if you want to add functionality to the type and still be able to use the constructor, you have to also change the interface which one might not want to do. It's much more flexible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the interfaces (granted I'm a little interface happy compared to average Go dev 😉)

The number of times I've hit walls because of us using interfaces for return types 😤... lol

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return &enumerableConcat[T]{
sources: sources,
currentSourceIndex: 0,
}
}

// Append appends a new source to this concatenation.
//
// This may be done after enumeration has begun.
func (s *enumerableConcat[T]) Append(newSource Enumerable[T]) {
s.sources = append(s.sources, newSource)
}
Comment on lines +23 to +35
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Little confused on the balance of having both Append and Concat. Where Concat will take a list of sources, append them together, and produce a Concatentation[T] type, which also allows additional sources to be appeneded together via Append method.

Would it not be better to have a single Concat or Append top level function with no methods? This would produce a pattern similar to the existing native append in go, where you always use the append, and the first parameter of the append is the target slice, and the rest are elements to be added. Should be easy enough to define a

EnumerableSlice type for example, which still implements the Enumerable interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be better to have a single Concat or Append top level function with no methods?

I am not sure I understand what you mean here. Concat is the constructor, Append allows adding new source in to an existing item.

Append needs to live on the concat type, as only a concat type can handle this, and the stuff required for implementation are internal/private.


func (s *enumerableConcat[T]) Next() (bool, error) {
startSourceIndex := s.currentSourceIndex
hasLooped := false

for {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I messed up this logic briefly and there is a strong case for adding concat tests within this PR I think as this type is now a bit more complicated. It is tested by defra however. Let me know, otherwise I will just see how I feel once approved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have it tested within the repo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this Next logic is a lot more complicated then it needs to be, and a few potential oddities, holding off specifics as you mentioned it was briefly messed up, is the current version the corrected version.

To make sure we're on the same page on the intended behavior of Concatenation types, can you add a docstring to the interface describing the behavior of the new one. I think its straight forward, but just want to confirm, and make sure others that come across this also know.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this Next logic is a lot more complicated then it needs to be

The complication involved is to preserve the order of items, so that Next does not bounce in and out of sources. We want to yield items from source A, then B, then C, then check if A has new items.

To make sure we're on the same page on the intended behavior of Concatenation types, can you add a docstring to the interface describing the behavior of the new one.

I'm not sure I understand you here. There is only one concrete Concatenation type, and one interface, and they are both doucmented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests have been added

// If we have reached the end of the sources slice we need to loop
// back to the beginning. It may be that earlier sources have gained
// items whilst we iterated though later sources.
if s.currentSourceIndex >= len(s.sources) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I think it's worth mentioning that this scenario is only possible if Next is called on an exhausted enumerable.

Copy link
Collaborator Author

@AndrewSisley AndrewSisley Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem, will add

  • doc if case

return false, nil
if len(s.sources) < 1 || hasLooped {
return false, nil
}
s.currentSourceIndex = 0
hasLooped = true
}

currentSource := s.sources[s.currentSourceIndex]
hasValue, err := currentSource.Next()
if err != nil {
return false, nil
return false, err
}
if hasValue {
return true, nil
}

s.currentSourceIndex += 1

if s.currentSourceIndex == startSourceIndex {
// If we are here it means that we have re-cycled
// all the way through the source slice and have found
// no new items.
return false, nil
}
}
}

Expand Down
130 changes: 130 additions & 0 deletions enumerable/concat_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
package enumerable

import (
"testing"

"github.com/stretchr/testify/require"
)

func TestConcatYieldsNothingGivenEmpty(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: may try the agreed upon test name structure: <category>_<condition>_<result>?

This example would be TestConcat_IfEmpty_YieldNothing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had already added tests before that conversation took place, and I'd prefer not to mix and match within a PR, and I'd also rather not rename everything. The agreement was more of a 'lets try this for a bit' agreement, not a 'lets convert the entire codebase right now' to this agreement.

concat := Concat[int]()

hasNext, err := concat.Next()
require.NoError(t, err)
require.False(t, hasNext)
}

func TestConcatYieldsItemsFromSource(t *testing.T) {
v1 := 1
v2 := 2
v3 := 3
source1 := New([]int{v1, v2, v3})

concat := Concat(source1)

hasNext, err := concat.Next()
require.NoError(t, err)
require.True(t, hasNext)

r1, err := concat.Value()
require.NoError(t, err)
require.Equal(t, v1, r1)

hasNext, err = concat.Next()
require.NoError(t, err)
require.True(t, hasNext)

r2, err := concat.Value()
require.NoError(t, err)
require.Equal(t, v2, r2)

hasNext, err = concat.Next()
require.NoError(t, err)
require.True(t, hasNext)

r3, err := concat.Value()
require.NoError(t, err)
require.Equal(t, v3, r3)

hasNext, err = concat.Next()
require.NoError(t, err)
require.False(t, hasNext)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: these repetitive checks can be put in a loop

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of dislike that. We are testing a small number of specific items, and in some cases, they are from different sources. I see a loop as a complication and abstraction that detracts from the readability of the test.

I also thought about refactoring it out into a private test func, but that also masks the Act and Assert part of the test and I much prefer them to be highly visible.

}

func TestConcatYieldsItemsFromSourceInOrder(t *testing.T) {
v1 := 1
v2 := 2
v3 := 3
v4 := 4
v5 := 5
v6 := 6
source1 := NewQueue[int]()
var s1 Enumerable[int] = source1
source2 := New([]int{v1, v2, v3})
source3 := New([]int{v4, v5})

concat := Concat(s1, source2, source3)

// Start yielding *before* source1 has any items
hasNext, err := concat.Next()
require.NoError(t, err)
require.True(t, hasNext)

r1, err := concat.Value()
require.NoError(t, err)
require.Equal(t, v1, r1)

// Put an item into source1
err = source1.Put(v6)
require.NoError(t, err)

// Assert that the yielding of items from source2 is
// not interupted by source1 recieving items
hasNext, err = concat.Next()
require.NoError(t, err)
require.True(t, hasNext)

r2, err := concat.Value()
require.NoError(t, err)
require.Equal(t, v2, r2)

hasNext, err = concat.Next()
require.NoError(t, err)
require.True(t, hasNext)

r3, err := concat.Value()
require.NoError(t, err)
require.Equal(t, v3, r3)

hasNext, err = concat.Next()
require.NoError(t, err)
require.True(t, hasNext)

// Assert that source3's items are yielded after
// source2's
r4, err := concat.Value()
require.NoError(t, err)
require.Equal(t, v4, r4)

hasNext, err = concat.Next()
require.NoError(t, err)
require.True(t, hasNext)

r5, err := concat.Value()
require.NoError(t, err)
require.Equal(t, v5, r5)

// Then assert that source1's items are yielded
// as the concat circles back round
hasNext, err = concat.Next()
require.NoError(t, err)
require.True(t, hasNext)

r6, err := concat.Value()
require.NoError(t, err)
require.Equal(t, v6, r6)

hasNext, err = concat.Next()
require.NoError(t, err)
require.False(t, hasNext)
}
154 changes: 154 additions & 0 deletions enumerable/queue.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
package enumerable

// Queue is an extention of the enumerable interface allowing individual
// items to be added into the enumerable.
//
// Added items will be yielded in a FIFO order. Items may be added after
// enumeration has begun.
type Queue[T any] interface {
Enumerable[T]
// Put adds an item to the queue.
Put(T) error
// Size returns the current length of the backing array.
//
// This may include empty space where yield items previously resided.
// Useful for testing and debugging.
Size() int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why don't we have Size for Enumerable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would mean complicating the core interface with a function that is not used in production code. And implementing it on every type, regardless as to whether it makes any sense on that type (I dont think it does for a lot of them, many just have a source enumerable and a predicate/int/etc).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you just stated Andy is one of the reasons why returning the concrete type is better practice in Go. If you want a Size when it make sense you don't need to modify the interface for it :)

}

// For now, increasing the size one at a time is likely optimal
// for the only useage of the queue type. We may wish to change
// this at somepoint however.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I can't see why we would want to change it in the future and not now. (original comment #7 (comment))

Would be nice to hear some benefits of growing the size by 1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because +1 fits the current use case very well, but in the future that may change as both the scope of Lens-in-Defra grows, and/or if other areas wish to use this type.

Please remember that Lens in Defra is currently as simple as it will get, and it will very likely continue to grow in complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a valid argument when developing an independent component, which the whole enumerable is.
We should keep in mind ideally all or at least major use cases, and not only specific ones.
What if we decide (which is highly likely) to use the queue in some other component and stuff it with thousands of items?
The cost of making it more future-proof is low, so I still don't understand why we don't do it now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you Islam and I'm not a fan of using copy in this situation either but for the sake of moving this PR ahead, I think we can leave it as is and if we end up using it elsewhere, we can make the change at that time.

const growthSize int = 1

type queue[T any] struct {
// The values slice of this queue.
//
// Note: queue is implementated as a dynamically sized ring buffer, the zero index
// is not nessecarily the next/current value. Also note that values are not explicitly
// removed from this slice, which values are still 'in' the queue is tracked by index.
values []T

// The index of the current value.
currentIndex int

// The index of the last value added to the queue.
lastSetIndex int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: wouldn't it be more intuitive to call these two readIndex and writeIndex?

Copy link
Collaborator Author

@AndrewSisley AndrewSisley Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think read/write is less accurate though, especially given that the two props actually differ in terms of temporality: currentIndex is the last index produced by a call to Next, that may or may not be out of bounds, and may or may not have been read via Value. lastSetIndex has been set, was valid, and is essentially a record of the past.

Writing all that out though, I may have missed something whilst testing.

  • Test post-loop Next=>Put=>Value - I'm pretty sure it is broken, and that a space between these two integers should always been maintained to prevent the premature overwrite of Value


// Will be true if values[0] has been set.
zeroIndexSet bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: This seems to be a config/behavior flag, can you document it :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not, it is internal stuff required for managing internal state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentation would be helpful

Copy link
Collaborator Author

@AndrewSisley AndrewSisley Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add, although I'm not sure much can be added to the declarations themselves.

  • doc private internal state props


// Will be true a value has been attempted to be read from an empty queue.
waitingForWrite bool
}

var _ Queue[any] = (*queue[any])(nil)

// NewQueue creates an empty FIFO queue.
//
// It is implemented using a dynamically sized ring-buffer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why does a FIFO queue need to be implemented as a ring-buffer? Does this imply you can lose elements in the queue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a dynamically sized ring buffer, no items can be lost. It is just a more efficienct way of implementing a queue.

func NewQueue[T any]() Queue[T] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Same as above. Return concrete type instead of interface.

return &queue[T]{
values: []T{},
currentIndex: -1,
lastSetIndex: -1,
}
}

func (q *queue[T]) Put(value T) error {
index := q.lastSetIndex + 1

if index >= len(q.values) {
if len(q.values) == 0 {
q.values = make([]T, growthSize)
q.currentIndex = -1
} else if q.zeroIndexSet {
// If the zero index is occupied, we cannot loop back to it here
// and instead need to grow the values slice.
newValues := make([]T, len(q.values)+growthSize)
copy(newValues, q.values[:index])
q.values = newValues
} else {
index = 0
if q.currentIndex >= len(q.values) {
q.currentIndex = -1
}
}
} else if index == q.currentIndex {
// If the write index has caught up to the read index
// the new value needs to be written between the two
// e.g: [3,4,here,1,2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why not rearrange the elements?

I see now the reason why you grow the size by 1: it's easier to handle this scenario.
I would say when this happens we can rearrange from:

[3,4,1,2]
     ^
     |
     read and write index

to this:

[1,2,3,4,_,_,_]
 ^       ^
 |       |
 |       write index
 read index

The queue is not thread-safe anyway

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an interesting thought. Will look at this some more. I do worry that this might make an future threadsafetiness harder/more-expensive to achieve, but maybe not really anymore than allowing the buffer to grow already does.

What benefit do you see in this though?

Copy link
Contributor

@islamaliev islamaliev Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though this was the main reason why you chose (for now) to grow the array by 1. So that it's easier to manage indexes. Otherwise your read index will have to keep in mind that there are empty slots for future writing that it will have to jump over.

[1,2,_,_,_,3,4]
   ^ ^     ^
   | |     next read index
   | write index
   read index

But with the rearrangement you can grow it as much as you want.

Copy link
Collaborator Author

@AndrewSisley AndrewSisley Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, the +1 is because I only want it to grow by one. The problem you just described doesnt exist, as read never crosses the gap (it is a queue, if an item is read it is consumed and never gets read again - if the read index reaches the gap it means the queue is currently empty).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I actually tried using this setup last week to help solve an edge case. It did not simplify the code unfortunately.

// Note: The last value read should not be overwritten, as `Value`
// may be called multiple times on it after a single `Next` call.
newValues := make([]T, len(q.values)+growthSize)
copy(newValues, q.values[:index])
copy(newValues[index+growthSize:], q.values[index:])
q.values = newValues
// Shift the current read index to reflect its new location.
q.currentIndex += growthSize
}

if index == 0 {
q.zeroIndexSet = true
}

q.values[index] = value
q.lastSetIndex = index

return nil
}

func (q *queue[T]) Next() (bool, error) {
// If the previous index was the zero-index the value is consumed (implicitly), so we update
// the flag here.
if q.currentIndex == 0 {
q.zeroIndexSet = false
}

nextIndex := q.currentIndex + 1
var hasValue bool
if nextIndex >= len(q.values) {
if q.zeroIndexSet {
// Circle back to the beginning
nextIndex = 0
hasValue = true
} else {
hasValue = false
if q.currentIndex == len(q.values) {
// If we have reached the end of the values slice, and the previous
// index was already out of bounds, we should avoid growing it further.
nextIndex = q.currentIndex
}
}
} else {
// If the previous read index was the last index written to then the value has been
// consumed and we have reached the edge of the ring: [v2, v3,^we are here, , v1]
hasValue = q.currentIndex != q.lastSetIndex
}

q.currentIndex = nextIndex
q.waitingForWrite = !hasValue
return hasValue, nil
}

func (q *queue[T]) Value() (T, error) {
// The read index might be out of bounds at this point (either outside the slice, or the ring)
// and we should not return a value here if that is the case.
if q.waitingForWrite {
var zero T
return zero, nil
}
return q.values[q.currentIndex], nil
}

func (q *queue[T]) Reset() {
q.values = []T{}
q.currentIndex = -1
q.lastSetIndex = -1
q.zeroIndexSet = false
q.waitingForWrite = false
}

func (q *queue[T]) Size() int {
return len(q.values)
}
Loading