Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add document change event broadcast using members list #189

Merged
merged 12 commits into from
Jun 9, 2021
Merged

Conversation

dc7303
Copy link
Member

@dc7303 dc7303 commented May 24, 2021

What this PR does / why we need it:
It broadcasts document change events to replicated agents.

  • Add grpc broadcast service.
  • Implement the Publish API.
  • Broadcast client management.
  • Add grpc_recover.
  • When requesting from the server to the server, the server should not be stopped when the replicated agent is in a state where communication is not possible.
  • Add stress test
  • Add design document for broadcast
  • Code cleanup

Added issue

  • If you enable the server in multiple tests for the cluster mode test, they affect each other and fail.
    example)
 Error Trace:    cluster_stress_test.go:32
                Error:          Received unexpected error:
                                listen tcp :21501: bind: address already in use
---
 Error Trace:    cluster_mode_test.go:59
                                cluster_mode_test.go:128
                Error:          "map[c2oi......]" should have 2 item(s), but has 7

Which issue(s) this PR fixes:

Address #183

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


Additional documentation:


Checklist:

  • Added relevant tests or not required
  • Didn't break anything

@hackerwins hackerwins marked this pull request as draft May 25, 2021 00:52
@dc7303 dc7303 force-pushed the broadcasting branch 2 times, most recently from ec27bbb to a5ed817 Compare May 26, 2021 14:01
1. Improvements so that each agent can simultaneously send requests and
conduct tests.
2. Added handling of `negative WaitGroup counter` errors that appear
intermittently in stress tests.
3. Fixed a situation where change reject occurred in the test.
@dc7303
Copy link
Member Author

dc7303 commented May 28, 2021

var portOffset = 0
// TestConfig returns config for creating Yorkie instance.
func TestConfig(authWebhook string) *yorkie.Config {
portOffset += 100
return &yorkie.Config{
RPC: &rpc.Config{
Port: RPCPort + portOffset,
},
Metrics: &prometheus.Config{
Port: MetricsPort + portOffset,
},
Backend: &backend.Config{
SnapshotThreshold: SnapshotThreshold,
AuthorizationWebhookURL: authWebhook,
},
Mongo: &mongo.Config{
ConnectionURI: MongoConnectionURI,
ConnectionTimeoutSec: MongoConnectionTimeoutSec,
PingTimeoutSec: MongoPingTimeoutSec,
YorkieDatabase: TestDBName(),
},
ETCD: &etcd.Config{
Endpoints: ETCDEndpoints,
},
}
}

Port conflicts occur when multiple packages use this function because it recompiles when the package being tested is replaced. I am looking for a way to use ports stably to run multiple servers.
go-test doc

'Go test' recompiles each package along with any files with names matching
the file pattern "*_test.go"
....

@dc7303 dc7303 marked this pull request as ready for review May 28, 2021 19:09
Copy link
Member

@hackerwins hackerwins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution.
I left a few simple questions.

api/yorkie.proto Outdated Show resolved Hide resolved
test/integration/cluster_mode_test.go Outdated Show resolved Hide resolved
test/stress/cluster_stress_test.go Outdated Show resolved Hide resolved
yorkie/backend/sync/etcd/membermap.go Outdated Show resolved Hide resolved
yorkie/backend/sync/etcd/client.go Outdated Show resolved Hide resolved
yorkie/rpc/broadcast.go Outdated Show resolved Hide resolved
yorkie/rpc/broadcast.go Outdated Show resolved Hide resolved
yorkie/rpc/server.go Outdated Show resolved Hide resolved
@hackerwins hackerwins assigned ppeeou and unassigned hackerwins May 29, 2021
api/yorkie.proto Outdated Show resolved Hide resolved
yorkie/backend/sync/etcd/pubsub.go Outdated Show resolved Hide resolved
docEvent, err := converter.ToDocEvent(event)
if err != nil {
log.Logger.Error(err)
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we broke the error chain here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think it was necessary to return an error from the API where a particular Document event occurred just because it couldn't broadcast to another agent.
So I left an error log at the time of occurrence. This is because we thought that document sync could be synced through other sync events.

I have been thinking about how to alert when a problem that can occur within the method (such as a convert failure or an abnormal problem in the target agent) occurs continuously, but I was not sure, so I tried to suggest it later.

Can you please let me know if there are more things I should consider?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we try to convert an event type that is not supported, an error occurs, and there is no need to loop further in the caller. I wish we could handle errors explicitly.

1. Improved 'watch document across agents test'.
2. Remove cluster stress test.
3. Cleanup code.
4. Change service's name Broadcast to Cluster.
5. Remove grpc recovery
@codecov
Copy link

codecov bot commented May 31, 2021

Codecov Report

Merging #189 (28cc71b) into main (b0af224) will decrease coverage by 1.21%.
The diff coverage is 22.61%.

❗ Current head 28cc71b differs from pull request most recent head 6d267ab. Consider uploading reports for the commit 6d267ab to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #189      +/-   ##
==========================================
- Coverage   61.58%   60.36%   -1.22%     
==========================================
  Files          41       42       +1     
  Lines        3454     3550      +96     
==========================================
+ Hits         2127     2143      +16     
- Misses       1131     1211      +80     
  Partials      196      196              
Impacted Files Coverage Δ
api/converter/from_pb.go 65.22% <0.00%> (-1.88%) ⬇️
api/converter/to_pb.go 82.93% <0.00%> (-2.72%) ⬇️
client/client.go 12.13% <0.00%> (ø)
yorkie/backend/sync/etcd/membermap.go 0.00% <0.00%> (ø)
yorkie/backend/sync/etcd/pubsub.go 0.00% <0.00%> (ø)
yorkie/backend/sync/memory/coordinator.go 0.00% <0.00%> (ø)
yorkie/rpc/cluster.go 0.00% <0.00%> (ø)
yorkie/rpc/server.go 50.72% <3.57%> (+0.91%) ⬆️
yorkie/backend/sync/memory/pubsub.go 91.89% <89.18%> (-1.33%) ⬇️
yorkie/backend/sync/etcd/client.go 71.42% <100.00%> (+1.73%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b0af224...6d267ab. Read the comment docs.

yorkie/rpc/server.go Outdated Show resolved Hide resolved
dc7303 and others added 4 commits June 6, 2021 21:04
1. Improve lock usage when accessing cluster server clients.
2. Improved error handling.
3. Remove duplicate lock calls.
The DocEvent and Event types seem to overlap each other. Integrated
these two types with each other and improved the logic that previously
used DocEvent.
Broadcasting should also stop when the parent context terminates.
- Remove Members from pubsub
- Remove code that uses two locks together
- Change the key of clusterClientMap to id
- Cleanup comments
@hackerwins hackerwins self-requested a review June 8, 2021 12:10
Copy link
Member

@hackerwins hackerwins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work. In particular, changing the topic to multi-key based in PubSub seems to be neat.

Broadcast seems to have many weaknesses compared to Gossip-based algorithms, but first of all, it seems to be able to operate a small-scale cluster.

(I made a few changes during the review.)

@hackerwins hackerwins merged commit bfe712d into main Jun 9, 2021
@hackerwins hackerwins deleted the broadcasting branch June 9, 2021 00:51
hackerwins added a commit to yorkie-team/yorkie-js-sdk that referenced this pull request Jun 9, 2021
@hackerwins hackerwins added the protocol changed 📝 Whether the protocol has changed label Jun 9, 2021
parkeunae pushed a commit to parkeunae/yorkie-js-sdk that referenced this pull request Jul 23, 2022
jeonjonghyeok pushed a commit to jeonjonghyeok/yorkie that referenced this pull request Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protocol changed 📝 Whether the protocol has changed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants