Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing alive socket count, looks like a leak #148

Closed
omerkirk opened this issue Apr 19, 2018 · 8 comments
Closed

Increasing alive socket count, looks like a leak #148

omerkirk opened this issue Apr 19, 2018 · 8 comments
Assignees

Comments

@omerkirk
Copy link

Hi,

I am using the driver with go 1.10. First I've noticed the number of goroutines waiting in socket readLoop is constantly increasing.

goroutine profile: total 3300
3131 @ 0x42ee6a 0x42a18a 0x429807 0x49150b 0x49158d 0x4923ed 0x5f089f 0x60195a 0x6c3193 0x6c3852 0x45c761
#	0x429806	internal/poll.runtime_pollWait+0x56			/usr/local/go110/src/runtime/netpoll.go:173
#	0x49150a	internal/poll.(*pollDesc).wait+0x9a			/usr/local/go110/src/internal/poll/fd_poll_runtime.go:85
#	0x49158c	internal/poll.(*pollDesc).waitRead+0x3c			/usr/local/go110/src/internal/poll/fd_poll_runtime.go:90
#	0x4923ec	internal/poll.(*FD).Read+0x17c				/usr/local/go110/src/internal/poll/fd_unix.go:157
#	0x5f089e	net.(*netFD).Read+0x4e					/usr/local/go110/src/net/fd_unix.go:202
#	0x601959	net.(*conn).Read+0x69					/usr/local/go110/src/net/net.go:176
#	0x6c3192	github.com/globalsign/mgo.fill+0x52			/home/omerkirk/Projects/go/src/github.com/globalsign/mgo/socket.go:567
#	0x6c3851	github.com/globalsign/mgo.(*mongoSocket).readLoop+0x601	/home/omerkirk/Projects/go/src/github.com/globalsign/mgo/socket.go:583

To be able to confirm that this is not due to high load I've checked mongo stats daily, and found that the number of goroutines is the same as SocketsAlive stat. The service I am using the driver gets traffic around 400 rps to 4000 rps depending on the time of day. For the past 2 weeks the number of alive sockets never decreased even when the traffic is very low and the number of used sockets fluctuates around 1 and 2.

For example a snapshot of mongo stats is below, as you can see the number of needed sockets to handle the traffic is very low however the number of alive sockets is still very high.

Clusters": 2,
"MasterConns": 32967,
"SlaveConns": 2845,
"SentOps": 66839799,
"ReceivedOps": 66801986,
"ReceivedDocs": 66801986,
"SocketsAlive": 3131,
"SocketsInUse": 0,
"SocketRefs": 0

I've checked the code and what I do is first dial a session in the beginning of the runtime then in each mongo request I copy the session and defer session.Close() in every function. The sessions all have a 1 minute timeout. Please let me know if you need any more information on the issue.

@domodwyer
Copy link

Hi @omerkirk

Great report, thanks for including all the details! This is a known issue unfortunately but work has been made towards fixing it - #116 by @gnawux has added pruning unused connections (thanks!) and I'm currently working on a performance tweak to mitigate the cost of the PR (see #142) that I expect to have ready some time today for review - after that we'll cut a new release that fixes this for everyone 👍

As for the number of goroutines, each socket has an event loop in the background so I expect this to be solved by the connection PR too. For what it's worth, the event loop goroutine for an unused socket should be blocked on network I/O and wouldn't be eligible for scheduling, but it's still far from ideal.

Hopefully we'll have a release in the next week or so (it takes time to test) - would you be willing to try a pre-release version and confirm it solves the issue for you?

Thanks again!

Dom

@omerkirk
Copy link
Author

Hi @domodwyer ,

Sure I'd love to help test this.

Just a suggestion, golang sql driver has a very elegant pool implementation with similar requirements (minPoolSize, etc. ) that might guide a solution without the performance hit in #142.

Thanks for the quick answer and the hard work.

@domodwyer
Copy link

Hi @omerkirk

The development branch now has all the bits to fix this - if you have the time please do try it and let us know!

Dom

@omerkirk
Copy link
Author

Hi @domodwyer ,

Will test it tomorrow, and let you know the results.

Omer

@omerkirk
Copy link
Author

Hi @domodwyer ,

I just deployed the service using the development branch, everything seems to be working fine currently, I will keep it in production to see the socket usage under high load and will let you know.

@domodwyer
Copy link

Hi @omerkirk

That's great news - we've run the development branch through some tests too and it looks good, I'll cut a release later today 👍

Thanks for your help!

Dom

@omerkirk
Copy link
Author

omerkirk commented Apr 24, 2018

Hi again @domodwyer ,

The development branch works great, we are 5 days in and the number of sockets alive is in check at 20 which is the minPoolSize I configured. Great work. Closing the issue.

@domodwyer
Copy link

Thanks @omerkirk you've been a big help!

We also confirmed it's all working as intended in our test environment too - we've just cut a new release so everyone can enjoy the changes.

Thanks again!

Dom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants