Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent connections #44

Closed
multimeric opened this issue Jan 29, 2023 · 5 comments · Fixed by #63 or #100
Closed

Persistent connections #44

multimeric opened this issue Jan 29, 2023 · 5 comments · Fixed by #63 or #100

Comments

@multimeric
Copy link
Collaborator

It isn't ideal to establish a new database connection each time we run get_metadata() because:

  • It's slightly slower
  • It burdens the database server
  • We get the annoying message: Warning message: In .Internal(gc(verbose, reset, full)) : call dbDisconnect() when finished working with a connection

The API needs to be changed in some way to persist the database connection between queries.

@stemangiola
Copy link
Owner

I am not sure if we have a persistent connection. Something I have noticed is that if I wait one minute or so, the SQL table gives an error about the connection being dropped (timeout). This is slightly annoying as I have to reinitiate the connection myself. Users will be frustrated.

I'm not sure if we could reinitiate the connection automatically in case of a timeout.

I will post the error here if I encounter it again.

@multimeric
Copy link
Collaborator Author

I am not sure if we have a persistent connection.

We don't, but I want one for the reasons stated above.

Something I have noticed is that if I wait one minute or so, the SQL table gives an error about the connection being dropped (timeout).

This is a separate issue. It's easily fixed by putting a larger timeout parameter value when connecting to the MySQL database.

@mtmorgan
Copy link
Collaborator

mtmorgan commented Feb 6, 2023

'Persistent' connections go wrong when they are opened on one process, and used on another, e.g., via mclapply / parLapply / bplapply (because the underlying C code is not expecting to be used in this 'reentrant' way).

I think the warning message about closing unused connection can be addressed by adding a 'finalizer' (c.f., reg.finalizer()) to the opened database connection. Care needs to be taken to only use functionality guaranteed to be available when the finalizer runs (maybe dbDisconnect + base, but maybe more restrictive than that...); I've forgotten the details...

@multimeric
Copy link
Collaborator Author

I'm not particularly worried about cross process use. It might happen but will be rare.

I think the connection objects probably have their own finalizer type process. e.g. the MySQL one's destructor seems to disconnect:https://github.com/r-dbi/RMariaDB/blob/082847dbff7267f0d308e6677aeea8fc00335835/src/DbConnection.cpp#L13-L20. But it might be nice to add a custom one just to avoid this error.

What I'm more worried about is that we're sort of implicitly encouraging users to make multiple connections and not quickly clean them up until the garbage collector does it. I think in the vignette we need to make one instance of the table, and then call various dplyr verbs on that table, rather than using get_metadata() multiple times.

@multimeric
Copy link
Collaborator Author

Probably a better future solution is to use: master...mtmorgan:CuratedAtlasQueryR:ensure-dbDisconnect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants