Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve collections count #260

Merged

Conversation

JeanFrancoisGuena
Copy link
Contributor

Hi,

This PR is about the ability to drop suffixed collections if empty.

For now, we used the countDocuments method that performs an aggregate operation in order to count all documents of the collection, which may be resource consuming if the collection contains a huge amount of data.

However, as we only need to know if the collection is actually empty and not how many documents it contains, a better way could be to use the 'estimatedDocumentCount' method that sends us an immediate answer from metadata, performing no counting at all (as I understand from MongoDB documentation)

Happy reviewing

# Conflicts:
#	rxmongo/src/main/scala/akka/contrib/persistence/mongodb/RxMongoJournaller.scala
#	rxmongo/src/main/scala/akka/contrib/persistence/mongodb/RxMongoSnapshotter.scala
@scullxbones
Copy link
Owner

Hi @JeanFrancoisGuena

I'm a bit concerned this is unsupported on some of the mongo versions that this driver supports. Looks like a 4.x feature?

https://docs.mongodb.com/manual/reference/method/db.collection.estimatedDocumentCount/#behavior

@JeanFrancoisGuena
Copy link
Contributor Author

Hi @scullxbones

I missed that point, sorry.

So, after digging into the code and performing mongo tests on some big data set, I can submit a new solution:

  • we can use estimatedDocumentCount method for Mongo Server which version is greater than 4.0.3
  • we can use a local readConcern for older versions

In both cases, this first count may be inaccurate in cluster environment. So, if it gives us zero, lets perform another count, using countDocuments method, as we are pretty sure it will run fast and return zero again.

However, we can optimize this method, using appropriate index as a hint field for aggregate method used by countDocuments method. Again, we have to test Mongo Server version as this is available for version greater than 3.6...

# Conflicts:
#	scala/src/main/scala/akka/contrib/persistence/mongodb/ScalaDriverPersistenceSnapshotter.scala
@scullxbones
Copy link
Owner

Can you implement this for the other drivers as well? At minimum the ReactiveMongo driver.

It's important to me to keep parity.

@JeanFrancoisGuena
Copy link
Contributor Author

Done.

There is no simple way to call estimatedDocumentCount method with RxMongo, so implementation is a bit different...

Copy link
Owner

@scullxbones scullxbones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the RXM implementation, LGTM

@scullxbones scullxbones merged commit 9fa6043 into scullxbones:master Sep 17, 2019
@JeanFrancoisGuena JeanFrancoisGuena deleted the improve-collections-count branch September 30, 2019 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants