-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing large amount of data in orleans grain state #1756
Comments
I’ll offer some general comments, but more details about your data, and the nature of the filtering would be helpful. For storing large amounts of data, I would expect the AzureBlobStorage storage provider to work. I’ve not worked with it myself, so I am not aware of its limits, but blobs don’t have the 1mb limit that azure table entities do. For performance (and in general) I’d suggest partitioning your data as much as is reasonably possible. The atomicity of the update requirements should dictate how small your data sets can be. Partitioning of the data will mean more grain activations but that’s not a bad thing. As the Orleans storage provider model is a simple abstraction over storage, it does not include filtering capabilities. Customers that have needed to load only subsets of data from storage, based off of some filter, have accessed storage directly from grains rather than via storage providers. This is more work, but allows the full feature set of the selected storage technology. For instance, storing one’s data in multiple tables in azure table storage would allow for a filtered read based on partition and row keys. |
@somnathnitb As @jason-bragg mentions, this can be kind of split in two ways in that grains save data in application specific ways. Very doable, just use your favourite ORM. The other way is to use one of the Orleans persistence providers that saves, well, what they save. :) One option for large blobs could be relational storage too, but it is to my chagrin in the makings still. See at #1682. I thought I could make it this month, but it will be a close call due to some unforseen meetings with investors in some personal projects (happy incidents, so to say). The idea is to use streaming too on larger blobs, but currently there is code to do that only one-way. As you may know, for relational storage the space limit is practically of what the file system allow. Then manipulating data on the storage is possible too, if it makes sense. You can see the queries and ideas there. I don't see doing the code is much of work (sans sharding), maybe more is arranging proper testing. |
Many Thanks. Just to clarify we are exposing an ODATA interface and we wanted to do the filtering at DB/Storage level rather than inmemory. One of the possible solutions is to move to SQL and use EF so that we can bind that directly to ODATA. For the specific functionality we will access data directly from DB using SQL EF-Odata interface without using Grains. Thoughts? |
@somnathnitb Filtering at DB level is almost always the correct thing to do. It looks to me having application specific code to the EF-OData link is the path of least resistance and you get the usual benefits. In your place I would also consider using schema bound views that would be exposed to OData for a few reasons:
You might get a bit awkward looking outer joins doing this, so might not always feel or be the most appropriate thing. Persistence storage is more like a "blob". It has some overhead on small amounts of data (not much) and is geared towards saving and retrieving the state as such, say a list of interegers. The plan is to currently save them as One thing going on is that at least I have a plan to introduce sharding to relational persistence storage provider that should work across even on non SQL Server and I think (no code yet) even on across different storage engines. |
@somnathnitb - For your needs, as @veikkoeeva also seemed to advocate, talking directly to storage (via EF in this case) rather than going through grain state storage via a storage provider is probably your best course. "For the specific functionality we will access data directly from DB using SQL EF-Odata interface without using Grains." In the above comment, I'm not sure if by 'using Grains' you meant 'without using grain state storage' or the, more literal, "not using grains". I agree that the supported grain state storage using storage providers is not sufficient for your needs, however, depending on your query model, grains may still be quite valuable for scaling purposes. Using statefull grains (regardless of how or where the state came from) adds much to the maintainability and scalability of a service. My suggestion is that you use grains just as you would if grain state persistence were sufficient, but instead of expecting the state to be loaded with the grain, explicitly load the grain state via EF the in the AsyncActivate call, or some sort of Load(..) call on the grain. |
Experts,
We have a requirement to store large amount of data in grain state i.e. data in millions. So are there any recommendations around the design approaches we should take taking performance into consideration?
Additionally we might have to do lot of filtering on the data set so there is also requirement to do filtering on entire data set.
Any suggestions on this will be really helpful.
Thanks,
Somnath
The text was updated successfully, but these errors were encountered: