Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automatic swap-to-disk support #60

Merged
merged 9 commits into from
Jun 30, 2022
Merged

Add automatic swap-to-disk support #60

merged 9 commits into from
Jun 30, 2022

Conversation

jpsamaroo
Copy link
Collaborator

@jpsamaroo jpsamaroo commented Jan 2, 2022

This PR implements support for "storage devices", which represent memory, disks, and storage manager algorithms that MemPool can leverage to perform automatic data management. We implement an SimpleRecencyAllocator device which, when assigned to one or more DRefs (via a poolset kwarg), will manage migrating refs from memory to disk (or really between any two devices) as memory fills up, and from disk to memory as data is accessed via poolget.

The SimpleRecencyAllocator isn't intended to be a perfect, all-encompassing storage manager; instead, it's a simple example of how to do memory management in this new scheme. Interested users are encouraged to implement their own storage devices and managers to suit their workload.

This is also intended as a replacement for some of the logic in JuliaParallel/Dagger.jl#289. This PR should be better able to prevent in-memory data and files on disk from lingering for longer than they're needed, and should give a storage manager fine-grained control over data migration decisions. It doesn't currently integrate metrics about data migration speeds and data compression ratios (which Dagger would want to know), but we should be able to easily expose those metrics later on.

Todo:

  • Add tests for SimpleRecencyAllocator
  • Add MRU support
  • Add docs for writing storage devices
  • Add tests for CPURAMDevice and SerializationFileDevice
  • Ensure LRU promotes data to upper device during read_from_device
  • Expose resource queries (available, capacity)
  • Test resource queries
  • Try to make all storage-related GC finalizers non-yielding
  • Clean-up refcounting debug statements
  • Add file name pattern callback to SerializationFileDevice
  • Test refcounting implementation
  • Update API names to use underscores
  • Test nested LRU
  • Stress-test concurrent access
  • Support retain alternative to destroyonevict and test it
  • Test multiple leaf devices
  • Add Windows disk query method
  • Use Base.diskstats if available
  • Determine how to support aliased storage resources Out of scope for now
  • Ensure that errors propagate correctly and do not hang the system
  • Update Add memory and storage awareness JuliaParallel/Dagger.jl#289 for this PR
  • Add mechanism to ensure lazy tasks complete during atexit
  • Add stream filters to SerializationDiskDevice for compression/encryption/etc.

@jpsamaroo
Copy link
Collaborator Author

This PR is basically good-to-go. I plan to test this with Dagger's DTable to confirm that it can automatically do swap-to-disk when the env. var. JULIA_MEMPOOL_EXPERIMENTAL_FANCY_ALLOCATOR=1.

@jpsamaroo
Copy link
Collaborator Author

I think this will need to change to avoid the implicit assumption that movetodevice! can move data between devices directly; instead, we should generally assume that moving data between devices implies:

  • A read from the old device into memory, if necessary
  • A delete from the old device
  • A write from memory to the new device
  • An optional delete from memory

@jpsamaroo jpsamaroo force-pushed the jps/storage branch 2 times, most recently from 5c434f3 to 2410b06 Compare June 30, 2022 03:50
Set Julia 1.7 lower-bound
Add more `show` methods
datastore: Split global datastore lock
counters: Add datastore counter lock
counters: Add send and recv datastore counters lock
storage: Implement RCU for storage access
storage: Thread `RefState` through methods
storage: Make some operations lazy
storage: Document design and internals
LRUAllocator: Rename to SimpleRecencyAllocator
SimpleRecencyAllocator: Add MRU support
SimpleRecencyAllocator: Implement batched migration
SimpleRecencyAllocator: Add locking for concurrent access
SimpleRecencyAllocator: Cache `RefState`
@jpsamaroo jpsamaroo merged commit c9c1139 into master Jun 30, 2022
@jpsamaroo jpsamaroo deleted the jps/storage branch June 30, 2022 21:07
@jpsamaroo jpsamaroo mentioned this pull request Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants