-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap: cache and sweeping improvements #6539
Comments
Shared my opinion in the thread where this was discussed, but reposting here for visibility: I vote for option 2 (global sweepers) first as this is likely achievable as a semver minor change, and then work to also do option 1 (global cache) if folks feels its viable down the road. |
Any plan for async cache methods eventually so we can offload cache out of the process memory to something like Redis? Or does the structure aspect of the data negate the ability to serialize it for something like Redis. This would be good for sharded data that has duplicates cross process(reactions, users, etc). Thanks for the thorough explanation on everything. The memory issue is probably the single most limiting factor to larger bots. Looking forward to this |
We haven't discussed anything about that matter in the last few months, but fortunately, we can go two ways around that, @JMTK:
If you ask me, the former will most likely happen before the latter. Besides, users who want to use Redis also prefer having higher control about what they cache and how the library interacts with it.
Edit: Added reference to new issue. |
Starting with Discord.js v13.0.0, we removed a lot of add-ons, such as
Channel#_typing
(#6114),User#lastMessageId
,User#lastMessageChannelId
(#6046), and so on. Those changes have helped reduce Discord.js' memory footprint greatly.With Discord adding archivable thread channels, we saw a need to add a second sweeper, alongside message's. In an attempt to generalize sweeping caches, we added custom caching (#6013) and later sweeping caches (#6110). The latter (
SweptCollection
, nowLimitedCollection
) is the new alternative for sweeping, and noticed it's biggest improvement was to allow us to use them in even very nested collections (such asclient
->channel
->message
->reactions
), but as much as it helped, it has a bounce effect and does actually increase memory usage when used.A
LimitedCollection
differs fromCollection
by several properties:maxSize
, which is usually a small integer.keepOverLimit
, which is a function, often a reference.sweepFilter
, which is also a function, often a reference.interval
, which is either an interval (with a lot of tracking properties) or null.If you were to use
LimitedCollection
on a very duplicated manager, such asReactionManager
, which also stays empty most of the time, this would use far more memory than it should, despite the collection's very purpose being to reduce it.Additionally, if a
LimitedCollection
is used, a lot of extra memory would be used for theFinalizationRegistry
, which is used to hook into the runtime's garbage collector and clear the timers in said collections, so that they don't leak memory. However, this brings a serious issue: the cleanup is bound in every single instance of them, that's a full function (which has large memory overhead) being created, as well as it being stored in both aSet
and finally the finalizer itself, alongside a message and a name:discord.js/src/managers/CachedManager.js
Lines 17 to 28 in fc51f61
As a side note, sweeping collections are cancelled a bit late, if you set the interval to 2 minutes, chances are that once its manager is not longer accessible, it'll run at least twice, as the GC usually runs every 5 or 10 minutes, and of course, we don't want to be sweeping unaccessible memory.
With this,
LimitedCollection
became what it swore to destroy.Following the discussion in the
#Koyamies caching hell
(874411732527886336
) thread (from#library-discussion
), @ckohen and I discussed a few alternatives to this caching hell, in the hope of making v13's memory usage go down.1. Global caches
Everything would be easily accessible in the first layer, no nested iterations would happen, everything would be accessed as
client.caches.channels
,client.caches.messages
,client.caches.reactions
, and so on. This is the most memory efficient approach, and works specially well with the plannedStructure#data
(more information about this soon!), as we'd just access to them without the need to access to its parents (such as accessing to aMessage
without its parentChannel
), as its ECS-like approach of removing the relationships (and thus promoting stateless-ness) makes everything far more efficient.However, it has its drawbacks, all of which related to supporting scoped caches:
Scoped caches (such as
channel.messages.cache
) would need to be backed by either anArray
or aSet
of the structure's ID, so when we do an operation, we need to check whether the key is stored in the backed cache before accessing it, in this fashion:Deleting structures should delete all of its dependencies, so if a channel is deleted, all of its messages, permission overwrites, etc, should be also deleted. We can achieve something similar to this with a
destroy()
method (similar to a destructor in other languages):Being
CachedManager#destroy()
:For simplicity.
(Key) cache duplication, since we'd store both global and scoped keys, translating to a slight increase in memory specially for large caches.
With the
Structure#data
proposal, we could generate managers (and their cache) on the fly, rather than requiring them to be always instantiated, this can lower memory usage and be on par with other libraries as discord.js would use nearly as much memory as required to store the data Discord sends us, although this is at the expense of CPU time, and we'd ideally want to avoid issues similar to the one described in #6248, so this needs further investigation.One could argue that the performance cost of scoped caches is amortized as a global sweeper wouldn't need to traverse many structures (see below), and several Discord.js internals benefit from this (no need to map arrays into collections when building the structures, yay!), especially since v14 aims to be less dependent of state. So unless the user requires scoped values very frequently, the performance benefits outweigh its long-term drawbacks.
2. Global sweepers
messageSweepInterval
andmessageCacheLifetime
were very efficient, as there was a single interval being used for everything, it did not requireFinalizationRegistry
, it did not require clean-ups, no binding functions N-times (N being the amount ofCachedManager
instances of one type), and so on. However, we cannot afford having that many properties inClientOptions
, so we could doClientOptions.sweepers
instead, similar to:This interface is very similar to the current one, and could perhaps be re-used, although it does not need to use the manager names, as the sweepers are global and managers do not need to read them.
This system comes with two major problems:
The former problem can be solved with one of the two following approaches:
Internal helpers that read a collection and the cache properties, such as
CachingUtils.set(channel.messages, client.sweepers.messages)
.Pros:
Cons:
Keep
LimitedCollection
with all of its properties (except interval), we'd probably also want to keep the manager names so it's easy to build from internal code.Pros:
Cons:
maximumSize
,keepOverLimit
, andsweepFilter
to be stored in each one of those collections, which translates to a small reduction of the memory usage overhead they have.We can alleviate the latter problem by making some helpers that builds code on how to traverse the data, e.g. for
reactions
it'd betraverse('channels', 'messages', 'reactions')
, which would basically iterate overclient.channels.cache
, and for each entry, iterate overchannel.messages.cache
, and for each entry, iterate overmessage.reactions.cache
.And because we need to write this code manually, we can hard-forbid specific managers from being swept, such as guilds and permission overwrites. It also provides additional flexibility so that we can handle channels significantly better in two ways.
ChannelManager
,GuildChannelManager
, andThreadManager
). We can do this because every channel is stored in the client channel manager, which already has a way to traverse guild channel managers and thread managers to remove the desired channel.3. Mixed
One of the global sweeper's problems is that we need to somehow tell each sweeper how to traverse, and this is a manual process. With the global caches proposal, we'd be able to just map to the property in
client.caches
, much like it does with managers, soclient.sweepers.messages
readsclient.caches.messages
.In a nutshell, the first approach supports the near-stateless approach we desire in the near future, and the second approach reduces memory usage to the point we could theoretically have the next version of Discord.js use less memory than v11 and v12. Finally, of course, both ideas work very well with each other, so it will most likely be the point we'd like to be in.
The text was updated successfully, but these errors were encountered: