Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/garbage collect cache #4681

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 98 additions & 4 deletions packages/apollo-cache-inmemory/src/inMemoryCache.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ import { DocumentNode } from 'graphql';

import { Cache, ApolloCache, Transaction } from 'apollo-cache';

import { addTypenameToDocument } from 'apollo-utilities';
import { addTypenameToDocument, isIdValue } from 'apollo-utilities';

import { wrap } from 'optimism';

import { invariant, InvariantError } from 'ts-invariant';
import { invariant } from 'ts-invariant';

import { HeuristicFragmentMatcher } from './fragmentMatcher';
import {
Expand Down Expand Up @@ -228,8 +228,50 @@ export class InMemoryCache extends ApolloCache<NormalizedCacheObject> {
};
}

public evict(query: Cache.EvictOptions): Cache.EvictionResult {
throw new InvariantError(`eviction is not implemented on InMemory Cache`);
public evict(evictee: Cache.EvictOptions): Cache.EvictionResult {
let keys = this.diff({
query: evictee.query,
variables: evictee.variables,
optimistic: false,
}).involvedFields;
const unique = new Set(keys);
//Don't delete the root
unique.delete('ROOT_QUERY');

//Delete each part
unique.forEach(k => {
this.data.delete(k);
});

this.cleanupCacheReferences();

return { success: true };
}

//Garbage Collect Cache entries that are not being watched
Copy link
Contributor

@wtrocki wtrocki Jun 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering about reason why this is actually connected to cache.watch.
Would this be expected to remove items that are not watched currently.

Would it make sense to gc elements that are simply not used in queries - means that they have no ability to be fetched anymore?
It is common to watch for certain queries within view but after navigating to different components moving to another set of elements to watch.

Does this mean that doing gc will mean that at the moment we will need to
watch all queries that should stay in cache?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be expected to remove items that are not watched currently.

Yes

I'm not sure I fully understand what point you are trying to make, so I am going to answer the rest of your questions backwards.

Does this mean that doing gc will mean that at the moment we will need to
watch all queries that should stay in cache?

Yes, if desired it should be possible to allow this method to take a list of queries + variables and not remove them as well.

Would it make sense to gc elements that are simply not used in queries - means that they have no ability to be fetched anymore?

Nothing in the cache has "no ability to be fetched anymore", if you were to do a one off query then do the same query again it would pull it from the cache. So as a compromise, I assume anything cared about is being watched. I would be interested to hear use cases where things need to remain in the cache and are accessed via non-watched queries.

It is common to watch for certain queries within view but after navigating to different components moving to another set of elements to watch.

That is the use case this is targeting, we have queries returning large amounts of data but once they leave the view it is unlikely the user will access the same data again. Or if they do, it is acceptable to incur the cost of hitting the server again.

This is also why i think gc should have to be called explicitly, then the app creator can decide the most ideal time to perform this operation, it could be after 2min or 20min of user inactivity in a requestidlecallback.

Copy link
Contributor

@wtrocki wtrocki Jun 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My use case will be strongly coming from projects that offer offline support where all cache is persisted in apollo-cache-persist.
All data that lands in the cache will stay there even after the restart which obviously brings the requirement to remove items from the cache once they were removed from the query as this will mean that they are no longer needed. For example, when executing deleteItem mutation we know that item should no longer be there and we can remove it from query, but sadly item will still stay on the root - workarounds we have used in many projects is to write empty object with just ID.

When making a query to the server we sometimes need to handle the situation when objects are removed and simply there is no way to do that now in apollo-client. While this PR is amazing it will be nice to extend it to handle rootId on evict as well so when performing cache updates we can remove that item. I have intentionally skipped code from this PR and added just idea. More info here:

#4917 and in apollo-feature-requests/issues/4

Copy link
Author

@thomassuckow thomassuckow Jun 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are totally right, I spaced the case where the query is refreshed and may leave dangling things in the cache.

I believe a modified version of cleanupCacheReferences() in my proof of concept may cover your case. Where cleanupCacheReferences is looking for missing keys and nuking the parent item, you would want to traverse the cache and find things that are not referenced at all.

Edit: Your use case may be a more appropriate to the name "gc" than mine.

public gc() {
let keys: string[] = [];
this.watches.forEach(c => {
const d = this.diff({
query: c.query,
variables: c.variables,
previousResult: c.previousResult && c.previousResult(),
optimistic: c.optimistic,
});
keys.push(...d.involvedFields);
});
const unique = new Set(keys);

const cacheObject = this.data.toObject();
const cacheKeys = Object.keys(cacheObject);

for (let k of cacheKeys) {
if (!unique.has(k)) {
this.data.delete(k);
}
}

this.cleanupCacheReferences();
}

public reset(): Promise<void> {
Expand Down Expand Up @@ -331,6 +373,58 @@ export class InMemoryCache extends ApolloCache<NormalizedCacheObject> {
}
}

protected cleanupCacheReferences() {
const cacheObject = this.data.toObject();
const cacheKeys = Object.keys(cacheObject);

//See if the array is invalid
//Note: Types don't permit nested arrays
function checkArray(ary: any[]): boolean {
for (let v of ary) {
if (Array.isArray(v)) {
if (checkArray(v)) {
return true;
}
} else if (isIdValue(v)) {
if (!cacheKeys.includes(v.id)) {
return true;
}
} else {
//Nothing to do
}
}
return false;
}

cacheKeys.forEach(k => {
const item = this.data.get(k);
let newItem = null;

for (let field in item) {
const value = item[field];

if (Array.isArray(value)) {
const invalid = checkArray(value);
if (invalid) {
if (!newItem) newItem = { ...item };
delete newItem[field];
}
} else if (isIdValue(value)) {
if (!cacheKeys.includes(value.id)) {
if (!newItem) newItem = { ...item };
delete newItem[field];
}
} else {
//Nothing to do
}
}

if (newItem) {
this.data.set(k, newItem);
}
});
}

// This method is wrapped in the constructor so that it will be called only
// if the data that would be broadcast has changed.
private maybeBroadcastWatch(c: Cache.WatchOptions) {
Expand Down
19 changes: 17 additions & 2 deletions packages/apollo-cache-inmemory/src/readFromStore.ts
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,9 @@ export type ExecResult<R = any> = {
result: R;
// Empty array if no missing fields encountered while computing result.
missing?: ExecResultMissingField[];

// Fields involved in constructing the result for tracking of cache utilization
involvedFields: string[];
};

type ExecStoreQueryOptions = {
Expand Down Expand Up @@ -254,6 +257,7 @@ export class StoreReader {

return {
result: execResult.result,
involvedFields: execResult.involvedFields,
complete: !hasMissingFields,
};
}
Expand Down Expand Up @@ -308,7 +312,7 @@ export class StoreReader {
execContext,
}: ExecSelectionSetOptions): ExecResult {
const { fragmentMap, contextValue, variableValues: variables } = execContext;
const finalResult: ExecResult = { result: null };
const finalResult: ExecResult = { result: null, involvedFields: [rootValue.id] };

const objectsToMerge: { [key: string]: any }[] = [];

Expand All @@ -324,6 +328,8 @@ export class StoreReader {
finalResult.missing = finalResult.missing || [];
finalResult.missing.push(...result.missing);
}

finalResult.involvedFields.push(...result.involvedFields);
return result.result;
}

Expand Down Expand Up @@ -459,15 +465,19 @@ export class StoreReader {
...execResults: ExecResult<T>[]
): ExecResult<T> {
let missing: ExecResultMissingField[] = null;
let involvedFields: string[] = [];
execResults.forEach(execResult => {
if (execResult.missing) {
missing = missing || [];
missing.push(...execResult.missing);
}

involvedFields.push(...execResult.involvedFields);
});
return {
result: execResults.pop().result,
missing,
involvedFields,
};
}

Expand All @@ -477,13 +487,16 @@ export class StoreReader {
execContext: ExecContext,
): ExecResult {
let missing: ExecResultMissingField[] = null;
let involvedFields: string[] = [];

function handleMissing<T>(childResult: ExecResult<T>): T {
if (childResult.missing) {
missing = missing || [];
missing.push(...childResult.missing);
}

involvedFields.push(...childResult.involvedFields);

return childResult.result;
}

Expand Down Expand Up @@ -516,7 +529,7 @@ export class StoreReader {
Object.freeze(result);
}

return { result, missing };
return { result, missing, involvedFields };
Copy link
Member

@benjamn benjamn Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really glad you figured out this information should be returned from the execute* functions, since that allows us to cache it along with the result and missing fields.

}
}

Expand Down Expand Up @@ -598,6 +611,7 @@ function readStoreResolver(
fieldName: storeKeyName,
tolerable: false,
}],
involvedFields: [],
};
}

Expand All @@ -607,5 +621,6 @@ function readStoreResolver(

return {
result: fieldValue,
involvedFields: [],
};
}
1 change: 1 addition & 0 deletions packages/apollo-cache/src/types/DataProxy.ts
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ export namespace DataProxy {

export type DiffResult<T> = {
result?: T;
involvedFields?: string[];
complete?: boolean;
};
}
Expand Down