Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sparse Maps rather than dense sources arrays in mergeType code. #3069

Merged

Conversation

sachindshinde
Copy link
Contributor

From @benjamn :

In the mergeType code responsible for merging subgraphs schemas into a supergraph schema, I noticed we were often looping over a sources array with as many elements as the number of subgraphs, inside a loop over all the fields currently being merged, leading to running time proportional to the product of the number of subgraphs and the total number of fields being merged.

This PR demonstrates that we can instead represent just the contributions from various subgraphs that are directly relevant to the current type or field being merged, so the sources data structure does not always remain as large as the number of subgraphs. Swapping the type Sources<T> = Map<number, T | undefined> type for all the array parameters can be done without altering behavior or performance (see the 2nd commit), and that foundation enables the sparse representation optimization (preserving behavior while improving performance; see the addFieldsShallow changes in the 3rd commit).

The goal of this commit is to switch from using a dense array for the
various 'sources' passed around in mergeType code, to using a Map
instead. These Map structures have the potential to omit many elements
corresponding to irrelevant subgraphs, becoming sparser than an array,
but this commit preserves the dense array-like behavior, for now.
Since we already loop over all input sources to add shallow versions of
fields to the dest object in addFieldsShallow, we can return a
representation of what was added to be used later in mergeObject,
mergeInterface, and mergeInput, yielding a speedup.

The source of the speedup is the use of sparse Sources maps, so not all
subgraphs need to have an entry in Sources, though some subgraphs still
do have (intentionally) undefined entries, indicating the subgraph does
not contribute a particular field, but might matter for validation of
the field.

I'm aware JavaScript also supports "sparse arrays" which are Array
objects with "holes" in them (missing elements, not just undefined), but
not all operations (such as sparseArray.entries()) skip the holes, so it
seemed better/safer to use an explicitly sparse data structure like Map.
@sachindshinde sachindshinde requested a review from a team as a code owner July 8, 2024 23:37
Copy link

changeset-bot bot commented Jul 8, 2024

🦋 Changeset detected

Latest commit: f5f6a79

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 7 packages
Name Type
@apollo/composition Patch
@apollo/gateway Patch
@apollo/federation-internals Patch
@apollo/query-planner Patch
@apollo/query-graphs Patch
@apollo/subgraph Patch
apollo-federation-integration-testsuite Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link

codesandbox-ci bot commented Jul 8, 2024

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

@sachindshinde sachindshinde force-pushed the benjamn/sparse-map-mergeType-sources branch from 69f2b89 to eca8ce6 Compare July 8, 2024 23:41
@sachindshinde sachindshinde force-pushed the benjamn/sparse-map-mergeType-sources branch from cfd7f6c to f5f6a79 Compare July 8, 2024 23:48
@sachindshinde sachindshinde merged commit 3dff8a3 into version-2.8.3-beta Jul 8, 2024
13 checks passed
@sachindshinde sachindshinde deleted the benjamn/sparse-map-mergeType-sources branch July 8, 2024 23:54
sachindshinde pushed a commit that referenced this pull request Jul 9, 2024
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to
version-2.8.3-beta, this PR will be updated.

⚠️⚠️⚠️⚠️⚠️⚠️

`version-2.8.3-beta` is currently in **pre mode** so this branch has
prereleases rather than normal releases. If you want to exit
prereleases, run `changeset pre exit` on `version-2.8.3-beta`.

⚠️⚠️⚠️⚠️⚠️⚠️

# Releases
## @apollo/[email protected]

### Patch Changes

- Error messages are now lazily evaluated for satisfiability
validations.
([#3068](#3068))

- Query graph caches now use maps instead of sparsely-populated arrays
for per-subgraph data.
([#3066](#3066))

- Add a fast path to skip override validation for fields without any
subgraph `@override`s.
([#3070](#3070))

- Type merging now uses maps instead of sparsely-populated arrays for
per-subgraph data.
([#3069](#3069))

- Stop duplicating hints for inconsistent value type fields per
subgraph.
([#3071](#3071))

- Use sets instead of arrays for tracking schema type/directive
referencers.
([#3067](#3067))

- Updated dependencies
\[[`38debcf2f9af1a719bd1c8acbd9335efa8427ddb`](38debcf),
[`860aace9904e787f9bf05aad94be5b5920f10543`](860aace),
[`f753d55e9a49d11389ee4f8d7976533447e95ede`](f753d55),
[`3af790517d662f3bec9064c0bf243014c579e9cd`](3af7905)]:
    -   @apollo/[email protected]
    -   @apollo/[email protected]

## @apollo/[email protected]

### Patch Changes

- Updated dependencies
\[[`38debcf2f9af1a719bd1c8acbd9335efa8427ddb`](38debcf),
[`860aace9904e787f9bf05aad94be5b5920f10543`](860aace),
[`67b70c6e68b1cdbf8f03dacafd636e27ed9b7814`](67b70c6),
[`f753d55e9a49d11389ee4f8d7976533447e95ede`](f753d55),
[`f5f6a799d6b3675eecb0eaec7a816d746cd136b2`](f5f6a79),
[`42bd27af6a23bcfdd36951dbfa3fb9f7ba833f3a`](42bd27a),
[`3af790517d662f3bec9064c0bf243014c579e9cd`](3af7905)]:
    -   @apollo/[email protected]
    -   @apollo/[email protected]
    -   @apollo/[email protected]

## @apollo/[email protected]

### Patch Changes

- For very large graphs cloning types with lots of join directives can
be expensive. Since these directives will not be used in the Schema that
is cloned for toAPISchema(), add the ability to optionally omit them
([#3053](#3053))

- Use sets instead of arrays for tracking schema type/directive
referencers.
([#3067](#3067))

## @apollo/[email protected]

### Patch Changes

- Error messages are now lazily evaluated for satisfiability
validations.
([#3068](#3068))

- Query graph caches now use maps instead of sparsely-populated arrays
for per-subgraph data.
([#3066](#3066))

- Updated dependencies
\[[`f753d55e9a49d11389ee4f8d7976533447e95ede`](f753d55),
[`3af790517d662f3bec9064c0bf243014c579e9cd`](3af7905)]:
    -   @apollo/[email protected]

## @apollo/[email protected]

### Patch Changes

- Query graph caches now use maps instead of sparsely-populated arrays
for per-subgraph data.
([#3066](#3066))

- Use sets instead of arrays for tracking schema type/directive
referencers.
([#3067](#3067))

- Updated dependencies
\[[`38debcf2f9af1a719bd1c8acbd9335efa8427ddb`](38debcf),
[`860aace9904e787f9bf05aad94be5b5920f10543`](860aace),
[`f753d55e9a49d11389ee4f8d7976533447e95ede`](f753d55),
[`3af790517d662f3bec9064c0bf243014c579e9cd`](3af7905)]:
    -   @apollo/[email protected]
    -   @apollo/[email protected]

## @apollo/[email protected]

### Patch Changes

- Updated dependencies
\[[`f753d55e9a49d11389ee4f8d7976533447e95ede`](f753d55),
[`3af790517d662f3bec9064c0bf243014c579e9cd`](3af7905)]:
    -   @apollo/[email protected]

## [email protected]

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Comment on lines +805 to +806
private subgraphsTypes<T extends NamedType>(supergraphType: T): Sources<T> {
return sourcesFromArray(this.subgraphs.values().map(subgraph => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future work: we could probably prune the Sources<T> maps even more by not including an undefined entry for every subgraph without a match for T here in the subgraphsTypes method.

However, naively omitting all undefined entries causes problems because some subgraphs are relevant (for validation, or because of @interfaceObject) even if they don't have a declaration matching T. Future investigation should identify these cases and include them in the resulting Sources<T> map.

Additionally, it might be beneficial to memoize the result of subgraphsTypes (if it turns out be called redundantly)?

@@ -143,7 +144,7 @@ export class MismatchReporter {
// Not meant to be used directly: use `reportMismatchError` or `reportMismatchHint` instead.
private reportMismatch<TMismatched extends { sourceAST?: ASTNode }>(
supergraphElement:TMismatched | undefined,
subgraphElements: (TMismatched | undefined)[],
subgraphElements: Sources<TMismatched>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that when includeMissingSources is true, the absence of pairs in the Map will result in different behavior for this block down below:

if (!subgraphElt) {
  if (includeMissingSources) {
    distributionMap.add('', this.names[i]);
  }
  continue;
}

sachindshinde added a commit that referenced this pull request Jul 11, 2024
This PR fixes the issue identified in [this review
comment](#3069 (comment)).
Specifically, this PR fixes the bug in composition hint/error message
generation where missing sources were computed incorrectly. (No
changeset needed since the referenced PR hasn't been released yet.)
sachindshinde pushed a commit that referenced this pull request Jul 12, 2024
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @apollo/[email protected]

### Patch Changes

- Error messages are now lazily evaluated for satisfiability
validations.
([#3068](#3068))

- Query graph caches now use maps instead of sparsely-populated arrays
for per-subgraph data.
([#3066](#3066))

- Add a fast path to skip override validation for fields without any
subgraph `@override`s.
([#3070](#3070))

- Type merging now uses maps instead of sparsely-populated arrays for
per-subgraph data.
([#3069](#3069))

- Stop duplicating hints for inconsistent value type fields per
subgraph.
([#3071](#3071))

- Fix logic to compute missing subgraphs when generating composition
hints/errors
([#3076](#3076))

- Use sets instead of arrays for tracking schema type/directive
referencers.
([#3067](#3067))

- Updated dependencies
\[[`38debcf2f9af1a719bd1c8acbd9335efa8427ddb`](38debcf),
[`50d648ccffb05591878de75dc5522914ed48698f`](50d648c),
[`860aace9904e787f9bf05aad94be5b5920f10543`](860aace),
[`f753d55e9a49d11389ee4f8d7976533447e95ede`](f753d55),
[`3af790517d662f3bec9064c0bf243014c579e9cd`](3af7905)]:
    -   @apollo/[email protected]
    -   @apollo/[email protected]

## @apollo/[email protected]

### Patch Changes

- Updated dependencies
\[[`38debcf2f9af1a719bd1c8acbd9335efa8427ddb`](38debcf),
[`50d648ccffb05591878de75dc5522914ed48698f`](50d648c),
[`860aace9904e787f9bf05aad94be5b5920f10543`](860aace),
[`67b70c6e68b1cdbf8f03dacafd636e27ed9b7814`](67b70c6),
[`f753d55e9a49d11389ee4f8d7976533447e95ede`](f753d55),
[`f5f6a799d6b3675eecb0eaec7a816d746cd136b2`](f5f6a79),
[`42bd27af6a23bcfdd36951dbfa3fb9f7ba833f3a`](42bd27a),
[`f376447a820e3c0ae41d16d1fd3b681d2f1e8c14`](f376447),
[`3af790517d662f3bec9064c0bf243014c579e9cd`](3af7905)]:
    -   @apollo/[email protected]
    -   @apollo/[email protected]
    -   @apollo/[email protected]

## @apollo/[email protected]

### Patch Changes

- dummy commit to force beta.2
([#3078](#3078))

- For very large graphs cloning types with lots of join directives can
be expensive. Since these directives will not be used in the Schema that
is cloned for toAPISchema(), add the ability to optionally omit them
([#3053](#3053))

- Use sets instead of arrays for tracking schema type/directive
referencers.
([#3067](#3067))

## @apollo/[email protected]

### Patch Changes

- Error messages are now lazily evaluated for satisfiability
validations.
([#3068](#3068))

- Query graph caches now use maps instead of sparsely-populated arrays
for per-subgraph data.
([#3066](#3066))

- Updated dependencies
\[[`50d648ccffb05591878de75dc5522914ed48698f`](50d648c),
[`f753d55e9a49d11389ee4f8d7976533447e95ede`](f753d55),
[`3af790517d662f3bec9064c0bf243014c579e9cd`](3af7905)]:
    -   @apollo/[email protected]

## @apollo/[email protected]

### Patch Changes

- Query graph caches now use maps instead of sparsely-populated arrays
for per-subgraph data.
([#3066](#3066))

- Use sets instead of arrays for tracking schema type/directive
referencers.
([#3067](#3067))

- Updated dependencies
\[[`38debcf2f9af1a719bd1c8acbd9335efa8427ddb`](38debcf),
[`50d648ccffb05591878de75dc5522914ed48698f`](50d648c),
[`860aace9904e787f9bf05aad94be5b5920f10543`](860aace),
[`f753d55e9a49d11389ee4f8d7976533447e95ede`](f753d55),
[`3af790517d662f3bec9064c0bf243014c579e9cd`](3af7905)]:
    -   @apollo/[email protected]
    -   @apollo/[email protected]

## @apollo/[email protected]

### Patch Changes

- Updated dependencies
\[[`50d648ccffb05591878de75dc5522914ed48698f`](50d648c),
[`f753d55e9a49d11389ee4f8d7976533447e95ede`](f753d55),
[`3af790517d662f3bec9064c0bf243014c579e9cd`](3af7905)]:
    -   @apollo/[email protected]

## [email protected]

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants