Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update all otel deps #2291

Merged

Conversation

pichlermarc
Copy link
Member

Updates package-lock.json to pull in updated OTel deps.

Copy link

codecov bot commented Jun 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.42%. Comparing base (dfb2dff) to head (67e5564).
Report is 179 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2291      +/-   ##
==========================================
- Coverage   90.97%   90.42%   -0.56%     
==========================================
  Files         146      148       +2     
  Lines        7492     7322     -170     
  Branches     1502     1518      +16     
==========================================
- Hits         6816     6621     -195     
- Misses        676      701      +25     

see 62 files with indirect coverage changes

@trentm
Copy link
Contributor

trentm commented Jun 25, 2024

Just noticed the failing test. Timing error. And I see you've also tried re-running the tests.

@pichlermarc
Copy link
Member Author

Just noticed the failing test. Timing error. And I see you've also tried re-running the tests.

Yes, fastify instrumentation failed 3 times, hapi once. I'm looking into it.

@trentm
Copy link
Contributor

trentm commented Jun 26, 2024

I couldn't repro the hapi one locally. Guessing: Is it just that the GH Action runners are so slow that the runTestFixture-based tests -- at least the hapi test failure was the "ESM" test that uses runTestFixture -- can occasionally hit the default 2s mocha timeout?

@pichlermarc
Copy link
Member Author

I couldn't repro the hapi one locally. Guessing: Is it just that the GH Action runners are so slow that the runTestFixture-based tests -- at least the hapi test failure was the "ESM" test that uses runTestFixture -- can occasionally hit the default 2s mocha timeout?

Yes that might be it. One odd thing though: looking at the logs it seems like the ESM tests do now consistently take longer to complete. I wonder if some changes are incurring a large performance hit.

From this run on main:

  • fastify
    • ✓ should work with ESM usage (1278ms)
  • Koa Instrumentation
    • ✓ should work with ESM usage (1183ms)
  • ExpressInstrumentation
    • ✓ should work with ESM usage (1215ms)
  • PinoInstrumentation ESM usage
    • ✓ should work with ESM default import (1109ms)
    • ✓ should work with ESM named import (1069ms)
  • Hapi Instrumentation - Hapi.Plugin Tests
    • ✓ should work with ESM modules (1677ms)

Latest run on this PR:

  • fastify
    • ✓ should work with ESM usage (1872ms) -> +594ms
  • Koa Instrumentation
    • ✓ should work with ESM usage (1392ms) -> +209ms
  • ExpressInstrumentation
    • ✓ should work with ESM usage (1843ms) -> +628ms
  • PinoInstrumentation ESM usage
    • hapi failed so it did not run
  • Hapi Instrumentation - Hapi.Plugin Tests
    • ✓ should work with ESM modules (>2000ms) -> +>323ms

@pichlermarc
Copy link
Member Author

pichlermarc commented Jun 27, 2024

Huh, really interesting, fixed this with commit 67e5564. It looks like there were a bunch of duplicate packages (1.25.0 and 1.25.1) which actually slowed down the startup so much that the tests timed out. (cc @trentm)

Now test timing is on-par again with what it was before the update.

@pichlermarc pichlermarc merged commit 65bc979 into open-telemetry:main Jun 27, 2024
16 checks passed
@pichlermarc pichlermarc deleted the chore/update-otel-bugfix branch June 27, 2024 11:27
@trentm
Copy link
Contributor

trentm commented Jun 27, 2024

It looks like there were a bunch of duplicate packages (1.25.0 and 1.25.1) which actually slowed down the startup so much that the tests timed out.

The only ones I see with dupes:

opentelemetry/[email protected]
opentelemetry/[email protected]
opentelemetry/[email protected]
opentelemetry/[email protected]
opentelemetry/[email protected]
opentelemetry/[email protected]

Good job finding those.
Here is a command to how many installs of each npm package in the tree:

% npm ls --package-lock-only -ap 2>/dev/null | awk -F'node_modules/' '{print $NF}' | sort | uniq -c | sort -n | tail
  32 minipass
  33 color-convert
  33 color-name
  33 tslib
  33 yallist
  37 semver
  38 ansi-styles
  39 lru-cache
  39 supports-color
  53 minimatch

Love having 53 different installs of minimatch. :)
Just otel packages:

% npm ls --package-lock-only -ap 2>/dev/null | rg @opentelemetry | awk -F'node_modules/' '{print $NF}' | sort | uniq -c | sort -n | tail
   1 @opentelemetry/sdk-metrics
   1 @opentelemetry/sdk-node
   1 @opentelemetry/sdk-trace-base
   1 @opentelemetry/sdk-trace-node
   1 @opentelemetry/sdk-trace-web
   1 @opentelemetry/sql-common
   1 @opentelemetry/winston-transport
   2 @opentelemetry/propagator-b3
  21 @opentelemetry/core
  22 @opentelemetry/semantic-conventions

So many Qs :)

  1. I'm curious how you tracked this down, but don't take much time on it.
  2. Was this a failure of ./scripts/update-otel-deps.js? If so, I could try to dig in to avoid this in the future. Perhaps it could finish with running that command above (or equivalent) and error or warn if there are dupe installs of any otel packages.
  3. Hundreds of ms to just load these modules. 😢

@pichlermarc
Copy link
Member Author

So many Qs :)

  1. I'm curious how you tracked this down, but don't take much time on it.

First I thought that we introduced some issue in import-in-the-middle that increased loading time but that was not what happened. Just updating IITM I did not see an increase in loading time. I started updating one by one as the script does and tried to find with which package does increase the loading time (i still assumed at that point that some code-change was causing the issue, even though we had very few changes to core in that release). The load time ended up increasing with a package that we did not change. I checked the code in the package to verify that nothing fishy was going on.

Then I checked package-lock.json for versions that were left behind and saw that there's a bunch of them. Knowing that the only difference is that there's a duplication of packages, I made the change from 67e5564 and saw that the time for the test decreased back to the old levels.

  1. Was this a failure of ./scripts/update-otel-deps.js? If so, I could try to dig in to avoid this in the future. Perhaps it could finish with running that command above (or equivalent) and error or warn if there are dupe installs of any otel packages.

Hmm, I think it was really the @opentelemetry/propagator-b3 update that held back the other packages. Maybe just changing that to a caret version would do the trick already.

  1. Hundreds of ms to just load these modules. 😢

Yes, I was actually somewhat shocked by this. I think it's something that's worth investigating further. At some point it'd be great to have some benchmarks that measure the added load time of OTel in various scenarios (plain cjs, plain esm, esm with customizaton hook, ...) so that we can work on reducing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants