Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package data misses information about the user who released versions of packages on npm #838

Open
lirantal opened this issue Jul 31, 2024 · 13 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@lirantal
Copy link

Hi folks! 👋

I would expect that the Packages API for the versions data point will also include information about the user who released the version, on npm.

For example, the npm registry API at https://registry.npmjs.org/safe-regex2 would return that information under the versions key (look for the _npmUser key below):

{
  "_id": "safe-regex2",
  "_rev": "5-87f8c2e9312b92a9b82d5fe7f1fa9348",
  "name": "safe-regex2",
  "dist-tags": {
    "latest": "4.0.0"
  },
  "versions": {
    "2.0.0": {
      "name": "safe-regex2",
      "version": "2.0.0",
      "keywords": [
        "catastrophic",
        "exponential",
        "regex",
        "safe",
        "sandbox"
      ],
      "author": {
        "url": "http://substack.net",
        "name": "James Halliday",
        "email": "[email protected]"
      },
      "license": "MIT",
      "_id": "[email protected]",
      "maintainers": [
        {
          "name": "matteo.collina",
          "email": "[email protected]"
        }
      ],
      "contributors": [
        {
          "name": "Matteo Collina",
          "email": "[email protected]"
        }
      ],
      "homepage": "https://github.com/fastify/safe-regex",
      "bugs": {
        "url": "https://github.com/fastify/safe-regex/issues"
      },
      "dist": {
        "shasum": "b287524c397c7a2994470367e0185e1916b1f5b9",
        "tarball": "https://registry.npmjs.org/safe-regex2/-/safe-regex2-2.0.0.tgz",
        "fileCount": 7,
        "integrity": "sha512-PaUSFsUaNNuKwkBijoAPHAK6/eM6VirvyPWlZ7BAQy4D+hCvh4B6lIG+nPdhbFfIbP+gTGBcrdsOaUs0F+ZBOQ==",
      },
      "main": "index.js",
      "gitHead": "6af6b35b1609474d928a5e9a8af4f95ab6771628",
      "scripts": {
        "test": "standard && tape test/*.js"
      },
      "_npmUser": {
        "name": "matteo.collina",
        "email": "[email protected]"
      },
      "repository": {
        "url": "git://github.com/fastify/safe-regex.git",
        "type": "git"
      },
      "_npmVersion": "6.7.0",
      "description": "detect possibly catastrophic, exponential-time regular expressions",
      "directories": {

      },
      "_nodeVersion": "10.15.1",
      "dependencies": {
        "ret": "~0.2.0"
      },

However, in the existing Packages API today for versions (or any other endpoints), I couldn't find this information exist. I thought it should appear in this endpoint: https://packages.ecosyste.ms/api/v1/registries/npmjs.org/packages/safe-regex2

@andrew andrew added enhancement New feature or request help wanted Extra attention is needed labels Jul 31, 2024
@andrew
Copy link
Member

andrew commented Jul 31, 2024

Agreed, this should be collected and stored on each version record in the metadata field.

@lirantal
Copy link
Author

lirantal commented Nov 7, 2024

@andrew, no intention for this to be a rude ask and I empathically respect your time; I would like to ask if this and #839 would be part of any roadmap action items to land into the ecosystem.ms API and database?

edit: I don't know Ruby too well but if implementing these is relatively straight-forward and you'd be interested to give me some pointers, I am happy to attempt this myself.

@andrew
Copy link
Member

andrew commented Nov 7, 2024

@lirantal not a problem, happy to recieve nudges, there's way more work to do than time I have, so prioritizing things in nessesary.

I've implemented code for both this and #839 in 7e48822 and new versions will start picking up that data automatically.

For existing versions I'm going to need to resync all versions of all packages on npm, which will take some time (50,726,808 versions for npm in the db at the moment).

@lirantal
Copy link
Author

lirantal commented Nov 8, 2024

That's awesome, thank you!
How long does it usually take for background processing to pick up the lag?

@andrew
Copy link
Member

andrew commented Nov 8, 2024

Based on the current rate of processing I'd guess it will take a few weeks to totally complete a resync of all 50 million records, I'm going to prioritize resyncing the most popular npm packages first, which shouldn't take too long, maybe a couple days.

@lirantal
Copy link
Author

lirantal commented Nov 8, 2024

Sound like a good plan! Thank you Andrew.

I am planning to rebuild my npq tool in and around the ecosyste.ms API for a holistic data set foundation instead of the many different registries and their endpoints. I'll keep you posted in how this progresses.

@voxpelli
Copy link

voxpelli commented Nov 8, 2024

Exciting that you will also start using this more @lirantal 👏

Also: A reminder of the existence of this one: https://opencollective.com/ecosystems

@andrew
Copy link
Member

andrew commented Nov 10, 2024

So far I've processed around 6% of all versions, primarily from the top 10% of packages

@andrew
Copy link
Member

andrew commented Nov 11, 2024

Up to 17% now

@lirantal
Copy link
Author

Andrew, I noticed at least a few days lag for some random package I picked: tldts 6.1.59 published on npm 2 days ago vs [tlds 6.1.58 showing up on ecosystems]https://packages.ecosyste.ms/registries/npmjs.org/packages/tldts/versions from 8 days ago

Similarly, this package is also behind (0.68.0 on ecosystems) from Aug vs 0.70.0 release on github from 2 weeks ago.

Is this expected? wondering what would be the expectation I should set to users given that they'd use it for ad-hoc package install like npq install <packagename> and they might be installing very new packages that were just published to the registry.

@andrew
Copy link
Member

andrew commented Nov 11, 2024

I had some issues with the background queue skipping jobs it shouldn't have a couple weeks ago, I'm still working to catch up on all of the lagging projects. The goal is that it should discover new versions within an hour of them being published where the package managers have a feed of recent releases, and within a few days to a week for package managers that don't have a good way of finding new versions (i.e. need to manually sync everthing on a regular basis to check)

There is also a /ping endpoint you can hit to request an update be checked for any package, for example: https://packages.ecosyste.ms/api/v1/registries/npmjs.org/packages/tldts/ping

@lirantal
Copy link
Author

Thanks, that's useful to know we can ping it if necessary :-)

@andrew
Copy link
Member

andrew commented Nov 12, 2024

Up to 25% now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants