Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Externalize pypi map into a ro sqlite database #116

Merged
merged 12 commits into from
Jul 24, 2023
Merged

Conversation

airportyh
Copy link
Contributor

@airportyh airportyh commented Jul 24, 2023

Why?

Having to load in the large maps within pypi_map.gen.go file into memory at start is causing upm 0.2 - 0.3 seconds.
Context: https://github.com/replit/goval/pull/9988#issuecomment-1646262367
We can eliminate this slowdown by externalizing this data into a lightweight database.

Changes

  1. Replaced gen_code.go with gen_db.go which generates a read-only sqlite database containing the equivalent data
  2. created pypi_map.go - convenience wrapper for accessing the db's data
  3. updated all accesses to any map in pypi_map.gen.go to accessing the same data using pypi_map.go instead
  4. updated upm.nix to generate the db and setup an env var to point to its location for pypi_map.go to access it

Operations to be re-tested:

  • python search
  • python package guess

Performance enhancement should affect all languages and all operations.

Testing

  1. nix build
  2. mkdir play; cd play
  3. create main.py containing:
import yaml
import requests
import flask
  1. time ../result/bin/upm lock
  2. time ../result/bin/upm guess -f

If you compare to the version in head, should see a sizable decrease in run time. In my local testing I saw ~200x for lock (when no op) and ~60x for guess.

Also, re-test upm guess -f -l python and upm search -l python <query> to make sure those still work.

@airportyh airportyh marked this pull request as ready for review July 24, 2023 15:56
@airportyh airportyh requested a review from cdmistman July 24, 2023 15:57
Copy link
Collaborator

@ryantm ryantm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ tree result
result
├── bin
│   ├── gen_pypi_map
│   └── upm
└── pypi_map.sqlite

Do we need the bin/gen_pypi_map file? It is pretty big.

@airportyh
Copy link
Contributor Author

airportyh commented Jul 24, 2023

Do we need the bin/gen_pypi_map file? It is pretty big.

That's only used for dev. I can look into getting rid of it. Probably a config to the buildGoModule call.

internal/backends/python/gen_pypi_map/db_gen.go Outdated Show resolved Hide resolved
internal/backends/python/pypi_map.go Outdated Show resolved Hide resolved
@airportyh airportyh merged commit 8d6353b into main Jul 24, 2023
@airportyh airportyh deleted the th-externalize-pypi-map branch July 24, 2023 19:32
@blast-hardcheese blast-hardcheese added the enhancement New feature or request label Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants