Skip to content

3.0.0 - Complete Overhaul

Compare
Choose a tag to compare
@gabe-l-hart gabe-l-hart released this 26 Jul 19:28
· 65 commits to main since this release
6975e14

Description

This is a major change! It fundamentally rewrites the core logic for tracking imports and rearranges the arguments to the import tracking functionality. The high-level gist is:

  • Rather than capturing imports during the processing of an import_module, the imports are computed after importing the target module by recursively inspecting the bytecode for all modules stemming from the target.
  • The tracking no longer needs to launch subprocesses to perform recursion because it does not rely on the diff in sys.modules
  • It's way faster!

But why?

Ok, the old way was working pretty well, so why refactor it all? The obvious answer is speed, but the less obvious answer is actually the correct one: the old implementation was not answering the right question. The old implementation answered the question

What modules are brought into sys.modules between starting the import of <target> and concluding the import of <target>?

Instead, what we really want to know is:

If we stripped away all code not required for <target>, what modules would we need to have installed for the import of <target> to work?

The difference here comes down to whether you count siblings of nested dependencies. This is much easier to describe with an example:

deep_siblings/
├── __init__.py
├── blocks
│   ├── __init__.py
│   ├── bar_type
│   │   ├── __init__.py
│   │   └── bar.py # imports alog
│   └── foo_type
│       ├── __init__.py
│       └── foo.py # imports yaml
└── workflows
    ├── __init__.py
    └── foo_type
        ├── __init__.py
        └── foo.py # imports ..blocks.foo_type.foo

In this example, under the old implementation, workflows.foo_type.foo would depend on both alog and yaml because the ..blocks portion of the import requires that all of the dependencies of blocks be brought into sys.modules. This, however, voids the value proposition of finding separable import sets. Under the new implementation, workflows.foo_type.foo only depends on yaml because it imports blocks.foo_type.foo from the deepest point where the only requirement is yaml.

What breaks in the API?

  • The side_effects_modules argument is gone. This was a hack to work around the fact that there were some modules that, when trapped by a DeferredModule would cause the overall import to fail. With the refactor, this is unnecessary as the import proceeds exactly as normal with no interferance.
  • The output with track_import_stacks is different. It no longer attempts to look like stack traces, but it is actually more useful. Now, instead of a partially-useful stack trace, it's a list of lists where each entry is a stack of module imports that causes the given dependency allocation.
  • By default, import_module stops looking for imports at the boundary of the target module's parent library. This means that if a third party module transitively imports another third party module, it won't be allocated to the target unless full_depth=True is given.
  • LazyModule is gone! This tool was a bit of a hack anyway and is no longer necessary.