-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-6706] Lazy load operator extra links #7327
[AIRFLOW-6706] Lazy load operator extra links #7327
Conversation
One comment - i think we should not load plugins_manager at all at init.py. Why do we do it? Could we remove it from init.py altogether and import it when it is really needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "integrate_plugins()" method should only be called in case
a) webserver is run
b) scheduler is run
c) maybe when tests are run (?)
We can explicitly call that method in those places rather than in arirfow.init.py
This way we could avoid this lazy-loading for specific imports and have it also solved for all future cases
Codecov Report
@@ Coverage Diff @@
## master #7327 +/- ##
=========================================
- Coverage 85.87% 85.7% -0.17%
=========================================
Files 862 863 +1
Lines 40484 40510 +26
=========================================
- Hits 34767 34721 -46
- Misses 5717 5789 +72
Continue to review full report at Codecov.
|
I think this is a completely different problem. I also think it is worth solving it, but it will require more time and testing. If we call the integrate plugins method elsewhere, it will still load GCP and Qubole classes, when it is not needed. In this PR I want to focus only on one small problem and not change the way whole plugins mechanism work. My change is not even related to the plugin loading mechanism at all. This only applies to operators that are in the core. It is a mistake that this code is in the same module. BTW. I think that it will also be much easier to do when we drop plugin support for operators, sensors and hook. |
We want to load lazily because some users do not want to use certain operators, so their code should be loaded when it is needed. Now all classes are loaded for each user. When we will move this method call, it will still load all classes for all users, not according to the user's needs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than name of the variable, looks good
Good work @mik-laj |
Co-authored-by: Kamil Breguła <[email protected]> Backported from #7327 cherry-picked from b180e4b
Co-authored-by: Kamil Breguła <[email protected]> Backported from #7327 cherry-picked from b180e4b
Co-authored-by: Kamil Breguła <[email protected]> Backported from apache#7327 cherry-picked from b180e4b
Co-authored-by: Kamil Breguła <[email protected]> Backported from apache/airflow#7327 cherry-picked from b180e4b GitOrigin-RevId: ea32d0d83ce915798ba9779dbf7c1df9faf7c241
Co-authored-by: Kamil Breguła <[email protected]> Backported from apache/airflow#7327 cherry-picked from b180e4b GitOrigin-RevId: ea32d0d83ce915798ba9779dbf7c1df9faf7c241
Co-authored-by: Kamil Breguła <[email protected]> Backported from apache/airflow#7327 cherry-picked from b180e4b GitOrigin-RevId: ea32d0d83ce915798ba9779dbf7c1df9faf7c241
When we import the airflow package, many modules are loaded, so I looked at what modules are exactly loaded. I found a lot of classes that should not be loaded and delay the start of the application very much. I suggest that some classes be loaded lazily when needed.
Performance benchmark:
Before:
After
Result:
and 580 fewer modules - 61%
If anyone is interested, I attach an exact log that shows the import process.
https://gist.github.com/mik-laj/002f5a714c221ba04bc638970094519c
CC: @evgenyshulman
Issue link: AIRFLOW-6706
Make sure to mark the boxes below before creating PR: [x]
[AIRFLOW-NNNN]
. AIRFLOW-NNNN = JIRA ID** For document-only changes commit message can start with
[AIRFLOW-XXXX]
.In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.