Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-44 Migrate to new serializer #31213

Open
mhenc opened this issue May 11, 2023 · 4 comments
Open

AIP-44 Migrate to new serializer #31213

mhenc opened this issue May 11, 2023 · 4 comments
Assignees
Labels
AIP-44 Airflow Internal API

Comments

@mhenc
Copy link
Contributor

mhenc commented May 11, 2023

Use new serializer in place BaseSerizalization for @internal_api_call and rpc server

@mhenc mhenc self-assigned this May 11, 2023
@mhenc mhenc added the AIP-44 Airflow Internal API label May 11, 2023
@potiuk
Copy link
Member

potiuk commented Jul 27, 2024

Might still be a good idea actually. It should not be that hard.

@potiuk
Copy link
Member

potiuk commented Jul 27, 2024

This is one of the remainin points for AIP-44, and I think we might consider using the "serde" serializer instead the "DAG" serializer still. The code to wire it up should be just moved to a "Pydantic" module, I guess and since we are going to have automated tests running in Database Isolation mode #41067 - it should be still safe to change it while we are preparing for 2.10.0rc1.

If anyone of the assigned people would like to take care about it - I think it's also a good opportunity to learn about the serializer.

@gyli - maybe as preparation and getting to understand more about the scope of #41067 that might be good task for you?

@gyli
Copy link
Contributor

gyli commented Jul 28, 2024

Yeah I can try that, and hopefully I can co-own this issue with someone who's familiar with internal_api_call and rpc server. Still trying to figure out how I can test it other than unit tests right now.
Also, could you elaborate what needs to be "moved to a Pydantic module"?

@potiuk
Copy link
Member

potiuk commented Jul 28, 2024

Sure. It's all rather well defined. Let me describe the options of testing. Might be also useful for others:

  1. Testing it locallu with airflow:
breeze start-airflow --database-isolation --standalone-dag-processor --executor  CeleryExecutor --load-example-dags --load-default-connections

This one will start Breeze container (the first time image might need to be build if you've never done it). Then once you enter the container it will run airflow in multiple tereminals (it splits terminal using tmux) - so you will see all the airflow components running. You will be able to connect to airflow locally on http://localhost:28080 and DAGs should nicely be triggerable and generally should work.

Breeze is mounting the code from your project to the container, so once you modify the code, it's enough to Ctrl-C the component, press up arrow to see last command and run it again.

This is the single most useful tool to test things end-to-end

  1. Unit tests: our interna_api tests in tests/api_internal - and there you can test both the endoint and the call - this one can. run as regular unit tests you just need airflow venv for that pip install -e ".[devel,pydantic]" I think should be enoug to run the tests (maybe some other devel extras needed) - see https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html#development-extras

  2. As of recently we also have an easy way to run unit tests in isolation mode. Not all tests are supposed to succed there and we are working on fixing/exclusing the tests that shoud be exclued in Remove Database Isolation feature flag and run DB isolation tests #41067 - but you can run some tests that are using DB isolation already using database isolation. It works in the way that you need to enter breeze shell --database-isolation, split the terminals with tmux, run internal-api component in one of them and then you can run pytest tests in the other terminal. Again - sources are mounted to inside the container so you can easily iterate on tests (ctrl-c restart internal-api to pick up changes).

One tests that fully works for now is dag_processing/test_job_runner.py for example and a good start.

  1. What needs to be moved:

if use_pydantic_models and not _ENABLE_AIP_44:
-> currently everything is controlled by this one and a lot of Pydantic classes there + some other classes that were added to serialized_objcects to make internal_api works and it all should be done via serde module. For now running the end-to-end start-airflow is the easiest way to test it - once we complete the "db-isolated tests` #41067 we will be able to verify it in regular CI PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-44 Airflow Internal API
Projects
None yet
Development

No branches or pull requests

7 participants