Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Beam pipeline option parsed incorrectly if value is False #38457

Closed
1 of 2 tasks
gblazq opened this issue Mar 25, 2024 · 3 comments · Fixed by #38496
Closed
1 of 2 tasks

Apache Beam pipeline option parsed incorrectly if value is False #38457

gblazq opened this issue Mar 25, 2024 · 3 comments · Fixed by #38496

Comments

@gblazq
Copy link

gblazq commented Mar 25, 2024

Apache Airflow Provider(s)

apache-beam

Versions of Apache Airflow Providers

Reproduced in 5.3.0 but probably affects up to 5.6.2

Apache Airflow version

2.6.3

Operating System

macOS 14.1.1

Deployment

Virtualenv installation

Deployment details

No response

What happened

The Beam hook is parsing options passed to the pipeline incorrectly if the value is False.

According to the docs, if the value is False, the option will be skipped

* If the value is False, this option will be skipped

This is consistent with the Apache Beam pipeline options behaviour and makes it possible to define flag pipeline options. Setting a foo option as True in the provider operators pipeline_options and default_pipeline_options parameters would pass --foo to the Beam pipeline, while setting it to False should skip it, effectively acting as a flag argument.

However, the code in

def beam_options_to_args(options: dict) -> list[str]:
"""
Return a formatted pipeline options from a dictionary of arguments.
The logic of this method should be compatible with Apache Beam:
https://github.com/apache/beam/blob/b56740f0e8cd80c2873412847d0b336837429fb9/sdks/python/
apache_beam/options/pipeline_options.py#L230-L251
:param options: Dictionary with options
:return: List of arguments
"""
if not options:
return []
args: list[str] = []
for attr, value in options.items():
if value is None or (isinstance(value, bool) and value):
args.append(f"--{attr}")
elif isinstance(value, list):
args.extend([f"--{attr}={v}" for v in value])
else:
args.append(f"--{attr}={value}")
return args
is not skipping the option if the value is False. Instead, it's applying the default behaviour of replacing the value with its textual representation.

What you think should happen instead

The option should be skipped if the value is False.

How to reproduce

import datetime

from airflow import models
from airflow.providers.apache.beam.operators.beam import BeamRunPythonPipelineOperator

with models.DAG(dag_id="beam-option", start_date=datetime.datetime.min) as dag:
    beam_operator = BeamRunPythonPipelineOperator(
        task_id="beam-operator",
        py_file="my_file.py",
        runner="DirectRunner",
        pipeline_options={"flag_option_true": True, "flag_option_false": False},
    )

Running this DAG will log:

[2024-03-25T12:52:19.067+0100] {beam.py:131} INFO - Running command: python3 my_file.py --runner=DirectRunner --flag_option_true --flag_option_false=False --labels=airflow-version=v2-6-3

instead of the expected without the flag_option_false option.

If flag_option_false is defined as a flag option in the Beam pipeline, the pipeline will fail:

error: argument --flag_option_false: ignored explicit argument 'False'

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@gblazq gblazq added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Mar 25, 2024
Copy link

boring-cyborg bot commented Mar 25, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@potiuk potiuk added good first issue and removed needs-triage label for new issues that we didn't triage yet labels Mar 25, 2024
@dondaum
Copy link
Contributor

dondaum commented Mar 25, 2024

I would be happy to support here and take this one over.

@potiuk
Copy link
Member

potiuk commented Mar 25, 2024

Assigned you @dondaum

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants