Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas version pinned to < 1.0.0 #7905

Closed
JPFrancoia opened this issue Mar 27, 2020 · 7 comments
Closed

Pandas version pinned to < 1.0.0 #7905

JPFrancoia opened this issue Mar 27, 2020 · 7 comments

Comments

@JPFrancoia
Copy link
Contributor

JPFrancoia commented Mar 27, 2020

Hi,

I noticed that pandas is pinned to < 1.0.0 in Airflow's dependencies. It has now started to impact other dependencies in my pipelines, and it will gradually become more and more difficult to solve conflicts.

Do you have an idea when Airflow will become compatible with pandas 1.0.0 ?

Cheers

@boring-cyborg
Copy link

boring-cyborg bot commented Mar 27, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@zhongjiajie
Copy link
Member

Good point, I find out Pandas < 1.0.0 release in October 31, 2019 https://pandas.pydata.org/docs/whatsnew/v0.25.3.html, and I think is new enough, although I know that 1.0.0 is the big version change.

And maybe you could create some draft PR to upgrade the version, due to have experience on your daily usage @JPFrancoia

@JPFrancoia
Copy link
Contributor Author

I created the PR, but jeez the contribution process is convoluted...

I tried testing as much as possible locally. Let's see what Circle CI has to say.

@JPFrancoia
Copy link
Contributor Author

Closing since PR was merged.

@ashb
Copy link
Member

ashb commented May 5, 2020

@JPFrancoia Sorry about the process, Airflow is a big project with lots of moving parts. We're always trying to make it friendlier for new contributors!

For future changes: you can just create a PR, no need to create an issue first.

@ashb ashb removed the invalid label May 5, 2020
@potiuk
Copy link
Member

potiuk commented May 6, 2020

Indeed @JPFrancoia. We have indeed quite a process. I think you hit the hardest part of it. We've been thinking and discussing how to solve the dependency problems and it's still not perfect. One day maybe we will make it super easy. Thanks for the feedback - for me that's a sign we need to do a better job at it. But for now, it is a bit convoluted just to keep us safe from transitive dependency problems. You can read a bit of why it is so complex here:
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#airflow-dependencies

TL;DR; Airlfow is a bit of both - library and application. Current approach tries to accommodate both approaches at the same time (keep dependencies open for library and pinned for application).

I think the problem can be (eventually) only solved in an easier way if we actually split "airflow application" from "airflow library" and we treat dependencies for those differently. I think it's possible and we have a number of ideas how to do it, but it is not a priority for Airflow 2.0. Maybe in Airflow 2.1 we can do something about it.

@JPFrancoia
Copy link
Contributor Author

I understand, thanks for providing more explanations.

Indeed separating the application and library parts of airflow seems to be a good idea.
To give you a bit of context, I was trying to make aws data wrangler (https://github.com/awslabs/aws-data-wrangler) and Airflow to cohabit. Aws data wrangler moves fast and only supports pandas > 1.0. Ultimately it was possible, but it was a rabbit hole of dependencies and I ended up modifying a setup.py by hand.

Since Airflow is so versatile I imagine people will/are trying to plug different libraries on top of it so this dependency issue will probably happen again. But it's nice that you're thinking ahead!

Thanks for the support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants