[Testing] tfx sample pipeline broken in release not caught by postsubmit #5178

Bobgy · 2021-02-24T12:10:41Z

In #5137, tfx sample pipeline was broken, but it was not caught by postsubmit test.

IIRC our integration test should cover this sample, do we know why this is not captured?

I think this is a high priority issue, because it causes a lot of efforts after 1.4.0 was released.

Bobgy · 2021-02-24T12:15:17Z

Notice that, postsubmit test before the release commit passed, however the released version had a problematic tfx sample pipeline (ignore the failure, it was expected for a release commit).

Integration test of the passing commit: https://oss-prow.knative.dev/view/gs/oss-prow/logs/kubeflow-pipeline-postsubmit-integration-test/1360157617023881216

What's even weirder, after the fix #5165, postsubmit tests start to fail on parameterized_tfx_sample pipeline.

Bobgy · 2021-02-24T12:18:41Z

One problem I found was that, the integration test in postsubmit is incorrectly using presubmit test script: https://github.com/GoogleCloudPlatform/oss-test-infra/blob/f29fc29cd617497ea44164ff6a1734c7dee3c0f4/prow/prowjobs/kubeflow/pipelines/kubeflow-pipelines-postsubmits.yaml#L50

EDIT: this is not root cause of this problem

Bobgy · 2021-02-24T12:41:22Z

I think I found the root cause, it's caused by incomplete upgrade of tfx dependencies.

Let me explain the order of events happening:

pip started to fail for python 3.5 images, so we fixed image build by upgrading to tfx==0.26.0 in Upgrade tfx version to 0.26.0 in backend #5052.

At this point, builtin sample is already failing, because tfx sdk version is bumped to 0.26.0, but tfx image is still at 0.22.0.
However, postsubmit tests do not fail, because sample-test/requirements.in was still at tfx==0.22.0.
we released KFP 1.4.0
we got user report that the builtin tfx sample fails -- as expected
we fixed builtin sample by updating tfx image: fix(samples): Update tfx image. Fixes #5137 #5165

However, postsubmit tests start to fail for tfx sample, because tfx image is now at 0.27.0 while sample-test/requirements.in was still at tfx==0.22.0.

Conclusion

We should clearly document how to upgrade TFX dependency in KFP, so that people do not mistakenly only upgrade a subset of them.
It'll be more ideal if tfx version for the multiple places have a single source of truth, or they can be updated programmatically using a script. A quick action we can do is probably letting sample-test/requirements.in derive from backend compiler requirements, so that they are always in-sync -- sample-test env is the same as prebuilt sample compilation env.

Bobgy · 2021-02-25T04:45:51Z

High priority problems fixed and I made samples-test to imports the same requirements.in from backend requirements.in.

What's missing is that, people may update requirements.in, but forget to update all requirements.txt.
EDIT: this still seems like a high priority and easy mistake.

/assign @chensun
Do you think you can take this?

Bobgy · 2021-02-25T04:49:21Z

and #5187 is pending review

chensun · 2021-02-25T05:00:18Z

High priority problems fixed and I made samples-test to imports the same requirements.in from backend requirements.in.

What's missing is that, people may update requirements.in, but forget to update all requirements.txt.
EDIT: this still seems like a high priority and easy mistake.

/assign @chensun
Do you think you can take this?

Aren't we deprecating requirements.in given sdk/python/requirements.txt covered by #5056?

Bobgy · 2021-02-25T05:21:33Z

@chensun for clarification, #5056 explicitly disabled renovate for python packages.

You can make the decision to enable it.

chensun · 2021-02-25T05:49:40Z

@chensun for clarification, #5056 explicitly disabled renovate for python packages.

You can make the decision to enable it.

I see, thanks for the clarification. Checked again, and it did disabled python. (I was under the wrong impression that sdk/python/requirements.txt was covered as I saw an ignore list with some components path yet sdk is not in that list).

Bobgy · 2021-02-25T08:19:29Z

I want to echo again what I said here. I think pip install -r sdk/python/requirements.txt doesn't represent the most common user journey -- think about our notebook samples, it only has pip install kfp or pip install kfp==<pinned version>, but I've rarely seen pip install -r sdk/python/requirements.txt.

I would suggest we move away from installing requirements.txt in tests. So the tests creates an environment closer to a fresh installation of pip install kfp. If there's a newly emerged dependency issue, we would probably be able to see it in tests.

P.S.: Taking tfx for example, their requirements.txt contains nothing but -e . (read from setup.py)

-- @chensun

Moving some of the discussion from the PR thread back to the main issue.

I agree with Chen, testing as what users would get is an important test. However, we used to do that before introducing requirements.{in/txt}, the result was that from time to time presubmit broke without any clues and we needed to dig through the dependencies to find out why.

I just want to make sure we are not going in cycles, we should admit approaches have their pros and cons.

I think the discussion laid out in #4682 is not significantly different from what we have here. Maybe the best approach is also to have a requirements.txt, but set up a bot to update it periodically as PRs. In this way, if that update PR fails, we know users might hit that problem too, but it won't be blocking presubmit tests (and other people not working on this problem).

chensun · 2021-03-08T09:41:31Z

However, we used to do that before introducing requirements.{in/txt}, the result was that from time to time presubmit broke without any clues and we needed to dig through the dependencies to find out why.

If I'm not mistaken, this is usually due to new versions of dependencies not compatible with other existing dependencies. And that's a sign that we need to fix setup.py by adding upper limit restrictions to the dependencies. Using requirements.txt in test is not solving the underlying issue but hiding it.

The fact that many of the dependencies listed in kfp setup.py don't have the upper version limit is problematic IMHO.

pipelines/sdk/python/setup.py

Lines 24 to 47 in 9bc63f5

    
           REQUIRES = [ 
        
               'absl-py>=0.9,<=0.11', 
        
               'PyYAML>=5.3', 
        
               'google-cloud-storage>=1.13.0', 
        
               'kubernetes>=8.0.0, <12.0.0', 
        
               'google-auth>=1.6.1', 
        
               'requests_toolbelt>=0.8.0', 
        
               'cloudpickle', 
        
               # Update the upper version whenever a new major version of the 
        
               # kfp-server-api package is released. 
        
               # Update the lower version when kfp sdk depends on new apis/fields in 
        
               # kfp-server-api. 
        
               # Note, please also update ./requirements.in 
        
               'kfp-server-api>=1.1.2, <2.0.0', 
        
               'jsonschema >= 3.0.1', 
        
               'tabulate', 
        
               'click', 
        
               'Deprecated', 
        
               'strip-hints', 
        
               'docstring-parser>=0.7.3', 
        
               'kfp-pipeline-spec>=0.1.5, <0.2.0', 
        
               'fire>=0.3.1', 
        
               'protobuf' 
        
           ]

So I think one action item is to add upper limits regardless whether we use requirements.txt in tests. WDYT?

EDIT: created #5258

Bobgy · 2021-04-16T06:52:59Z

TODO: update TFX upgrade documentation

stale · 2021-07-20T23:51:15Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2022-03-03T06:04:59Z

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Bobgy mentioned this issue Feb 24, 2021

test(kubeflow): fix kubeflow-pipelines-postsubmits.yaml GoogleCloudPlatform/oss-test-infra#717

Merged

This was referenced Feb 24, 2021

test: fix tfx version consistently. Part of #5137 #5179

Merged

Problems upgrading to TFX 0.27.0 #5137

Closed

chensun self-assigned this Feb 25, 2021

stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jul 20, 2021

stale bot closed this as completed Mar 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Testing] tfx sample pipeline broken in release not caught by postsubmit #5178

[Testing] tfx sample pipeline broken in release not caught by postsubmit #5178

Bobgy commented Feb 24, 2021 •

edited

Loading

Bobgy commented Feb 24, 2021 •

edited

Loading

Bobgy commented Feb 24, 2021 •

edited

Loading

Bobgy commented Feb 24, 2021 •

edited

Loading

Bobgy commented Feb 25, 2021 •

edited

Loading

Bobgy commented Feb 25, 2021

chensun commented Feb 25, 2021

Bobgy commented Feb 25, 2021

chensun commented Feb 25, 2021

Bobgy commented Feb 25, 2021

chensun commented Mar 8, 2021 •

edited

Loading

Bobgy commented Apr 16, 2021

stale bot commented Jul 20, 2021

stale bot commented Mar 3, 2022

[Testing] tfx sample pipeline broken in release not caught by postsubmit #5178

[Testing] tfx sample pipeline broken in release not caught by postsubmit #5178

Comments

Bobgy commented Feb 24, 2021 • edited Loading

Bobgy commented Feb 24, 2021 • edited Loading

Bobgy commented Feb 24, 2021 • edited Loading

Bobgy commented Feb 24, 2021 • edited Loading

Conclusion

Bobgy commented Feb 25, 2021 • edited Loading

Bobgy commented Feb 25, 2021

chensun commented Feb 25, 2021

Bobgy commented Feb 25, 2021

chensun commented Feb 25, 2021

Bobgy commented Feb 25, 2021

chensun commented Mar 8, 2021 • edited Loading

Bobgy commented Apr 16, 2021

stale bot commented Jul 20, 2021

stale bot commented Mar 3, 2022

Bobgy commented Feb 24, 2021 •

edited

Loading

Bobgy commented Feb 24, 2021 •

edited

Loading

Bobgy commented Feb 24, 2021 •

edited

Loading

Bobgy commented Feb 24, 2021 •

edited

Loading

Bobgy commented Feb 25, 2021 •

edited

Loading

chensun commented Mar 8, 2021 •

edited

Loading