Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use temp .netrc file for integration tests and support NETRC environment variable #808

Merged
merged 21 commits into from
Oct 6, 2024

Conversation

chuckwondo
Copy link
Collaborator

@chuckwondo chuckwondo commented Sep 16, 2024

This primary intent for the PR was to address #806. Because the changes required for #806 also produced at least partial completion of #743 and #480, it also includes the full set of changes for them as well. All 3 issues are very closely related, so it didn't make much sense to attempt to separate them out.

As a result,

  • contributors may now run integration tests locally without clobbering their existing ~/.netrc file, if they have one
  • because the bullet above required a means for configuring integration tests to use a temporary .netrc file location, it made sense to do so by setting a NETRC environment variable for this purpose, so users may now also set this environment variable, if they need to specify a different location for their .netrc file
  • to better enable local execution of integration tests, some refactoring was required, which led to simply folding in the changes required to fix Update integration tests to use pytest conventions #743 as well

Fixes #806
Fixes #743
Fixes #480

Pull Request (PR) draft checklist - click to expand
  • Please review our
    contributing documentation
    before getting started.
  • Ensure an issue exists representing the problem being solved in this PR.
  • Populate a descriptive title. For example, instead of "Updated README.md", use a
    title such as "Add testing details to the contributor section of the README".
  • Populate the body of the pull request with:
  • Update CHANGELOG.md with details about your change in a section titled
    ## Unreleased. If such a section does not exist, please create one. Follow
    Common Changelog for your additions.
  • Update the documentation and/or the README.md with details of changes to the
    earthaccess interface, if any. Consider new environment variables, function names,
    decorators, etc.

Click the "Ready for review" button at the bottom of the "Conversation" tab in GitHub
once these requirements are fulfilled. Don't worry if you see any test failures in
GitHub at this point!

Pull Request (PR) merge checklist - click to expand

Please do your best to complete these requirements! If you need help with any of these
requirements, you can ping the @nsidc/earthaccess-support team in a comment and we
will help you out!

  • Add unit tests for any new features.
  • Apply formatting and linting autofixes. You can add a GitHub comment in this Pull
    Request containing "pre-commit.ci autofix" to automate this.
  • Ensure all automated PR checks (seen at the bottom of the "conversation" tab) pass.
  • Get at least one approving review.

📚 Documentation preview 📚: https://earthaccess--808.org.readthedocs.build/en/808/

@codecov-commenter
Copy link

codecov-commenter commented Sep 17, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 60.57692% with 41 lines in your changes missing coverage. Please review.

Project coverage is 70.08%. Comparing base (2f49c08) to head (ab066bc).
Report is 39 commits behind head on main.

Files with missing lines Patch % Lines
earthaccess/auth.py 39.13% 14 Missing ⚠️
earthaccess/kerchunk.py 13.33% 13 Missing ⚠️
earthaccess/store.py 0.00% 9 Missing ⚠️
tests/integration/test_kerchunk.py 0.00% 4 Missing ⚠️
tests/integration/test_cloud_open.py 80.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #808      +/-   ##
==========================================
- Coverage   73.88%   70.08%   -3.80%     
==========================================
  Files          31       31              
  Lines        2003     2123     +120     
==========================================
+ Hits         1480     1488       +8     
- Misses        523      635     +112     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 37 to 50
#
# ORNLDAAC no longer has any on-prem collections. This returns 0 collections:
# https://cmr.earthdata.nasa.gov/search/collections?data_center=ORNL_DAAC&cloud_hosted=false
# The following is commented out because the test in this file will now always fail
# because there are no longer any on-prem collections.
#
# {
# "short_name": "ORNLDAAC",
# "collections_count": 100,
# "collections_sample_size": 3,
# "granules_count": 100,
# "granules_sample_size": 2,
# "granules_max_size_mb": 50,
# },
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to remove this completely?

Comment on lines 37 to 50
#
# ORNLDAAC no longer has any on-prem collections. This returns 0 collections:
# https://cmr.earthdata.nasa.gov/search/collections?data_center=ORNL_DAAC&cloud_hosted=false
# The following is commented out because the test in this file will now always fail
# because there are no longer any on-prem collections.
#
# {
# "short_name": "ORNLDAAC",
# "collections_count": 100,
# "collections_sample_size": 2,
# "granules_count": 100,
# "granules_sample_size": 2,
# "granules_max_size_mb": 50,
# },
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to remove this completely?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, yeah. This is ringing a bell, and if I wrote this, I dislike past-me's decision to leave anything here 😆 We can put the details in a commit message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What details would you like in the commit message? Just something that indicates that ORNLDAAC no longer has any on-prem collections?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant to say that (if this comment were my doing, which I think it was but haven't git blamed yet) I should have included the details in this comment in the commit message instead, and deleted the comment :)

Copy link
Collaborator Author

@chuckwondo chuckwondo Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I commented out that code, specifically so anybody reviewing this PR would see it and could comment on it. It worked :-)

I'll update the PR by now removing the commented out code and putting the details into the commit message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad. I didn't even look at the full diff yet and did a drive-by on the "conversation" tab 😆 I ran in to the same problem in a past hackathon, which is why I think it looked so familiar :) Sorry for adding confusion! 😆

@chuckwondo chuckwondo marked this pull request as ready for review September 23, 2024 10:24
@chuckwondo
Copy link
Collaborator Author

Apologies for the relatively large PR. I could perhaps split out the changes for #743, if anybody feels this is too large.

I would at least like to keep the changes addressing both #480 and #806 together since the changes for #806 effectively necessitate the changes for #480.

This also now fixes #815. It made sense for me to add the changes for that issue so that I could actually run integration tests locally since this issue is precisely for handling integration tests.

Regardless, integration tests now run successfully on PRs from forks. For PRs from forks owned by those of use who are maintainers of this repo, the integration tests run successfully (barring actual failing tests, of course) without intervention, as we have write permission.

For PRs from forks owned by anybody without write access to this repo, the integration tests job will now fail due to a permission check, not due to failure to read the necessary secrets, but now (once this PR is merged) it will be possible for a maintainer to check the PR, and if all looks safe, re-run the failed job. When this happens, the job re-runs with the maintainer's permissions, and thus the integration tests will run (and ideally succeed as well).

@chuckwondo chuckwondo marked this pull request as draft September 23, 2024 20:08
@chuckwondo chuckwondo marked this pull request as ready for review September 24, 2024 22:30
@chuckwondo
Copy link
Collaborator Author

@mfisher87, @betolink, @jhkennedy, I've merged the recent changes from the main branch and resolved some conflicts and a broken integration test (previously hidden due to not being run previously). I've taken it out of draft, so please review when you have some time.

@mfisher87
Copy link
Collaborator

I'm getting really busy in my last week at NSIDC, may not be able to make time until after 😬 I'll be at hack day though, maybe we can fit this in then!

@chuckwondo
Copy link
Collaborator Author

I'm getting really busy in my last week at NSIDC, may not be able to make time until after 😬 I'll be at hack day though, maybe we can fit this in then!

I'll be at SatCamp during the Oct. 1 hack day, so I'll miss the hack day.

Although I touched quite a few files, there's really not much to this PR, so I'm happy for us to continue through it async. There are no gnarly bits of code.

@mfisher87
Copy link
Collaborator

Shoot, hope to catch you at the next one :)

Although I touched quite a few files, there's really not much to this PR, so I'm happy for us to continue through it async. There are no gnarly bits of code.

👍

@betolink
Copy link
Member

Looking through this PR, so far so good. @chuckwondo

@chuckwondo
Copy link
Collaborator Author

Looking through this PR, so far so good. @chuckwondo

Thanks @betolink. Did you get a chance to finish reviewing?

Copy link
Member

@betolink betolink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR!!! a lot of code simplification and quality improvements, I just left a few minor comments and some questions. My email is not a typo, I wanted the one with a K but someone had it already 😄. I really like what you did on the tests mocking the missing netrc etc. 🚀

@@ -78,8 +83,6 @@ jobs:
env:
EARTHDATA_USERNAME: ${{ secrets.EDL_USERNAME }}
EARTHDATA_PASSWORD: ${{ secrets.EDL_PASSWORD }}
EARTHACCESS_TEST_USERNAME: ${{ secrets.EDL_USERNAME }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember why this was here, a meta test about environment variables being present I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I searched through the code and found where this appeared, and the logic did not appear to be doing anything in this regard, so I eliminated its code use as well. I suspect this was perhaps originally a way to configure 2 sets of creds locally, where 1 set was someone's "real" creds (the vars without TEST in the names), and a second set for "test" creds, so that when running integration tests locally, you could avoid using your "real" creds. However, I don't see any real value in that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work on the docs!

@@ -46,7 +47,9 @@
"download",
"auth_environ",
# search.py
"DataGranule",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for testing purposes or we want users use these classes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we return a list of them to the user from the public function search_data, the user should have access to the type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the user could import from the search module, but for consistency with the plural names being at the top level, along with the functions themselves, it makes sense to include this as well.

otherwise, the path of the platform-specific default location:
`~/_netrc` on Windows systems, `~/.netrc` on non-Windows systems.
"""
sys_netrc_name = "_netrc" if platform.system() == "Windows" else ".netrc"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This distinction is a good catch! I've never used the library on Windows, I wonder if someone could share an example with us.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not urs_cookies_path.exists():
urs_cookies_path.write_text("")

# Create and write to .dodsrc file
dodsrc_path = Path.home() / ".dodsrc"

if not dodsrc_path.exists():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for OPeNDAP? amazing!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so, but this was existing code. I just added a bit of whitespace for readability.

) -> List[EarthAccessFile]:
def multi_thread_open(data: tuple) -> EarthAccessFile:
urls, granule = data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think urls is correct on this one, the relation is a granule can have many files/urls.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tuple is of type tuple[str, Optional[DataGranule]], so the str is a singular url. It comes from the mapping returned by _get_url_granule_mapping (as well as a couple of other places using Mapping[str, Optional[DataGranule]]).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refined the type of the data argument to reflect this.

pyproject.toml Outdated
description = "Client library for NASA Earthdata APIs"
authors = [
{name = "earthaccess contributors"}
]
maintainers = [
{name = "Luis Lopez", email = "[email protected]"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a typo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! I'll restore it.

@@ -29,3 +33,31 @@ def pytest_sessionfinish(session, exitstatus):
failure_rate = (100.0 * session.testsfailed) / session.testscollected
if failure_rate <= ACCEPTABLE_FAILURE_RATE:
session.exitstatus = 99


@pytest.fixture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yeah this is better.

if "podaac-tools.jpl.nasa.gov/drive" in url:
return False
return True
return all("podaac-tools.jpl.nasa.gov/drive" not in url for url in data_links)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Down the road I think we should create a regex dictionary with servers we know work without custom auth loops etc, related to this when we use S3://granule-url users have to tell this is PODAC, maybe the same approach will work to get the credentials from whatever DAAC or mission endpoint controls access for the URL.

Copy link

github-actions bot commented Oct 5, 2024

Binder 👈 Launch a binder notebook on this branch for commit 1fe472c

I will automatically update this comment whenever this PR is modified

Binder 👈 Launch a binder notebook on this branch for commit 17542d7

Binder 👈 Launch a binder notebook on this branch for commit ab066bc

Copy link
Member

@betolink betolink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the PR again, great refactoring and code improvements!

```

Nox handles everything for you, including setting up a temporary virtual
environment for each run.

**NOTE:** In order to run integration tests locally, you must set the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@chuckwondo chuckwondo merged commit c8d6838 into nsidc:main Oct 6, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants