-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a GDAC ftp fetcher #157
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a possible solution, but a fail on performance Need to implement something to avoid loading mono profile files when fetching for a given float, need to work with multi-profile files
Have a nice trip !
Add logging Raise an error when no data are found, but still server is not raising an error on its own
- BGC index file option - Change to multi-profile files for request leading to more than 50 mono-profile files, hence adding post-processing filtering (''filter_points'') ton re-inforce search request criteria
New InvalidDataset
Improved logging
- Catch http client response error 413 on Payload Too Large - No data returned by ''open_json'' now returns None instead of an empty list - ''_mfprocessor_json'' raise an error if ''_mfprocessor_json'' returns None - Improve logging
Allow distributed.client.Client in ``open_mfdataset``
Move generic methods to Proto
Fix box definitions to find data from all sources and modes
Refactoring some methods from pyarrow implementation up to proto (prepare for pandas implementation refactoring to come)
Fix DataNotFound message
- rename option "gdac_ftp" into "ftp" - fix bug in options
- new deprecated utility - deprecate ``localftp`` fetchers - misc docstring improvements - updated doc - fix bug in erddap
fix bug causing data fetching when not necessary
- cache was not used because _same_origin always return False ...
- fix issue with indexstore access to cached files for search - implement new Registry class utility to ease internal management of specific list of items
- changed src from localftp to gdac in plotters tests - add tests for new Registry and float_wmo utilities - make errors option consistent between Registry, float_wmo, is_wmo and is_cyc with 'raise', 'warn', and 'ignore'
Hopefully we're finally good 👍🏻 to go The new data fetcher API: from argopy import DataFetcher
argo = DataFetcher(src='gdac')
argo = DataFetcher(src='gdac', ftp="https://data-argo.ifremer.fr") # http server, the default and fastest !
argo = DataFetcher(src='gdac', ftp="ftp://ftp.ifremer.fr/ifremer/argo") # works with ftp
argo = DataFetcher(src='gdac', ftp=argopy.tutorial.open_dataset("localftp")[0]) # or with local GDAC folder works also with the IndexFetcher The new fetcher can return the index without downloading the data: argo = DataFetcher(src='gdac').float(6903076)
argo.index This relies on the new index store: from argopy.stores.argo_index_pa import indexstore_pyarrow as indexstore
# or if you don't have/want pyarrow:
# from argopy.stores.argo_index_pd import indexstore_pandas as indexstore
idx = indexstore(host="https://data-argo.ifremer.fr", index_file="ar_index_global_prof.txt") # Default
idx.load()
idx.search_lat_lon_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])
idx.N_MATCH # Return number of search results
idx.to_dataframe() # Convert search results to a dataframe |
quai20
approved these changes
Apr 12, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Well, simply add a long time required GDAC ftp fetcher
This PR has many commits because implementing the GDAC ftp fetcher required a full new index store (based on #113 ) and a new pytest design.
But the new pytest design was very very slow, so I had to redesign CI tests as well 😭