Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a GDAC ftp fetcher #157

Merged
merged 197 commits into from
Apr 13, 2022
Merged

Implement a GDAC ftp fetcher #157

merged 197 commits into from
Apr 13, 2022

Conversation

gmaze
Copy link
Member

@gmaze gmaze commented Nov 29, 2021

Well, simply add a long time required GDAC ftp fetcher

This PR has many commits because implementing the GDAC ftp fetcher required a full new index store (based on #113 ) and a new pytest design.

But the new pytest design was very very slow, so I had to redesign CI tests as well 😭

@gmaze gmaze added enhancement New feature or request backends labels Nov 29, 2021
This is a possible solution, but a fail on performance
Need to implement something to avoid loading mono profile files when fetching for a given float, need to work with multi-profile files
Have a nice trip !
Add logging
Raise an error when no data are found, but still server is not raising an error on its own
- BGC index file option
- Change to multi-profile files for request leading to more than 50 mono-profile files, hence adding post-processing filtering (''filter_points'') ton re-inforce search request criteria
New InvalidDataset
Improved logging
- Catch http client response error 413 on Payload Too Large
- No data returned by ''open_json'' now returns None instead of an empty list
- ''_mfprocessor_json'' raise an error if ''_mfprocessor_json'' returns None
- Improve logging
Allow distributed.client.Client in ``open_mfdataset``
@gmaze gmaze added this to the Go from alpha to beta milestone Jan 14, 2022
Move generic methods to Proto
Fix box definitions to find data from all sources and modes
Refactoring some methods from pyarrow implementation up to proto (prepare for pandas implementation refactoring to come)
Fix DataNotFound message
@gmaze gmaze requested a review from quai20 April 1, 2022 14:15
- rename option "gdac_ftp" into "ftp"
- fix bug in options
- new deprecated utility
- deprecate ``localftp`` fetchers
- misc docstring improvements
- updated doc
- fix bug in erddap
fix bug causing data fetching when not necessary
- cache was not used because _same_origin always return False ...
@gmaze gmaze removed the request for review from quai20 April 6, 2022 08:44
- fix issue with indexstore access to cached files for search
- implement new Registry class utility to ease internal management of specific list of items
@gmaze gmaze mentioned this pull request Apr 7, 2022
- changed src from localftp to gdac in plotters tests
- add tests for new Registry and float_wmo utilities
- make errors option consistent between Registry, float_wmo, is_wmo and is_cyc with 'raise', 'warn', and 'ignore'
@gmaze
Copy link
Member Author

gmaze commented Apr 8, 2022

Hopefully we're finally good 👍🏻 to go

The new data fetcher API:

from argopy import DataFetcher
argo = DataFetcher(src='gdac')
argo = DataFetcher(src='gdac', ftp="https://data-argo.ifremer.fr")  # http server, the default and fastest !
argo = DataFetcher(src='gdac', ftp="ftp://ftp.ifremer.fr/ifremer/argo")  # works with ftp
argo = DataFetcher(src='gdac', ftp=argopy.tutorial.open_dataset("localftp")[0]) # or with local GDAC folder

works also with the IndexFetcher

The new fetcher can return the index without downloading the data:

argo = DataFetcher(src='gdac').float(6903076)
argo.index

This relies on the new index store:

from argopy.stores.argo_index_pa import indexstore_pyarrow as indexstore
# or if you don't have/want pyarrow:
# from argopy.stores.argo_index_pd import indexstore_pandas as indexstore

idx = indexstore(host="https://data-argo.ifremer.fr", index_file="ar_index_global_prof.txt")  # Default
idx.load()
idx.search_lat_lon_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])
idx.N_MATCH  # Return number of search results
idx.to_dataframe()  # Convert search results to a dataframe

@gmaze gmaze requested a review from quai20 April 8, 2022 06:39
Copy link
Member

@quai20 quai20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@gmaze gmaze merged commit 4480503 into master Apr 13, 2022
@gmaze gmaze deleted the gdac-ftp-fetcher branch April 13, 2022 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backends enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants