-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simpler file navigation utilities #8
Comments
Hi @BillMills, the WOD format puts limitations on what we can do with regards this kind of thing. Probably we can tighten things up a bit but the main thing will be to improve documentation I think.
|
Gday guys, Just a quick one to say that it would be great to see some improved doco around this. Perhaps a few example scripts could be added to the repo that show common tasks (eg iterating through profiles in a file to populate a netcdf)? I'll try to dig in to the code a bit over the next few days and if I can contribute anything I'll send through some pull requests (although I am a bit of a python noob!). Cheers |
@s-good ah yes, I see what you're saying; there's no unambiguous flag denoting the start of a profile (there is for WOD01, 05, 09 and 13, since they all start with A, B or C after a newline, but WOD98 blows it). I think the reason for
Where one might have imagined that middle call to skip a profile, no profile is skipped since I haven't dug into the other functions, but I suspect there's similar stories for the ones with surprising behavior. But anyway, I think you're right - put underscores in front of all of these and discourage endusers from poking at them is the easiest and most robust solution. @rowanwins that would be awesome! @s-good is correct that a number of the functions I described exist, but as I discuss above, they rely on being used in a very particular context. The canonical usage example is:
I'd be delighted to get some PRs with example usage - you've got the right idea, to think of the simplest relevant minimal working examples, so that |
Hmmm well I'm still not getting very far on the looping front I'm afraid, I can get through the first two profiles but then the third record freaks out
The error seems to be something along these lines...
:( |
Have a look at the end of your datafile and check:
The wod spec requires ascii wod files to consist exclusively of 80-character lines terminated in a newline character - which is easy to miss! I can reproduce your error iff I remove my trailing whitespace. |
Ok so had a look at that and all seemed ok. Rather it appears im somehow being foiled by the WODselect download tool, I tried again with one of the pre-canned geographic datasets and it worked fine. With my completely uneducated guess I think what is happening is when I run my WODselect the resulting dataset is too large and so its being split over multiple files (eg file1.dat, file2.dat, file3.dat). I suspect I somehow need to append these files back together before passing them into wodpy. All a good learning experience, it's as much about wodpy as it is the wod downloaded tools! |
Hi @rowanwins, the WODselect tool does split the data into multiple files but I think that each file should be readable without appending them together. @BoyerWOD, do you have any advice? @BillMills, sorry to be pedantic but I think that advance_file_position_to_next_profile can only work as it currently does as it is a method of an instance of WodProfile. It's function is to move the file pointer to the end of where the data record corresponding to that instance of WodProfile occurs in the data file and it wouldn't make sense for it to do anything else. Maybe we need to start a utils module that does file manipulations that are not tied to a particular profile. |
@s-good I think you're right - long term, a module along those lines could be good; might even be worth thinking about smart ways to unpack into a database there, like our recent discussion with Gui, potentially after AutoQC 1.0. |
Hi IQuOD/wodpy people. Firstly thanks for all your efforts on these packages. Great to see. Hopefully this is an appropriate question: WODselect has many options. Is their any guidance on what settings work best with wodpy? I intend to give it a go and see if I can build some tools with wodpy as the base. Thanks. [edit] |
Right now wodpy works only with the WOD native ASCII option. I do not know if a netCDF option will be added, but this would be a nice feature. Other options:
I think that covers all relevant options. Let me know if I missed any. Thanks, |
Hi Tim, Thanks for the quick and very useful response - much appreciated. I guess starting out building some tools in python the question is: should one start from a base of (A) WODselect files with IQuOD flags and wodpy or (B) IQuOD netcdf files w/something else (like XARRAY). [This is largely a rhetorical question - will work through these options]. |
Yes, X-ARRAY would work. There is still the need to translate the netcdf files into X-ARRAY and make sure the depths relate to the other measured variables even though the other variables may have different dimensions. The netcdf files produced by WODselect are IQuOD (and WOD) netcdf files. The full set of IQuOD and WOD contiguous ragged array files as they currently stand can be found through their landing pages on a THREDDS server.: Tim |
Hi @Thomas-Moore-Creative @BoyerWOD - re: netcdf compatibility, sure, I’m game to implement this for WOD data if there’s demand for it. To get things started, it’d be helpful if you provided a netcdf profile or two, with a description of the ‘right answer’ - what it should decode to, so we can build unit tests to validate this properly. Make that available and netcdf support can be next up. |
Thanks for the reply Bill. It's great having someone with your IT experience working with oceanographers - hopefully I / we don't frustrate you with our poor programming practices! =) I don't yet have a 'right answer' for my current task. I'm waiting for that myself - and it's very specific to our current uses. And I'm not looking at NC "profile" files but the yearly files recently released by IQuOD. What I'm currently working on is sucking in 30 years of observations at one time (about 400M obs and 1M casts), create some xarray datasets and / or pandas dataframes that make sense given the different dimensions, and write some basic tools that allow me to slice & dice by typical things like time, space, and flags. Tim and I are just working through some of the questions I have about what I'm seeing in the data that doesn't make immediate sense to me. BUT - this is not meant to be discouraging of the need to follow your suggestions above. |
At the risk of getting off-topic here is the toy analogous problem I worked through (VERY SIMPLE problem and approach) to get some code that can help me "merge" the "casts" data with the "obs" data in the current crop of v0.1 IQuOD datasets. I'm not sure how useful it is for others but I'm pushing code up to a public repo in the chance it's useful or might spark discussion > https://github.com/Thomas-Moore-Creative/IQuOD_scratch/blob/master/Toy_problem_merge_by_row_size.ipynb |
@Thomas-Moore-Creative thanks for sharing, and sorry for the slow reply - what you've got there is almost an outer join between your two input tables, but it assumes some things about the ordering of rows in your dataframes, specifically that if the first child has n pets, those are the first n rows of the pets table, etc. This makes me nervous, since if anyone ever re-sorts those tables, information gets lost. Would it be possible to introduce a foreign key into one table or another? Then:
|
Regarding #6, it's not as simple as it could be for a user to walk around the file. For example,
is_last_profile_in_file
works in a guessable way on aWodProfile
class, butadvance_file_position_to_next_profile
appears to do nothing, thanks (I think) to some pass-by-value-ism in how Python is thinking of thefid
variable.It'd be nice to have a set of functions with the semantic meaning:
Much of this functionality already exists, but as detailed above, isn't totally obvious in usage.
The text was updated successfully, but these errors were encountered: