Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework intro to Section 8: Accuracy & precision #330

Closed
erget opened this issue Jun 15, 2021 · 5 comments
Closed

Rework intro to Section 8: Accuracy & precision #330

erget opened this issue Jun 15, 2021 · 5 comments
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@erget
Copy link
Member

erget commented Jun 15, 2021

@JonathanGregory @AndersMS @davidhassell @oceandatalab (Sylvain) FYI

In pursuing #327 the following text was raised by @JonathanGregory (#327 (comment)), which we will not pursue in the course of #327 but should nonetheless be captured for referencing and addressing separately. I've quoted that below

In the first paragraph of Sect 8 we distinguish three methods of reduction of datset size. I would suggest minor clarifications:

There are three methods for reducing dataset size: packing, lossless compression, and lossy compression. By packing we mean altering the data in a way that reduces its precision (but has no other effect on accuracy). By lossless compression we mean techniques that store the data more efficiently and result in no loss of precision or accuracy. By lossy compression we mean techniques that store the data more efficiently and retain its precision but result in some loss in accuracy.

Then I think we could start a new paragraph with "Lossless compression only works in certain circumstances ...". By the way, isn't it the case that HDF supports per-variable gzipping? That wasn't available in the old netCDF data format for which this section was first written, so it's not mentioned, but perhaps it should be now.

@JonathanGregory
Copy link
Contributor

Dear Daniel @erget

I think it's fine to make this a separate issue, but in that case, will you leave that paragraph unchanged (as in version 1.8) in #327?

Jonathan

@erget
Copy link
Member Author

erget commented Jun 22, 2021

@JonathanGregory I'd pass this question on to @AndersMS who is doing the lion's share of work on this topic - currently I don't have the overview if you had other comments that would touch that paragraph, but we're addressing them in a systematic way there, so if there's something that's directly relevant for #327 that would be the place for it. WRT to the comment quoted above, as packing and compression, both lossy and lossless, are already part of the standard, I would see that as a separate improvement from those proposed in #327.

@AndersMS
Copy link
Contributor

Hi @JonathanGregory,

We propose to change the first paragraph as proposed by you in #327. Additionally, we have opened this present issue #330 to address per-variable gzipping as well as to verify that the usage of the terms precision and accuracy are correct.

Would you support that or would you prefer that we keep the paragraph as is for #327?

Anders

@JonathanGregory
Copy link
Contributor

Dear @AndersMS and Daniel @erget

I see. I misunderstood. If you think my version of the revised paragraph is OK, it's fine to include it in your pull request for #327. I agree that my question about gzipping is a separate point, not related to #327. If that's the subject of this issue, it makes sense to me.

Thanks, Jonathan

@davidhassell davidhassell added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Jun 28, 2021
@JonathanGregory
Copy link
Contributor

Fixed by #326 of @AndersMS and @erget, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

No branches or pull requests

4 participants