-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Presubmission Inquiry for sciform (float -> string scientific formatting) #114
Comments
Welcome @jagerber48 and thanks so much for your detailed pre-submission inquiry. After a first read-through, sciform looks like it could be in scope for a pyOpenSci review, but we'd like to ask you for a little more information.
|
Hello @NickleDave, thanks for your response!
I can include the tabulate and matplotlib use cases in the documentation. I think those would be illustrative use cases people could look at. |
hey @jagerber48 !! 👋 welcome! I just had a question not related to this specific review. What on that form would make it more clear that pangeo is an option thing? we have an affiliated partner program and that check just allows someone to ALSO become pangeo affiliated. But it's not a requirement. How could we make that more clear as you are not the first person to be confused by that!! Also i'm wondering then if this tool would really be a support tool for reproducible reports (which is important to our open science goals)? If it's really about printing and output. Does that type of application (reproducible reports/ jupyter notebook output, etc). resonate with your goals for the tool? |
@lwasser About the Pangeo option from my perspective: Something like "You may optionally choose to affiliate your package with additional communities by checking the boxes below. These affiliations may come with XYZ benefits/additional requirements" Even just an "(optional)" flag may have cleared me. "If your package fits into an
"would really be a support tool for reproducible reports". What are "reproducible reports"? The tool takes python floats or float pairs and converts them to formatted (hopefully human readable) strings. There are many ways these strings could be used, it sounds like "reproducible reports" is definitely a use case that this tool could support. You mention Jupyter notebook output, that's definitely something I use it for, so I would say this does resonate with my goals for the tool. |
I've updated the documentation to include my prototypical use case: https://sciform.readthedocs.io/en/stable/examples.html. Here I am doing two visualization tasks. I have x, y data which I am fitting an extracting best fit parameters for. The first visualization task is plotting the data. The second visualization task is displaying the best fit parameters (and their uncertainties from the fit routine) in a table.
I imagine Instead of using I imagine adding an option to format strings into a "pretty" format using unicode characters and also a "latex" format similar to the |
@NickleDave I'm curious what next steps are for this. It seems like the package is likely in scope for pyopensci. Does that mean the next step is to actually submit the package and work towards meeting those requirements? |
Hi @jagerber48 thank you for your patience--we wanted to get input from other community members about whether this package was in scope. Thank you also for updating the documentation with a use case. That is exactly the kind of concrete example that really helps users understand what you are trying to do for them. We have decided that, yes, we will proceed with a review. Please go ahead and make a full submission. Be sure to mention this issue by number when you do so ("as discussed in #114") and please be sure to complete the pre-review survey when you do make the submission. Appreciate it! Once you have opened that issue referencing this one, I will close this. We will then put out a call for an editor and reviewers. |
@NickleDave ok great, thank you for your response! I'll be going on a two week vacation starting this weekend and I haven't yet had time to make the full submission yet. I will work on it, as per all your instructions, when I return. |
Thanks for letting me know @jagerber48 -- no rush. Have a good vacation! |
@NickleDave I've made the full submission at #121. One question before next steps: I have a few high level and lower design questions about the package. Some are about the overall architecture of the code and some are about "should I include this feature or this requirement". I'm curious if these types of questions are in-scope for the code review. Or if the code review should be thought of as reviewing the quality of the code and giving general advice based on the code at one snapshot in time (at one version number). I may as well mention some specific questions I have here and then you can better inform me about their appropriateness for discussion. These are the questions I have that I'm not sure are in scope for review. I also have some questions that I'm more sure are in scope for review (like should I add more unit tests, how can I improve continuous integration).
|
Hi @jagerber48, happy to help. These are all good questions to ask yourself as a developer, and I have definitely found myself pondering similar questions before. However, I can't give you a detailed answer here, because I would feel like I'm starting to review. In fact, some of these questions start to be about scope, and ideally we should not run a review just for the purpose of figuring out scope. That's something that should be determined ahead of time. We do want to help you though.
A related practice that I find helpful is to keep a "dev diary". I write down questions like this each day I do dev work, and I also prioritize my to-dos. If the same questions or ideas keep popping up, then it helps me know that I really need to prioritize working on them. I also include links to other code, papers, etc., that give me concrete examples--if I can't find anyone else who is doing what I have in mind, then that tells me something. Hope that's somewhat helpful--I'm only telling you because I wish I had gotten into this practice much sooner, along with using project management tools like GitHub Projects. Please ask these questions on our forum and let's take it from there. Let's time box that process--say, two weeks max--and then we'll start the review. |
@NickleDave thank you very much for the response, that is the sort of stuff I was looking for and is very helpful! The dev diary would definitely be helpful for me and I will look into GitHub projects. thank you for these pieces of advice. I asked my question about the formatting options proliferation here. That is one spot I hope to improve the code. Perhaps this specific question about code organization/repetition is actually in scope for the review process? After typing out but not posting a new topic on the scope questions (especially the list and arithmetic features) I've decided to take the following approach. I'll start out with the most conservative approach. So the package will be strictly for formatting individual numbers or pairs of numbers with a lot of possible formatting options. No arithmetic, no sequence/array handling and no The |
That's perfect, thank you @jagerber48. The question on the forum is very well stated and I think you will get good feedback. I think you are exactly right to take a more conservative approach for now. One thing I see happen is that developers get excited about adding new features and solving the related programming problems. There's nothing wrong with that, of course. (It's one of the reasons we like doing this stuff!) But it can take time away from "road-testing" the existing functionality out in the real world. My sense is that you'll get more out of focusing on that for now.
Yes. Let's do the following:
|
I'm going to close this presubmission issue since we have the submission open. Let's continue discussion there |
Submitting Author: Justin Gerber (@jagerber48)
Package Name: sciform
One-Line Description of Package: Provides extended functionality for formatting floats into strings according to scientific standards
Repository Link (if existing): https://github.com/jagerber48/sciform
Code of Conduct & Commitment to Maintain Package
Description
Community Partnerships
We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:
Scope
Scope
Please indicate which category or categories.
Check out our package scope page to learn more about our
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
Domain Specific & Community Partnerships
Sciform allows for improved formatting of floats into strings according to scientific standards. These strings will be output to terminals, plots, data documents, text documents, and possibly more. Making the displayed strings more readable as per scientific standards improves the visualization of "printed number" data.
There are no existing community partnerships for this project, though there may be opportunities for education around significant figures and uncertainty.
Who is the target audience and what are the scientific applications of this package?
Any scientist who uses python is in the potential target audience for this package, but especially those who are concerned with displaying data values in a way that is commensurate with the corresponding uncertainties. Most scientists likely use the python built-in string formatting for this purpose, but there are some shortcomings to python built-in formatting. Scientists who seek more formatting features could consider
sciform
.Are there other Python packages that accomplish similar things? If so, how does yours differ?
Yes there are similar packages.
sciform
includes its own string formatting mini language closely based on the built in one, but with some differences. Notablysciform
includes well-controlled significant figure formatting, engineering notation, binary formatting, SI/IEC prefix substitution, digit grouping and decimal symbol options (helpful for a diversity of locales), exponent value coercion, as well as value +/- uncertainty formatting functionality.sciform
was heavily motivated by this package. This package has sophisticated statistical handling of value +/- uncertainty pairs, handling error propagation and simulation under-the-hood. In addition, it has its own extension of the mini language for formatting value +/- uncertainty pairs.sciform
has more formatting functionality than the uncertainties package including, especially, engineering notation, grouping separator controls, and prefix substitution.sciform
is also a much lighter weight requirement than the uncertainties package. This may be desirable when a user wants to format strings, but they don't need the rest of the full statistical machinery of theuncertainties
package.sciform
was also motivated by theprefixed
package. This package provides a sort of engineering notation where exponents are rounded to multiples of 3, and then exponents area always replaced with their corresponding SI exponent.prefixed
package is a more conservative extension of the built-in formatting language.sciform
includes more functionality including engineering notation without prefix substitution and more grouping/decimal symbol control.sciform
also includes global configuration options for handling optional SI prefixes such asc
,d
,da
, andh
.sigfig
package has similar functionality tosciform
including sig fig rounding, separator control, value +/- uncertainty formatting including some features that are only forthcoming insciform
. sig fig does not currently support binary formatting. sig fig also does not provide a format specification mini language for formatting floats. Rather floats are formatted using an overload of the built-inround
function which I find to be slightly awkward compared to aFormatter
object or function.Much of the code is still a work in progress. I'm still working on documenting the existing features, more unit tests are necessary for existing features, and the value +/- uncertainty features are still young and not thoroughly tested. I have important ideas in mind for more value +/- uncertainty formatting features. But I would say the core of the package is in place. One glaring gap for this package is support for
Decimal
number rather thatfloat
numbers. I would like to add that functionality after the functionality for formatting floats is stable.This package is very new and has 1 user so far. Me. But, I've been kicking around code for this sort of formatting for quite some time now and think many others would find it useful. Having a small authoritative package for this sort of formatting could be useful for the scientific community. There is also some interest in getting some of these features into the python built in string formatting feature set which would be very useful. Having a package like this could be a stepping stone towards that. See https://discuss.python.org/t/new-format-specifiers-for-string-formatting-of-floats-with-si-and-iec-prefixes/26914/46. Though I do note that the format specification mini language is intentionally not 100% backwards compatible with the built in format specification mini language, so it would not be a top candidate for that role.
I'm also not very experienced when it comes to contributing to open source software. This is one of my first forays into that world, so I am learning as I go.
P.S. Have feedback/comments about our review process? Leave a comment here
The text was updated successfully, but these errors were encountered: