Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to convert sciform output strings back into SciNum objects. #104

Closed
jagerber48 opened this issue Jan 7, 2024 · 3 comments

Comments

@jagerber48
Copy link
Owner

This is an idea to make a parser that is used by the SciNum object which converts sciform output strings back into SciNum objects. There is a such a diversity of sciform outputs it is not obvious if this is possible. The parser would need to determine if it is looking at a number or number/uncertainty pair. The parser would also need to determine what type of exponent is being used, including standard or custom possible SI/IEC or parts per conversions. Then it would need to extract the exponent and the mantissa(s) and construct the SciNum object. It would also need to parse different user-selected separators. All of these means the parser should be dependent on format options. Perhaps if the parser was constrained to depend on both an input string and format options the problem would be more tractable. I.e. it will only look for superscript formatted exponents if the corresponding format options supplied has superscript=True. Perhaps the parsing could be applied from the scope of a Formatter rather than SciNum.

It would be nice if this operation round tripped str -> SciNum -> str under the global options. It is impossible for it to round trip SciNum -> str -> SciNum because the SciNum -> str conversion (involving a Formatter) in general rounds the number.

@jagerber48
Copy link
Owner Author

Much of the parsing machinery was figured out for the FormattedNumber parsing to convert to LaTeX/html.

There is only one ambiguity in sciform output that prevents it being read back in as input. Consider

100,000

Is this one hundred (comma is decimal separator, 6 sig figs) or one hundred thousand (comma is upper separator)?

I think the ambiguous cases can be summarized by: The number consists of a single period or comma followed by exactly three numbers and preceded by one, two, or three numbers. In all other cases I think it can be backed out periods or commas in the strings are being used as decimal or thousands separators. e.g.

100,0  # Must be 100 with comma decimal separator
1000.000  # Must be 1000 with period decimal separator
100,000.000  # Must be 100000 with period decimal separator

I think the proper way to handle this is for the parser to accept a ambiguous_decimal_separator input (that defaults to "."). Then when an ambiguous case is canceled this variable will be used to resolve the ambiguity. However, this parameter does NOT imply that e.g. "." will ALWAYS be interpreted as the decimal separator. That is, if 100.000,000 is encountered, the parser should know that "." is the upper separator and "," is the decimal separator.

@jagerber48
Copy link
Owner Author

The ambiguous_decimal_separator could default to None and resolve to the global options for decimal_separator in this case.

@jagerber48
Copy link
Owner Author

Closed by #149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant