Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vstrtonum util as replacement for atof/strtod, etc. #19309

Draft
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

markcmiller86
Copy link
Member

@markcmiller86 markcmiller86 commented Feb 14, 2024

Description

This is a draft PR for developers to see where I am headed with this. It defines vstrtonum<sometype>() with some special sauce to handle things like default values, error checking, range checking, etc. Read the comment in StringHelpers.h for an overview...

// ****************************************************************************
// Function: vstrtonum
//
// Purpose: Replacement for strtoX() and atoX() methods.
//
// Instead of any of these kinds of uses...
//
// int k = atoi(numstr);
// float f1 = atof(numstr);
//
// unsigned u = (unsigned) strtoul(numstr, 0);
// if (errno != 0) u = 0xFFFFFFFF; // set to default val
//
// float f2 = (float) strtod(numstr, 0);
// if (errno != 0) {
// f2 = -1.0; // set to default value
// debug5 << numstr << " bad value" << endl; // log error
// }
//
// ...do this...
//
// int k = vstrtonum<int>(numstr);
// float f1 = vstrtonum<float>(numstr);
//
// unsigned u = vstrtonum<unsigned>(numstr, 0xFFFFFFFF);
// float f = vstrtonum<float>(numstr, -1.0, debug5);
//
// Templatized methods to convert strings to language-native typed
// numeric values, perform some minimal error checking and optionally
// emit error messages with potentially useful context when errors
// are encountered.
//
// This should always be used in place of strtoX() or atoX() when
// reading ascii numerical data.
//
// We do a minimal amount of error checking for a signed conversion
// by checking if first non-whitespace character is a minus sign and
// artificially setting errno to EDOM (not something strtoX/atoX
// would ever do). We could add more error checking for different
// cases too by, for example, checking value read for int type and
// seeing if it is too big to fit in an int. We currently do not
// do this but it would be easy to add. We could also easily add
// logic for other, less frequently used types such as shorts or
// maybe int64_t, etc.
//
// The default method treats all ascii as long double for the strtoX
// conversion and then casts it to correct type. We specialize some
// cases for slightly better behavior.
//
// I ran performance tests on macOS doing 1 million conversions with
// these methods (including error checking) and 1 million directly
// with strtoX and atoX methods and observed no significant diffs
// in performance. In addition, keep in mind that these methods are
// typically being used in conjunction with file I/O, which almost
// certainly dominates performance. The only time this might not be
// true is for memory resident "files", mmaps, and/or SSDs.
//
// At some point, it would make sense to enhance this to use locale
// so we can handle different character encodings as well as regional
// specific interpretations (e.g. European 1.234.456,89). The idea
// would be to set the locale VisIt is using (a pref. maybe) and then
// these methods would just use that locale in calls to strtoX. That
// could be achieved globally in VisIt with a call to setlocale().
// However, when in the United States reading an ascii data file
// formatted for human readability in Germany the desire would be
// to specify "de_DE" for the locale during the read of just that
// file suggesting something more complicated than just a global
// setting.

@markcmiller86 markcmiller86 marked this pull request as draft February 14, 2024 00:11
template<typename T> inline T _vstrtonum(char const *numstr, char **eptr, int /* unused */) { return static_cast<T>(strtold(numstr, eptr)); }

// Specialize int/long cases to use int conversion strtol which with base of 0 can handle octal and hex also
#define _VSTRTONUMI(T,F) template<> inline T _vstrtonum<T>(char const *numstr, char **eptr, int base) { return static_cast<T>(F(numstr, eptr, base)); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of macros, we can use std::function along with the template to provide a generic way to forward and apply for each type

Copy link
Member Author

@markcmiller86 markcmiller86 Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the macro is being used quite the way your comment suggests. Consumers of this interface use only the vstrtonum templatized function...never a macro.

strtold() would work for everything if we didn't care about handling bases other than 10 in conversions (e.g. octal or hex) and we would not need the specializations for integer data which include a base argument.

This macro here is simply being used to easily instantiate several specializations for both signed and unsigned integer data to also support conversions involving bases other than 10. The specialization for unsigned cases is added error detection for passing negative values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, why we don't just due a normal instantiation? Templates should be the tool, not sure why we need both templates + macros?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the issue is with using macros.

That said, my reason for using them is that it makes a) for a lot less typing and reading and, in particular, b) reading a lot of text that is substantially similar and trying to identify what the key difference is between them.

IMHO, macros make it very clear that the only difference in the three instantiations is the type and function called to perform the conversion.

@markcmiller86
Copy link
Member Author

It occurs to me...maybe we should be thinking about unicode string data as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants