Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vstrtonum util as replacement for atof/strtod, etc. #19309

Draft
wants to merge 13 commits into
base: develop
Choose a base branch
from
184 changes: 114 additions & 70 deletions src/common/utility/StringHelpers.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include <utility_exports.h>

#include <iostream>
#include <fstream>
#include <set>
#include <string>
#include <vector>
Expand All @@ -20,14 +21,6 @@

#include <maptypes.h> // for CIStringSet, CIStringSetVector defs

#if __GNUC__ >= 3
# define MUST_CHECK __attribute__ ((warn_unused_result))
#else
# define MUST_CHECK /*nothing*/
#endif



namespace StringHelpers
{
const std::string NON_RELEVANT_CHARS =
Expand Down Expand Up @@ -97,73 +90,124 @@ namespace StringHelpers

std::string UTILITY_API EscapeSpecialChars(const std::string &str);

// ****************************************************************************
// Function: str_to_u_numeric
//
// Purpose: Converts a string value into an unsigned numeric type as given by
// the template parameter.
// WARNING: This is likely to compile and silently fail if given a
// signed type in the template parameter.
//
// Programmer: Tom Fogal
// Creation: August 11, 2008
//
// Modifications:
//
// Tom Fogal, Fri Aug 29 16:15:17 EDT 2008
// Reorganized to propagate error upward.
//
// Tom Fogal, Tue Sep 23 11:08:02 MDT 2008
// Removed a statically-false branch which was causing an annoying warning.
//
// ****************************************************************************
template<typename UT>
MUST_CHECK bool str_to_u_numeric(const char * const s, UT *retval)
// ****************************************************************************
// Function: vstrtonum
//
// Purpose: Replacement for strtoX() and atoX() methods.
//
// Instead of any of these kinds of uses...
//
// int k = atoi(numstr);
// float f1 = atof(numstr);
//
// unsigned u = (unsigned) strtoul(numstr, 0);
// if (errno != 0) u = 0xFFFFFFFF; // set to default val
//
// float f2 = (float) strtod(numstr, 0);
// if (errno != 0) {
// f2 = -1.0; // set to default value
// debug5 << numstr << " bad value" << endl; // log error
// }
//
// ...do this...
//
// int k = vstrtonum<int>(numstr);
// float f1 = vstrtonum<float>(numstr);
//
// unsigned u = vstrtonum<unsigned>(numstr, 0xFFFFFFFF);
// float f = vstrtonum<float>(numstr, -1.0, debug5);
//
// Templatized methods to convert strings to language-native typed
// numeric values, perform some minimal error checking and optionally
// emit error messages with potentially useful context when errors
// are encountered.
//
// This should always be used in place of strtoX() or atoX() when
// reading ascii numerical data.
//
// We do a minimal amount of error checking for a signed conversion
// by checking if first non-whitespace character is a minus sign and
// artificially setting errno to EDOM (not something strtoX/atoX
// would ever do). We could add more error checking for different
// cases too by, for example, checking value read for int type and
// seeing if it is too big to fit in an int. We currently do not
// do this but it would be easy to add. We could also easily add
// logic for other, less frequently used types such as shorts or
// maybe int64_t, etc.
//
// The default method treats all ascii as long double for the strtoX
// conversion and then casts it to correct type. We specialize some
// cases for slightly better behavior.
//
// I ran performance tests on macOS doing 1 million conversions with
// these methods (including error checking) and 1 million directly
// with strtoX and atoX methods and observed no significant diffs
// in performance. In addition, keep in mind that these methods are
// typically being used in conjunction with file I/O, which almost
// certainly dominates performance. The only time this might not be
// true is for memory resident "files", mmaps, and/or SSDs.
//
// At some point, it would make sense to enhance this to use locale
// so we can handle different character encodings as well as regional
// specific interpretations (e.g. European 1.234.456,89). The idea
// would be to set the locale VisIt is using (a pref. maybe) and then
// these methods would just use that locale in calls to strtoX. That
// could be achieved globally in VisIt with a call to setlocale().
// However, when in the United States reading an ascii data file
// formatted for human readability in Germany the desire would be
// to specify "de_DE" for the locale during the read of just that
// file suggesting something more complicated than just a global
// setting.
//
// Mark C. Miller, Wed Jan 10 17:10:21 PST 2024
// ****************************************************************************
template<typename T> inline T _vstrtonum(char const *numstr, char **eptr, int /* unused */) { return static_cast<T>(strtold(numstr, eptr)); }

// Specialize int/long cases to use int conversion strtol which with base of 0 can handle octal and hex also
#define _VSTRTONUMI(T,F) template<> inline T _vstrtonum<T>(char const *numstr, char **eptr, int base) { return static_cast<T>(F(numstr, eptr, base)); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of macros, we can use std::function along with the template to provide a generic way to forward and apply for each type

Copy link
Member Author

@markcmiller86 markcmiller86 Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the macro is being used quite the way your comment suggests. Consumers of this interface use only the vstrtonum templatized function...never a macro.

strtold() would work for everything if we didn't care about handling bases other than 10 in conversions (e.g. octal or hex) and we would not need the specializations for integer data which include a base argument.

This macro here is simply being used to easily instantiate several specializations for both signed and unsigned integer data to also support conversions involving bases other than 10. The specialization for unsigned cases is added error detection for passing negative values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, why we don't just due a normal instantiation? Templates should be the tool, not sure why we need both templates + macros?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the issue is with using macros.

That said, my reason for using them is that it makes a) for a lot less typing and reading and, in particular, b) reading a lot of text that is substantially similar and trying to identify what the key difference is between them.

IMHO, macros make it very clear that the only difference in the three instantiations is the type and function called to perform the conversion.

_VSTRTONUMI(int,std::strtol)
_VSTRTONUMI(long,std::strtol)
_VSTRTONUMI(long long,std::strtoll)

// Specialize unsigned cases to use unsigned conversion strtoul and error checking passing negated arg
// Note that size_t is an alias almost certainly to one of these types and so we do not have to
// explicitly handle it here but any caller can still use it as in vstrtonum<size_t>().
#define _VSTRTONUMU(T,F) template<> inline T _vstrtonum<T>(char const *numstr, char **eptr, int base) { char const *s=numstr; while (isspace(*s)) s++; T retval = static_cast<T>(F(numstr, eptr, base)); if (*s=='-') errno = EDOM; return retval;}
_VSTRTONUMU(unsigned int,std::strtoul)
_VSTRTONUMU(unsigned long,std::strtoul)
_VSTRTONUMU(unsigned long long,std::strtoull)

// dummy ostream for default (no ostream) cases
static std::ostream NO_OSTREAM(std::cerr.rdbuf());

template<typename T> T
inline vstrtonum(char const *numstr, int base = 10, T dfltval = 0, std::ostream& errstrm = NO_OSTREAM, char **eptr = 0)
{
// strtoul() will happily convert a negative string into an unsigned
// integer. Do a simple check and bail out if we're given a negative
// number.
if(s[0] == '-')
{
return false;
}

const char *str = s;
// get rid of leading 0's; they confuse strtoul.
if(str[0] == '0' && str[1] != '\0' && str[1] != 'x')
{
while(*str == '0') { ++str; }
}

// One might want to think about switching this to an `unsigned long
// long' and using `strtoull' below. That will catch more cases, but
// this is more portable.
unsigned long ret;
char *end;
char *_eptr;
char **eptrptr = eptr==0?&_eptr:eptr;
errno = 0;
ret = strtoul(str, &end, 0);
*retval = static_cast<UT>(ret);
switch(errno)
T retval = _vstrtonum<T>(numstr, eptrptr, base);
int errno_save = errno;

// emit possible error messages
if (errno_save != 0)
{
case 0: /* success */ break;
case ERANGE:
// Constant does not fit in sizeof(unsigned long) bytes.
return false;
break;
case EINVAL:
// Bad base (3rd arg) given; this should be impossible.
return false;
break;
default:
// Unknown error.
return false;
break;
retval = dfltval;
if (&errstrm != &NO_OSTREAM)
{
errstrm << "Problem converting \"" << numstr << "\" to a number (\"" << strerror(errno_save) << "\")" << std::endl;
}
}
if(end == s) {
// junk characters start the string .. is this a number?
return false;
if (*eptrptr == numstr)
{
retval = dfltval;
if (&errstrm != &NO_OSTREAM)
{
errstrm << "Problem converting \"" << numstr << "\" to a number (\"no digits to convert\")" << std::endl;
}
}
return true;

return retval;
}
}
#endif
36 changes: 30 additions & 6 deletions src/common/utility/StringHelpers_test.C
Original file line number Diff line number Diff line change
Expand Up @@ -68,16 +68,16 @@ int main(int argc, char **argv)
falseNegatives.push_back(__LINE__);

size_t s_to_num_tmp;
if(str_to_u_numeric<size_t>("42", &s_to_num_tmp) == false ||
s_to_num_tmp != 42)
s_to_num_tmp = vstrtonum<size_t>("42");
if (s_to_num_tmp != 42)
falseNegatives.push_back(__LINE__);

if(str_to_u_numeric<size_t>("0", &s_to_num_tmp) == false ||
s_to_num_tmp != 0)
s_to_num_tmp = vstrtonum<size_t>("0");
if (s_to_num_tmp != 0)
falseNegatives.push_back(__LINE__);

if(str_to_u_numeric<size_t>("2147483648", &s_to_num_tmp) == false ||
s_to_num_tmp != 2147483648UL)
s_to_num_tmp = vstrtonum<size_t>("2147483648");
if (s_to_num_tmp != 2147483648UL)
falseNegatives.push_back(__LINE__);

{
Expand Down Expand Up @@ -354,6 +354,30 @@ int main(int argc, char **argv)
"../kerry/apple///banana/./././///../../../grape",
"/foo/bar/grape");

#if 0
#define CHECK_STRTONUM(T, S, V, DF, L, E) \
T v = vstrtonum<T>(S); \
if (v != V



CHECK_STRTONUM("555", 555, 555
//
// Check strtonum conversion cases
//
char *numstrs[] = {"555", "0555", "0x555", "0x555.5", "555.5", " \t-1", "abcdef", "1.5e-1207",
"0xFFFFFFFF", "4290000000", "15E+3", "7.652abc"};
for (int i = 0; i < sizeof(numstrs)/sizeof(numstrs[0]); i++)
{
int iv = vstrtonum<int>(numstrs[i], 0, i, "int", &std::cerr);
double dv = vstrtonum<double>(numstrs[i], 0, i, "double", &std::cerr);
unsigned int uiv = vstrtonum<unsigned int>(numstrs[i], 0, i, "unsigned", &std::cerr);
printf("str = \"%s\", int=%d, double=%g, unsigned int=%u\n", numstrs[i], iv, dv, uiv);
}
#endif



int all_errors = falseNegatives.size() +
falsePositives.size() +
pluralizing_errors +
Expand Down
11 changes: 8 additions & 3 deletions src/common/utility/maptypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#ifndef VISIT_MAP_TYPES_H
#define VISIT_MAP_TYPES_H
#include <algorithm>
#include <cctype>
#include <map>
#include <set>
#include <string>
Expand All @@ -16,15 +17,19 @@ typedef StringIntMap LevelColorMap;
typedef std::map<std::string, stringVector> StringStringVectorMap;
typedef std::map<double, int> DoubleIntMap;


// CI short for Case-Insenstive
// The two transform lines here previously used C lib `tolower` as the 4th argument
// That is incorrect as it works on ints, not chars. C++ transform over strings is
// expecting a method that operates on chars. Using C++ tolower works but there are
// multiple std::twolower methods and so we use a lambda notation to force it to
// find the correct one.
struct CIComparator {
bool operator() (const std::string& s1, const std::string& s2) const
{
std::string str1(s1.length(),' ');
std::string str2(s2.length(),' ');
std::transform(s1.begin(), s1.end(), str1.begin(), tolower);
std::transform(s2.begin(), s2.end(), str2.begin(), tolower);
std::transform(s1.begin(), s1.end(), str1.begin(), [](unsigned char c)->char{return std::tolower(c);});
std::transform(s2.begin(), s2.end(), str2.begin(), [](unsigned char c)->char{return std::tolower(c);});
return str1 < str2;
}
};
Expand Down
7 changes: 6 additions & 1 deletion src/databases/AMR/AMRreaderInterface.C
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,11 @@ AMRreaderInterface::~AMRreaderInterface()
#include <AMRreaderLowResBlkConso.h>

#include <DebugStream.h>
using DebugStream::Level1;
using DebugStream::Stream1;
#include <StringHelpers.h>
using StringHelpers::vstrtonum;
using StringHelpers::NO_OSTREAM;
#include <cstdlib>

AMRreaderInterface *
Expand All @@ -54,7 +59,7 @@ InstantiateAMRreader()
int typ=0;
char* amrlvl=getenv("AMRLEVEL");
if( amrlvl )
typ = atoi(amrlvl);
typ = vstrtonum<int>(amrlvl,10,0,Level1()?Stream1():NO_OSTREAM);

AMRreaderInterface *reader = NULL;
switch(typ)
Expand Down
Loading
Loading