Add class to facilitate serialization and validation of code points #2858
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of Change(s)
Depends on (#2855)
#2788 was added to decode UTF-8 strings into code points. This change repurposes the encoding logic from #2120 into a wrapper around a
uint32_t
. This wrapper validates on construction and only stores valid UTF-8 code points (ie. surrogates are excluded). Invalid code points are replaced (silently) with theReplacementValue
. A stream operator for serialization is provided.The
TF_INVALID_CODE_POINT
uint32_t
has been replaced with aTfUtf8InvalidCodePoint
constexpr
instance.To aid testing,
EndAsIterator()
has been added to theTfUtf8CodePointView
class which returns an iterator of the same type asbegin()
. While the language (via range based for loops) supports sentinelend()
types as of C++17, many STL algorithms don't until C++20. WhileEndAsIterator()
this creates redundant copies of thestring_view
'send()
value, in non-performance critical code, it may be preferable to enable usage of the STL.Fixes Issue(s)