Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse enum descriptions and value descriptions #2208

Merged
merged 7 commits into from
Dec 16, 2021

Conversation

eutro
Copy link
Contributor

@eutro eutro commented Dec 14, 2021

This PR updates the API parser to parse the descriptions of enums and enum values. These were marked with TODOs before. Enum value descriptions especially have quite a lot of content that was missing from the parses.

@eutro eutro marked this pull request as draft December 14, 2021 21:56
@eutro
Copy link
Contributor Author

eutro commented Dec 14, 2021

Currently introduces a syntax error in outputs with an unescaped backslash.

@eutro
Copy link
Contributor Author

eutro commented Dec 14, 2021

Now also properly escapes all strings. I hope you don't mind the FPrintfEscapes function introduced to do this.

@eutro eutro marked this pull request as ready for review December 14, 2021 22:33
@raysan5
Copy link
Owner

raysan5 commented Dec 14, 2021

@eutro Hey! That's a great improvement! It was on my TODO list!

@@ -804,6 +826,44 @@ static char *TextReplace(char *text, const char *replace, const char *by)
}
*/

// Like fprintf, but supports only %i and %S. %S prints an escaped JSON-like string.
static void FPrintfEscapes(FILE *file, const char *format, ...) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to avoid this function if possible and just escape them directly on fprintf(). What are the benefits of creating an additional function to do that? It also add an additional library dependency that I'd prefer to avoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escaping the string requires the insertion of the extra backslashes (or possibly other characters for XML escapes), so it can't be done in-place without awkward shifting of bytes that may overflow buffers. Since they're being output directly, that's unnecessary and we can just insert the extra characters as we output.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand the problem, using backslashes is the expected way to print those characters. About the buffers size, it's a specific parser for a specific file (raylib.h), buffers could be sized for that with a secure margin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To write the raw string of value:
foo \ bar \ baz
in JSON,
"foo \\ bar \\ baz"
must be written.

These characters must be inserted into the string at specific indeces, which are the extra backslashes.

Naively, this means shifting the rest of the buffer to the right by a byte every time. Alternatively, counting backslashes ahead of time means shifting each byte at most once. Both of these are slightly more complicated than just outputting the extra backslashes with this function, but not the end of the world.

Additionally, inserting the extra backslashes into the string is destructive, meaning the original strings would be modified. This would be undesirable if we want to output the unescaped strings again. Say, if the program is modified to be able to output JSON and XML without a restart, outputting the JSON would modify the strings before the XML gets output. Again, not the end of the world, but it may be an unintuitive bug.

An alternative function would be one that just outputs an escaped string, then we can split the printf calls on those.

Which would be preferrable?

  • Using FPrintfEscapes (current)
  • A PrintEscaped function that prints a single escaped string
  • An EscapeString function that escapes a string in-place and destructively
  • Something else

Copy link
Owner

@raysan5 raysan5 Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry again but I can't see the issue. As per my understanding foo, bar and baz are independently provided so they can just be properly concatenated with required backslashes when writing the required output. Also XML, JSON and others are generated in an independent way, each one with its formatting requirements, directly generated from tokenized input data.

Please, could you provide a specific examples from raylib.h parsing where that problem happens? Sorry again for not seeing the issue, english is not my first language and I may be missing the point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I want to output the description from this comment here

KEY_BACKSLASH = 92, // Key: '\'

then the string has the raw value of Key: '\'. In JSON, I should output this as "Key: '\\'", though, which is a character longer and requires the shifting of the \' to the right by a character.

This is how this would be done:

Unescaped string from description:

K e y :   ' \ ' 

If we want to modify the string in-place to have the escapes we must shift everything after the backslash to the right:

K e y :   ' _ \ '

and then we can insert the extra backslash:

K e y :   ' \ \ '

That's all, that's the issue I'm talking about.


What is currently being done (in this PR) is that we iterate over the string once, but output an extra backslash before anything that needs to be escaped:

K e y :   ' \ '
^ output K
K e y :   ' \ '
  ^ output e
...
K e y :   ' \ '
          ^ output '
K e y :   ' \ '
            ^ output two \s
K e y :   ' \ '
              ^ output '

So, which should the approach be? Should we escape the string in-place (shifting the string and inserting backslashes, as in the first description) or should we just output an extra backslash as we output the string manually (like the second description)?

Copy link
Owner

@raysan5 raysan5 Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now I understand the issue. Checking raylib.h there are only 4 cases where that's a problem (always descriptions):

Line 567:     KEY_BACKSLASH       = 92,       // Key: '\'
Line 1037: RLAPI char *LoadFileText(const char *fileName);                   // Load text data from file (read), returns a '\0' terminated string
Line 1039: RLAPI bool SaveFileText(const char *fileName, char *text);        // Save text data to file (write), string must be '\0' terminated, returns true on success
Line 1355: RLAPI unsigned int TextLength(const char *text);                  // Get text length, checks for '\0' ending

Current solution just replaces it by an empty space:

CharReplace(funcs[i].desc, '\\', ' ')

That function could be improved to replace the char by another string, just using an internal static buffer.

By the way, thank you very much for the detailed explanation.

Copy link
Contributor Author

@eutro eutro Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced it with an EscapeBackslashes function now. A general TextReplace function would be overkill, I think.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it looks good to me.

@raysan5 raysan5 merged commit fffd78e into raysan5:master Dec 16, 2021
@raysan5
Copy link
Owner

raysan5 commented Dec 16, 2021

@eutro Thanks for the improvement! I'm merging and reviewing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants