Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid tokenizer exceptions for qualified expressions #56

Merged
merged 1 commit into from
Jul 8, 2023

Conversation

skaupper
Copy link

Currently qualified expressions trigger a tokenizer exception.

This PR should take care of these, by recognizing that the character pattern '( may not be followed by another single quote (which would produce the character literal '('). In those cases both characters are output as instances of CharacterToken.

An additional test case ensure a correct parsing of qualified expressions (e.g. UNSIGNED'(x"0")) and the relevant character literal (.

Copy link
Owner

@Paebbels Paebbels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a test for these sequences:

foo'('0')
foo'('(')

pyVHDLParser/Token/Parser.py Outdated Show resolved Hide resolved
elif (buffer[0] in __WHITESPACE_CHARACTERS__) and (buffer[1] in __WHITESPACE_CHARACTERS__):
tokenKind = cls.TokenKind.SpaceChars

if buffer[0] == "(": # Relevant for qualified expressions
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if buffer[0] == "(": # Relevant for qualified expressions
if buffer[0] == "(": # Relevant for qualified expressions

Python (styling) requires at least 2 spaces before #. I'm following Python coding styles loosely, but this is a good rule :).

@Paebbels
Copy link
Owner

@skaupper are you planning more fixes like this or should I prepare for a release of that code to main?

@skaupper
Copy link
Author

Should we have a test for these sequences:

foo'('0')
foo'('(')

Well, without trying it I am pretty sure these cases will fail with the current solution.
It will probably be tricky to unambiguously support all variations of character literals, qualified expressions and attributes. Maybe with a deeper lookahead?

Can you think of an example of the form foo'(',')' being valid VHDL?

I will have another look at it!

@skaupper
Copy link
Author

@skaupper are you planning more fixes like this or should I prepare for a release of that code to main?

Currently I focus on fixing things which occur while parsing some generated VHDL sources. As of now it seems qualified expressions are the last issue with the tokenizer for my use cases.

I may have a look into some issues with the block parser afterwards (e.g. #9 and #16). Personally, I do not need a release though.

@Paebbels
Copy link
Owner

Paebbels commented Jun 29, 2023

Can you think of an example of the form foo'(',')' being valid VHDL?

I don't think so.

F****

Type, qualified expression on character , followed by an attribute ...

character'(',')'length

@Paebbels Paebbels marked this pull request as draft June 29, 2023 20:41
@skaupper skaupper closed this Jun 29, 2023
@skaupper skaupper reopened this Jun 29, 2023
@skaupper
Copy link
Author

Type, qualified expression on character , followed by an attribute ...

character'(',')'length

After digging through the LRM and using ModelSim to confirm some ideas, it seems like character'(',')'length is not valid VHDL, since a qualified expression is neither a name nor a function_call according to the syntax.

On the other hand these snippets ARE valid VHDL: character'(''') and ''', both representing a value of '.

A similiar issue arises with the string literal "a""b" representing the string a"b (both a and b can be omitted for a string literal of """") but that should be part of another issue/PR, i guess.

@skaupper
Copy link
Author

For this PR, I will focus on solving the foo'('(') problem. Using a look-behind tokens seems promising, since '( may only be a qualified expression if a WordToken preceeds it (any counter examples are welcome).

Any shenanigans involving ''' and """" should probably be handled in a separate PR.

@skaupper skaupper force-pushed the bugfix/vhdl_qualified_expressions branch from 5e9754f to fd7422b Compare June 30, 2023 13:38
@Paebbels
Copy link
Owner

Please remove the draft state when ready for review / merge.

@skaupper skaupper force-pushed the bugfix/vhdl_qualified_expressions branch from fd7422b to 41daf49 Compare July 3, 2023 08:38
@skaupper skaupper marked this pull request as ready for review July 3, 2023 08:39
@Paebbels Paebbels merged commit 275a778 into Paebbels:dev Jul 8, 2023
@Paebbels Paebbels mentioned this pull request Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants