Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmaCC_Elixir: parsing expr causes token after <at_op>, unary_op, or <capture_op> dropping next identifier #36

Open
mariari opened this issue Nov 3, 2024 · 2 comments · May be fixed by #37

Comments

@mariari
Copy link
Contributor

mariari commented Nov 3, 2024

Issue

Parsing multiple Expressions

expr 
	: matched_expr
	|  no_parens_expr
	|  unmatched_expr
	;

Has issues on the two parts of grammar below

matched_expr 
	: ...
	| unary_op_eol matched_expr {{}}
	| <at_op> 'op' matched_expr  {{}}
	| <capture_op> 'op' matched_expr {{}} 

unmatched_expr 
	: ...
	| unary_op_eol expr 'expression' {{}}
	| <at_op> 'op' expr 'expression' {{}}
	| <capture_op> 'op' expr 'expression' {{}} 
         ...
	;

The next parsed item drops the next identifier, unless if the next identifier is required for a valid following expression.

I believe this may be related to glr backtracking and actionsForCurrentToken. As changing expr to either unmatched_expr or matched_expr seems to create correct parses for their respective cases, and trying it out in the grammar tab (which seems not to be affected by actionsForCurrentToken) returns the proper value.

I'm not sure of the issue, as the scanner state is properly restored(?).

I.E.

&3
1

will properly display both expressions where as.

&3
foo()

will only have &3 and () as the expressions.

As a consequence, the following parses, when the following ) should be had.

@1
(3

Example with <at_op>

defmodule Nock do
  @spec nock(Noun.t(), Noun.t()) :: 1
  def nock(subject, formula) do
    nock(subject, formula, %Nock{})
  end

the def is missing and the nock( ... is next.

Example with <capture_op>

  &3
  foo()

The foo is missing and the next token is ()

Example with unary_op

not not 1

f  1

The next expression is 1

Analysis

It seems that putting the operator within a expression does not make a difference

4 * +1
f  1

will be treated as if it were

4 * +1
1

However, Putting a ; between the nodes make it work, as per

not not 1;

f  1

This may be related to #29, as the <unary_ops> seem to trigger similar errors

Debugging Information

To better debug what is going on I started to compare

4 * +1 # or not 1 ... doesn't particularly matter
f  1
# vs
4 * 1
f  1

The parser should mostly treat these the same, However, after the expr_list is made for both, the first one back tracks to be an unmatched_expression which is the next rule after matched_expression. Seems the glr back tracking kicks in.

One step later

@mariari mariari changed the title SmaCC_Elixir: 'def' token sometimes dropped SmaCC_Elixir: token after a match matched expression sometimes dropped Nov 3, 2024
@mariari mariari changed the title SmaCC_Elixir: token after a match matched expression sometimes dropped SmaCC_Elixir: token after <at_op> or <capture_op> dropping Nov 3, 2024
@mariari mariari changed the title SmaCC_Elixir: token after <at_op> or <capture_op> dropping SmaCC_Elixir: token after <at_op>, unary_op, or <capture_op> dropping next identifier Nov 3, 2024
@mariari mariari changed the title SmaCC_Elixir: token after <at_op>, unary_op, or <capture_op> dropping next identifier SmaCC_Elixir: parsing expr causes token after <at_op>, unary_op, or <capture_op> dropping next identifier Nov 3, 2024
@mariari
Copy link
Contributor Author

mariari commented Nov 3, 2024

I've made some progress, after staring at it for a long time, I did properly get it to error, however it doesn't work in general as I have some errors when throwing big files at this change

actionsForCurrentToken
	| lookahead ids id actions token position previousState |
	scanner lastWasEol
		ifTrue: [ actions := OrderedCollection new.
			self addActionsFor: scanner eolId to: actions.
			actions notEmpty
				ifTrue: [ scanner lastWasEol: false.
					position := scanner position.
					token := currentToken.
					previousState := currentState.
					currentToken := currentToken class
						value: ''
						start: token startPosition
						ids: {scanner eolId}.
					currentState := self duplicateState.
					self position: position -3.
					self parseCurrentToken.
					currentToken := token.

					self position: position.
					self parseCurrentToken.
					self restoreState: previousState.
					
					self position: position.
					scanner lastWasEol: true.
					currentToken := token ] ].

I've added all the self position: calls. Namely the self position: position - 3. Seems to help things.

Here is the current state after this change on some definition

@mariari
Copy link
Contributor Author

mariari commented Nov 3, 2024

I believe I have a solution that I will open in a PR. However I'm not sure what I'm doing precisely by this:

actionsForCurrentToken
	| lookahead ids id actions token position previousState |
	scanner lastWasEol
		ifTrue: [ actions := OrderedCollection new.
			self addActionsFor: scanner eolId to: actions.
			actions notEmpty
				ifTrue: [ scanner lastWasEol: false.
					position := scanner position.
					token := currentToken.
					previousState := currentState.
					currentToken := currentToken class
						value: ''
						start: token startPosition
						ids: {scanner eolId}.
					currentState := self duplicateState.
					self position: position - token size.
					self parseCurrentToken.
					currentToken := token.
					self position: position.
					self parseCurrentToken.
					self restoreState: previousState.
					
					self position: position.
					scanner lastWasEol: true.
					currentToken := token ] ].

Seems to work, I deduced this by noting that the number I go - by turns out to be precisely the amount of characters it successfully parses from it. This combination seems to set up everything correct. However I'm unsure why we do the parseCurrentToken routine twice in a row (Commenting out the repetition makes the parser very slow, however doesn't change the semantics.).

This was gotten by looking at how % was done and reverse engineering from there.

mariari added a commit to mariari/SmaCC that referenced this issue Nov 3, 2024
The Position wasn't being shifted before scanning forward unlike the map_op case.

We shifft it by the characters to remove the issue.
@mariari mariari linked a pull request Nov 3, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant