Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor issues with REPP #308

Closed
goodmami opened this issue Aug 3, 2020 · 6 comments
Closed

Minor issues with REPP #308

goodmami opened this issue Aug 3, 2020 · 6 comments
Labels
Milestone

Comments

@goodmami
Copy link
Member

goodmami commented Aug 3, 2020

There are some parts of the REPP spec (spread across ReppTop wiki and various publications) that are not completely clear. PyDelphin currently makes guesses about the proper behavior, but it would be good to confirm them with others.

  1. Must iterative groups be called within the module that defines them? (currently yes)
  2. Can iterative group calls occur before the group's definition? (currently no, but maybe it should be allowed)
  3. Must the tokenizer pattern appear in the top-level module (or included file)? Can it be defined in an external module instead? (currently no)
  4. Is it an error to have tokenization patterns or meta-info declarations inside iterative groups? (currently no, probably should be yes)
@oepen
Copy link

oepen commented Aug 12, 2020

thanks for the clarification request, mike! i have tried to answer your questions in what i consider the official specification:

http://moin.delph-in.net/ReppTop?action=diff&rev2=44&rev1=43

do those changes seem sufficient to clarify your points?

@goodmami
Copy link
Member Author

goodmami commented Aug 12, 2020 via email

@goodmami
Copy link
Member Author

@oepen I'm working to make my code consistent with the updated spec, and it occurs to me there's another set of issues regarding iterative groups (which I'll call "internal" groups to be consistent with the wiki text). Extending my numbering from above...

  1. Internal group definitions (considered atomically) are non-sequential; if the contents of internal groups must be sequential, then nesting of internal groups should not be allowed. By what logic should we allow nesting while disallowing meta-info/tokenizer definitions? (Note that I'm referring to the definitions, not the calls)

  2. Assuming we allow nesting, then does the module-wide identifier space for internal groups persists inside internal groups, too? That is, both of these are illegal?

    #1
    ; ...
    #
    #1
    ; ...
    #
    

    and

    #1
    #1
    ; ...
    #
    #
    
  3. If the answer to 6 is "yes", then can a nested internal group be called outside its parent group? E.g.:

    #1
    #2
    ; ...
    #
    #
    >2
    

I think the cleanest way to resolve these issues is to disallow nested internal group definitions while allowing nested calls. The problem is a potential break in backward compatibility.

Finally, I want to resolve some ambiguity around my use of "must" in (1) above. I think it is fine to define an internal group that is never called (like an inactive external group), but any internal group call must resolve to a group defined within the same module. Your text on the wiki is less ambiguous here.

@oepen
Copy link

oepen commented Aug 21, 2020

hi @goodmami, and thanks for pushing further! i think i have answered all of your follow-up question by adding at the end of the section on internal group to the REPP wiki page. in sum, i fail to see why you lean towards outlawing nested internal groups? that, to me (just now, at least), would seem like an unnecessary constraint.

@goodmami
Copy link
Member Author

Thanks! I see that (6) and (7) have answers, but not (5), although the question in (5) is somewhat more philosophical than practical.

i fail to see why you lean towards outlawing nested internal groups?

I'm not trying to outlaw nested internal groups. I'm just noticing that our specification would be more consistent if we outlawed nested internal group definitions. The definitions are non-sequential while the internal group calls are sequential, so disallowing nested definitions would be consistent with this passage:

Owing to their non-sequential status, the tokenizer (:) and version (@) operators cannot occur inside a numbered internal group.

Disallowing them also removes the question about whether there's a separate namespace inside internal groups (meaning the clarification text you added about the global namespace would become unnecessary).

Regarding the wiki text here:

In principle, it is possible to have an internal group nested inside another one(which could be useful, for example, to allow calling into either the outergroup as a whole, or just its inner sub-group);

I'm not seeing why the nested internal group definition is useful. As things currently stand, it seems like these are equivalent:

#1
; ...
#

#2
; ...
> 1
#

>1
>2

and

#2
; ...
#1
; ...
#
> 1
#

>1
>2

I thought nested definitions weren't actually used in practice, but I see that there is one in the ERG's wiki.rpp:

#1
#2
!\[\[(?:[^[|\]]+\|)?([^[|\]]+)\]\]                      \1
#
>2
!\[(?:http|ftp)://(?:[^[\] ]+ )?([^[\]]+)\]             \1
#
>1

So the backward-compatibility-break would be more consequential than I originally thought.

@goodmami goodmami changed the title Unclear areas of the REPP specification Minor issues with REPP Jul 14, 2021
@goodmami
Copy link
Member Author

goodmami commented Jul 14, 2021

Coming back to this, here are things to do based on the wiki updates by Stephan:

  • Iterative groups must be called within the module that defines them (already done; maybe needs a test)
  • Iterative group calls may occur before the group's definition
  • The tokenizer pattern must appear in the top-level module (already done; may need a test)
  • Only a tokenizer pattern in the top-level module will be used
  • It is an error to have tokenization patterns or meta-info declarations inside an iterative groups
  • Modules have one global namespace
    • iterative group identifiers may not be redefined within a module
    • nested iterative group definitions are no different than flat definitions

edit: PyDelphin allows for a default tokenizer and I may have been reading the spec too strictly, so I changed the requirement about tokenization patterns in non-top-level modules.

@goodmami goodmami added this to the v1.6.0 milestone Jul 14, 2021
@goodmami goodmami added the bug label Aug 14, 2021
goodmami added a commit that referenced this issue Aug 15, 2021
goodmami added a commit that referenced this issue Aug 18, 2021
goodmami added a commit that referenced this issue Aug 18, 2021
Part of #308

The line of code changed in repp.py is only defensive and not directly
related to the tests. It is to avoid a NameError in the event that a
group ever returns nothing, but currently that is not the case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants