Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fall 2024 Planning #1

Open
StoneyJackson opened this issue Aug 30, 2024 · 7 comments
Open

Fall 2024 Planning #1

StoneyJackson opened this issue Aug 30, 2024 · 7 comments

Comments

@StoneyJackson
Copy link
Member

No description provided.

@StoneyJackson
Copy link
Member Author

StoneyJackson commented Aug 30, 2024

  • I'm going to create something like .trash in the root and move everything that is out of date there. Why not just delete it? In case we want to mine it for ideas without having to figure out where in the git history it is.
  • I'm also going to rename substructure_parser to rough_parser. Hopefully this conveys that it provides a rough parse of the spec file, and the rest of the parsing will be completed in a later stage. This draws on "rough" as used in construction.

@StoneyJackson
Copy link
Member Author

StoneyJackson commented Aug 30, 2024

Below is a functional decomposition of plcc.

plcc
    load_spec
        load_rough_spec
            ...
        parse_spec
            parse_lexical_spec
            parse_syntactic_spec
            parse_semantic_spec
        validate_spec
            validate_lexical_spec
                ... Separate validator for each rule/check.
            validate_syntactic_spec
                ... Separate validator for each rule/check.
            validate_semantic_spec
                ... Separate validator for each rule/check.
    generate_code
        generate_scanner
        generate_parser
        generate_evaluator
        generate_interactive_scanner
        generate_interactive_parser
        generate_interactive_evaluator
    translate_code
        load_translator
        apply_translator
    write_code

The problem with a functional decomposition is that they are functions, and so are verbs. I had been using nouns for packages (folders) and verbs for modules (files). Maybe we name packages as verbs too if we want to use the above to organize code. That should be easy to do now, because there is only one package right now. Let's try it.

  • I'm going to make the existing code match the above structure.

@StoneyJackson
Copy link
Member Author

StoneyJackson commented Aug 30, 2024

OK, that makes it much clearer about what needs to be done. I'll just convert that functional decomp into a checklist ...

  • plcc
    • load_spec
      • load_rough_spec
        • split_rough_spec
      • parse_spec
        • parse_lexical_spec
        • parse_syntactic_spec
        • parse_semantic_spec
      • validate_spec
        • validate_lexical_spec
          • ... Separate validator for each rule/check.
        • validate_syntactic_spec
          • ... Separate validator for each rule/check.
        • validate_semantic_spec
          • ... Separate validator for each rule/check.

That's probably enough for now. We'll add the code generate stuff when we are further along. We also need to enumerate the different validations for each spec. We'll do that also when when we are further along.

OK, so now we need to think about parsing the individual specs: lexical, syntactic, and semantic. Each of these will receive only the part of the rough that is directly related to their section (so they won't have to deal with section dividers). We should be able to work on them in parallel, independently. Or we can work on them as a team, one at a time. Or some combination thereof.

@AksharP5
Copy link
Collaborator

AksharP5 commented Sep 2, 2024

Just want to make sure I understand, and that my thought process is on the right track as we move towards the parsing. Please feel free to tell me that all of this is unnecessary, and there is a better system in place to accomplish this goal, however I just wanted to share the ideas that I was brainstorming:

Would the best way to split the rough spec into the 3 different sections (with the possibility of multiple semantic sections) be to simply iterate over the rough_spec list of objects and create a hashmap/dict that holds the three sections, and add the objects to their respective section until a divider is encountered? The semantic section of the dict will likely require its own dict for multiple languages, maybe the languages can be the keys and the objects will be the values? Another thing, is it safe the assume that the dividers in the semantic section will have the language of the code that follows written after the divider, similar to the example below, or will we need to figure that out ourselves somehow?

Provided Input from parse_rough_test.py:

[
    Line('one', 1, None),
    Divider(Line('%', 2, None)),
    Line('two', 3, None),
    Divider(Line('% java', 4, None)),
    Include(file='/A.java', line=Line('%include /A.java', 5, None)),
    Divider(Line('% python', 6, None)),
    Include(file='/B.py', line=Line('%include /B.py', 7, None)),
    Divider(Line('% c++', 8, None)),
    Block([
        Line('%%%', 9, None),
        Line('%include nope', 10, None),
        Line('% nope', 11, None),
        Line('%%%', 12, None)
    ])
]

For example, the dict could look something like this:

sections = {
    'lex': [],
    'syn': [],
    'sem': {}
}

And using the provided example, after iteration, the output should look something like this:

{ 
'lex': [ Line('one', 1, None) ],
'syn': [ Line('two', 3, None) ],
'sem': {
    'java': [ Include(file='/A.java', line=Line('%include /A.java', 5, None)) ], 
# (this will actually be the contents of the A.java file, however I am just using 
# the rough parse without the includes being processed for this example)
    'python': [...] 
# (similar to the java section above, and then the c++ section would follow this, 
# but this should give the big picture of what I am trying to convey)
}

Also, where would this process take place? Is it better to create its own section, or will this be done at the end of the rough_spec section? Maybe the best way would be to create a file under the parse_spec section, so that we can send the resulting sections to each part?

@StoneyJackson
Copy link
Member Author

The syntax for %include is up in the air. But it looks like we need to decide this now. The good news is that how the language and tool name are determined will be isolated to this component. So if we need to change it later, it should only effect this module (yeah!).

So lets go with this...

There are three forms of separators.

%
% language
% tool language

The first defaults to Java for both language and tool name.
The second supplies the language name which will also be the tool name.
The third supplies both.

I think we need to update Divider to hold the language and tool name. For example, after parsed we should get something like this:

Divider(tool='Java', language='Java', line=Line('%', 5, None))
Divider(tool='python', language='python', line=Line('% python', 6, None))
Divider(tool='linter', language='python', line=Line('% linter python', 6, None))

Also, we'll eventually make a validator that checks that tool names are unique since they will be used to generate subdirectories in the same parent directory. But notice that having a tool name allows multiple semantic sections to use the same language.

All of the above should be part of load_rough_spec.


With the above changes, we can now include the Divider object with with semantic section that follows it. We probably don't need to include the divider in front of the syntactic section with the syntactic section.

So now we can make something like @AksharP5 described. My only adjustment would be to make 'sem' hold a list of sections instead of a dict. We don't yet know if order matters between semantic sections. So let's be conservative and assume oder matters does. The top level dictionary could be a list and just assume that the first section is the lexical section, the second is the syntactic, and the remaining are semantic sections. But a dict works too.


Let me try to answer the remaining questions asked by @AksharP5:

Also, where would this process take place? Is it better to create its own section, or will this be done at the end of the rough_spec section? Maybe the best way would be to create a file under the parse_spec section, so that we can send the resulting sections to each part?

I think all of the above is more closely affiliated with load_rough_spec. I don't think parse_spec should have to know how the "sub specs" where organized in the file (beyond knowing the order of the semantic sections).

So let's make all of this part of load_rough_spec. I'll uncheck that box to indicate that it is not yet complete.

@StoneyJackson
Copy link
Member Author

StoneyJackson commented Sep 10, 2024

OK, I have four independent issues fleshed out: #4, #5, #6, #7

@StoneyJackson
Copy link
Member Author

Starting to think about the validators for the subsections...

Lexical checks

  • Names must be all upper case letters, numbers, and underscore and cannot start with a number.
  • No duplicate names are allowed.

Syntactic checks

  • Terminal names must be all upper-case letters, numbers, and underscore and cannot start with a number.
  • Non-terminal names must start with a lower-case letter, and may contain upper or lower case letters, numbers, and underscore.
  • Every RHS non-terminal must appear on the LHS of at least one rule.
  • The resolved names for all LHS rules must be unique.
  • Within a rule, the resolved names for all RHS symbols must be unique.
  • User supplied names on for LHS symbols must start with a capital letter, and may contain upper or lower case letters, numbers, and underscore.
  • Use supplied names for RHS symbols must start with a lower case letter, and may contain upper or lower case letters, numbers, and underscore.
  • The separator for a repetition rule must be a terminal.
  • All terminals must be defined as tokens in the lexical specification.
  • The grammar must be in LL(1).

Semantic checks

  • Each locator must start with a valid class name (start with an upper case letter, and may contain upper or lower case letters, numbers, and underscore).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants