Fall 2024 Planning #1

StoneyJackson · 2024-08-30T19:57:24Z

No description provided.

StoneyJackson · 2024-08-30T19:58:34Z

I'm going to create something like .trash in the root and move everything that is out of date there. Why not just delete it? In case we want to mine it for ideas without having to figure out where in the git history it is.
I'm also going to rename substructure_parser to rough_parser. Hopefully this conveys that it provides a rough parse of the spec file, and the rest of the parsing will be completed in a later stage. This draws on "rough" as used in construction.

StoneyJackson · 2024-08-30T21:03:15Z

Below is a functional decomposition of plcc.

plcc
    load_spec
        load_rough_spec
            ...
        parse_spec
            parse_lexical_spec
            parse_syntactic_spec
            parse_semantic_spec
        validate_spec
            validate_lexical_spec
                ... Separate validator for each rule/check.
            validate_syntactic_spec
                ... Separate validator for each rule/check.
            validate_semantic_spec
                ... Separate validator for each rule/check.
    generate_code
        generate_scanner
        generate_parser
        generate_evaluator
        generate_interactive_scanner
        generate_interactive_parser
        generate_interactive_evaluator
    translate_code
        load_translator
        apply_translator
    write_code

The problem with a functional decomposition is that they are functions, and so are verbs. I had been using nouns for packages (folders) and verbs for modules (files). Maybe we name packages as verbs too if we want to use the above to organize code. That should be easy to do now, because there is only one package right now. Let's try it.

I'm going to make the existing code match the above structure.

StoneyJackson · 2024-08-30T21:25:57Z

AksharP5 · 2024-09-02T17:11:08Z

Just want to make sure I understand, and that my thought process is on the right track as we move towards the parsing. Please feel free to tell me that all of this is unnecessary, and there is a better system in place to accomplish this goal, however I just wanted to share the ideas that I was brainstorming:

Would the best way to split the rough spec into the 3 different sections (with the possibility of multiple semantic sections) be to simply iterate over the rough_spec list of objects and create a hashmap/dict that holds the three sections, and add the objects to their respective section until a divider is encountered? The semantic section of the dict will likely require its own dict for multiple languages, maybe the languages can be the keys and the objects will be the values? Another thing, is it safe the assume that the dividers in the semantic section will have the language of the code that follows written after the divider, similar to the example below, or will we need to figure that out ourselves somehow?

Provided Input from parse_rough_test.py:

[
    Line('one', 1, None),
    Divider(Line('%', 2, None)),
    Line('two', 3, None),
    Divider(Line('% java', 4, None)),
    Include(file='/A.java', line=Line('%include /A.java', 5, None)),
    Divider(Line('% python', 6, None)),
    Include(file='/B.py', line=Line('%include /B.py', 7, None)),
    Divider(Line('% c++', 8, None)),
    Block([
        Line('%%%', 9, None),
        Line('%include nope', 10, None),
        Line('% nope', 11, None),
        Line('%%%', 12, None)
    ])
]

For example, the dict could look something like this:

sections = {
    'lex': [],
    'syn': [],
    'sem': {}
}

And using the provided example, after iteration, the output should look something like this:

{ 
'lex': [ Line('one', 1, None) ],
'syn': [ Line('two', 3, None) ],
'sem': {
    'java': [ Include(file='/A.java', line=Line('%include /A.java', 5, None)) ], 
# (this will actually be the contents of the A.java file, however I am just using 
# the rough parse without the includes being processed for this example)
    'python': [...] 
# (similar to the java section above, and then the c++ section would follow this, 
# but this should give the big picture of what I am trying to convey)
}

Also, where would this process take place? Is it better to create its own section, or will this be done at the end of the rough_spec section? Maybe the best way would be to create a file under the parse_spec section, so that we can send the resulting sections to each part?

StoneyJackson · 2024-09-02T19:41:31Z

The syntax for %include is up in the air. But it looks like we need to decide this now. The good news is that how the language and tool name are determined will be isolated to this component. So if we need to change it later, it should only effect this module (yeah!).

So lets go with this...

There are three forms of separators.

%
% language
% tool language

The first defaults to Java for both language and tool name.
The second supplies the language name which will also be the tool name.
The third supplies both.

I think we need to update Divider to hold the language and tool name. For example, after parsed we should get something like this:

Divider(tool='Java', language='Java', line=Line('%', 5, None))
Divider(tool='python', language='python', line=Line('% python', 6, None))
Divider(tool='linter', language='python', line=Line('% linter python', 6, None))

Also, we'll eventually make a validator that checks that tool names are unique since they will be used to generate subdirectories in the same parent directory. But notice that having a tool name allows multiple semantic sections to use the same language.

All of the above should be part of load_rough_spec.

With the above changes, we can now include the Divider object with with semantic section that follows it. We probably don't need to include the divider in front of the syntactic section with the syntactic section.

So now we can make something like @AksharP5 described. My only adjustment would be to make 'sem' hold a list of sections instead of a dict. We don't yet know if order matters between semantic sections. So let's be conservative and assume oder matters does. The top level dictionary could be a list and just assume that the first section is the lexical section, the second is the syntactic, and the remaining are semantic sections. But a dict works too.

Let me try to answer the remaining questions asked by @AksharP5:

Also, where would this process take place? Is it better to create its own section, or will this be done at the end of the rough_spec section? Maybe the best way would be to create a file under the parse_spec section, so that we can send the resulting sections to each part?

I think all of the above is more closely affiliated with load_rough_spec. I don't think parse_spec should have to know how the "sub specs" where organized in the file (beyond knowing the order of the semantic sections).

So let's make all of this part of load_rough_spec. I'll uncheck that box to indicate that it is not yet complete.

StoneyJackson · 2024-09-10T20:02:05Z

OK, I have four independent issues fleshed out: #4, #5, #6, #7

StoneyJackson · 2024-09-17T15:33:58Z

Starting to think about the validators for the subsections...

Lexical checks

Names must be all upper case letters, numbers, and underscore and cannot start with a number.
No duplicate names are allowed.

Syntactic checks

Terminal names must be all upper-case letters, numbers, and underscore and cannot start with a number.
Non-terminal names must start with a lower-case letter, and may contain upper or lower case letters, numbers, and underscore.
Every RHS non-terminal must appear on the LHS of at least one rule.
The resolved names for all LHS rules must be unique.
Within a rule, the resolved names for all RHS symbols must be unique.
User supplied names on for LHS symbols must start with a capital letter, and may contain upper or lower case letters, numbers, and underscore.
Use supplied names for RHS symbols must start with a lower case letter, and may contain upper or lower case letters, numbers, and underscore.
The separator for a repetition rule must be a terminal.
All terminals must be defined as tokens in the lexical specification.
The grammar must be in LL(1).

Semantic checks

Each locator must start with a valid class name (start with an upper case letter, and may contain upper or lower case letters, numbers, and underscore).

StoneyJackson mentioned this issue Aug 30, 2024

Stoneyjackson/fall planning 1 #2

Merged

StoneyJackson mentioned this issue Aug 30, 2024

WIP: mv plcc/load_rough to plcc/load_spec/load_rough_spec #3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fall 2024 Planning #1

Fall 2024 Planning #1

StoneyJackson commented Aug 30, 2024

StoneyJackson commented Aug 30, 2024 •

edited

Loading

StoneyJackson commented Aug 30, 2024 •

edited

Loading

StoneyJackson commented Aug 30, 2024 •

edited

Loading

AksharP5 commented Sep 2, 2024

StoneyJackson commented Sep 2, 2024

StoneyJackson commented Sep 10, 2024 •

edited

Loading

StoneyJackson commented Sep 17, 2024

Fall 2024 Planning #1

Fall 2024 Planning #1

Comments

StoneyJackson commented Aug 30, 2024

StoneyJackson commented Aug 30, 2024 • edited Loading

StoneyJackson commented Aug 30, 2024 • edited Loading

StoneyJackson commented Aug 30, 2024 • edited Loading

AksharP5 commented Sep 2, 2024

StoneyJackson commented Sep 2, 2024

StoneyJackson commented Sep 10, 2024 • edited Loading

StoneyJackson commented Sep 17, 2024

StoneyJackson commented Aug 30, 2024 •

edited

Loading

StoneyJackson commented Aug 30, 2024 •

edited

Loading

StoneyJackson commented Aug 30, 2024 •

edited

Loading

StoneyJackson commented Sep 10, 2024 •

edited

Loading