Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BC] Split parser into php_only parser and php parser (combined php/html) #180

Closed
wants to merge 8 commits into from

Conversation

calebdw
Copy link
Collaborator

@calebdw calebdw commented Jun 26, 2023

Hello!

This is a rough draft at splitting the parser into two separate parsers: one for just PHP (which allows top level statements, start/end tags are optional), and one that parsers both PHP and HTML (which reuses the PHP-only parser to avoid duplication).

This is necessary because most modern PHP projects do not have monolithic HTML/PHP files but rather most likely follow some sort of MVC pattern/framework (I'm approaching this with Laravel in mind) where the HTML is stored in separate files (e.g., "views") and a templating engine (e.g., Blade, twig, etc.) is used to process the templates into cached php/html files. However, we need to be able to inject PHP into nodes that do not have the php tags---see #168 (provides both Blade and markdown examples), and EmranMR/tree-sitter-blade#5. This is currently not possible with the existing parser as the text is too hungry and there's not a good way of determining if something should be text or a top level statement.

I've tried to follow the same split-parser pattern that the markdown parser uses---at this point I could certainly use some feedback / help on things I'm not too familiar with (like ensuring all of the bindings are properly setup). Things like node/parser names can certainly be updated if necessary (e.g., php_only could be renamed to phpo, php_inline, or php_base...whichever makes the most sense).

Thoughts @maxbrunsfeld? Is this the best path forward (in order to accept top level statements and being able to inject PHP into nodes that do not contain php start/end tags) or is there pattern you'd like to see?

Thanks!

Checklist:

  • All tests pass in CI
  • There are sufficient tests for the new fix/feature
  • Grammar rules have not been renamed unless absolutely necessary (x rules renamed)
  • The conflicts section hasn't grown too much (x new conflicts)
  • The parser size hasn't grown too much (master: STATE_COUNT, PR: STATE_COUNT) (check the value of STATE_COUNT in src/parser.c)

@cfroystad
Copy link
Collaborator

Just popping by to say that I applaud the effort, but it will be a few weeks before I'll find the time to do a proper review (someone else might be faster of course). Just to manage expectations 🙂

@EmranMR
Copy link

EmranMR commented Aug 21, 2023

Hi @calebdw I am not a C Programmer and I have no idea how the Makefile works or gets structured. But, I reckon, we need one in each folder? one for tree-sitter-php and one for tree-sitter-php-only?

The current Makfile also utilises the git repo URL to declare the name. This will default the PARSER_NAME for both folders php, producing conflicting name for the .dylib?

Regardless, I couldn't get this Makefile work using the compile_parser.sh provided by Nova anyways. It output an unusual name. So I just used their complimentary template Makefile instead, hardcoded the php_only as PARSER_NAME and it finally output me a functioning .dylib . 👍

Both of which can be found here for examination (The link downloads the zip file containing Makefile & compile_parser.sh supplied by Panic/Nova)

Nova tree-sitter Compiling Docs: here

@vivere-dally

This comment was marked as off-topic.

@cfroystad
Copy link
Collaborator

Sorry for the extended delay. Life and all of that happened...

I really like that this approach will allow supporting more use cases and make it easier to reuse the parser across project. I suppose this could make life easier for the docblock parser as well.

What gives me pause is that it would be really nice to be able to do this without creating a new parser, since it's both basically PHP. However, I don't currently see a way around it. Though, I should be noted that I'm in no way an authority on the injection system in tree-sitter.

Another thing to consider is that this is a big change in how the parser is used, and will require quite a few projects to adapt.

I wonder if not the approach used by tree-sitter-typescript would be more beneficial here? That way I think we can share a common grammar for the base and have two different parsers where one would behave identically to the current parser (without needing two parse passes to obtain the result) and one that can be used inline.

@calebdw
Copy link
Collaborator Author

calebdw commented Oct 22, 2023

@cfroystad, no worries! My wife just had a baby so I totally understand :)

I agree that it would be best to not have to create another parser but I'm not sure if it's possible... I tried to allow top level statements similar to the Go parser (#177) but it turns out that it's really difficult to tell if something should be text or PHP without using the PHP tags---which is really no surprise since PHP was originally a templating language.

Thanks for pointing to the TS grammer! I haven't seen that pattern before---basically the grammar file is parameterized on the parser name which seems like it should work here...the only difference between the two parsers is the handling of the text.

I'll try to implement something similar in a separate branch

@calebdw
Copy link
Collaborator Author

calebdw commented Oct 22, 2023

@cfroystad, do you have any preference on the name of the new parser? Some thoughts are:

  • php_only/phpo
  • php_base
  • php_inline

@cfroystad
Copy link
Collaborator

Congratulations on the baby! 🙂

hm, naming is always difficult... First I leaned towards php_base, but it's not really a base. Then I thought php_inline sounded good, but it's also supporting php not being inline. I think php_only would be best.

In an ideal world, I'd like to name the "php_only" version "php" and the one including html "php_html" or something like that. But php_only would be my preference all things considered.

Thank you for working on this! It's much apprechiated!

@calebdw calebdw mentioned this pull request Oct 23, 2023
5 tasks
@calebdw
Copy link
Collaborator Author

calebdw commented Oct 23, 2023

Closing---moving discussion to #192

@calebdw calebdw closed this Oct 23, 2023
@calebdw
Copy link
Collaborator Author

calebdw commented Oct 23, 2023

The current Makfile also utilises the git repo URL to declare the name. This will default the PARSER_NAME for both folders php, producing conflicting name for the .dylib?

@EmranMR, I never updated the Makefile in this branch---however, the git repo name is only used if a parser name is not provided. In the new branch you can use the following:

PARSER_NAME=php_only make

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants