Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduction of Compiler Command class #243

Merged
merged 15 commits into from
Apr 2, 2024

Conversation

Corniel
Copy link
Contributor

@Corniel Corniel commented Jan 27, 2024

The original commandline parsing is a weakly typed homegrown piece of code. It is hard to debug, and not trivial to write test for. I originally started by rewriting it to become strongly typed (but still homegrown).

Thanks to@siegfriedpammer, for mentioning the CSharpCommandLineParser. Roslyn already implemented it itself (and also for VB.NET with the VisualBasicCommandLineParser).

This should make our lives way easier. The only downside (but one I can live with) is the addition of 2 dependencies:

<PackageReference Include="Microsoft.CodeAnalysis.CSharp" Version="4.0.1" AllowedVersion="[4,)" />
<PackageReference Include="Microsoft.CodeAnalysis.VisualBasic" Version="4.0.1" AllowedVersion="[4,)" />

@daveaglick
Copy link
Collaborator

I really like this idea. My first instinct though was that I'd have too much trouble maintaining it long-term, but then I finally caved and decided to throw in the towel (after months of agonizing over it). #247 is a request for a new maintainer(s) and I think it makes sense to let whoever takes over the project decide the direction of open PRs.

Thanks for the effort - hopefully this really helps the next team by allowing them to start from a better place on the compiler interaction!

@Corniel Corniel marked this pull request as ready for review February 16, 2024 08:18
@Corniel
Copy link
Contributor Author

Corniel commented Feb 16, 2024

@phmonte, @daveaglick : I think I finalized this PR. The parsing of the compiler is handled by a dedicated (internal) parser that uses 3 seperate tokenizers (C#, VB.NET, F#). The result is an immutable class that has direct access to Switches, Items, and Source Files. The items required by AnalyzerResult (Additional Files, Analyzers, Preprocessor Symbols, and References) have a 'short cut'.

I decorated these structures with [DebuggerDisplay]'s, hoping that it will improve the debugger experience.

@siegfriedpammer
Copy link
Contributor

siegfriedpammer commented Feb 21, 2024

Not sure, if this is already covered, but today I discovered a bug in Buildalyzer where absolute Unix paths were not added to the SourceFiles collection. They have probably been treated as "unknown" options and were ignored. I have solved the problem by switching to Roslyn's CSharpCommandLineParser in my code (this is one of the reasons why I proposed exposing the raw command-line in #212). Maybe that would also be an option for Buildalyzer? Replicating the command-line parsing logic seems to be a lot of extra work and it's easy to miss some corner cases/platform-specific stuff.

If you want me to open an issue for the bug, I can try to build a small reproducer.

@Corniel
Copy link
Contributor Author

Corniel commented Feb 22, 2024

@siegfriedpammer It might be benefitial to use Roslyn's parser here as well (stating the obvious). We do not want to re-invent the wheel. I'll look into this. Thanks for mentioning it.

@Corniel
Copy link
Contributor Author

Corniel commented Feb 22, 2024

@siegfriedpammer I rewrote this PR based on your suggestion: It way less custom code if we use Roslyn's CSharpCommandLineParser and VisualBasicCommandLineParser. I could not yet find an F# alternative. I hope it exists, as I'm not an expert on F#, and as F# unfortunately is not built on top of Roslyn, it will take some hard work to get this right for F# too.

@phmonte
Copy link
Owner

phmonte commented Feb 22, 2024

I'm evaluating the PR, I'll get back to you soon. @Corniel @siegfriedpammer!
Thank you very much.

@Corniel
Copy link
Contributor Author

Corniel commented Feb 24, 2024

@phmonte and @daveaglick Also found the F# equivalent (at least I think I did😉). Rewrote the intro for the PR, and now am working on replacing things step by step.

@Corniel
Copy link
Contributor Author

Corniel commented Feb 24, 2024

I think I have it working. Note that C# and VB.NET commandline parsers apply a path combine on source, and additional files.

I hope I did all F# mapping correct. and there are some TODO's left. Is there any F# expert who can help with those?

@slang25
Copy link
Sponsor Contributor

slang25 commented Feb 25, 2024

I don't want this to sound negative, but I really liked the early tokenizer version of this PR and think adding the extra dependencies would be more trouble than it's worth in the long run. I had mixed feelings about NuGet.Frameworks, however that was a very light leaf dependency, so made sense.

My specific issues are:

  • These packages are often used by consumers of this package, so can create unnecessary diamond dependency version issues
  • The size of these packages and transitive tree are large, FSharp.Compiler.Service will pull in packages including FSharp.Core, these packages are many megabytes (I say this as an F# fanboy). Edit: actually FSharp.Core is tiny 😆 ignore that remark
  • The utility to package size ration is disproportionately small. i.e. the command line parsing from these libraries is useful, but is dwarfed by the overall functionality provided by the libraries. This doesn't seem appropriate.

The previous Buildalyzer vs Buildalyzer.Workspaces package split did a really good job of keeping these dependency chains separate, and this feels like a step backwards. Just my two cents (please take it as constructive 🙏 ). I'm really happy to see this project receiving so much love in the past few weeks 🙂

@Corniel
Copy link
Contributor Author

Corniel commented Feb 25, 2024

@slang25 That is useful feedback, and surely something to keep in mind. The added value for adding the F# dependency is the lowest, so dropping not adding a lot of code. For the other two: they provide tons of logic on how to interpreter the command line, taking care on a lot of logic.

So there I think of the following options:

  1. Take the earliest possible version (4.0.) and explicitly state that the supported versions are 4. and up.
  2. Remove them, 'copy' most of there logic, and use them in the unit tests as a reference to ensure the outcome it the same.

That being said, the second option is a lot of work. In general, (as pointed out during this PR) we - all people involved in maintaining this package - really need F# experts to help us. Secondly, I personally need some more inside in what this package is used for. I know what Stryker does with it, and obviously what my own Roslyn Test Tools uses it for, but to be of better help to others, feedback is needed.

@phmonte
Copy link
Owner

phmonte commented Mar 3, 2024

@Corniel , I'm evaluating this PR, the Compiler Command class is great, the AnalyzerResult.cs class is much more readable.

But, I'm also worried about the issue of adding 3 more dependencies to the project, Nuget.Framework seems like something more subtle, which was made exactly with this objective, it's a complicated decision, I need more information to decide which path to follow, but I also don't see any other alternative than the ones you mentioned.

Regardless of whether we continue with option 1 or 2, I understand that your work will be used, because even following option 2 we would maintain all class contracts, and I could do this annoying part of "copying" the logic.

You asked a great question about this package, asking how and where people are using it could be an alternative, perhaps opening an issue, looking for references in public projects, I'm thinking.

@Corniel
Copy link
Contributor Author

Corniel commented Mar 4, 2024

@phmonte with the last PR, I dropped the F# dependency. I'll continue to also drop the other two dependencies. But that will take some time.

@Corniel
Copy link
Contributor Author

Corniel commented Mar 4, 2024

After some further investigation, I came to the conclusion that the benefits of using Microsoft.CodeAnalysis.CSharp and Microsoft.CodeAnalysis.VisualBasic are even bigger than I thought. The tokenization of the command line itself is do-able (and we could just copy the straight forward implementation of Micorsoft's Microsoft.CodeAnalysis.CommandLineArgumentParser). However, the parsing of those arguments is way over the 1000 lines of code, for both C# and VB.NET, and than we are just dealing with the main routines. After all, what's in those CompilerCommand's is one of the key features of Buildalyzer.

Issues like those mentioned by @siegfriedpammer will dramatically reduce, and the code will be easier to maintain.

@slang25 and @phmonte Feedback is appreciated.

Now defined as:

<PackageReference Include="Microsoft.CodeAnalysis.CSharp" Version="4.0.1" AllowedVersion="[4,)" />
<PackageReference Include="Microsoft.CodeAnalysis.VisualBasic" Version="4.0.1" AllowedVersion="[4,)" />

@Corniel
Copy link
Contributor Author

Corniel commented Mar 19, 2024

@phmonte I'd like to continue with this. Can we decide on how to do that?

@phmonte
Copy link
Owner

phmonte commented Mar 20, 2024

Actually using these packages saves a lot of lines of code, at first I was thinking about copying the library logic, but I confess that the idea of the minimum version seems very satisfactory to me and excludes compatibility problems if by chance the project already uses some of these libraries.

But, I would feel safer applying these changes in a next version, if this is not a dependency for the other open PRs, what do you think @Corniel ?

@daveaglick , if you have any feedback too, it would be greatly appreciated.

@Corniel
Copy link
Contributor Author

Corniel commented Mar 20, 2024

@phmonte My idea is to prepare more 'drastic' changes that would require a new major version. Eventually I think the result of an analysis would be the CompilerCommand, the properties/items and the dot-net info. So we could decide to merge this not back to master, but keep it o next-stable-major-branch and continue this process.

@Corniel Corniel changed the title Proposal: class that represents the Compiler Command Introduction of Compiler Command class Mar 21, 2024
@Corniel
Copy link
Contributor Author

Corniel commented Mar 28, 2024

@phmonte I did a rebase. I would like to create a next PR that combines this with the Compiler Items and Properties, but that obvious requires this to be merged first.

@phmonte
Copy link
Owner

phmonte commented Apr 2, 2024

Several projects that use this library have references to Microsoft.CodeAnalysis.CSharp and Microsoft.CodeAnalysis.VisualBasic, version 4+ would be compatibility from the year 2021, I think it is good compatibility and we will no longer need to worry about many lines of code, I did some tests and the behavior seems satisfactory, let's continue this way.
@Corniel , could you update the PR description? for documentation only.
As soon as I update, I'll do the merge.

@Corniel
Copy link
Contributor Author

Corniel commented Apr 2, 2024

Done. Feel free to make extra additions to the summary.

@phmonte phmonte merged commit 11e7acf into phmonte:main Apr 2, 2024
6 checks passed
@Corniel Corniel deleted the command-line-tokenizer branch April 2, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants