Proposal: Inline Languages #226
Replies: 13 comments
-
This feature can be useful at both compile-time & runtime. The compile-time aspects overlap with Source Generators. Regarding runtime, allowing specific
Also, it should be possible do init such processors in any scope. |
Beta Was this translation helpful? Give feedback.
-
Features like this are typically associated with programming languages that have extremely light-weight syntax (e.g. Parsing Words in Factor) or no syntax at all (e.g. Racket). Perl6 also apparently has a story for changing the language "mid-flight" (so to speak) in that the parser itself is just a Perl6 object accessible (and mutable!) at run-time. However, all of these have something in common: they are either highly-dynamic, have little to no syntax, or both. In other words: the exact opposite of C♯. But, there is a language which successfully combines strong static typing, complex syntax, and inline languages (plus also macros): Converge. Its Compile-Time Meta-Programming (CTMP) features include a feature called DSL Blocks that is exactly equivalent to this proposal. The way it works is rather simple: the code inside a DSL Block is treated as a completely opaque black box by the Converge parser; it is simply handed off as a string to a DSL processor function. That function must then return a valid Converge AST fragment that gets inserted instead of the DSL block node into the overall AST, and normal processing (typechecking etc.) then proceeds from there. In order to help create such AST fragments, there is a compiler API, called Compiler External Interface (similar to Roslyn). There are also language features such as quotes, quasi-quotes and splices. And the standard library provides a DSL for writing AST fragments, which can be used via a DSL block (how very meta!) Note that many of the building blocks are already in C♯: there is an external compiler API (Roslyn). There are already discussions about macros, which will presumably include quotes, quasi-quotes, and splices. (These inline languages are a superset of macros, it makes no sense to separate their discussion. If we assume that we are going to get inline languages, we might as well assume that we already have macros, since they are simpler.) AFAIK, there already are fluent DSLs for creating Roslyn trees, so once we have inline languages, adding an inline language for Roslyn trees that helps writing processors for inline languages is more or less trivial. However, there is one syntactic wrinkle that precludes simply "stealing" this feature: Converge's syntax is indentation-based, inspired by Python. This makes it very easy to delimit DSL blocks. But C♯'s syntax is curly-brace-based, and needs matching pairs of delimiters to delimit blocks. The problem is: inline languages are free-form, there is simply no guarantee that they will not contain a Heredocs (various Unix shells, Perl, Ruby, many others) and Ruby's percent-literals offer some examples of existing solutions to that problem. Both involve allowing the programmer to specify the delimiter. Heredocs are string literals for large, multiline strings (they literally are for "inserting a document here"). Here is an example of Ruby's heredoc syntax: a_long_string = <<~__END_OF_DOCUMENT__
line 1
line 2
can contain ' and " and nothing breaks
__END_OF_DOCUMENT__ Basically, the trick is: the programmer supplies their own end-of-document-marker which she knows doesn't appear in the document itself. There are various flavors of heredocs, in some, the end-of-document-marker must appear on a line by itself at the very beginning of the line, in some, the marker is allowed to be indented. Ruby allows three forms: the marker must be at the start of the line, the marker may be indented, and the third form takes the whitespace in front of the first non-whitespace character of the first line, and removes it from the beginning of all lines, which allows you to write the heredoc with proper indentation inside a more complex block of code, but have that indentation removed from the string. Ruby's percent literals are similar, they exist for strings, regex, and many other forms of literals. They again allow the programmer to choose their own delimiters: %Q@some string@ # %Q means "double quoted string", the @ is the chosen delimiter
%Q,some string,
%Q.some string.
%Q[some string] # some delimiters come in "pairs"
%Q{some string} Note: there is an interesting relation to user-defined string interpolation here, akin to ECMAScript's Template Literals or Scala's String Interpolation: both allow the programmer to process strings in an arbitrary way, the main difference being that processed string literals are inserted into the program as values and interpolation is performed at runtime, whereas processed inline language blocks are inserted into the programs as code and processing happens at compile time. But a lot of the syntactic challenges are similar, in that both string literals and inline language blocks can contain arbitrary text. Here is a proposed modification of the syntax in the OP, with just one change: it uses the user-defined marker as an end-of-document marker instead of insisting on a matched pair of
And some of the other examples from the original issue:
Cosmos X♯ (seriously, there are now three languages called X♯?) by @fanol:
Note: This issue was originally reported in dotnet/roslyn#13735, this is an expanded version of my comment there. |
Beta Was this translation helpful? Give feedback.
-
That was an interesting read, thanks. Xtend honors relative indentation in "Template Expressions" which is quite nice: |
Beta Was this translation helpful? Give feedback.
-
I don't think that they will ever allow this to exist in C# and there's already a proposal for compiler intrinsics and if you will read this you will realize that it wouldn't happen for IL not to mention a more generalized form of this. |
Beta Was this translation helpful? Give feedback.
-
This would introduce a ridiculous amount of complexity to the parser and compiler API for very little real benefit. CLS ensures that you can already mix languages in a solution. ILMerge allows that within a single project. I'd rather see better tooling support for the latter in Visual Studio. |
Beta Was this translation helpful? Give feedback.
-
I don't think it is really that much more complex than a combination of other proposals:
|
Beta Was this translation helpful? Give feedback.
-
A source generator could accomplish that with no other changes to the language or tooling. But the experience would be horrendous. It'd be limited strictly to new members and could not interact with the locals of any method, unlike inline assembly in C. It also depends on those languages having transpilers to C#, effectively requiring them to be rewritten and they'd be limitrd to what can be expressed in C#. In short, a whole lot of work for nothing. You're much better off writing a second assembly in that language of choice. |
Beta Was this translation helpful? Give feedback.
-
@bbarry, I think it is significantly more complex. At the very least, it means that the compiler doesn't have full knowledge of how the stack looks and it no longer knows if the method is doing anything unsafe or unverifiable. There are only a handful of IL operations which are not currently supported by C# today, and I would much rather see those get proper language support than just supporting inline IL. At the very least, providing the intrinsic support would be significantly better and would still allow the compiler to maintain end to end knowledge of what a method is doing. The 64-bit version of MSVC dropped support for inline assembly all together and only allows intrinsics. Even without inline assembly and given that intrinsic support is available, essentially the only time raw assembly is needed is when the user needs to squeeze every bit of performance out of a very critical function. For the cases where intrinsics aren't good enough, we have plenty of other options that are much safer and simpler (such as ILMerge). |
Beta Was this translation helpful? Give feedback.
-
I think part of the problem with this proposal is that it tries to offer one solution to two quite distinct issues:
Inline IL is very niche and I think that compiler intrinsics (#191) are good enough. Parsing other languages is much more widely useful, and I think also more complex issue. The problems I see with the existing solutions (like
For 1., I think something similar to heredocs (see #89) is the right solution. For 2., you could probably write an analyzer that checks the syntax, but IntelliSense and syntax highlighting would not be that simple. But it shouldn't be hard to extend Roslyn to add support for something like that. As for 3., I'm not sure it's a big issue, but I also don't know what would be a good solution. Maybe some form of code generators? |
Beta Was this translation helpful? Give feedback.
-
Let it go inline IL if the preferred method is to create intrisics that will be probably more easy to use in that way but let analyze other case:
So for X#:
|
Beta Was this translation helpful? Give feedback.
-
It seem this feature would also enable metaprogramming? If we have inline IL. Is it possible that we could run it on compile time? |
Beta Was this translation helpful? Give feedback.
-
I'd rather have multi-language projects and compiler instristics than this. the former is related to project system, but the latter is already championed. |
Beta Was this translation helpful? Give feedback.
-
@HaloFour wrote:
I'm not entirely sure that's the case. The parser only has to look for the start token, then keep going until it finds the end token and simply ignore anything in between. The parsing of the inline language is done by the inline language's parser, not C♯'s. The compiler already has APIs for transforming trees. All the compiler has to do is to call a method, passing it the raw uninterpreted string as argument; that method then returns a subtree fragment, which the compiler attaches to the tree and then keeps going just as it would have otherwise. Really, the only difference is that instead of calling a method defined within the compiler, it calls a method defined in a different assembly.
The method presented here would allow LINQ Query Expression Syntax to be implemented as a library, instead of as a change to the language spec as was done 10 years ago in C♯ 3. I challenge anybody who claims that this feature can already be achieved using ILMerge and other features that already exist to implement LINQ Query Expressions as a library feature using only features that are in the current master branch of either the spec or Roslyn. |
Beta Was this translation helpful? Give feedback.
-
Proposal
I'd like to propose an idea for C# 8: inline languages. C and C++ have the
__asm
keyword which opens a new syntactical scope for inline assembly coding [1][2]. Proposed are such nested scopes in C# 8 for an open-ended set of languages.Inline languages are envisioned as implemented by .NET components and interoperable with integrated development environments, compilers and debuggers.
Modes of Operation
Three modes of operation are compatible with inline languages:
__asm
, generates program logicAdvantages
Examples
Scenarios
Ideas for inline language scenarios include:
__msil
,__xml
,__rdf
,__n3
,__sql
,__sparql
,__pls
,__srgs
,__ssml
,__antlr
,__grammar
,__prolog
References
[1] https://msdn.microsoft.com/en-us/library/45yd4tzz.aspx
[2] https://msdn.microsoft.com/en-us/library/4ks26t93.aspx
Beta Was this translation helpful? Give feedback.
All reactions