Proposal: Inline Languages #226

AdamSobieski · 2017-03-04T05:38:22Z

AdamSobieski
Mar 4, 2017

Proposal

I'd like to propose an idea for C# 8: inline languages. C and C++ have the __asm keyword which opens a new syntactical scope for inline assembly coding [1][2]. Proposed are such nested scopes in C# 8 for an open-ended set of languages.

Inline languages are envisioned as implemented by .NET components and interoperable with integrated development environments, compilers and debuggers.

Modes of Operation

Three modes of operation are compatible with inline languages:

As per __asm, generates program logic
Generates program logic which generates a runtime object
Embeds a resource into an assembly and generates program logic to load it and to parse it into a runtime object

Advantages

Software quality
1. Maintainability - suitability for debugging (localization and correction of errors) and for modification and extension of functionality
2. Readability
3. Testability - suitability for allowing the programmer to follow program execution (runtime behavior under given conditions) and for debugging
Optimization with MSIL
Complexity reduction in use of nested languages
1. data languages (XML, RDF, CSV, etc) with IDE features
2. querying languages (SQL, SPARQL, etc) with IDE features
3. programming languages (Prolog, etc) with IDE features
4. special purpose languages (SRGS, SSML, grammars) with IDE features

Examples

using language __msil = System.Runtime.MsilComponent;

class Example1
{
  void Function()
  {
    int x;

    __msil
    {
      ... x;
    }
  }
}

using language __xml = System.Xml.XmlComponent;

class Example2
{
  void Function()
  {
    System.Xml.XmlDocument x1 = __xml
    {
      <!-- -->
    }
    System.Xml.XmlDocumentFragment x2 = __xml
    {
      <!-- -->
    }
  }
}

Scenarios

Ideas for inline language scenarios include:

__msil,
__xml, __rdf, __n3,
__sql, __sparql,
__pls, __srgs, __ssml,
__antlr, __grammar,
__prolog

References

[1] https://msdn.microsoft.com/en-us/library/45yd4tzz.aspx
[2] https://msdn.microsoft.com/en-us/library/4ks26t93.aspx

YaakovDavis · 2017-03-04T09:39:42Z

YaakovDavis
Mar 4, 2017

This feature can be useful at both compile-time & runtime. The compile-time aspects overlap with Source Generators.

Regarding runtime, allowing specific language instances can enable even greater flexibility:

using language xml = new XmlCodeProcessor(Config.SomeSetting | Config.SomeOtherSetting);

Also, it should be possible do init such processors in any scope.

0 replies

JoergWMittag · 2017-03-04T10:38:16Z

JoergWMittag
Mar 4, 2017

Features like this are typically associated with programming languages that have extremely light-weight syntax (e.g. Parsing Words in Factor) or no syntax at all (e.g. Racket). Perl6 also apparently has a story for changing the language "mid-flight" (so to speak) in that the parser itself is just a Perl6 object accessible (and mutable!) at run-time.

However, all of these have something in common: they are either highly-dynamic, have little to no syntax, or both. In other words: the exact opposite of C♯.

But, there is a language which successfully combines strong static typing, complex syntax, and inline languages (plus also macros): Converge. Its Compile-Time Meta-Programming (CTMP) features include a feature called DSL Blocks that is exactly equivalent to this proposal. The way it works is rather simple: the code inside a DSL Block is treated as a completely opaque black box by the Converge parser; it is simply handed off as a string to a DSL processor function. That function must then return a valid Converge AST fragment that gets inserted instead of the DSL block node into the overall AST, and normal processing (typechecking etc.) then proceeds from there.

In order to help create such AST fragments, there is a compiler API, called Compiler External Interface (similar to Roslyn). There are also language features such as quotes, quasi-quotes and splices. And the standard library provides a DSL for writing AST fragments, which can be used via a DSL block (how very meta!)

Note that many of the building blocks are already in C♯: there is an external compiler API (Roslyn). There are already discussions about macros, which will presumably include quotes, quasi-quotes, and splices. (These inline languages are a superset of macros, it makes no sense to separate their discussion. If we assume that we are going to get inline languages, we might as well assume that we already have macros, since they are simpler.) AFAIK, there already are fluent DSLs for creating Roslyn trees, so once we have inline languages, adding an inline language for Roslyn trees that helps writing processors for inline languages is more or less trivial.

However, there is one syntactic wrinkle that precludes simply "stealing" this feature: Converge's syntax is indentation-based, inspired by Python. This makes it very easy to delimit DSL blocks. But C♯'s syntax is curly-brace-based, and needs matching pairs of delimiters to delimit blocks. The problem is: inline languages are free-form, there is simply no guarantee that they will not contain a } character. @orthoxerox already alluded to this problem in his comment on the original issue. So, the syntax proposed by @AdamSobieski in the OP simply cannot work.

Heredocs (various Unix shells, Perl, Ruby, many others) and Ruby's percent-literals offer some examples of existing solutions to that problem. Both involve allowing the programmer to specify the delimiter.

Heredocs are string literals for large, multiline strings (they literally are for "inserting a document here"). Here is an example of Ruby's heredoc syntax:

a_long_string = <<~__END_OF_DOCUMENT__
line 1
line 2
can contain ' and " and nothing breaks
__END_OF_DOCUMENT__

Basically, the trick is: the programmer supplies their own end-of-document-marker which she knows doesn't appear in the document itself. There are various flavors of heredocs, in some, the end-of-document-marker must appear on a line by itself at the very beginning of the line, in some, the marker is allowed to be indented. Ruby allows three forms: the marker must be at the start of the line, the marker may be indented, and the third form takes the whitespace in front of the first non-whitespace character of the first line, and removes it from the beginning of all lines, which allows you to write the heredoc with proper indentation inside a more complex block of code, but have that indentation removed from the string.

Ruby's percent literals are similar, they exist for strings, regex, and many other forms of literals. They again allow the programmer to choose their own delimiters:

%Q@some string@ # %Q means "double quoted string", the @ is the chosen delimiter
%Q,some string,
%Q.some string.

%Q[some string] # some delimiters come in "pairs"
%Q{some string}

Note: there is an interesting relation to user-defined string interpolation here, akin to ECMAScript's Template Literals or Scala's String Interpolation: both allow the programmer to process strings in an arbitrary way, the main difference being that processed string literals are inserted into the program as values and interpolation is performed at runtime, whereas processed inline language blocks are inserted into the programs as code and processing happens at compile time. But a lot of the syntactic challenges are similar, in that both string literals and inline language blocks can contain arbitrary text.

Here is a proposed modification of the syntax in the OP, with just one change: it uses the user-defined marker as an end-of-document marker instead of insisting on a matched pair of { and }:

using language __msil = System.Runtime.MsilComponent;

class Example1
{
  void Function()
  {
    int x;

    __msil
      ... x;
    __msil;
  }
}

using language __xml = System.Xml.XmlComponent;

class Example2
{
  void Function()
  {
    System.Xml.XmlDocument x1 = __xml
      <!-- -->
    __xml;
    System.Xml.XmlDocumentFragment x2 = __xml
      <!-- -->
    __xml;
  }
}

And some of the other examples from the original issue:

F♯ by @dsaf:

public void PrintColor(Color color)
{
    return
    __fs
        match color with
        | Color.Red -> printfn "Red"
        | Color.Green -> printfn "Green"
        | Color.Blue -> printfn "Blue"
        | _ -> ()
    __fs;
}

Cosmos X♯ (seriously, there are now three languages called X♯?) by @fanol:

public void Execute() {
    __x# 
        ESI = 12345              // assign 12345 to ESI
        EDX = #constantForEDX    // assign #ConstantForEDX to EDX
        EAX = EBX                // move EBX to EAX              => mov ebx, eax
        EAX--                    // decrement EAX                => dec eax
        EAX++                    // increment EAX                => inc eax
        EAX + 2                  // add 2 to eax                 => add eax, 2
        EAX - $80                // subtract 0x80 from eax       => sub eax, 0x80
        BX * CX                  // multiply BX by CX            => mul cx      -- division, multiplication and modulo should preserve registers
        CX / BX                  // divide CX by BX              => div bx
        CX mod BX                // remainder of CX/BX to BX     => div bx
    __x#
}

Brainfuck by @alrz:

class HelloWorld
{
  static void Main()
  {
    __bf
      ++++++++[>++++[>++>+++>+++>+<<<<-]>
      +>+>->>+[<]<-]>>.>---.+++++++..+++.
      >>.<-.<.+++.------.--------.>>+.>++.
    __bf
  }
}

Note: This issue was originally reported in dotnet/roslyn#13735, this is an expanded version of my comment there.

0 replies

YaakovDavis · 2017-03-04T10:45:16Z

YaakovDavis
Mar 4, 2017

@JoergWMittag

That was an interesting read, thanks.

Xtend honors relative indentation in "Template Expressions" which is quite nice:
http://www.eclipse.org/xtend/documentation/203_xtend_expressions.html#templates

0 replies

iam3yal · 2017-03-04T15:20:53Z

iam3yal
Mar 4, 2017

I don't think that they will ever allow this to exist in C# and there's already a proposal for compiler intrinsics and if you will read this you will realize that it wouldn't happen for IL not to mention a more generalized form of this.

0 replies

HaloFour · 2017-03-04T15:43:27Z

HaloFour
Mar 4, 2017

This would introduce a ridiculous amount of complexity to the parser and compiler API for very little real benefit. CLS ensures that you can already mix languages in a solution. ILMerge allows that within a single project. I'd rather see better tooling support for the latter in Visual Studio.

0 replies

bbarry · 2017-03-04T18:53:47Z

bbarry
Mar 4, 2017

I don't think it is really that much more complex than a combination of other proposals:

depending on the syntax of Proposal: Support "raw" string literals #89, inline languages could be raw strings with a particular named delimiter
this delimiter could be watched for by a referenced generator dll Champion "Replace/original and code generation extensions" #107 to permit an analyzer to parse the string and offer syntactic/semantic tooling as well as code gen into the result

0 replies

HaloFour · 2017-03-04T19:05:40Z

HaloFour
Mar 4, 2017

@bbarry

A source generator could accomplish that with no other changes to the language or tooling. But the experience would be horrendous. It'd be limited strictly to new members and could not interact with the locals of any method, unlike inline assembly in C. It also depends on those languages having transpilers to C#, effectively requiring them to be rewritten and they'd be limitrd to what can be expressed in C#. In short, a whole lot of work for nothing. You're much better off writing a second assembly in that language of choice.

0 replies

tannergooding · 2017-03-04T19:06:08Z

tannergooding
Mar 4, 2017
Collaborator

@bbarry, I think it is significantly more complex. At the very least, it means that the compiler doesn't have full knowledge of how the stack looks and it no longer knows if the method is doing anything unsafe or unverifiable.

There are only a handful of IL operations which are not currently supported by C# today, and I would much rather see those get proper language support than just supporting inline IL.

At the very least, providing the intrinsic support would be significantly better and would still allow the compiler to maintain end to end knowledge of what a method is doing. The 64-bit version of MSVC dropped support for inline assembly all together and only allows intrinsics. Even without inline assembly and given that intrinsic support is available, essentially the only time raw assembly is needed is when the user needs to squeeze every bit of performance out of a very critical function.

For the cases where intrinsics aren't good enough, we have plenty of other options that are much safer and simpler (such as ILMerge).

0 replies

svick · 2017-03-04T22:08:37Z

svick
Mar 4, 2017
Collaborator

I think part of the problem with this proposal is that it tries to offer one solution to two quite distinct issues:

inline IL
better support for creating objects that are result of parsing code in some other language (XML, SQL, …)

Inline IL is very niche and I think that compiler intrinsics (#191) are good enough.

Parsing other languages is much more widely useful, and I think also more complex issue. The problems I see with the existing solutions (like XDocument.Parse(@"<elem attr=""value"">text</elem>")) are:

The need to escape quotes in verbatim strings, which makes the code hard to copy&paste (both ways) and also harder to read.
No IntelliSense, syntax checking or syntax highlighting.
It's a constant string that's parsed at runtime, which could be a performance problem.

For 1., I think something similar to heredocs (see #89) is the right solution.

For 2., you could probably write an analyzer that checks the syntax, but IntelliSense and syntax highlighting would not be that simple. But it shouldn't be hard to extend Roslyn to add support for something like that.

As for 3., I'm not sure it's a big issue, but I also don't know what would be a good solution. Maybe some form of code generators?

0 replies

fanoI · 2017-03-07T09:12:50Z

fanoI
Mar 7, 2017

Let it go inline IL if the preferred method is to create intrisics that will be probably more easy to use in that way but let analyze other case:

Embed F#, VB.Net, IronPython or - in general - a language that compiles to IL that seems easy to me Roslyn recognizes and parses already at least VB.Net and IronPython and C# could easily interoperate with them. See this as an alternative to create a new DLL and then do ILMerge only because you want use Python to parse an XML or F# to do numeric crunching
XML, Json not really "languages" the only thing one want is to make them use variable defined in C#
X# or other languages that does not compile to IL... they have to be compiled with their compiler and yes could effectively generate unsafe / unverifiable code (X# is a high level assembler so surely can do this), maybe one could make so that for this category of languages one must use an unsafe block?

So for X#:

unsafe public void Execute() {
    __x# 
        ESI = 12345              // assign 12345 to ESI
        EDX = #constantForEDX    // assign #ConstantForEDX to EDX
        EAX = EBX                // move EBX to EAX              => mov ebx, eax
        EAX--                    // decrement EAX                => dec eax
        EAX++                    // increment EAX                => inc eax
        EAX + 2                  // add 2 to eax                 => add eax, 2
        EAX - $80                // subtract 0x80 from eax       => sub eax, 0x80
        BX * CX                  // multiply BX by CX            => mul cx      -- division, multiplication and modulo should preserve registers
        CX / BX                  // divide CX by BX              => div bx
        CX mod BX                // remainder of CX/BX to BX     => div bx
    __x#
}

0 replies

Thaina · 2017-03-08T05:41:41Z

Thaina
Mar 8, 2017

It seem this feature would also enable metaprogramming?

If we have inline IL. Is it possible that we could run it on compile time?

0 replies

alrz · 2017-03-08T12:21:36Z

alrz
Mar 8, 2017

I'd rather have multi-language projects and compiler instristics than this. the former is related to project system, but the latter is already championed.

0 replies

JoergWMittag · 2017-03-09T20:23:25Z

JoergWMittag
Mar 9, 2017

@HaloFour wrote:

This would introduce a ridiculous amount of complexity to the parser and compiler API for very little real benefit.

I'm not entirely sure that's the case. The parser only has to look for the start token, then keep going until it finds the end token and simply ignore anything in between. The parsing of the inline language is done by the inline language's parser, not C♯'s.

The compiler already has APIs for transforming trees. All the compiler has to do is to call a method, passing it the raw uninterpreted string as argument; that method then returns a subtree fragment, which the compiler attaches to the tree and then keeps going just as it would have otherwise. Really, the only difference is that instead of calling a method defined within the compiler, it calls a method defined in a different assembly.

CLS ensures that you can already mix languages in a solution. ILMerge allows that within a single project. I'd rather see better tooling support for the latter in Visual Studio.

The method presented here would allow LINQ Query Expression Syntax to be implemented as a library, instead of as a change to the language spec as was done 10 years ago in C♯ 3.

I challenge anybody who claims that this feature can already be achieved using ILMerge and other features that already exist to implement LINQ Query Expressions as a library feature using only features that are in the current master branch of either the spec or Roslyn.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Inline Languages #226

{{title}}

Replies: 13 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Proposal: Inline Languages #226

Proposal

Modes of Operation

Advantages

Examples

Scenarios

References

Replies: 13 comments

tannergooding Mar 4, 2017 Collaborator

svick Mar 4, 2017 Collaborator

tannergooding
Mar 4, 2017
Collaborator

svick
Mar 4, 2017
Collaborator