-
Notifications
You must be signed in to change notification settings - Fork 0
/
conclusion.tex
359 lines (276 loc) · 38.4 KB
/
conclusion.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
% !TEX root = omar-thesis.tex
\chapter{Discussion \& Conclusion}\label{chap:conclusion}
% \begin{quote}
% \emph{Any representation of reality we develop can be only partial. There is no finality, sometimes no single best representation. There is only deeper understanding, more revealing and enveloping representations.
% }
% -- Carl R. Woese \cite{woese}
% \end{quote}
\vspace{-4px}
\section{Summary of Contributions}
In summary, typed literal macros (TLMs) allow library providers to programmatically control the parsing and expansion of generalized expression and pattern literal forms. A proto-expansion validation step allows a TLM client to reason about types, binding and segmentation \emph{abstractly}, i.e. without examining the macro definition or the generated expansion in complete detail. Instead, the client needs only to be made aware of the type annotation on the applied TLM and the \emph{splice summary} of the generated proto-expansion, which locates and assigns a type to each spliced term within the literal body. This information can be communicated to the client using secondary notation (e.g. colors) together with a simple type inspection service similar to that available in many program editors today.
This work developed a series of progressively more sophisticated core calculi in order to formally characterize the mechanisms of TLM definition and application and the associated abstract reasoning principles. In particular:
\begin{enumerate}
\item Chapters \ref{chap:uetsms} and \ref{chap:uptsms} developed $\miniVersePat$, which communicates the central concepts of typed expansion and proto-expansion validation for simple expression and pattern TLMs, respectively. It also showed how we can formally establish the abstract reasoning principles available to TLM clients. The reasoning principles related to binding relied on the standard notion of capture-avoiding substitution (for ABTs), rather than on a macro-specific mechanism for fresh variable generation as in other formal accounts of macro systems (notably, the work of Herman and Wand \cite{Herman10:Theory,DBLP:conf/esop/HermanW08}.)
\item Chapter \ref{chap:ptsms} then developed $\miniVerseParam$, which added type and module parameters. These enrich the TLM type discipline to allow a single TLM to operate across a parameterized family of types. They also allow TLMs to refer to external bindings through module parameters, rather than requiring that all external references go through spliced terms. Support for partial application in higher-order abbreviations lowers the syntactic cost of this explicit parameter-passing style.
\item Chapter \ref{chap:static-eval} developed $\miniVersePH$, which adds a static environment shared between TLM definitions. This makes the job of the TLM provider easier, by giving them access to libraries, including those that themselves export TLM definitions. We gave examples of TLMs that are useful for defining other TLMs, including quasiquotation TLMs and a grammar-based parser generator. We also briefly discussed how TLMs interact with library management systems.
\item Finally, Chapter \ref{chap:tsls} developed Bidirectional $\miniVersePat$, which supports a mechanism of TLM implicits that further decreases the syntactic cost of applying a TLM while retaining the essential reasoning principles developed in the previous chapters.
\end{enumerate}
\vspace{-10px}
\section{Summary of Related Work}
\vspace{-4px}
Chapter \ref{chap:background} gave a detailed account of the various existing mechanisms of syntactic control. In this section, we briefly compare TLMs to this related work more directly.
Compared to dialect-oriented mechanisms of syntactic control, i.e. mechanisms that allow providers to form a new syntax dialect by extending the context-free syntax of a language, TLMs provide clients with stronger abstract reasoning principles. In particular:
\begin{enumerate}
\item \textbf{Conflict}: Clients attempting to combine general syntax dialects that have been found to be individually free of syntactic conflicts cannot be sure that the combined dialect will be free of syntactic conflicts. Only a system described by Schwerdfeger and Van Wyk allows reasoning modularly about conflicts, but for a restricted class of syntax dialects \cite{conf/pldi/SchwerdfegerW09,schwerdfeger2010context}. In contrast, TLM providers can reason modularly about syntactic conflicts because the context-free syntax of the language is fixed.
\item \textbf{Responsibility}: Clients using a combined syntax dialect cannot easily determine which constituent dialect is responsible for any given form. In contrast, TLM clients can reason about responsibility by following the binding structure of the language in the usual manner.
\item \textbf{Segmentation}: Clients of a syntax dialect cannot accurately determine which segments of the program text appear directly in the desugaring. In contrast, TLM clients can reason about segmentation by inspecting the splice summary. The information in a splice summary can be communicated straightforwardly by a program editor using secondary notation (e.g. colors.) The system guarantees that spliced segments are non-overlapping.
\item \textbf{Typing}: Clients of a syntax dialect cannot reason abstractly about types. In contrast, TLM clients clients can determine the type of any expansion by inspecting the parameter and type declarations on the TLM definition. The parse function and the expansion itself need not be inspected, i.e. these details can be held abstract. The splice summary also gives types for each spliced expression or pattern. This information can be communicated upon request by the type inspection service of a program editor.
\item \textbf{Binding}: Clients of a context-free syntax dialect cannot be sure that the desugaring is context-independent and that spliced terms are capture avoiding. In contrast, TLM clients can be sure that the expansion is completely context-independent and that spliced terms are capture-avoiding.
\end{enumerate}
Some existing typed term-rewriting macro systems, notably the Scala macro system \cite{ScalaMacros2013}, address most of these problems but sacrifice syntactic control by requiring that the macro repurpose what existing syntactic forms are otherwise defined. String literal forms cannot be repurposed flexibly because terms cannot be spliced hygienically out of string literal bodies. TLMs solve this problem by distinguishing spliced segments explicitly within the generated proto-expansions. This allows the language to selectively give spliced terms access to application-site bindings during proto-expansion validation, while requiring that the remainder of the proto-expansion be context-independent.
This work also provides the first type-theoretic account of a hygienic typed macro system integrated into a language with support for structural pattern matching and an ML-style module system. Let us consider some related work to justify this assertion.
\begin{itemize}
\item The Scala macro system has not been formally specified as of this writing.
\item The macro calculus of Herman and Wand \cite{DBLP:conf/esop/HermanW08,Herman10:Theory} uses types only to make manifest the binding structure of the generated expansion. The language that expansions are written in does not have an ML-style type structure.
Their calculus uses tree locations, which originate in the early work of Gorn \cite{gorn1965explicit}, to identify macro arguments by their location within the macro argument list. These tree locations are conceptually reminiscent of our spliced segment references, which identify spliced terms by their lexical location within the literal body. However, the tree locations serve a different purpose in Herman and Wand's calculus -- they distinguish symbolic arguments that will appear in binding positions in the expansion. In our system, there is no way for spliced identifiers to appear in binding position (but see Sec. \ref{sec:controlled-binding} for further discussion of this point.) Spliced segment references instead serve to distinguish segments that appear as sub-terms in the proto-expansion from those parsed in some other way.
Additionally, their calculus explicitly specifies the mechanism for fresh variable generation during expansion, whereas our approach relies on the standard type-theoretic notions of implicit alpha-variation and capture-avoiding substitution (for general ABTs.) This significantly simplifies our calculi.
Finally, their calculus defines term representations and quasiquotation primitively because the binding structure of the expansion must be manifest in the type of the macro. In our work, the parse functions can perform arbitrary computations on simpler term representations -- there is no need to define quasiquotation primitively or encode the binding structure of term representations in their types. This is possible because we enforce a stronger notion of context-independence -- not even definition-site bindings are available directly to expansions.
\item It is also worth mentioning MacroML \cite{ganz2001macros}. MacroML is only a staging macro system, i.e. it does not give the macro access to the syntax (parsed or unparsed) of the provided arguments but rather requires that it treat them parametically. As such, MacroML does not qualify as a mechanism of syntactic control (staging is motivated primarily by performance considerations.) MacroML also does not support pattern matching or ML-style modules.
\item Finally, the predecessor to the systems introduced in this work was the system of \emph{type-specific languages} (TSLs) that was defined in a previous paper \cite{TSLs}. That work introduced generalized literal forms and gave a bidirectionally typed protocol for type-directed parse function invocation. This work progresses beyond that work in several ways.
Most obviously, this work allows explicit TLM application. Different TLMs can therefore be defined at the same type without conflict. A subsequent short paper proposed explicit application of simple expression TLMs in a bidirectional typed setting, but did not provide any formal details \cite{sac15}. We do not assume a bidirectionally typed language when discussing explicit application (see Sec. \ref{sec:type-inference} below for more discussion on type inference.)
Moreover, TLMs are not associated directly with named type definitions, like TSLs, but rather can operate at any type.
Our mechanism of TLM implicits, described in Chapter \ref{chap:tsls}, also differs from TSLs in that implicit designations are orthogonal to the type abbreviation mechanism. Clients are free to differentially control implicit designations as desired within a lexically scoped portion of a program.
Another distinction is that the metatheory presented in the earlier work establishes only the ability to reason abstractly about types. It does not establish the remaining abstract reasoning principles that have been a major focus of this work. In particular, there is no formal notion of hygiene (though it is discussed informally), and the mechanism does not guarantee that a valid segmentation will exist, nor associate types with segments.
Finally, the prior work on TSLs did not consider the mechanisms of pattern matching, parameterized types, modules, parametric expansions or static evaluation. All of these are important for integration into an ML-like functional language.
\end{itemize}
\vspace{-8px}
\section{Limitations \& Future Research Directions}\label{sec:future-work}
Let us conclude with a discussion of the present limitations of our work. When possible, we outline future research that might resolve these limitations.
\vspace{-4px}
\subsection{Integration Into a Full-Scale Functional Language Definition}
We left many orthogonal language features out of our calculi, for the sake of simplicity and to focus our exposition on our novel contributions. We leave the work of integrating TLMs into a full-scale functional language definition as future work. We hope that this work will serve to guide a variety of different efforts in this direction.
\vspace{-4px}
\subsection{Integration Into Languages From Other Design Traditions}
\label{sec:integration}
We conjecture that all of the mechanisms we have described could be integrated into dependently typed functional languages, e.g. Coq \cite{Coq:manual}, but leave the necessary technical developments as future work.
The mechanisms described in Chapter \ref{chap:uetsms} and Chapter \ref{chap:static-eval} could also be adapted for use in languages from the imperative and object-oriented traditions without difficulty. These languages do not typically support structural pattern matching, so the mechanism in Chapter \ref{chap:uptsms} may not be relevant. Parametric TLMs as described in Chapter \ref{chap:ptsms} require an ML-style module system, i.e. a system where types and values are packaged into modules, and these modules are classified by signatures. However, it may be possible to define a variant of this mechanism that treats type and value parameters separately (the difficulty being that the type of value parameters may mention prior type parameters.)
The mechanism of TLM implicits introduced in Chapter \ref{chap:tsls} assumes a bidirectionally typed unexpanded language. A number of full-scale languages are bidirectionally typed, notably including Scala \cite{odersky2008programming}. However, Scala, like many other object-oriented languages, supports subtyping. Subtyping would complicate the question of implicit TLM dispatch, because there may be designations at multiple supertypes of a given type.
Various forms of \emph{ad hoc} polymorphism, e.g. function/method overloading or type classes \cite{Hall:1996:TCH:227699.227700}, would also complicate the question of implicit TLM dispatch. One approach that may be worth exploring in future work would associate implicits with a type class, rather than with an individual type.
TLMs could also be adapted for use in dynamic languages like Racket by eliminating the type annotation (thereby treating the language as ``uni-typed'' \cite{scott1980lambda,pfpl}.) In such a language (and even in a language with richer type structure), it could be useful to allow TLM providers to annotate a TLM with a dynamic contract governing all generated expansions \cite{DBLP:conf/icfp/FindlerF02}.
\vspace{-4px}
\subsection{Constraint-Based Type Inference}\label{sec:type-inference}
ML uses a constraint-based type inference system \emph{a la} Damas-Milner \cite{damas1982principal}. We conjecture that the mechanisms described up through Chapter \ref{chap:static-eval} are compatible with such a system -- most simply, by generating the constraints from the final expansion. Actually, it should be possible to perform type inference abstractly, using only the information in the splice summary. We leave the evaluation of this conjecture as future work.
The mechanism of TLM implicits developed in Chapter \ref{chap:tsls} assumed a bidirectional type system, i.e. one that only locally infers types \cite{Pierce:2000:LTI:345099.345100}. It may be possible to integrate implicit TLM dispatch with a constraint-based type inference system, but the approach to take is less clear. We leave the exploration of this question as future work.
\vspace{-4px}
\subsection{Module Expression Syntax Macros}
We did not consider situations where a library client wants to define syntactic sugar for module expressions matching a given signature. It should be possible to ``replicate'' the mechanisms developed up through Chapter \ref{chap:static-eval} at the level of module expressions without difficulty. Implicit dispatch would be problematic at the module level because signatures are related by a notion of subtyping.
A similar approach might be considered for syntax macros at the level of types (or, more generally, constructions.) However, for an ML-like language where all types are classified by a single kind, type-level TLMs would obscure the identity of the type. This, in turn, might obscure the type structure of the program too much to be useful.
% We have explored syntax for types in a recent short paper \cite{sac15}. Care must be taken here, however, because knowing merely that the %We also did not consider situations where a library client wants to generate an expression or a pattern based on the structure of a type (e.g. automatiCarec generation of equality comparisons.) Finally, we do not consider situations that require modifications to the underlying type structure of a language (though ``reasonably programmable type structure'' is a rich avenue for future work.)
\subsection{Parameterized Implicit Designations}\label{sec:parametric-designations}
The mechanism of TLM implicits developed in Chapter \ref{chap:tsls} allows the client to designate a TLM at a single type. A more sophisticated mechanism would allow for parameterization of the TLM implicit designation itself, so that it could operate across a parameterized family of types. For example, we may want to be able to implicitly apply the parametric TLM \li{#\dolla#list'} at \emph{all} list types, with the parameters determined implicitly from the type supplied for analysis.
Na\"ively, we might imagine a designation that quantifies over modules, \li{L}, and types, \li{'a}, like this:
\begin{lstlisting}[numbers=none]
implicit syntax (L : LIST) 'a => $list' L 'a in (* ... *) end
\end{lstlisting}
When encountering an unadorned literal form, the implicit dispatch mechanism must instantiate each implicit parameter from the type provided for analysis. The problem is that there does not always exist a unique such instantiation. For example, the type expression \li{list(int)} makes no reference to any module matching the \li{LIST} signature, and there may be many such modules in context, so we cannot uniquely instantiate \li{L}.
To solve this problem, we would need to define a unique \emph{normal form} that serves as a representative for each equivalence class of types. A designation is deemed invalid if the normal form of its type does not mention every designation parameter. For example, the normal form of \li{L.list('a)} does not mention \li{L} (because \li{LIST.list} is not abstract), so the implicit designation above can simply be deemed invalid.
The following designation does not require instantiating a module variable, so it would be valid under this restriction (recalling that \li{#\dolla#list} was defined as a synonym for \li{#\dolla#list' List}):
\begin{lstlisting}[numbers=none]
implicit syntax 'a => $list 'a in (* ... *) end
\end{lstlisting}
The normal form of an abstract type would necessarily mention a module path, so the following parametric designation would also be valid:
\begin{lstlisting}[numbers=none]
implicit syntax (R : RX) => $r R in (* ... *) end
\end{lstlisting}
It may be possible to use Crary's method for representing terms of the dependent singleton calculus in a beta-normal, eta-long form for this purpose \cite{DBLP:conf/lfmtp/Crary09}. Defining pattern matching over types of normal form, and incorporating this mechanism into the implicit dispatch mechanism, is left as future work.
% \subsection{TLM Packaging}\label{sec:tsm-packaging}
% In the exposition thusfar, we have assumed that TLMs have delimited scope. However, ideally, we would like to be able to define TLMs within a module:
% \begin{lstlisting}[numbers=none]
% structure Rxlib = struct
% type Rx = (* ... *)
% syntax $rx at Rx { (* ... *) }
% (* ... *)
% end
% \end{lstlisting}
% However, this leads to an important question: how can we write down a signature for the module \li{Rxlib}? One approach would be to simply duplicate the full definition of the TLM in the signature, but this leads to inelegant code duplication and raises the difficult question of how the language should decide whether one TLM is a duplicate of another. For this reason, in VerseML, a signature can only refer to a previously defined TLM. So, for example, we can write down a signature for \li{Rxlib} after it has been defined:
% \begin{lstlisting}[numbers=none]
% signature RXLIB = sig
% type Rx = (* ... *)
% syntax $rx = Rxlib.$rx
% (* ... *)
% end
% Rxlib : RXLIB (* check Rxlib against RXLIB after the fact *)
% \end{lstlisting}
% Alternatively, we can define the type \li{Rx} and the TLM \li{#\dolla#rx} before defining \li{Rxlib}:
% \begin{lstlisting}[numbers=none]
% local
% type Rx = (* ... *)
% syntax $rx at Rx { (* ... *) }
% in
% structure Rxlib :
% sig
% type Rx = Rx
% syntax $rx = $rx
% (* ... *)
% end = struct
% type Rx = Rx
% syntax $rx = $rx
% (* ... *)
% end
% end
% \end{lstlisting}
% Another important question is: how does a TLM defined within a module at a type that is held abstract outside of that module operate? For example, consider the following:
% \begin{lstlisting}[numbers=none]
% local
% type Rx = (* ... *)
% syntax $rx at Rx { (* ... *) }
% in
% structure Rxlib :
% sig
% type Rx (* held abstract *)
% syntax $rx = $rx
% (* ... *)
% end = struct
% type Rx = Rx
% syntax $rx = $rx
% (* ... *)
% end
% end
% \end{lstlisting}
% If we apply \li{Rxlib.#\dolla#rx}, it may generate an expansion that uses the constructors of the \li{Rx} type. However, because the type is being held abstract, these constructors may not be visible at the application site. \todo{actually, this is why doing this is a bad idea. export TLMs only from units, not modules}
% \subsection{TSLs}
% \section{pTSLs By Example}
% For example, a module \lstinline{P} can associate the TLM \lstinline{rx} defined in the previous section with the abstract type \lstinline{R.t} by qualifying the definition of the sealed module it is defined by as follows:
% \begin{lstlisting}[numbers=none]
% module R = mod {
% type t = (* ... *)
% (* ... *)
% } :> RX with syntax rx
% \end{lstlisting}
% More generally, when sealing a module expression against a signature, the programmer can specify, for each abstract type that is generated, at most one previously defined TLMs. This TLM must take as its first parameter the module being sealed.
% The following function has the same expansion as \lstinline{example_using_tsm} but, by using the TSL just defined, it is more concise. Notice the return type annotation, which is necessary to ensure that the TSL can be unambiguously determined:
% \begin{lstlisting}[numbers=none]
% fun example_using_tsl(name : string) : R.t => /SURL@EURLnameSURL: %EURLssn/
% \end{lstlisting}
% As another example, let us consider the standard list datatype. We can use TSLs to express derived list syntax, for both expressions and patterns:
% \begin{lstlisting}[numbers=none]
% datatype list('a) { Nil | Cons of 'a * list('a) } with syntax {
% static fn (body : Body) =>
% (* ... comma-delimited spliced exps ... *)
% } with pattern syntax {
% static fn (body : Body) : Pat =>
% (* ... list pattern parser ... *)
% }
% \end{lstlisting}
% Together with the TSL for regular expression patterns, this allows us to write lists like this:
% \begin{lstlisting}[numbers=none]
% let val x : list(R.t) = [/SURL\dEURL/SHTML, EHTML/SURL\d\dEURL/SHTML, EHTML/SURL\d\d\dEURL/]
% \end{lstlisting}
% From the client's perspective, it is essentially as if the language had built in derived syntax for lists and regular expression patterns directly.%However, we did not need to build in this syntax primitively.%The only constraint is that this syntax must be used in an analytic position, which we argue is actually better for code compren when encountering unfamiliar syntax.
\subsection{Exportable Implicit Designations}
Implicit designations cannot be exported from a library, because different libraries might define conflicting designations. This can be inconvenient for clients.
This restriction is perhaps too severe in cases where the designation is at an abstract type generated from within the same library. In such situation, it would be safe to export an implicit designation because no other library could do the same.
% \todo{revise this} TSLs can be associated with abstract types that are generated by parameterized modules (i.e. generative functors in Standard ML) as well. For example, consider a trivially parameterized module that creates modules sealed against \lstinline{RX}:
% \begin{lstlisting}[numbers=none]
% module F() => mod {
% type t = (* ... *)
% (* ... *)
% } :> RX with syntax rx
% \end{lstlisting}
% Each application of \lstinline{F} generates a distinct abstract type. The semantics associates the appropriately parameterized TLM with each of these as they are generated:
% \begin{lstlisting}[numbers=none]
% module F1 = F() (* F1.t has TSL rx(F1) *)
% module F2 = F() (* F2.t has TSL rx(F2) *)
% \end{lstlisting}
% As a more complex example, let us define two signatures, \lstinline{A} and \lstinline{B}, a TLM \texttt{\$G} and a parameterized module \lstinline{G : A -> B}:
% \begin{lstlisting}[numbers=none,mathescape=|]
% signature A = sig { type t; val x : t }
% signature B = sig { type u; val y : u }
% syntax $G(M : A)(G : B) at G.u { (* ... *) }
% module G(M : A) => mod {
% type u = M.t; val y = M.x } :> B with syntax $G(M)
% \end{lstlisting}
% Both \lstinline{G} and \texttt{\$G} take a parameter \lstinline{M : A}. We associate the partially applied TLM \texttt{\$G(M)} with the abstract type that \lstinline{G} generates. Again, this satisfies the requirement that one must be able to apply the TLM being associated with the abstract type to the module being sealed.
% Only fully abstract types can have TSLs associated with them. Within the definition of \lstinline{G}, type \lstinline{u} does not have a TSL available to it because it is synonymous to \lstinline{M.t}. More generally, TSL lookup respects type equality, so any synonyms of a type with a TSL will also have that TSL. We can see this in the following example, where the type \lstinline{u} has a different TSL associated with it inside and outside the definition of the module \lstinline{N}:
% \begin{lstlisting}[numbers=none,mathescape=|]
% module M : A = mod { type t = int; val x = 0 }
% module G1 = G(M) (* G1.t has TSL $G(M), per above *)
% module N = mod {
% type u = G1.t (* u = G1.t in this scope, so u also has TSL $G(M) *)
% val y = /asdf/ (* we can use it to create a value of that type *)
% } :> B (* did not specify a TSL for N.u at the point where it is sealed,
% so N.u has no TSL in the outer scope *)
% val z : N.u = /asdf/ (* ERROR: no TSL for type N.u *)
% \end{lstlisting}
% A formal specification of TSLs in a language that supports only non-parametric datatypes is available in a paper published in ECOOP 2014 \cite{TSLs}. %We will add support for parameterized TSLs in the dissertation (see Sec. %\ref{sec:syntax-timeline}).
% \subsection{TLMs and TSLs In Candidate Expansions}\label{sec:tsms-in-expansions}
% Candidate expansions cannot themselves define or apply TLMs. This simplifies our metatheory, though it can be inconvenient at times for TLM providers. We discuss adding the ability to use TLMs within candidate expansions here.\todo{write this}
% \subsection{Pattern Matching Over Values of Abstract Type}\label{sec:patterns-for-abstract-types}
% ML does not presently support pattern matching over values of an abstract data type. However, there have been proposals for adding support for pattern matching over abstract data types defined by modules having a ``datatype-like'' shape, e.g. those that define a case analysis function like the one specified by \lstinline{RX}, %shown in Sec. \ref{sec:examples}. %We leave further discussion of such a facility and of parameterized TPSMs also as remaining work (see Sec. \ref{sec:syntax-timeline}).
\subsection{Controlled Capture}\label{sec:controlled-binding}
The prohibition on capture makes it impossible to define new binding constructs. For example, consider Haskell's \textbf{do}-notation for monadic structures:
\begin{lstlisting}[numbers=none]
do x1 <- action1
x2 <- action2
action3 x1 x2
\end{lstlisting}
This desugars to:
\begin{lstlisting}[numbers=none]
action1 >>= \ x1 -> action2 >>= \ x2 -> action3 x1 x2
\end{lstlisting}
where \li{>>=} is infix application of the \emph{bind} function and \li{\ x1 -> e} is Haskell's syntax for lambda abstraction.
If we na\"ively attempted to define something like \textbf{do}-notation using a TLM, the prohibition on capture would prevent \li{x1} from being visible within \li{action2}.
Completely relaxing the prohibition on capture would be unreasonable. Instead, we conjecture that there are two important constraints that need to be enforced:
\begin{enumerate}
\item Identifiers that can be captured by spliced terms must themselves appear in the literal body. This leaves control over naming entirely to the client programmer.
\item The splice summary must state which of these are available to each spliced term.
\end{enumerate}
The prior work of Herman and Wand developed mechanisms to support these sorts of examples \cite{DBLP:conf/esop/HermanW08,Herman10:Theory}. We leave integrating related mechanisms into a system of TLMs as future work. We also leave open the question of how an editor should best convey the set of available identifiers within a spliced term to the programmer.
\subsection{Type-Aware Splicing}\label{sec:type-aware-splicing}
Although typed expansion generalizes typing, there is no mechanism by which the expansion can branch based on the type synthesized by a spliced term. Recent work by Lorenzen and Erdweg on type-dependent desugarings gives some examples where branching according to the type of a spliced term might be useful \cite{conf/popl/LorenzenE16}. Developing a reasonable extension of our TLM mechanism for doing so is left as future work.
\subsection{TLM Application in Proto-Expansions}
Proto-expansions are abstract binding trees in our work, so it is not possible to apply TLMs within proto-expansions. This might occasionally be inconvenient for TLM providers. Developing the machinery necessary to be able to take TLMs as parameters and apply TLMs in proto-expansions is left as future work.
\subsection{Mechanically Reasoning About Parse Functions}\label{sec:verifying-tsms}
A correct parse function never returns an encoding of a proto-expansion that fails proto-expansion validation. This invariant cannot be enforced by the simple type systems we have considered in this work. Using a proof assistant, it would be possible to verify that a parse function generates only encodings of valid proto-expansions. Alternatively, in a dependently typed setting, the type of the parse function itself could be enriched so as to enforce this invariant intrinsically. We leave the details of this approach as future work.
A related problem is that a parse function might diverge. Again, we can either prove that the parse function does not diverge extrinsically, or define the parse function using a total language.
Parse functions implement some intended syntax definition. It would be useful to be able to state the syntax definition separately as a formal structure, and then prove that the parse function implements it correctly. In fact, in most cases, it should be possible to generate the parse function directly from the syntax definition using a parser generator. In this case, it would be useful to be able to mechanically prove the parser generator correct.
It would also be useful to develop the notion of an \emph{splice summary specification}, i.e. a specification of the splice summary that should result from a well-formed string. This could be combined with a grammar like the one shown in Figure \ref{fig:rx-grammar-based}, with the non-terminals representing spliced terms annotated with types.
% \subsection{Improved Error Reporting}\label{sec:error-handling}
% \todo{write this}
\subsection{Refactoring Unexpanded Terms}\label{sec:refactoring}
A crucial distinction is between identifiers, which appear in unexpanded terms, and variables, which appear in expanded terms. Variables are given meaning by substitution, and ABTs are identified only up to renaming of bound variables. In contrast, identifiers are given meaning only by expansion to variables and there is no notion of renaming or substitution. Unexpanded terms are not evaluated directly, so there is no need for these operations to assign static and dynamic meaning to programs.
It would, however, be useful to support identifier renaming and substitution operations for the purposes of automatic refactoring \cite{mens2004survey}. The simplest solution would be to use the splice summaries to locate spliced terms, and then perform the renaming directly within the literal body. The problem is that there is no guarantee that the parse function will produce an alpha-equivalent expansion after such a renaming operation has been performed. Similar concerns about invariance come up for other kinds of refactorings.
There are three approaches one might take to avoid this problem. The simplest approach is for the renaming operation to re-run the parse function and check that the expansion it generates is related to the previously generated expansion as expected. Alternatively, we might seek to mechanically verify that the parse function is invariant to refactoring, either intrinsically or extrinsically as discussed in Sec. \ref{sec:verifying-tsms}.
A third approach would be to require that the TLM be defined using a grammar formalism that precludes inspection of the form of spliced expressions by construction. Exploration of these approaches is a promising avenue for future work.
\subsection{Integration with Editor Services}\label{sec:editor-integration}
Program editors often seek to provide feedback to the programmer about the syntax and semantics of the program being written. Questions remain about how various editor services should interact with TLMs. In the examples in this document, we colored spliced terms black and all other segments of a literal body some other uniform color (e.g. green.) A more sophisticated approach would allow the TLM to define its own syntax highlighting logic governing these non-spliced segments.
Another concern has to do with performance: na\"ively, a program editor would need to re-run the corresponding parse function on each edit that modified a literal body. Ideally, it should be possible to incrementally compute the resulting change to the splice summary as a function of the change to the literal body \cite{Ghezzi:1979:IP:357062.357066}. The generated expansions are context-independent, so there should be ample opportunity for caching.
Finally, we considered only reasoning principles for well-formed, well-typed programs. However, programmers often produce ill-formed or ill-typed programs. It is often useful to have error recovery heuristics that can be applied to provide useful feedback to the programmer in these circumstances. Error recovery within generalized literal forms might need the assistance of the TLM.
\subsection{Pretty Printing}\label{sec:resugaring}
We considered only the task of defining expressions and patterns using alternative syntactic forms. It is also generally useful to be able to ``pretty print'' (or ``unparse'') values of a given type using the same syntactic conventions, e.g. when using a REPL. It may be useful to explicitly associate a pretty-printer with a type using a mechanism closely related to our mechanism of TLM implicits. To avoid inconsistencies between the parsed syntax and the pretty-printed syntax, it is useful to generate both a parser and a pretty-printer from the same syntax definition, e.g. as described by van den Brand and Visser \cite{DBLP:journals/tosem/BrandV96} and implemented by many syntax definition systems.
There has also been some prior work on \emph{resugaring}, i.e. retaining syntactic sugar while stepping through the evaluation of an expression \cite{DBLP:conf/pldi/PombrioK14,DBLP:conf/icfp/PombrioK15}. Extending these techniques to support syntactic forms defined via TLMs is left as future work.
\subsection{Structure Editing}
In this work, we assumed have that the surface syntax that the programmer interacts with is textual. However, there is an alternative approach: \emph{structure editors} (a.k.a. structured editors, syntax-directed editors, projectional editors) allow for a surface syntax that is tree-shaped, with holes standing for branches of the tree that have yet to be constructed. The programmer interacts with a projection of this tree structure using a language of edit actions. There are a number of prominent examples of structure editors, starting with the Cornell Program Synthesizer \cite{teitelbaum_cornell_1981}. In recent work, we have developed a type-theoretic foundation for structure editing by assigning static meaning to terms with holes, and formally defining a type-aware action semantics \cite{DBLP:conf/popl/OmarVHAH17}. TLMs could be incorporated into such a system with some modifications. In particular, the TLM would need to define a visual representation as well as an interaction model for a projection of an expression or pattern. Every state that the projection can be in would need to map onto an expansion. Spliced segments in such a system would correspond to holes that appear within the projection. Detailing this system of ``typed projection macros'' is left as future work. We elaborated on this idea in a recent ``vision paper'' \cite{snapl17}.
\section{Concluding Remarks}
Strong abstract reasoning principles dramatically increase the usability of a programming language by allowing the programmer to ignore (i.e. hold abstract) certain details when reasoning about the behavior of a program. Similarly, syntactic sugar that captures the idioms common to an application domain more concisely or naturally can dramatically increase the usability of a programming language by decreasing the cognitive cost of producing and examining programs. This work aimed to show that these considerations need not be in opposition -- it is possible, if formal care is taken, to define a programming language with a \emph{reasonably programmable syntax}.
\newpage
\section*{\LaTeX~Sources and Errata}
\addcontentsline{toc}{chapter}{\LaTeX~Source Code and Updates}
\noindent
The \LaTeX~sources for this document can be found at the following URL:
\begin{center}
\url{https://github.com/cyrus-/thesis}
\end{center}
The latest version of this document can be downloaded from the following URL:
\begin{center}
\url{http://www.cs.cmu.edu/~comar/omar-thesis.pdf}
\end{center}
Any errors or omissions can be reported using GitHub's issue tracker, or by sending an email to the author:
\begin{center}
\url{[email protected]}
\end{center}
Any changes to this document that occured after the final dissertation has been submitted to the university will be summarized below.
\begin{enumerate}
\item There was a minor syntax highlighting mistake in Figure \ref{fig:big-html-example}.
\item There was a missing parenthesis in the proof of Typed Pattern Expansion. (Thanks, David Christiansen!)
\end{enumerate}