Maaj Script is a dynamic interpreted language based on Clojure running on JVM with interpret written in Java.
Maaj Script is a functional, dynamic scripting language with Lisp syntax. Prefers immutability and, as is now, does not allow mutable local variables. It supports tail recursion, macro expansion, lazy lists, module importing... Is easy to use from Java. Jar with all dependencies (mainly Clojure data structures) is "only" ~400 KiB.
There are many constructs familiar to people who used lisp, especially Clojure, like:
def
,defn
,defmacro
,fn
cons
,first
,rest
,car
,cdr
,cadr
do
,let
,recur
,eval
,apply
- ...
Some ideas are:
- Loading settings
- This can be potentially unsafe.
- Dynamic testing of rest of the system.
- not yet:
- Dnamic testing without requiring special namespace loader
- Direct Java interop (boxed)
- Reloading of scripts from files (for changing settings / ...)
- Only reloading is missing, rest is ready.
- General purpose scripting language.
Coming up with good names is hard...
- maa : from Maartyl
- j : JVM, Java
- script : ...
So essentially, a script by Maartyl for JVM / in Java...
I am writing MaajScript as a school project for Java class. I chose an interpreter because I always wanted to write one and because I like Clojure, of which I wanted to learn more in the process. Not to mention, every programmer has written their own lisp...
- Java 8
- https://github.com/krukow/clj-ds
- This library is currently copied into project for the sake of development (fast goToSource etc.)
- It will be made into a normal dependency at some later phase.
- Java interop (through seamless reflection)
- Bean accessors
- Integration with Java Scripting API
- Possibly some synchronization primitives.
Maaj Script is a Lisp with extended syntax for vectors and maps. Very similar to Clojure, syntactically. There is a good chance that if a feature from Clojure is implemented, it has the same or similar syntax.
Maaj Script is fully Unicode compliant up to char 65535. (as far as reading characters goes)
Which means all common Unicode strings are valid strings in MaajScript, and if they meet requirements* for identifiers (no parenthesis, ...) they are valid identifiers.
Example of valid symbols: čeřící
, 葛城
, Ꙭൽↈⴱ
For requirements see maaj.reader.MaajReader#isSymbolic(int)
.
Default loader assumes files to be in UTF-8 format.
; this is a comment
,,,,,,,;commas are considered whitespace and ignored
; this invokes foo with arguments in variables* bar a baz
; (* all normal variables are immutable)
(foo bar baz)
; lambda creation
(fn args-bind body-here)
(fn [] 5) ; takes no arguments and always returns 5
; this function returns a function that returns the argument passed to the outer function
(fn [x] (fn [] x))
(((fn [x] (fn [] x)) 5)) ;=> 5
See let for details on argument binding. Body can be multiple terms, evaluated in an implicit do block.
; named functions
(defn name-of-fn args-bind body-here)
(defn add [a b] (+ a b)) ; adds 2 numeric arguments
Any expression in Maaj Script is a term. Some terms can nest other terms, creating expression trees.
~Classical Lisp symbol.
Used as identifier.
Essentially any number of characters that don't start with number, colon or other special character (whitespace, parenthesis...)
Examples: do
,count'
, ->
,_auto_#45
, .1
Evaluated as value lookup.
Symbols and keywords can have a namespace: it is separated from name by a /
, becoming: namespace/name
- Symbols cannot begin with /
. (Special case is /
only : it is a valid name)
Examples: #core/reduce
, some.namespace/foo
, #core//
Special variant of symbol that evaluates to itself. Starts with a colon.
Used for keys / tags...
Examples: :col
, :height
, :*
2 variants of numbers are supported: Evaluates to itself.
- int : wrapper over java.lang.Long
5
,-74567
- double :
0.5
,32.
,752.1563
Also a numeric term, but not a number. Can be used in numerical operations.
Is written as escape character. Some characters have* special literals like \newline
.
Another way to specify character is using a unicode literal: \u
and 4 hexadecimal digits.
Example: \a
, \n
, \u771F
Evaluates to itself.
(* multisymbol character literals other then unicode i.e. \u00A0 are not supported yet)
String. Wrapper over native java.lang.String .
Example: "hello world!"
Evaluates to itself.
Immutable linked list. Or limited S-expression: tail must always be a seq*.
(* Some Lisps allow: (5 . 8) pairs. There is no dot syntax in MaajScript ind if there were, the second argument would have to be a seq.)
Example: (foo bar baz)
, (count' 5 coll)
Evaluated as function application.
Sequence. Variant of list that is potentially lazy*. Does not necessarily know it's own length.
(* Only partially evaluated. The rest is determined but unevaluated.)
Can be created from any immutable collection or through other operations
Example: (seq "hello") ;=> (\h \e \l \l \o)
Evaluated as function application.
A special value, can be used as: false, end of a sequence, .... - similar to null
in Java, but is a valid term in that methods can be invoked on it.
Example: ()
- There is no other syntax for this value.
Immutable array. - Supports O(1) [add to] and [remove from] end. Can hold any number of terms.
(wrapped Clojure data structure)
Example: [1 2 3 :kw 4.78 \a]
Evaluation evaluates all elements, returning resulting vector. (fmap)
Associative collection of key-value pairs. Both keys and values can by any* term.
Example: {:key :value, :key2 5, :key3 ()}
Evaluation evaluates all elements, returning resulting map. (fmap)
(* maps over 8 pairs require hashable keys. For now, all terms have correctly defined .hashCore() and .equals() - This might no longer be the case after wrapper for any JVM object is implemented. - With less then 8 pairs, only .equals() is required. )
Can by invoked. There is a variety of types (some evaluate arguments, some are treated differently in Java (different functional interfaces), ...) Evaluates to itself.
Anything that can be coerced into a seq. - seq and collections. (and string, which is a special collection of chars)
All following special forms are defined in namespace '#'. This namespace cannot be directly referenced and is loaded by default, imported into every other created namespace. In case of name clash: qualified form (i.e. #/if
) can be used instead.
Creates (or updates) a global mutable cell (Var) in the current namespace. Vars can only* be changed from the namespaces they were defined in. (* through def
special form)
Syntax: (def name value)
. Any metadata on name
will be added to the created var itself. (Not if this only changes.)
Binds local scope variables with provided values. A binding is a pair of pattern binder and expression to evaluate. One let can contain multiple such pairs.
Variables are no longer bound after leaving let construct, in which they were created.
Syntax: (let [pattern expression ...] body)
Binding: Currently only 3 version of binding are supported:
- simple: just a symbol : binds
- Syntax: any symbol, that from now on can be used to access bound value.
- seq: captures positional elements in a seq
- Syntax:
[a b c]
- This will bind first 3 terms in seq [a b & r]
- The r will capture the rest of the sequence.- & is a special symbol and cannot be bound to in this context.
- If seq contains less terms then required: rest will be bound to nil.
- Syntax:
- ignore: like simple, but ignore
- Syntax:
_
- The value will still be computed, only not captured.
- Syntax:
Example: (let [a (+ 7 5)] (* a (inc a))) ;=> 156
(let [[a b c] (50000)] (+ a b c)) ;=> 3
- Explantion:
(50000)
creates a lazy seqence of numbers from 0 to 49999. Only first 3 will be actually realized and bound toa
,b
andc
, so expression will become(+ 0 1 2)
and+
can sum any number of arguments.
Binding are recursive, so something like this is possible too: [[a b] [c d] [_ [q w r & l]]]
.
//Note: I say 'bound' but associated might be a better term... Either way, they are immutable and expressions in let are evaluated eagerly. (i.e. not "bound" to potential results if needed, but actually computed and "assigned")
Known problem: Scope retains captured values even if they are no longer needed. - Can cause memory leaks: especially if context is captured in a closure: The same problem: Closure captures the entire scope, not just what is used inside.
- This might not seem like a problem, but imagine: if scope retains head of a lazy seq, that is then being reduced: under normal circumstances, only the necessary, little, part of seq is created at a time, and already consumed beginning of the seq is garbage collected... If the head is still retained in scope, although unneeded, it will never be GCed and will cause (potentially) huge space leaks.
Do special form takes any number of expressions and evaluates them in turn, returning result of last. Other then last can be executed for side effects, as their result will never be captured anywhere.
Takes 2 or 3 arguments. Evaluates first argument and based on it's returned value* evaluates and returns second argument or [if third argument is present, evaluates and returns that, otherwise returns nil].
Syntax: (if test then else)
or (if test then)
~= (if test then ())
Evaluates it's arguments like any other function. (can only take 1) But, then evaluates the result of that. Which allows evaluation of any (possibly computed / otherwise obtained) term. As other lisps: Program is data, and possibly vice versa, so it does not take a string or anything special, but normal data structures.
Syntax: (eval '(+ 7 8))
Transforms a seq into argument list. Last argument of apply must be seqable. It replaces the last argument with all the arguments in the seq. This operation works even if last argument is infinite.
Syntax: (apply fn possible other args seq-arg)
Example: (apply + [1 2 3])
- the same as (+ 1 2 3)
.
(apply + -15 128 (100))
- sums numbers 0 to 99 and -15 and 128
Allows tail recursion. Every Fn* tests it's result for being a recur. If it is, it repeats itself with arguments captured in the recur. - Recur itself is just a lightweight box for arguments. This special form can be used anywhere, but invoking virtually any other method then getArgs() results in "Recur outside looping context" exception.
This implies that it can only be used in a tail position and only inside Fn.
Syntax: (recur arg1 arg2 ...)
Example: A function that will reverse a list.
(fn ([l] (recur () l))
([acc l] (if l
(recur (cons (first l) acc) (rest l))
acc)))
(* essentially all normal functions: not macros or special forms)
Normal symbol evaluation becomes lookup of that symbol in scope. Global scope is composed of Vars. i.e. Evaluation of a symbol accesses value of a Var. If I want to access the Var itself, rather then value stored in it, I have to use this special form. This special form completely ignores any scope variables.
Syntax: (var symbol)
There exist special reader macro that is translated into this special form: #'symbol
Loads and imports a file. Uses default Loader, that searches on CLASSPATH. Splits symbol name by .
and replaces with /
, adding .maaj
. - Loader then evaluates the file and returns created namespace. require'
then imports based on it's options
For details on options see: (meta :doc #'require')
:*
- loads not qualified- nothing - fully qualified
:as
- qualified with another namespace- there will also be an option to import as not qualified only specified symbols
Everything will be also fully qualified.
Creates a function (or a lambda; not different) that captures current context* / scope. (* Function will be evaluated in the namespace they were created in.) - The body of this expression will become the body of the function. It will not be evaluated now, only stored. The body can be potentially infinite.
Evaluating the function causes the body to be evaluated in the original context, with one* difference: there is special local variable injected: $args
. - This variable shouldn't be used directly and is prone to change name. This special form generally shouldn't be used directly, use macro fn
instead.
Syntax: (fnseq "Hello World!")
(* There is potentially more differences : if provided, name of this function will be in scope as well to allow recursion. There might be other changes. Argument list is the most important, though.)
The same as fnseq
, only has different evaluation semantics. See Macro expansion. Shouldn't be used directly, use macro macro
instead.
Macros are function, that upon evaluation generate code. In most Lisps, this is done at compilation time. I'm trying to do it once (while creating a function, essentially). - On the other hand, I can pass macros around and generally threat them as dynamically as functions. - At least some advantage of not having a compiled language.
//Note: At this point, macros sometimes work incorrectly, mainly: expand more then they should... (some combinations of macro expansion and quotation, unquotation...) - And ignores unquote - I will probaly have to make that a special term too... // I know about it and I will address it when there is time. At this point, works mostly fine and macros are not the most import part of this project.
Macros are evaluated from outside in. Macro expansion algorithm:
- on seq: Is this a macro application?
- yes: apply and expand result
- no : expand all arguments
- on other then seq
- collection: just fmap (expand all terms)
- ground: return itself
This namespace is imported by default, but kept fully qualified. Contains primitives mainly for manipulating macros.
All these special forms / terms have associated reader macros.
On evaluation simply returns it's body (i.e. first argumet) without evaluating it.
Reader macro: 'x
-> (#macro/quote x)
.
Example: (#macro/quote (1 2 3)) ;=> (1 2 3)
Similar to quote
, but traverses it's argument and : changes unqualified symbols to qualified (either by Var they represent, or current namespace) and runs unquote
and unquote-splicing
.
Reader macro: `x
. (backtick)
Example: `(if a test/b) ;=> (#/if current-namespace/a test/b)
Expects single argument. Returns it unevaluated, but macro expanded.
Example: (#macro/expand (cadr [4 5 6])) ;=> (first (rest [4 5 6]))
Unquote special terms (not forms) are terms, because they exist inside quoted terms. These are not evaluated and thus, they would never be found. - Instead a special operation unquoteTraverse is used in quote-qualified. If it encounters unquote, it evaluates it's body instead. Reader macro is the only way to create this.
Reader macro: ~x
generates something like (#macro/unquote x)
Example: (let [a 5] `[a ~a 5]) ;=> [current-namespace/a 5 5]
Note: unquote only works inside quote-qualified
Works like unquote
, only instead of injecting 1 value from evaluating it;'s body, expects body to result in something seqable. It then becomes all of the terms in the evaluated seqable, and replaces itself with that.
Reader macro: ~@x
generates something like (#macro/unquote-splicing x)
Example: (let [a [1 2 3]] `[a ~a ~@a 42] ) ;=> [cur-ns/a [1 2 3] 1 2 3 42]
Adds arity checks, arity overloads and argument list pattern binding to fnseq
.
Basic form is: (fn pattern-bind body in an implicit do block)
.
Arity check: arity is computed from pattern-bind, that must be a seq bind or simple bind ... or ignore.
- A simple bind represents an overload with variadic arity and no minimal number of arguments required.
- Is functionally equivalent with
[& r]
.
- Is functionally equivalent with
- A seq bind (i.e. vec):
- if does not capture / ignore rest: has fixed arity.
- example:
[a b c]
: has arity 3
- example:
- if captures rest : is considered variadic, but might define minimal arity : number of terms specified
- example:
[a b & r]
- this will match ifarity >= 2
. And will bind first 2 elements.
- example:
- if does not capture / ignore rest: has fixed arity.
- There cannot be multiple overloads with the same arity or multiple variadic overloads.
Arity overloading syntax: args-bind-pattern and it's body are in a seq. - For all overloads.
Example: (fn ([](recur 0)) ([x](inc x)))
- A function that increments it's argument, or calls itself with the argument 0. - Any other arity would throw an arity-exception.
The same as fn
but uses macroseq
instead.
Combines def
and fn
. You can specify meta data after name.
Form: same as fn
, only first argument is name for def.
Example: (defn name ^"the same as using just :name" [x] (:name x))
The same as defn
but uses macro
instead.
Returns first element (head) of a seqable argument.
Returns the rest (tail) of a seqable arument.
takes 2 args, second must be seqable; returns second argument with prepended first
Retuns a seqable as a seq.
takes:
- fn : invocable, that takes 2 arguments: accumulator and term, producing new accumulator
- acc : starting value of accumulator
- coll : Reducible (essentially: collection / seq)
Applies fn on acc and first term in coll, applies fn on this new accumulator and second term in coll ... until no more left. - If coll was from the start empty, returns starting accumulator.
Captures 1 argument body, that must return a seq and returns a seq thunk, with a promise of evaluating the captured body when needed. The returned value can be used as a seq.
If given 2 arguments is essentially (cons arg1 (lazy arg2))
.
Calls require'
for each argument.
Takes even number of args. eacch "pair" is interpreted as : test, body
- evaluates and returns body after first succesful test
- if nothing matches returns nil
- tests are evaluated in order
Example: (cond (0 < v) :pos, (0 > v) :neg, :else :zero)
Similar to Java switch construct.
Takes an expression and pairs. Tests each match on equality with computed value of first expression.
Tests sequentially.
Can take a special test match _
that will match anything.
Example: (case (get-value), 0 :ok, 1 :warn, 4 :err, _ :unknown)
Counts collection. Most times O(1), counting a seq can be O(n).
O(1) bound count : takes 2 args : bound and seq
- if counts a seq and gets over specified bound, returns Int.MAX_VALUE
returns new unique symbol, with possible preffix specified as first argumetn
access meta data of a term
+, *
: work on arity 0-* (0 argumets : identity in associated monoid)-, /
: work on arity 1-* (single argument : inverse in associated algebraic group)min, max
: works on arity 1-*<, >, <=, >=, ==
: works on arity 2inc, dec, neg
: works on arity 1
.equals() : works on arity 1-*
and, or
: works on arity 0-*not
: works on arity 1
... Some other functions are: take, conj, assoc
...
- Informations about them can be found thorough
(meta :doc #'fn-name)
. Insead of:doc
could be:info
.
There is no IO yet. It will be implemented together with other Java interop. - I could implement it now as something like: having readln, writeln on some reader and writer (... possibly streams) stored in Context, but it would be mostly just a stub implementation.
Thanks to Repl, it's easy to explore and can be used from Java...
Interactive read-evaluate-print loop.
run: maaj.Repl
.
What it does: It will:
- Read a term.
- Evaluate it.
- Print what it's become.
- Repeat this in a loop, until the reader encounters EOF.
You can also look at implementation of this class (it's very short) on how Maaj Script can be used from Java. I did not provide any special API because I plan to use Java Scripting API, when I get to do it.
Term is an interface, with methods like .eval(Context):Term
, .apply(Context, Arguments):Term
, .Show(Writer):Void
...
- Each term then knows how to perform these actions specifically to it's details. Possibly invoking the more methods on other Terms it contains...
Scope is stored in special type Context
that consists of mainly 2 things: reference to global variables and local scope, represented as a immutable map.
Reading mainly works as a (single pass) push-down automaton, where the entire state is position in code. Only exception to this are "terminated" constructs like (...),[...],{...},"..."
which are read in a loop, but read the insides normally.
Reader has a lot unused "slots" (special characters; switching to different dispatch table). Some are decided, but not yet implemented (like sets or regexps), some are reserved, or otherwise unused. - When these are encountered, reading fails. (as with any syntax error; the difference is : these can in future change to something meaningful)
Complex data structures (vectors, maps) are just wrappers over Clojure data structures.
Let (and fn) argument binding works such that from the bind expression is created a function that can create a bindings map from it's argument.