-
Notifications
You must be signed in to change notification settings - Fork 257
Defining New Schema Types
Note: as of schema 1.0, this guide is deprecated. Please check the updated version.
In this section, we will see the steps required to define new schemas.
Schemas are composable: you can write schemas that build on other schemas. Eventually all of these composite schemas bottom out in atomic schemas, schemas that do not depend on other schemas. Schemas can be recursive too, so you can write schemas that are defined in terms of themselves. Let's take a look at how to define atomic and composite schemas.
An atomic schema is one that is defined without depending on any other schema.
As an example, let's take a look at EqSchema
implemented in core.cljx
.
This schema is used to check whether some input data value x
is equal to a given value v
.
Let's see how this schema can be created and used for checking.
For example, we can define a schema (eq "Schemas are cool!")
that can be used
to check whether a particular value is exactly equal to the string "Schemas are cool!"
.
Let's first try with a positive example:
(require '[schema.core :as s])
(s/check (s/eq "Schemas are cool!") "Schemas are cool!")
> nil
The schema check succeeds (it returns nil
) because the two strings are equal.
Now let's try a negative example:
(s/check (s/eq "Schemas are cool!") "Schemas are NOT cool!")
> (not (= "Schemas are cool!" a-java.lang.String))
Here the schema check fails because the data value "Schemas are NOT cool!"
does not
match the value given when defining the EqSchema
(namely "Schemas are cool!"
).
In this case, we see that the result is a validation error message explaining how the schema failed to validate.
Now that we have seen how EqSchema
is used, let's see how it is implemented.
(defrecord EqSchema [v]
Schema
(walker [this]
(fn [x]
(if (= v x)
x
(macros/validation-error this x (list '= v (utils/value-name x))))))
(explain [this] (list 'eq v)))
We see that EqSchema
implements the walker
and explain
methods of the Schema
protocol.
The walker
method returns a function that is used by check
to validate data.
This returned function takes a piece of data x
as input and returns a value if we want the check to succeed,
or a validation error if the check fails.
(For validation checking, the returned value is typically just the input.
When we do coercion or other fancier uses of the walker
,
we will see use cases where a different, transformed value is returned.)
In the case of the EqSchema
, the heart of deciding what walker
returns lies in the equality testing: (= v x)
.
The validation error is constructed using the validation-error
method in schema.macros
.
The explain
method of the Schema
protocol allows you to specify how the schema should be rendered.
The explain method is used to print the schema.
Atomic schemas are small and simple. Their power really comes from composing them into bigger, more complex schemas.
A composite schema is one that is defined in terms of other schemas.
As an example, let's look at Both
implemented in core.cljx
.
This schema is defined in terms of zero or more subschemas.
Validation of the Both
schema succeeds if the validation for each of the subschemas succeeds.
Note that these subschemas can be arbitrary schemas such as the EqSchema
above or even other instances of Both
.
As an example usage, let's look at a schema that checks whether a number is both positive and even:
(def EvenPos (s/both (s/pred even? 'even?) (s/pred pos? 'pos?)))
This is a composite schema that is defined in terms of two instances of the Predicate
schema.
As an aside, a Predicate
schema is built from a predicate function that returns
true
when we want the schema check to succeed, and false
if the check should fail.
Here we're using clojure's built-in even?
and pos?
predicate functions to define two Predicate
schemas.
The second arg to s/pred
is an optional name for the predicate, which is used to make validation errors and the explain
nicer to read.
We can now use the constructed both
schema to validate numbers.
The number 4
is both even and positive, so the following check succeeds:
(s/check EvenPos 4)
> nil
On the other hand, the number 3
is not even, so it fails the first predicate and so the Both
schema check fails:
(s/check EvenPos 3)
> (not (even? 3)) ;; aside: here we see the predicate's name 'even?' rendered in the explain
To validate a composite schema (such as Both
),
we need to perform some processing on its constituent subschemas (in our example, these are the two Predicates
)
and then combine the results of that processing in a meaningful way.
In the case of Both
, combining the results of processing the constituent schemas essentially amounts to
an "and" that all the subschemas validate.
In general, each constituent subschema might itself be a composite schema,
and therefore validation might require recursively traversing these schemas.
This recursive traversal is handled by the walker
and subschema-walker
methods.
As an example, let's see how the walker
method is implemented in terms of the subschema-walker
method
to define the behavior of Both
.
(defrecord Both [schemas]
Schema
(walker [this]
(let [sub-walkers (mapv subschema-walker schemas)]
(fn [x]
(reduce
(fn [x sub-walker]
(if (utils/error? x)
x
(sub-walker x)))
x
sub-walkers))))
(explain [this] (cons 'both (map explain schemas))))
Here we see the implementation of Both
, which is defined in terms of schemas
, a seq of the constituent subschemas.
In our example, schemas
is a seq of length 2 containing the even and positive pred
schemas.
Recall that to validate a schema, walker
returns a function that takes in data x
to validate,
and, if x
matches the schema, it is returned, otherwise an error is returned.
In the case of Both
, x
must be matched against all of the constituent subschemas,
which is done by evaluating the functions created by each of their respective walker
methods
against the input data x
.
If any of these functions return an error,
then validating the Both
schema for the input x
should also return an error.
The implementation of Both
first gets the functions returned by the walker
method of the subschemas
by calling subschema-walker
on each element of the schemas
seq.
These returned functions are bound to sub-walkers
outside of the returned function.
Then Both
reduces over these sub-walkers, applying each to x
and returns the first error,
or x
if no error is found.
There are a couple of caveats to keep in mind when defining a composite schema.
First, rather than calling walker
directly on the subschemas, we use subschema-walker
,
which eventually calls through to their walker
methods.
This extra layer of indirection gives us a powerful hook to introduce transformations to apply to the data.
When schemas are used only to validate data, there is no transformation applied,
and so calling subschema-walker
on the subschemas is equivalent to calling walker
.
However, to support fancier uses of schemas, such as coercion or recursive schemas, it is necessary
to call subschema-walker
instead of walker
.
For now, the important take away is that whenever you want to walk a subschema,
you need to do it using the subschema-walker
method.
The second caveat is that the subschema walkers are bound outside of the returned function.
That is to say the (let [sub-walkers (mapv subschema-walker schemas)] ...)
happens outside
of the returned (fn [x] ...)
.
This might seem like a trivial detail, on the surface the functionality might seem equivalent, but it's not.
Binding the subschema walkers eagerly, outside of the returned function is necessary for performant and correct schema implementations, so we've tried to ensure that incorrect implementations produce informative error messages.
Concretely, let's see what happens if when incorrectly implement Both
by binding the subschema
walkers inside of the returned function. We'll call our broken implementation BrokenBoth
:
(defrecord BrokenBoth [schemas]
Schema
(walker [this]
(fn [x]
(let [sub-walkers (mapv subschema-walker schemas)]
(reduce
(fn [x sub-walker]
(if (utils/error? x)
x
(sub-walker x)))
x
sub-walkers)))))
(defn broken-both [& schemas] (BrokenBoth. schemas))
Here we have essentially copied the implementation of Both
, but have swapped the order of
(let [sub-walkers (mapv subschema-walker schemas)] ...)
and (fn [x] ...)
.
Now, if we try using our broken implementation to validate a value, we get an error:
(s/check (broken-both (s/pred even? 'even) (s/pred pos? 'positive)) 4)
> RuntimeException Walking is unsupported outside of start-walker; all composite schemas must eagerly bind subschema-walkers outside the returned walker.
Therefore, the second important take away is to always get the subwalkers outside of the returned function.