Skip to content

Writing Custom Transformations

Jason Wolfe edited this page Jan 8, 2014 · 8 revisions

In Schema 0.2.0, we introduced transformations, which are a powerful extension to performing all sorts of manipulations to data. The new version of schema comes with two concrete applications of transformations: validation and data coercion. We believe these two implementations are good demonstrations of what transformations can do, but there is a lot of unexplored potential. Here we will explore how to write custom transformations to tap into their power.

The workhorse for performing transformations is the walker method, defined in the Schema protocol of schema.core. This method returns a function that takes in data and returns a transformed version of the data. The walker can be thought of as a function that simultaneously traverses a schema and input data that it is matched against. Often schemas are composite -- that is, built out of other schemas -- and the walker method for a schema knows how to call into the walkers of its subschemas with the matching pieces of an input datum.

Writing a Custom Walker

There are many things that walkers can do, but most custom walkers will follow the following general pattern:

(defn custom-walker [input-schema]
  (start-walker
   (fn [schema]
     (let [walk (walker schema)]
       (fn [data]
         ;; transform the `data`
         (-> data
             pre-process
             walk
             post-process))))
   input-schema))

The custom walker takes as input a schema to walk, it returns a function that takes as input data and returns a walked version of that data. In general, the walk proceeds by traversing the schema and data together. If the schema is composite, and made of smaller schemas, its walker knows how to delegate parts of the walk to its subschema.

The custom starts with a call to start-walker, which takes as input a function that knows how to transform the data and the schema on which to perform the processing (the input schema). Inside of the custom walk, the data can be preprocessed, then walked, and then post-processed. This is shown using the hypothetical pre-process and post-process functions in the snipped above. The implementation of these methods can depend on the particular value of schema in the code.

Let's see a few examples to make this concrete.

Demo Walker

Inspired by clojure.walk's prewalk-demo and postwalk-demo, our first example is a walker that checks whether a value matches a schema (just like the checker in schema.core) but in addition, it will print the steps of its processing as it checks the data.

Here is the walk demo custom walker:

(defn walk-demo [schema]
  (start-walker
   (fn [s]
     (let [walk (walker s)]
       (fn [x]
         (let [result (walk x)]
           (printf "%s | checking %s against %s\n"
                   (if (utils/error? result) "FAIL" "PASS")
                   x (explain s))
           result))))
   schema))

In terms of the general walk pattern, this walker does no preprocessing. However, it does do a post-processing step, one that only has a side-effect: it prints its work. It prints the data it is checking, the schema it is checking against, and the result of the schema check.

Here is an example:

((walk-demo {:a Long (optional-key :b) String}) {:a 3 :b "Hello"})
PASS | checking 3 against java.lang.Long
PASS | checking [:a 3] against (map-entry :a java.lang.Long)
PASS | checking Hello against java.lang.String
PASS | checking [:b "Hello"] against (map-entry (optional-key :b) java.lang.String)
PASS | checking {:a 3, :b "Hello"} against {:a java.lang.Long, (optional-key :b) java.lang.String}
> {:b "Hello", :a 3}

In the output, we see the walker checking all of the components of the input data against the given map schema. We can see, for example, how the walker traverses the maps by key-value pairs. The returned value is simply the result of calling the map schema's walker method on the input data, which returns the data if there is no error.

Let's see what this looks like when there is an error, and the input does not match the schema:

((walk-demo {:a Long (optional-key :b) String}) {:a 3 :b 3})
PASS | checking 3 against java.lang.Long
PASS | checking [:a 3] against (map-entry :a java.lang.Long)
FAIL | checking 3 against java.lang.String
FAIL | checking [:b 3] against (map-entry (optional-key :b) java.lang.String)
FAIL | checking {:a 3, :b 3} against {:a java.lang.Long, (optional-key :b) java.lang.String}
> #schema.utils.ErrorContainer{:error {:b (not (instance? java.lang.String 3))}}

Here we see the value for the optional-key :b is a long instead of a String, as specified in the schema. We can see the failures propagating up through the schema checking. The returned value from the entire walk is an error because the input data does not match the schema.

Simple Coercion

As a second example of a custom walker, let's consider a simple walker that does some basic coercion. This is a simplified version of the style of coercion we released in schema.coerce just to get the point across.

(defn simple-coercion [schema]
  (start-walker
   (fn [s]
     (let [walk (walker s)]
       (fn [x]
         (if (and (= s Keyword) (string? x))
           (walk (keyword x))
           (walk x)))))
   schema))

This simple coercion conditionally preprocesses a data, transforming inputs that are Strings into keywords, but only when the corresponding schema expects a keyword. Notice that this is an example of the type of fancy preprocessing you can do using transformations: it transforms the data based on both the data and the schema it is trying to match.

Here is an example:

((simple-coercion {Keyword String}) {"name" "clojure" "designer" "Rich Hickey"})
> {:designer "Rich Hickey", :name "clojure"}

The schema specifies that it is expecting a map from keyword to String, but the data is a map from String to String. Ordinarily, the schema checker would return an error on this input. Nonetheless, the output of the simple coercion is not an error, but instead a map from keyword to String. The walk coerced the keys of the map intro keywords because that is what the schema was expecting, but it left the values as strings.

The simple coercion has not lost the validation power of the checker. If we try coercing something that does not fit (and cannot be coerced to fit) the schema, we still get an error:

((simple-coercion {Keyword String}) {"name" 7 "designer" "Rich Hickey"})
> #schema.utils.ErrorContainer{:error {"name" (not (instance? java.lang.String 7))}}

If you are looking to dig deeper into coercion, we recommend taking a look at schema.coerce

Capture Groups

Schema serves a similar purpose as regular expressions in that both are declarative methods of pattern validation. Regular expressions match patterns in strings, whereas schema is used for matching patterns in Clojure data structures. Regular expressions provide an additional functionality of being able to capture parts of the strings that they match to be used outside of the matching code. Up until this point, schema did not support capture behavior, but with transformations, capture groups in schema are now possible.

Here is a method that takes a schema and some data, and returns a vector of parts of the input corresponding to s/Any subschemas:

(defn capture [schema data]
  (let [captured (atom [])]
    (or
     (utils/error-val ;; check for errors
      ((start-walker
        (fn [s]
          (let [walk (walker s)]
            (fn [x]
              (if (= Any s)
                ;; save result
                (do (swap! captured conj x) x)
                ;; otherwise, continue processing the rest of the input
                (walk x)))))
        schema)
       data))
     ;; when there are no errors, return the captured matches
     @captured)))

The method will check the input data against the given schema and return a vector of the parts of the data that match the Any schema.

For example, here we make a schema that matches sequences of books represented by their title and year of publication.

(capture [{:title Any :year long}] [{:title "Moby Dick" :year 1851} {:title "Crime and Punishment" :year 1866}])
> ["Moby Dick" "Crime and Punishment"]

Since we use the Any schema to match the title of the book, the capture method will save off all of the book titles into the captured vector as it processes the sequence of books. Prior to returning the vector of captured results, the capture method checks whether there were any errors while matching the schema. If there is an error, the error is returned instead of the vector of captured matches:

(capture [{:title Any :year long}] [{:title "Moby Dick"}])
> [{:year missing-required-key}]
Clone this wiki locally