A while back David Nolen had a thoughtful post about using spec as a tool for thought, which included an exploration of creating a spec for clojure.core/let.
The latest Clojure alpha actually includes a spec for let
that covers destructuring and I thought it might be interesting to walk through the details of how it is implemented.
I'll pick up approximately where David left off. A typical let
looks like this:
(let [a 1
b 2]
(+ a b))
We can define an initial spec for clojure.core/let
by splitting it
into bindings and body:
(require '[clojure.spec :as s]
'[clojure.spec.gen :as gen])
(s/fdef let
:args (s/cat :bindings ::bindings
:body (s/* any?)))
We then need to more fully define bindings
as a vector of individual binding
s. Each binding
is made of a binding-form
and an init-expr
that computes the value of the local binding:
(s/def ::bindings (s/and vector? (s/* ::binding)))
(s/def ::binding (s/cat :binding ::binding-form
:init-expr any?))
The expressions can be anything so we leave those as any?
. The binding-form
is where things get interesting. Let's first allow for binding-form
to be just simple (no namespace) symbols. That's enough to create something to work with.
;; WORK IN PROGRESS
(s/def ::binding-form simple-symbol?)
Now that we have a full spec, we can actually try a few things. Let's try an example of conforming our bindings
.
(s/conform ::bindings '[a 1, b 2])
;;=> [{:binding a, :init-expr 1} {:binding b, :init-expr 2}]
Looks good! We get back a vector of binding maps broken into the binding and the initial expression.
Now we need to expand our spec to include sequential destructuring and map destructuring.
Sequential destructuring binds a series of symbols to the corresponding elements in the bound value. Optionally, the symbols may be followed by a variadic argument (using &
) and/or an alias for the overall sequence (using :as
).
Some examples:
;; Sequential destructuring examples:
[a b]
[a b :as s]
[a b & r]
To describe a sequential spec we use the spec regex operators:
;; WORK IN PROGRESS
(s/def ::seq-binding-form
(s/cat :elems (s/* simple-symbol?)
:rest (s/? (s/cat :amp #{'&} :form simple-symbol?))
:as (s/? (s/cat :as #{:as} :sym simple-symbol?))))
Let's try it out:
(s/conform ::seq-binding-form '[a b])
;;=> {:elems [a b]}
(s/conform ::seq-binding-form '[a b :as s])
;;=> {:elems [a b], :as {:as :as, :sym s}}
(s/conform ::seq-binding-form '[a b & r])
;;=> {:elems [a b & r]}
Hang on a sec, what happened in the last example? The elems snagged
& r
as well because &
is a symbol. We need to redefine our notion of
what a binding symbol is to exclude the symbol &
, which is special
in the language of destructuring:
;; WORK IN PROGRESS
(s/def ::local-name (s/and simple-symbol? #(not= '& %)))
(s/def ::seq-binding-form
(s/cat :elems (s/* ::local-name)
:rest (s/? (s/cat :amp #{'&} :form ::local-name))
:as (s/? (s/cat :as #{:as} :sym ::local-name))))
(s/conform ::seq-binding-form '[a b & r :as s])
;;=> {:elems [a b], :rest {:amp &, :form r}, :as {:as :as, :sym s}}
That's better. But it turns out I've not really been spec'ing the full truth of sequential destructuring. Each of the ::elems
can itself be sequentially destructured, and even the rest arg can be destructured.
We need to back up to the beginning and reconsider the definition of ::binding-form
to add the possibility of either a ::local-name
(our improved simple symbol) or a sequential destructuring form. (We'll add map later.)
(s/def ::local-name (s/and simple-symbol? #(not= '& %)))
;; WORK IN PROGRESS (still missing ::map-binding-form)
(s/def ::binding-form
(s/or :sym ::local-name
:seq ::seq-binding-form))
(s/def ::seq-binding-form
(s/cat :elems (s/* ::binding-form)
:rest (s/? (s/cat :amp #{'&} :form ::binding-form))
:as (s/? (s/cat :as #{:as} :sym ::local-name))))
Now ::binding-form
is a recursive specification. Binding-forms are either symbols or sequential forms, which may themselves contain binding-forms. The registry provides naming indirection which makes this possible.
Let's try our prior example again and see things have changed.
(s/conform ::seq-binding-form '[a b & r :as s])
;;=> {:elems [[:sym a] [:sym b]], :rest {:amp &, :form [:sym r]}, :as {:as :as, :sym s}}
Our conformed result is a bit more verbose as it now indicates for each binding form what kind of binding it is. While this is more verbose to read, it's also easier to process. Here's how a recursive binding form example looks:
(s/conform ::seq-binding-form '[a [b & c] [d :as e]])
;;=> {:elems [[:sym a]
;; [:seq {:elems [[:sym b]], :rest {:amp &, :form [:sym c]}}]
;; [:seq {:elems [[:sym d]], :as {:as :as, :sym e}}]]}
Finally we are ready to look at map destructuring.
Map destructuring has a number of entry forms that can be used interchangeably:
<binding-form> key
- for binding either a local name with (get m
key)
or recursively destructuring:keys [key ...]
- for binding locals with the same name as each
key to the value retrieved from the map using the key as a
keyword. In addition the specified keys can be either symbols or
keywords and simple or qualified. In all cases, the local that gets
bound is a short symbol and the value is looked up as a keyword.:<ns>/keys [key ...]
- same as :keys
, but where ns
is used as
the namespace for every key:syms [sym ...]
- for binding locals with the same name as each
sym to the value retrieved from the map using sym, which may be
either simple or qualified.:<ns>/syms [sym ...]
- same as :syms
, but where ns
is used as
the namespace for every symbol.:strs [str ...]
- for binding locals with the same name as each
sym to the value retrieved from the map using str as a sym, which
must be simple.:or {sym expr}
- for providing default values for any missing
local that would have been bound based on other entries. The keys
should always be simple symbols (the same as the bound locals) and
the exprs are any expression.:as sym
- binds the entire map to a local named sym.There is a lot of functionality packed into map binding forms and in fact there are really three different map specs combined into this single map. We call this a "hybrid" map spec.
The first part describes just the fixed well-known attributes in a
typical s/keys
spec:
(s/def ::keys (s/coll-of ident? :kind vector?))
(s/def ::syms (s/coll-of symbol? :kind vector?))
(s/def ::strs (s/coll-of simple-symbol? :kind vector?))
(s/def ::or (s/map-of simple-symbol? any?))
(s/def ::as ::local-name)
(s/def ::map-special-binding
(s/keys :opt-un [::as ::or ::keys ::syms ::strs]))
The second part describes the basic binding form specs (examples like
{n :name}
), although the left hand side here can further
destructure.
(s/def ::map-binding (s/tuple ::binding-form any?))
And finally we need to handle the new functionality for namespaced
key or symbol sets (like :<ns>/keys
or :<ns>/syms
) which we'll
describe here as a map entry tuple:
(s/def ::ns-keys
(s/tuple
(s/and qualified-keyword? #(-> % name #{"keys" "syms"}))
(s/coll-of simple-symbol? :kind vector?)))
Then we can put all of these together into the ::map-binding-form
by
combining them as an s/merge
or the well-known attributes and a
description of the possible tuple forms:
;; collection of tuple forms
(s/def ::map-bindings
(s/every (s/or :mb ::map-binding
:nsk ::ns-keys
:msb (s/tuple #{:as :or :keys :syms :strs} any?)) :into {}))
(s/def ::map-binding-form (s/merge ::map-bindings ::map-special-binding))
And finally we need to go back and define our parent spec to include map bindings:
(s/def ::binding-form
(s/or :sym ::local-name
:seq ::seq-binding-form
:map ::map-binding-form))
And that's it! Here's an example binding form that shows several features of destructuring:
(s/conform ::binding-form
'[[x1 y1 :as p1]
[x2 y2 :as p2]
{:keys [color weight]
:or {color :black weight :bold}
:as opts}])
;;=> [:seq {:elems [[:seq {:elems [[:sym x1] [:sym y1]], :as {:as :as, :sym p1}}]
;; [:seq {:elems [[:sym x2] [:sym y2]], :as {:as :as, :sym p2}}]
;; [:map {:keys [color weight]
;; :or {color :black, weight :bold}
;; :as opts}]]}]
Now that we have a spec for destructuring, we can reuse it anywhere destructuring is allowed - in fn
, defn
, for
, etc. We could even leverage it to implement destructuring itself. Rather than recursively parsing the binding form, we could simply conform it to receive a more regular structure described in terms of the parts we've defined in the spec.