Extending Transit

The transit spec defines several ground types that are supported by JSON and MessagePack, on which transit piggy-backs to convey data from one process to another. These include scalar types like strings and numbers, and composite types like maps and arrays.

The spec also defines several extension types, which are built from ground types, each represented by a tag and a value. The tag is a string, which is a scalar ground type. The value can be a scalar or composite type. When the reader encounters a composite, e.g. an array, it unpacks its values and reads each one, recursing until it reaches scalar ground types.

Transit uses this extension system to support scalar types like URI, and composite types like set. A URI is represented with the tag "r" and a string value of the URI. Here's what that looks in the json wire format:

The "~" tells the transit reader that the following character, "r" is the tag, and the rest is the value from which to build an instance of the type represented by that tag.

A set is represented with the tag "set" and an array value, which looks like this in the json-verbose format:

In this case, the transit reader sees a map with a single entry whose key starts with "~#" and treats the rest of that string as the tag, "set", and builds a set from the array value.

We use the same extension sytem to support custom types that are not part of the transit spec. These can be domain-specific types like Person or Account, or they can be generic datatypes like trees, sorted sets, tuples, etc. They don't need to be supported by all of the languages in our system. In fact, they don't even need to be recognized by the transit libs running in all of the processes in our system!

When a transit reader encounters a tag it doesn't recognize, it builds a TaggedValue object, which the writer knows how to write back out to the wire in the same form in which it arrived. This means that this data can safely travel through intermediate nodes that don't care about this type or the instance it represents with no error and no degredation. The only nodes that need to support custom extension types are the ones that actually write and read them.

To define a custom extension type, we just need to pick a tag and a representation that is already supported. We'll define a sorted-set, since that is not part of the transit spec. We'll use the tag "sorted-set" with an array as the representation. Here it is in transit's verbose JSON format:

To coerce transit to write e.g. a Clojure sorted-set as a transit sorted-set, we need to create a custom write handler that writes the "sorted-set" tag and array representation. transit-clj provides a write-handler function to build this handler for you:

The write-handler function takes the tag and a function that can transform the runtime instance into a type that is already supported. In this example, we build a vector from the set, which transit then writes out as an array.

Once we have a write handler, we can pass it to the writer constructor function:

The writer function takes an output stream, a keyword identifying the wire format we want to use, and an optional map, which may include a map of custom write handlers bound to the key :handlers. The custom handler map binds types, as keys, to handlers. In this case we're binding PersistentTreeSet, Clojure's sorted-set implementation, to our custom sorted-set-write-handler.

Now we can use the writer to write a sorted-set to the output stream using the write function, giving it the writer with our custom handler and a sorted set:

Now we have a write handler, but no read handler, so the transit reader won't recognize the "sorted-set" tag and it will provide a TaggedValue object:

If the application in which this sorted-set is read doesn't do anything with it, we're done. There are no errors thrown when the reader doesn't recognize the tag, as would likely be the case if we were constrained by a schema. If it passes this data onto another process, the built-in TaggedValue write handler will write it back out as it came in:

If the application does need to process the value as a sorted set, we need to add a sorted-set read handler. As you might expect, transit-clj has a read-handler function we can use to build one. Just pass it a function to convert the representation (the array) to the value we want (a sorted set), and bind it to the "sorted-set" tag in the :handlers map:

Now not all languages support a SortedSet type directly.

JavaScript, for example, does not support a SortedSet type out of the box, so to handle them in JavaScript we have some decisions to make. If we need to round-trip them without using them, there's nothing to do because transit-js's reader will construct a TaggedValue and its writer knows how to write them back out.

If we need to display them without round-tripping them, a read handler that builds arrays would be just fine:

The rep is already an array, so we just return it as is!

Now if we want to be able to read them, use them as sorted-sets, and write them, then we're probably best served by choosing an existing sorted-set implementation or writing our own, and then adding read and write handlers for that type. It all depends on how we want to use them.

And that's really all there is to it. The hardest part is deciding how to represent unsupported types, and whether we need custom read and/or write handlers in each environment. The write and read handlers, themselves, couldn't be easier nor simpler! Easy, in that they are easy to define, construct and reason about. Simple, because they are not coupled to anything beyond the tag, representation, and concrete implementation types in each language. Nor do they impose themselves on any process that doesn't care about them! This means that intermediate nodes can remain blissfully ignorant and they, and you, can sleep soundly at night knowing that your data is making it safely and intact from one interested node to another.

Get In Touch