Nuqleon.Linq.Expressions.Bonsai.Serialization
Provides serialization of Bonsai expressions in JSON and binary form.
Design
The design of Bonsai serialization is centered around a couple of principles:
- Separate the representation of the tree structure from auxiliary information such as types, members, etc.
- Use LISP-style S-expressions to denote nodes in the tree.
- Making typing optional to enable interoperability with different languages.
An example of a Bonsai tree serialized in JSON is shown below:
{
"Context":
{
"Types":
[
["::", "System.Int32"]
]
},
"Expression":
[
"+",
[
":",
42,
0
],
[
"$",
"x",
0
]
]
}
The Context
provides information about assemblies, types, and members. For example, Types
contains an array of types, which are represented using arrays themselves. The first element of such arrays is a discriminator. In the example above:
["::", "System.Int32"]
uses ::
to denote a simple type, which just has a name and an (optional) reference to an assembly in the Assembly
table. While the names of types are typically inherited from CLR constructs, further rewrite steps can normalize these in a manner similar to expression tree normalization carried out in Nuqleon (e.g. a method call to Where
becomes an invocation expression of an unbound function variable named rx://filter
).
Note: Various deployments of Nuqleon has been augmented with further Bonsai expression normalization steps based on the interoperability targets involved. For example, when shipping expressions down from the cloud service to devices (e.g. phones running signal processing for GPS sensors, etc.), there's an interoperability boundary between .NET and C++. In such a setting, Bonsai rewrites are used to erase assembly references such as
mscorlib
and type references such asSystem.Int32
in favor of an agreed-upon set of "virtual" assemblies and types. Similarly, any remaining references to members on types are erased by tree rewriting steps that refer to unbound functions (e.g. aSubstring
method call can be turned into an invocation of an unbound function calledstd://string/substring
).
The convention of using arrays and discriminators as the first element carried forward in the expression's serialization format as well. In the example above, the +
discriminator is used to denote an Add
operation, while the :
discriminator is used to denote a Constant
node. Zooming in to such a constant:
[":", 42, 0]
there are two elements following the discriminator. The first one contains the value, here serialized as a JSON number. The second one is optional and, if specified, contains a reference to an entry in the type table (here referencing System.Int32
). Various leaf nodes such as Default
, Constant
, Parameter
, etc. support such optional type annotations. In the example above, $
represents a Parameter
node which also has an optional type reference (besides a name).
The full list of discriminators can be found in a file called Discriminators.cs
. This includes discriminators for expression nodes, type kinds, and member kinds. A couple of examples:
[]
is used as a discriminator for array types, but also for index expressions.["[]", 0]
is a single-dimensional array with element type0
, which is an index into the type table.["[]", obj, [arg0, arg1]]
is an index expression withobj
representing the object being indexed (e.g. an array), and thearg*
operands representing the arguments of the indexer.
<>
is used as the discriminator for closed generic types.["<>", 7, [0, 1]]
closes an open generic type definition in slot7
with generic arguments in slots0
and1
.
- Etc.
Note: While the Bonsai specification allows for the omission of various type references (to have "optional static typing"), the implementation shared in the Nuqleon OSS codebase does not support this in all places right now. A separate Bonsai deserialization implementation with support for dynamic typing in .NET has been built, but relied on an alternative implementation of the Dynamic Language Runtime (with different types of binders and tighter control over call site caches, enabling higher density in services running millions of compiled expression trees) which is currently not OSS. However, an implementation using the standard DLR is possible, and performance can likely be made to match fairly closely by playing more expression rewrite tricks where call sites are hoisted up and reused across various expressions. As an alternative, more aggressive type inference could be used in a pre-processing step (prior to deerialization) in an attempt to reconstruct typing earlier rather than relying on dynamic binding at runtime. Finally note that type inference can also take place prior to serialization, e.g. if a JavaScript client loads the TypeScript compiler as a library to try to infer types (or require type inference to be complete by running in a strict mode, disallowing
any
).
Serialization and deserialization
Serialization and deserialization of ExpressionSlim
objects is carried out via an ExpressionSlimBonsaiSerializer
instance. The two core methods are:
public Json.Expression Serialize(ExpressionSlim expression);
public ExpressionSlim Deserialize(Json.Expression expression);
Note the use of Nuqleon.Json.Expression
as the object used to represent the JSON document. Having an object representation for these allows for pre- and post-processing steps using visitors for JSON nodes as well. The final conversion from and to strings is carried out in Nuqleon.Json
or using interoperability helpers with other JSON frameworks.
In order to construct an ExpressionSlimBonsaiSerializer
, two more pieces are needed:
- a lift factory for
Object
toObjectSlim
conversion, and, - a reduce factory for
ObjectSlim
toObject
conversion.
These are passed to the constructor and provide a place to plug in serializers for values. For example, the Nuqleon.DataModel
serializer is plugged in here at the Reactive layer of the stack. This supports decoupling of expression tree and value serialization.
Note: Some deployments of Nuqleon use JSON for DataModel serialization, while others use binary (and store it in a Bonsai tree as a base64 encoded string). Yet other deployments of Nuqleon have used totally different data models, for example based on Entity Framework's EDM.
The constructor is shown below:
public ExpressionSlimBonsaiSerializer(
Func<Type, Func<object, object>> liftFactory,
Func<Type, Func<object, object>> reduceFactory,
Version version)
Use the Versioning.Default
value for the version of Bonsai to use (currently 0.9
). The lift and and reduce factories are higher-order functions that deal with the ConstantExpression.Value
or ConstantExpressionSlim.Value
to convert to/from the .NET object. If only primitve types are used, one can create an instance of ObjectSerializer
and use its lift/reduce support:
var obj = new ObjectSerializer();
var ser = new ExpressionSlimBonsaiSerializer(obj.GetJsonSerializer, obj.GetJsonDeserializer, Versioning.Default);
More advanced scenarios require parameterization with a custom lift/reduce factory to handle more complex values. Examples of this can be found in higher layers of Nuqleon where the Bonsai serializer and Data Model serializer are integrated to support any direction of nesting (e.g. an entity with a property that's an expression tree, which in turn contains a constant node holding an entity value).