Nuqleon.Linq.CompilerServices
Provides expression tree utilities such as visitors, analyzers, rewriters, evaluators, and more.
Visitors
System.Linq.Expressions
comes with a default ExpressionVisitor
that visits all nodes of an expression tree and invokes the Update
method on a node if any of its children changes (as tested by an reference equality check). This library provides additional types of visitors to enable rewrites of expressions to other types, to track scope information of variables, to visit other elements of the tree (such as reflection information), etc.
Generic visitors
ExpressionVisitor<TExpression>
is the base type for expression visitors that rewrite an expression of type Expression
to an object of type TExpression
. All of the Visit*
methods are abstract
and have to be implemented by derived types to perform the recursion and the conversion, e.g.:
protected abstract TExpression VisitBinary(BinaryExpressionAlias node);
Because this is quite tedious to do (both the recursion and the construction of results), a more derived type is provided:
public abstract class ExpressionVisitor<TExpression, TLambdaExpression, TParameterExpression, TNewExpression, TElementInit, TMemberBinding, TMemberAssignment, TMemberListBinding, TMemberMemberBinding, TCatchBlock, TSwitchCase, TLabelTarget> : ExpressionVisitor<TExpression>
where TLambdaExpression : TExpression
where TParameterExpression : TExpression
where TNewExpression : TExpression
where TMemberAssignment : TMemberBinding
where TMemberListBinding : TMemberBinding
where TMemberMemberBinding : TMemberBinding;
It's quite rich in generic type parameters, but they should be straightforward to understand. While this visitor takes care of the all the recursion over the structure of the tree, it still requires the user to supply a means to construct objects of the result types. These are represented as abstract
methods that use the Make
prefix, e.g.:
protected abstract TExpression MakeBinary(BinaryExpressionAlias node, TExpression left, TLambdaExpression conversion, TExpression right);
An example of a generic visitor that performs a conversion from an expression tree to another type is a printer (returning a string
for each node) or a converter to another tree representation (e.g. ExpressionSlim
for Bonsai trees).
Narrow visitors
The ExpressionVisitorNarrow<...>
class acts as a base class for expression visitors that do not support statement nodes (e.g. Block
, Loop
, etc.). It is provided as a means to deal with just expressions, as they existed in .NET 3.5 prior to the introduction of the DLR and statement nodes in .NET 4.0. All Visit*
methods for statement nodes are overridden and throw NotSupportedException
.
Partial visitors
The PartialExpressionVisitor<TExpression>
class implements all Visit*
methods by throwing NotSupportedException
. Derived types can override the nodes they want to support, therefore rejecting trees that have unsupported nodes. This is unlike the built-in ExpressionVisitor
where failure to override a virtual
method on the base class can go unnoticed.
Visitors with reflection
The ExpressionVisitorWithReflection
class is a traditional visitor but it also visits all of the "reflection" objects that exist on tree nodes. For example, BinaryExpression
has an optional Method
property of type MethodInfo
. The visitor with reflection support does invoke a VisitMethod
virtual to visit the Method
(in addition to recursing over Left
, Right
, and Conversion
). A corresponding MakeBinary
method is supplied to construct a new node in case any of the children (including the visited method) has changed:
protected virtual Expression MakeBinary(ExpressionType binaryType, Expression left, Expression right, bool liftToNull, MethodInfo method, LambdaExpression conversion);
Other examples of visiting reflection information include the Type
or ConstructorInfo
on a NewExpression
, the MemberInfo
on a MemberExpression
, the Type
in a UnaryExpression
used for purposes of Convert
, etc. By visiting all of these additional reflection objects, visitors can perform tasks such as checking reflection objects against allow lists, retargeting methods or members, etc.
Cooperative visitors
Cooperative expression visitors enable dispatching into a user-specified visitor for nodes that refer to methods, properties, fields, or constructors. For example, consider an expression representing a call to a method Bar()
on an instance of type Foo
:
(Foo f) => f.Bar()
If the expression gets visited using a CooperativeExpressionVisitor
and the Bar
method has a VisitorAttribute
applied to it, the visit of the MethodCallExpression
will dispatch into the user-specified visitor type using an interface called IRecursiveExpressionVisitor
with a single TryVisit
method:
bool TryVisit(Expression expression, Func<Expression, Expression> visit, out Expression result);
For example:
class Foo
{
[Visitor(typeof(BarVisitor))]
public void Bar() { /*...*/ }
private sealed class BarVisitor : IRecursiveExpressionVisitor
{
public bool TryVisit(Expression expression, Func<Expression, Expression> visit, out Expression result)
{
// implement custom logic here
}
}
}
The custom visitor can recurse into child nodes (using the visit
delegate passed in), and return a rewritten expression through the result
output parameter. If the TryVisit
method returns false
, the node remains unchanged. This mechanism is useful to inject behaviors into an expression visitor across component boundaries to support loose coupling. Rather than implementing a visitor that has references to a whole bunch of types (resulting in tight coupling and a central place where a lot of dependencies come together), a cooperative visitor can support a decoupled approach where the logic lives alongside the types and members that are being operated on.
Scope-tracking visitors
Scope-tracking visitors are useful to keep track of variables (represented as a ParameterExpression
) that are being declared in expression tree nodes including Lambda
(cf. LambdaExpression.Parameters
), Block
(cf. BlockExpression.Variables
), and Catch
(cf. CatchBlock.Variable
). These are the definition sites of variables. When visiting a tree, we'll also encounter use sites, i.e. any other place where ParameterExpression
is used. Scope-tracking visitors build an environment with symbol tables enabling a visitor to locate a definition site when encountering a use site. For example:
x => y => x + y
When visiting the outermost lambda expression, a new scope is entered where x
is declared. Next, when visiting its body, we encounter another lambda expression that declares y
. Finally, when visiting the body of this inner lambda expression, we'll pass through a binary addition node to finally find the use sites for x
and y
. At this point, we may want to figure out where these variables were declared.
The ScopedExpressionVisitor<TState>
base class can be used to build scope-tracking visitors. The TState
parameter represents a piece of state to keep and associate with a ParameterExpression
variable whenever a declaration site is encountered. This piece of state can then be looked up when a use site of a ParameterExpression
variable is visited. Examples of types used for TState
include:
- another representation of a variable to translate to (e.g. when going from
ParameterExpression
toParameterExpressionSlim
as part of Bonsai conversion); - a dummy value when the visitor is simply used to find unbound "global" variables that do no have a definition site within the tree being visited;
- a new instance of
ParameterExpression
to rewrite both definition and use sites to, for example when renaming variables or to avoid accidental aliasing or shadowing; - etc.
When a declaration site is encountered during the visit of a node such as Lambda
, Block
, or Catch
, a call is made to the abstract GetState
method:
protected abstract TState GetState(ParameterExpression parameter);
Once the state is obtained for all variables declared in a scope established by any of the nodes mentioned, it gets pushed into an environment through a virtual Push
method (which can optionally be overridden). This is done prior to visiting the body of the node (e.g. the Body
of a Lambda
, the Expressions
in a Block
, or the Body
in CatchBlock
).
In order to handle use sites, the derived visitor class can override VisitParameter
which will only be called by the base class for use sites (and no longer for definition sites). In there, one typically uses the TryLookup
method to try to look up the state associated with the given ParameterExpression
:
protected override Expression VisitParameter(ParameterExpression node)
{
if (TryLookup(node, out var state))
{
// do something with the state
}
return node;
}
Note that scope-tracking visitors honor the scoping and binding rules for variables. For example, in x => x => x
where all variables x
are represented using the same ParameterExpression
instance, the use site of x
binds to the innermost lambda's declaration site. Consider a more complex example:
x => f(x, x => x)
The corresponding bindings are as follows:
x => f(x, x => x)
^ | ^ |
| | | |
+------+ +----+
Type-based visitors
The TypeBasedExpressionRewriter
is a specialized visitor that can be parameterized on delegate-based Visit
functions to be invoked for nodes of a specified type. This can be useful in expression trees that mix different domains (e.g. IEnumerable<T>
and IObservable<T>
) where different types of rewrites have to be carried out.
To use this class, simply create an instance and add rewrite rules using the Add
method. For example:
var rewriter = new TypeBasedExpressionRewriter();
rewriter.Add(typeof(DateTime), expr => { /* process and return an expression */ });
rewriter.AddDefinition(typeof(IObservable<>), expr => { /* process and return an expression */ })
Expression res = rewriter.Visit(expr);
The AddDefinition
method is used for open generic types.
Expression tree factories
IExpressionFactory
is an interface abstracting over the static factory methods on System.Linq.Expressions.Expression
. We use this interface in various places where we have to create new nodes. The use of an interface enables alternative implementations of factory methods that perform additional checks, disallow the use of certain node types or members referenced by nodes, return cached node instances, etc.
A default implementation called ExpressionFactory
simply implements the interface in terms of the static factory methods on System.Linq.Expressions.Expression
.
In addition, an alternative ExpressionUnsafeFactory
is provided which instantiates the expression objects with tricks to bypass the static factory methods on System.Linq.Expressions.Expression
. This can significantly speed up the creation of new nodes (due to bypassing of various expensive reflection-based checks), at the expense of potentially constructing invalid trees. When trying to Compile
a LambdaExpression
containing nodes that don't properly type check, the CLR may encounter fatal execution engine conditions. Use of this factory is reserved for cases where there's no intent to compile the tree, or when prior checks have guaranteed correctness (for example because we're deserializing an expression tree that was serialized after having been constructed through safe factories).
Note: Unsafe expression factories have been used in some deployments of Nuqleon to speed up recovery times of query engines where a lot of expression tree deserialization operations take place.
FuncletExpression
Funclets are custom extension expression tree nodes that evaluate a given expression to a constant upon reduction. They're useful for partial evaluation and can be generated in an expression visitor when encountering a subtree that cannot be processed by some remote system (e.g. in query translation or prior to expression tree shipping). By leaving a FuncletExpression
wrapper around such a subtree in the tree, further visits to the tree will trigger a reduction that replaces the functlet by a ConstantExpression
containing the result of evaluating the subtree.
For example, consider a tree of the form x => x + a
, where a
is a variable captured from an outer scope. In an expression tree, a
will be represented as some MemberExpression
accessing a field a
on some ConstantExpression
which contains a Value
that references the compiler-generated closure class (in C# parlance, a "display class"). Some visitor may figure out that this closure class is something a service doesn't support (e.g. by checking all types and members against a set of known types and members, i.e. an allow list), thus deciding that the Member(Constant(closure), a)
subexpression should be locally evaluated. By wrapping it with a FuncletExpression
the partial evaluation to a Constant
can be deferred until future visits over the tree (in visitors that are not handling FuncletExpression
directly, thus causing reduction of the extension node through a call to Reduce
).
An additional extension method called Funcletize
can be used to create a funclet around a given expression.
Expression tree analysis
Various utilities are provided in this library to make the task of analyzing expression trees easier.
Equality comparers
ExpressionEqualityComparer
implements IEqualityComparer<Expression>
to compare two Expression
instances for equality, taking binding of variables and labels (used in Goto
and Label
expressions) into account. For example, given trees x => x
and y => y
, they will compare equal even though x
and y
are not reference equal. That is, as long as the binding of variables between use sites and definition sites is equivalent in both trees being compared, the equality requirement is met.
Note: Because keeping track of scopes requires state, the
ExpressionEqualityComparer
is really a factory for a stateful underlyingExpressionEqualityComparator
implementation. Each call toEquals
on the comparer causes the instantiation of a new comparator, using fresh state. This makes the use of a single equality comparer instance safe across threads.
Free variable scanner
In order to find unbound variables that occur in an expression, the FreeVariableScanner.Scan
method can be used:
public static IEnumerable<ParameterExpression> Scan(Expression expression);
Alternatively, a Boolean-valued check can be used (which is more efficient than compiling a list of unbound variables):
public static bool HasFreeVariables(Expression expression);
As an example, consider an expression of the form x => x + y
, where y
is unbound. Note than a single ParameterExpression
may be used in a bound context as well as an unbound context, for example Bar.Foo(x, x => x + 1)
contains an unbound occurence of x
in the first argument of the call to Foo
.
Free variable scanners are often used as part of a binding step. For example, in Nuqleon we represent query operators using Invoke(Parameter(...), ...)
, where the Parameter
is unbound and has a Name
that refers to the query operator, e.g. rx://filter
. Upon receiving such an expression, a free variable scan is carried out to find these unbound parameters, look them up in some environment (e.g. a query engine registry), and finally bind them by some form of inlining (e.g. beta reduction in lambda calculus terms, see BetaReducer
in this library).
Allow list scanner
When processing expressions, it's often useful to scan them for undesirable operations, such as calls to methods that are unsafe to the hosting environment (when evaluating an expression) or constructs that cannot be translated to some target language (e.g. when writing a query provider). This library provides allow list scanners that check expressions against a given list of allowed items. Two different types are provided:
ExpressionTypeAllowListScanner
which checks theType
for everyExpression
node in a given expression tree, e.g. to make sure a tree evaluates within a certain "type domain";ExpressionMemberAllowListScanner
which checks occurrences of anyMemberInfo
on anyExpression
node in a given exprssion tree, e.g. to check for methods, properties, fields, and constructors used.
Base classes are provided that offer a single Boolean-valued Check
function that can be implemented. In particular:
public abstract class ExpressionTypeAllowListScannerBase : ExpressionVisitor
{
protected abstract bool Check(Type type);
}
public abstract class ExpressionMemberAllowListScannerBase : ExpressionVisitor
{
protected abstract bool Check(MemberInfo member);
}
These visitors take care of recursing over the structure of the tree and inquiring about the validity of encountered Type
or MemberInfo
objects by calling Check
. In case it's more desirable to simply check types or members against a given list of allowed ones, the more derived classes can be used:
public class ExpressionTypeAllowListScanner : ExpressionTypeAllowListScannerBase
{
public TypeList Types { get; }
}
public class ExpressionMemberAllowListScanner : ExpressionMemberAllowListScannerBase
{
public TypeList DeclaringTypes { get; }
public MemberList Members { get; }
}
Instances of these scanners can be created using collection initializers for the Type
, DeclaringTypes
, and Members
properties. For example:
var scanner = new ExpressionMemberAllowListScanner
{
DeclaringTypes = { typeof(string) },
Members = { typeof(DateTime).GetProperty(nameof(DateTime.Now)) }
};
When using this scanner, all uses of members on System.String
will be allowed, as will be the use of DateTime.Now
. However, uses of methods like DateTime.Add
or Guid.Parse
will be rejected.
Upon encountering a type or member that is not supported, the default implementation will throw a NotSupportedException
. In some cases, a different action is warranted, which can be provided by overring various Resolve
virtual methods such as:
protected virtual Expression ResolveExpression<T>(T expression, MemberInfo member, Func<T, Expression> visit) where T : Expression;
Methods like this are called when the specified member
was rejected because it was detected on the given expression
. This provides the derived class with the option to take an action other than throwing an exception. For example, one may decide to rewrite the expression to a supported construct (while optionally using visit
to first continue scanning any subexpression of the current given expression), or decide to wrap the expression in a FuncletExpression
to trigger partial local evaluation (e.g. when the allow list scanning is done prior to sending an expression to some service, where local evaluation can reduce certain constructs - such as accesses to closures - to constants first).
Note: Nuqleon has been using allow lists, either table-driven or with manual overrides for
Check
, in various layers of the system including the client (prior to submitting expressions to a Nuqleon-compliant service), various front-end services (as a defense-in-depth mechanism, e.g. making sure expressions don't do things likeEnvironment.FailFast()
), as well as back-end services to further narrow down the set of allowed operations (or as a further defense-in-depth piece, right before evaluating an expression that may have been rehydrated from an untrusted store or has been subject to many rewrites by pluggable components).
Expression tree rewriters
This library also provides a whole plethora of expression tree rewriting utilities.
Alpha renaming
Lambda calculus has the notion of alpha renaming whereby the names of variables get changed in a way that doesn't inadvertently changes the bindings of variable. For example, consider an expression x => y => x + y
. In here, we can change the name of x
to a
but not to y
because it'd cause the binding to change to the y
declared on the inner lambda expression.
This library provides an implementation of alpha renaming using the AlphaRenamer.EliminateNameConflicts
static method, shown below:
public static Expression EliminateNameConflicts(Expression expression);
Its goal is to eliminate naming conflicts, for example in expressions like x => x => x
where it's unclear if this means x0 => x1 => x0
or x0 => x1 => x1
or x0 => x1 => x2
, which depends on the reference equality of the ParameterExpression
nodes in use sites and definition sites (in this case, the Parameters
collection of a LambdaExpression
, but all of this holds for Block
and Catch
nodes as well). Given an expression like this, the alpha renamer will introduce new variables to ensure that names are unique when confusion can arise.
Note: Alpha renaming is purely cosmetic; it doesn't ensure that
ParameterExpression
nodes are only used for a single definition site. For example, inf(x => x / 2, x => x + 1)
, all occurrences ofx
may be reference equal, but there's no name conflict in nested scopes, so the alpha renamer won't touch it. To ensure all instances ofParameterExpression
nodes are unique across definition sites, on can simply use aScopedExpressionVisitor<ParameterExpression>
where theGetState
override simply instantiates a newParameterExpression
with the same name (or optionally a different name) and the same type.
Beta reduction
Computation in lambda calculus is driven by beta reduction. Given an invocation of a lambda expression, a reduction can be made by substituting the arguments used in the invocation for the parameters of the lambda expression. For example:
(x => x + 1)(2)
can be reduced to 2 + 1
by substituting 2
for x
in the body of the lambda expression. That is, arguments get inlined in the lambda body. There are a few caveats when variables can shadow one another, for example:
(x => f(y => x + y))(y)
In here, the y
used for the top-level invocation may be bound to a declaration in a higher scope, or represent a global variable. If we naively carry out beta reduction, we may end up with f(y => y + y)
where both occurences of y
are now bound to the parameter introduced by the lambda expression passed to f
. To overcome this, lambda calculus provides a means to rename variables by using alpha renaming.
This library provides a beta reduction mechanism in BetaReducer.Reduce
. A few variants are available, as shown below:
public static Expression Reduce(Expression expression);
public static Expression Reduce(Expression expression, BetaReductionNodeTypes nodeTypes, BetaReductionRestrictions restrictions);
public static Expression ReduceEager(Expression expression, BetaReductionNodeTypes nodeTypes, BetaReductionRestrictions restrictions, bool throwOnCycle);
The simplest form is the most conservative one and only performs reduction for a select set of nodes including Constant
, Default
, Quote
, and Parameter
(if no incorrect "capture" results). For example:
(x => x + x)(42)
will be reduced to 42 + 42
, because the Constant
node representing 42
is free of side-effects and can safely be inlined for every occurrence of x
in the body of the Lambda
. However, an argument like int.Parse(Console.ReadLine())
does have side-effects and reducing:
(x => x + x)(int.Parse(Console.ReadLine()))
to
int.Parse(Console.ReadLine()) + int.Parse(Console.ReadLine())
will cause the side-effect of reading from the console to be duplicated. If this is desirable, or a separate analysis has figured out that the arguments passed to an Invoke
node are side-effect free (e.g. Math.Abs(42)
is pure because Abs
is a pure function, but the BetaReducer
does not have that knowledge), other overloads of Reduce
can be used that specify the BetaReductionNodeeTypes
and the BetaReductionRestrictions
.
The node types indicate which nodes are safe for inlining. By default, a value of Atoms
is used, which is a combination of Constant
, Default
, Parameter
, and Quote
. A value named Molecules
is used to refer to any other expression tree node type, and Unrestricted
combines both.
The restrictions control whether or not side-effects can be duplicated or dropped. Valid options are DisallowDiscard
(to prevent a side-effect from being dropped if the parameter does not occur in the lambda body), DisallowMultiple
(to prevent a side-effect from being duplicated if the parameter occurs more than once in the lambda body), or ExactlyOnce
(to ensure the parameter is used exactly once in the lambda body).
Note:
ExactlyOnce
merely enforces that exactly one use site of a parameter is found. Whether or not the use site will be evaluated (and how often) is an orthogonal concern. For example, the use site of the parameter may be inside a conditional (e.g.b ? x : y
), or inside a loop construct. As such, the number of evaluations of the inlined expression depends on the control flow around the binding site. This library does not perform control flow analysis at the moment.
Note that the order of side-effects may change due to inlining; the beta reducer does not provide an option to control that. For example:
(x => int.Parse(Console.ReadLine()) - x)(int.Parse(Console.ReadLine()))
Prior to beta reduction, the first read of the console happens as part of building the argument list for the invocation. This value will get bound to x
, and then the second read happens as part of evaluating the lambda expression. Say the user entered 5
and then 3
, the result will be -2
. However, after beta reduction, the order of side-effects is changed, and the left operand of (the non-commutative operator) -
is read first, causing the operands to be flipped, with the result being 2
instead.
Note: The beta reducer is typically used when inlining definitions for global unbound parameters, i.e. a binding step. For example, consider an expression
rx://filter(xs, x => x > 0)
.
First, a free variable scanner is used to find
rx://filter
andxs
. By "lambda lifting" the unbound parameters out of the original expression, we obtain:
(Func<IObservable<int>, Func<int, bool>, IObservable<int>> rx://filter, IObservable<int> xs) => rx://filter(xs, x => x > 0)
Next, the free variables (now lifted into a top-level parameter list on the constructed lambda expression) are looked up in some registry (e.g. in a query engine), resulting in bindings like:
rx://filter := (IObservable<int> source, Func<int, bool> predicate) => Observable.Where(source, predicate)
xs := Observable.Return(42)
Next, we construct an invocation expression around the (lambda lifted) original expression, by using the binding targets for the arguments, which yields:
((Func<IObservable<int>, Func<int, bool>, IObservable<int>> rx://filter, IObservable<int> xs) => rx://filter(xs, x => x > 0))((IObservable<int> source, Func<int, bool> predicate) => Observable.Where(source, predicate), Observable.Return(42))
Finally, two successive steps of beta reduction cause inlining to happen:
((IObservable<int> source, Func<int, bool> predicate) => Observable.Where(source, predicate))(Observable.Return(42), x => x > 0)
followed by
Observable.Where(Observable.Return(42), x => x > 0)
Quite often, as illustrated in the example above, one wants to repeatedly perform beta reduction until there is no further opportunity to do so (i.e. all Invoke(Lambda, ...)
sites have been reduced). This is achieved by using ReduceEager
which repeats beta reduction until the tree no longer changes (i.e. has reached a "fixed point").
Note:
ReduceEager
accepts athrowOnCycle
parameter which is used for cycle detection which can occur if a tree keeps reducing to itself (exercise left to the reader; hint: look up a lambda calculus operator called "omega").
Eta converter
A final lambda calculus construct is eta conversion, which simplifies an expression of the form (x => f(x))
to f
, by taking away the redundant lambda "abstraction" whose body contains an invocation "application". The result is a simpler, more compact, expression.
This library provides eta conversion through EtaConverter.Convert
.
Compiler-generated name elimination
The C# compiler emits compiler-generated names for certain lambda expression parameters, e.g. in the context of let
clauses that introduce so-called "transparent identifiers". An example is:
from x in xs
let y = x + 1
select x + y
Compilation of this query expression results in lowering to the following form:
xs
.Select(x => new { x, y = x + 1 })
.Select(<>__blah => <>__blah.x + <>__blah.y)
where <>__blah
is some compiler-generated identifier. This can make debug output for expressions quite cumbersome to read, so this library provides a utility (CompilerGeneratedNameEliminator.Prettify
) to make such an expression look prettier by renaming these parameters.
Constant hoisting
Constant expressions often occur in expression trees due to partial evaluation of locals at the point of rewriting an expression for submission to a service. For example:
int a = 41;
IQueryable<int> query = from x in xs where x > a select x + 1;
a = 42;
foreach (var x in query)
{
// use results
}
In this example, the foreach
loop triggers the query provider underneath the IQueryable<int>
to prepare the expression tree representing the query in order to perform evaluation. In some cases, this evaluation involves translation to an other language (e.g. SQL), but in systems like Nuqleon, we end up serializing the expression tree (in Bonsai format) to another machine.
The lowered query expression looks like this:
xs.Where(x => x > a).Select(x => x + 1)
where a
is a local variable in the outer scope. The compiler generates a closure to hold a
in a so-called display class, like this:
var closure = new <>__DisplayClass();
closure.a = 41;
IQueryable<int> query = xs.Where(x => x > closure.a).Select(x => x + 1);
but rather than now closing over closure
(which would go on ad infinitum), the expression tree generated for x => x > closure.a
contains a ConstantExpression
node holding the display class instance. That is, the body of the LambdaExpression
looks like this:
Expression.GreaterThan(
x,
Expression.Field(
Expression.Constant(closure),
typeof(<>__DisplayClass).GetField("a")
)
)
where x
is a ParameterExpression
.
During preparation of this expression for submission to a service, we typically perform partial evaluation (through a process called funcletization) of the MemberExpression
that reads the field from the closure, in order to avoid serializing the closure object (which refers to an compiler-generated type that does not exist in the service, nor do we want to deploy random assemblies).
Note: An alternative approach could be to rewrite display classes to
System.Tuple<...>
values which can be serialized more easily. However, we should be careful about serializing such values more than once. For example, in an expressionx => x > a && x < b
, botha
andb
end up being represented asMember(Constant(closure), field)
trees. If we keep these trees for botha
andb
, and serialize the tuple by value, we end up with two copies of the entire tuple. By partial evaluation of theseMember
expressions, we effectively "slice" the closure and only retain the hoisted variable's value in each use site. This said, if a variable occurs multiple times, we still end up with copies of such values for every use site. To mitigate that, we'd need to store the closure value in a "constants table" (a later version of Bonsai supports this notion).
In the example above, the expression gets rewritten to the equivalent of:
xs.Where(x => x > 42).Select(x => x + 1)
where 42
was captured at the point of query evaluation, in this case by the foreach
loop, which happens after re-assignment of a
with value 42
.
Once we end up installing this query on the service (and in the case of Nuqleon, it'd be a standing reactive event processing query that will keep running until explicitly disposed), we may have a lot of similar queries floating around due to earlier query submission from the same client library (e.g. with different values of a
), for example:
xs.Where(x => x > 42).Select(x => x + 1)
xs.Where(x => x > 17).Select(x => x + 1)
xs.Where(x => x > 99).Select(x => x + 1)
This results in expression tree compilation of queries that are completely identical, except for constants, and thus growth of the JIT-compiled code heap, extra memory used for expression trees (which are often kept resident in memory, e.g. to be checkpointed later, or to be analyzed at a later point, for example by a query optimizer). One way to solve this is by "hoisting" constants out of the expression and turn them into parameters:
((int p0, int p1) => xs.Where(x => x > p0).Select(x => x + p1))(42, 1)
Note that this starts to look like the original user code again, where a
has been replaced by a generated variable p0
. In addition, the constant 1
used in the Select
selector body has also been lifted into a variable. One can think of this as a generalization or templatization of an expression by punching holes (represented as variables) where constants used to be.
Now, when a new expression comes in, we can apply constant hoisting and compare the obtained lambda expression against already existing query expressions that may have been compiled to a delegate before. This lookup can be implemented by the ExpressionEqualityComparer
(for example used with a dictionary type) which supports equality checks and hash code computation.
Note: Constant hoisting results in a beta-reducible construct consisting of a
Lambda
expression and bindings ofParameter
toConstant
nodes. One can think of constant hoisting as a form of "outlining" where beta reduction is a form of "inlining".
This library supports constant hoisting via the ConstantHoister
class. To create a constant hoister, the Create
factory method can be used:
public static ConstantHoister Create(bool useDefaultForNull, params LambdaExpression[] exclusions);
The first parameter can be used to convert Expression.Constant(null, type)
to Expression.Default(type)
nodes, which reduces the number of null
constants to hoist (because it's common for expressions to contain null
-checks).
The second parameter can be used to specify exclusions of constants through patterns expressed as lambda expressions. For example (string s) => string.Format(s, default(object[]))
states that a constant occurring as the first argument of a string.Format
call should not be hoisted as a constant (because such uses are typically invariant across many copies of the tree, i.e. a format string is not something that varies across different copies of the tree). Other examples including ToString
format strings or Regex.Match
containing a regular expression string.
Once a ConstantHoister
has been created, the Hoist
method is used to perform hoisting:
public ExpressionWithEnvironment Hoist(Expression expression);
This returns an ExpressionWithEnvironment
which represents the hoisted expression and bindings of introduced variables to constant values:
public sealed class ExpressionWithEnvironment : IExpressionWithEnvironment
{
public Expression Expression { get; }
public IReadOnlyDictionary<ParameterExpression, object> Environment { get; }
IReadOnlyList<Binding> IExpressionWithEnvironment.Bindings { get; }
}
One typically constructs a LambdaExpression
from the Expression
and the keys in the Environment
(or the Bindings
list, which guarantees a stable order), and uses the resulting expression as a key for lookups used to check for existing expressions that have the same shape, except for constants that were outlined.
Alternatively, one can use ToInvocation
to obtain an Invoke(Lambda(...), ...)
expression that can be fed back into a BetaReducer.Reduce
call to inline the constants again.
Type substitution
The TypeSubstitutionExpressionVisitor
can be used to retype an expression tree from one set of types to another set of types. This involves specifying rules to rebind members involving the types being substituted. For example:
var subst = new TypeSubstitutionExpressionVisitor(new Dictionary<Type, Type>
{
{ typeof(DateTime), typeof(DateTimeOffset) }
});
Expression rewritten = subst.Apply(expr);
By default, members are located by finding a member with the same name (and parameter types, after type substitution in child nodes, in case of methods and constructors) on the target type. For example, when rewriting () => DateTime.Now.AddDays(1)
from DateTime
to DateTimeOffset
, both Now
and AddDays
will be located successfully.
However, if more advanced rules are needed to retarget members on types, a subclass of TypeSubstitutionExpressionVisitor
can be used to override various Resolve*
methods. An an example, consider the method used to resolve a MethodInfo
:
protected virtual MethodInfo ResolveMethod(MethodInfo originalMethod, Type declaringType, Type[] genericArguments, Type[] parameters, Type returnType);
Here, we're given the original method in the input expression tree, as well as all of the type constraints that are required to be met by the newly returned method to be used in the output expression tree. These constraints originate from a depth-first rewrite of the input tree, e.g. the target instance and the arguments on a MethodCallExpression
have been rewritten already.
If a Resolve
method returns null
, another virtual call is made to a FailResolve
method:
protected virtual MethodInfo FailResolveMethod(MethodInfo originalMethod, Type declaringType, Type[] genericArguments, Type[] parameters, Type returnType);
This provides a last chance to resolve the member when the default logic (using reflection) in ResolveMethod
works in most cases, but only failure cases have to handled manually. The default implementation of FailResolve
methods is to throw InvalidOperationException
.
One final extensibility point is the conversion of constants. For example, if an expression contains a ConstantExpression
of type DateTime
containing a value of said type, any rewrite to another type (e.g. DateTimeOffset
) needs to perform a conversion of these constants. This is done through a ConvertConstant
method:
protected virtual Object ConvertConstant(Object originalValue, Type newType);
The default implementation only supports assignability checks (using IsAssignableFrom
) to check for a possible automic conversion, as well as conversions of null
references. It does not try to invoke implicit conversions. A derived class can specify the exact behavior to use (for example, it could try to construct an expression tree that performs an Expression.Convert
, or perform partial evaluation of such a conversion to end up with a retyped ConstantExpression
).
Note: Type substitution is used in Nuqleon when erasing anonymous types or data model types for custom generated types (e.g. replacing
MappingAttribute
-annoted properties on a nominal type for a generated structural type using the names specified in such mappings). This is both applicable in theExpression
as well as theExpressionSlim
space. Another example is finding anonymous types in a query expression and replacing them by tuple types, which involves resolving properties on anonymous types to tupleItem*
member access expressions.
Tupletization
One of the drawbacks of delegate types in .NET is the lack of support for variadic generics, causing us to end up with a whole ladder of types to support functions with different numbers of parameters (cf. Action<...>
and Func<...>
families).
While expression trees support the creation of LambdaExpression
nodes with any number of parameters, it relies on runtime code generation (using System.Reflection.Emit
) to manufacture delegate types of arities that exceed the available Func<>
and Action<>
types. This results in types that are tricky to carry along across machine boundaries (though Bonsai supports ways to represent function types as first-class citizens).
Note: One can argue that functions with more than 16 parameters should be avoided, but keep in mind that expressions are often machine generated. For example, when performing constant hoisting, the number of parameters generated is linear in the number of constants that occurred in an expression.
Furthermore, having to support various arities of functions on APIs can be cumbersome. For example, consider a DefineObservable<TArg1, ..., TArgN, TResult>
function that has to provide 16 overloads. While this may still be desirable for user convenience, it becomes less appealing for cases where a call to such methods is generated by another component in the system, having to perform a binding step to pick the right overload. Therefore, it's desirable to have a "normal form" that enables squeezing N-ary functions through a single unary function.
Various ways to normalize are possible, including the use of curried functions ((x, y) => x + y
becomes x => y => x + y
), the use of record types ((x, y) => x + y
becomes p => p.x + p.y
), or the use of tuple types whereby a parameter list is "materialized" as a tuple ((x, y) => x + y
becomes t => t.Item1 + t.Item2
). The ExpressionTupletizer
type provides facilities to work with this normal form.
Note: Nuqleon has standardized on tuples as the normal form, but spin-offs of the technology (ICP , i.e.
IComputationProcessing
, a generalization of IRP, i.e.IReactiveProcessing
) have also implemented support for record normal form (where function parameters are composite record-based entities themelves) and curry normal form (which fits naturally with functional programming front-end languages).
In order to transform a LambdaExpression
to its tuple form, use the ExpressionTupletizer.Pack
method, as shown below:
Expression<Func<int, int, int>> f = (x, y) => x + y;
var g = (Expression<Func<Tuple<int, int>, int>>)ExpressionTupletizer.Pack(f);
The resulting expression will look like t => t.Item1 + t.Item2
, which can now be squeezed through APIs that accept a Func<TArgs, TResult>
parameter (or an Expression<>
variant thereof).
Note: Tupletization was introduced prior to the availability of
ValueTuple<>
types and hence usesTuple<>
types instead. Support for value tuples can be introduced, albeit separately, because the current behavior ensures read-only usage of parameters becauseItem*
properties on tuples areget
-only (and they are properties, so they can't be used - safely - forref
parameters either). This restriction was intentional because the use of tupletization in Nuqleon applied to expressions with a functional programming nature. When introducing support for value tuples (as an alternative, not a replacemnet), assignments will also be allowed.
The reverse operation is also possible using Unpack
. For example:
var h = (Expression<Func<int, int, int>>)ExpressionTupletizer.Unpack(g);
Putting these pieces together enables flowing a normal form through various layers of the system, only ever having to worry about unary functions, while still going back to the original form in some back-end system (e.g. right before compiling an expression where the construction and deconstruction of tuple-based values is deemed an unneccesary performance hit)
To deal with lambda expressions that have no parameters, the Pack
and Unpack
methods support an additional Type
parameter named voidType
. This can be used to insert a dummy unused parameter (of the specified type) in case the given lambda expression has no parameters. Also note that unary functions get rewritten to use a Tuple<T1>
type.
Other overloads of Pack
enable conversion of a params Expression[]
to a single expression that uses a tuple to combine all of the expressions (which should have non-void
types) into a single tuple creation expression. The inverse Unpack
operation can be used to get the original array of expressions back (i.e. it applies a deconstruction).
Note: Tuples support any number of components through the use of the
Rest
properties which can nest arbitrarily deep.Pack
andUnpack
support the use ofRest
properties to nest tuples.
Another utility that works with tuples is AnonynmousTypeTupletizer
. This expression rewriter is a type substitution visitor that replaces anonymous types (e.g. produced by let
clauses in query expression, due to the use of transparent identifiers) by tuple types. This can be useful to shake off compiler-generated types prior to carrying out further rewrite steps (or prior to serialization of the expression for shipment to another machine). To use this functionality, use the Tupletize
static method:
public static Expression Tupletize(Expression expression, Expression unitValue);
public static Expression Tupletize(Expression expression, Expression unitValue, bool excludeVisibleTypes);
The unitValue
is used to deal with empty anonymous types (new {}
in C#), and excludeVisibleTypes
is used to prevent rewriting of anonymous types that "leak" out of the expression (e.g. xs.Select(x => new { x })
causes the anonymous type to appear on the type of the entire expression, so it's not just used within the expression). As an example, consider a query like this:
from x in xs
let y = x + 1
select x + y
which is lowered into:
xs
.Select(x => new { x, y = x + 1 })
.Select(t => t.x + t.y)
After applying tupletization, the resulting expression will be similar to this:
xs
.Select(x => new Tuple<int, int>(x, x + 1))
.Select(t => t.Item1 + t.Item2)
except that the NewExpression
for the tuple creation will have its Members
collection set to properties Item1
and Item2
. This in turn enables query providers to make mappings from the arguments passed to the constructor to the properties they get exposed on (note: this is where the term transparent in transparent identifiers comes from).
CPS transformations
This library provides very rudimentary support for continuation passing style (CPS) transforms of code which can be used in the context of binding query execution plans to asynchronous infrastructure.
Note: The CPS transform support in this library only works for expressions because it originates from a LINQ provider toolkit we wrote in the .NET 3.5 days, prior to support for statement nodes. A full-blown CPS transformation framework existed in the context of Volta's tier splitting work, but it was written in CCI (the Common Compiler Infrastructure) and never got ported to .NET 4.0's expressions. While Nuqleon doesn't use this implementation of CPS transform directly (anymore), we're keeping it around for other uses (e.g. execution plans in some internal stores). A more complete CPS transformation framework could be built, especially in conjunction with the work on modernized expression trees with support for async lambdas and
await
expressions.
CPS transforms involve rewriting method invocations such as Add(a, b)
to Add(a, b, ret => ...)
where the result is provided through a callback rather than a returned result. This then enables the use of asynchronous functions.
Note: This functionality has been used for execution plans in stores that have asynchronous callback-based
Get
,Read
,Enumerate
, etc. operations, but the user's code uses synchronous operations instead. By enabling rewrites to synchronous stub methods (e.g.T Get<T>(string key)
) and annotating these stubs with aUseAsyncMethod
attribute, the CPS transformation framework performs a set of tree rewrites that replace the stub method calls with their async counterparts (e.g.void Get<T>(string key, Action<T>)
).
See ClassicCpsRewriter
and ClassicCpsRewriterWithErrorPropagation
for more details.
Note: Other implementations of CPS transforms using
Begin
/End
patterns orTask<T>
can be built using the same mechanisms, but it's relatively easy to write converters from the trivial callback-based model to any of these other patterns (at the expense of another layer of indirection).
Expression tree evaluation
Expression trees support compilation and evaluation through the Expression<T>.Compile
method. This is a very powerful mechanism that enables highly efficient JIT-compiled code to be emitted, using System.Reflection.Emit
under the hood. However, compiling a lot of expression trees can be quite expensive, both due to calling into the CLR's JIT (possibly deferred until the compiled delegate gets invoked), and the memory usage for the compiled expressions. This is particularly wasteful if a lot of expressions are the same or very similar.
The compiled delegate cache support in this library tries to reuse already compiled expressions in order to reduce the cost of expression tree compilation (by avoiding it, if possible) and to increase the density when many expressions are compiled and ran.
Note: The query engine in Nuqleon is all built around the idea of compiling expression trees representing standing query expressions. Compiling these expressions (rather than interpreting them one way or another) is very beneficial because many expressions are invoked for every single event flowing through an observable sequence (e.g. predicates passed to
Where
, selectors passed toSelect
, etc.).
To compile expressions with support for compiled delegate caching, use the CachedLambdaCompiler.Compile
method, which looks as follows:
public static T Compile<T>(this Expression<T> expression, ICompiledDelegateCache cache, bool outliningEnabled, IConstantHoister hoister);
Various overloads exist that reduce the number of options, or that deal with LambdaExpression
rather than Expression<T>
. The top-level structure is the same as the Compile
instance method on Expression<T>
, namely that the method returns a delegate of type T
. The method differs in its caching behavior which is controlled through 3 parameters:
cache
to specify a cache (and a policy) to hold on to compiled delegates;outliningEnabled
can make the cache more efficient by outlining nested lambdas and compiling them separately;hoister
is used to perform constant hoisting to make trees equivalent modulo constants.
For example, when outlining is enabled and a constant hoister is specified, an expression like:
y => f(x => x + 1, y + 2)
will first get decomposed into smaller expressions for x => x + 1
and y => f(d, y + 2)
(where d is a constant that will hold the compiled delegate for x => x + 1
). This technique enables reuse of x => x + 1
in case it occurs in other unrelated expressions.
Note: Outlining is quite beneficial in stream processing systems, especially after steps to normalize queries. For example, one query may write
xs.Where(x => x > 0 && x < 10)
while another one may writexs.Take(5).Where(x => x > 0)
. By normalizing the first query toxs.Where(x => x > 0).Where(x => x < 10)
we end up with smaller expressions that have a higher likelihood of being reused (in addition, things like0 < x
can be rewritten tox < 0
, i.e. some normal form). Now both queries containx => x > 0
and the compiled delegate for this predicate can be reused.
Next, the resulting lambda expressions are constant-hoisted, producing c0 => (x => x + c0)
and (c1, c2) => (y => f(c1, y + c2))
, where c0
, c1
, and c2
are variables that replace the constants that used to be in that position. Note that the outlined delegate (for x => x + 1
) itself became a constant, so any other expression of the form y => f(*, y + *)
(where *
is a placeholder for any constant) can match the reused compiled delegate.
Finally, a lookup for the constant-hoisted expressions is carried out against the supplied cache. If a match is found, the compiled delegate is retrieved and an invocation is made to provide the constants that were hoisted out. Otherwise, compilation is carried out, and the resulting delegate is added to the cache.
Three built-in caches are provided:
VoidCompiledDelegateCache
does not cache anything; it's useful for debugging, performance analysis, or to disable caching without changing the coreCompile
code.SimpleCompiledDelegateCache
is unbounded; it can be cleared externally by callingClear
, based on some user policy. (If not cleared, it can be a source of leaks.)LeastRecentlyUsedCompiledDelegateCache
uses an LRU eviction policy. A capacity is specified on the constructor. When the entries in the cache exceed the threshold, the least recently used one is evicted.
Note: It's relatively straightforward to build other caching policies by implementing
ICompiledDelegateCache
, for example based on the hit frequency for entries, to evict the least commonly used entries first.
Expression tree optimization
This library provides a few expression tree optimizations. More optimizations are available in Nuqleon.Linq.Expressions.Optimizers
.
Delegate invocation inlining
The DelegateInvocationInliner
provides a very narrow optimization that looks for InvocationExpression
nodes whose target is a ConstantExpression
containing a delegate-typed expression. If the delegate has an invocation list with a single invocation target (i.e. it's not a multicast delegate with multiple invocation targets attached to it), the tree is rewritten to a MethodCallExpression
for the method targeted by the delegate.
This optimization can be useful after carrying out binding steps where the binding targets for functions are ConstantExpression
s containing delegates pointing at a function's implementation. For example:
f(1)
where f
is an unbound variable of type Func<int, int>
. During a binding step, f
is resolved to a ConstantExpression
containing a delegate of type Func<int, int>
(e.g. Math.Abs
). When rewriting the expression using lambda lifting and lambda application, we end up with the equivalent of:
((Func<int, int> f) => f(1))(new Func<int, int>(Math.Abs))
except that the new
expression is not represented by a NewExpression
but rather by a ConstantExpression
containing the delegate instance.
When applying DelegateInvocationInliner.Apply
to this expression, the resulting expression tree will be equivalent to:
Math.Abs(1)
and thus avoid the indirection of delegate invocation.
Note: Some implementations of Nuqleon query engines have historically used delegate values for binding targets, rather than
LambdaExpression
s that can get inlined by usingBetaReducer
. Both approaches can be optimized well.
Expression tree interning
Because expression trees are immutable, they can be shared safely. For example, if two expressions have a common subexpression, the same object can be used to represent that common subexpression:
f() + g() + h()
and
f() + g() + i()
can reuse the common subexpression f() + g()
. However, it's not always possible to figure out that two or more expressions have commonalities that present opportunities for sharing. For example, when expression visitors are used to manipulate expression trees, commonalities may not occur until rewrites have completed.
Expression tree interning allows for the detection of common subexpressions across a "forest" of trees, and rewrite expressions to allow for reuse of common subexpressions, thus reducing the memory utilitization. This is very similar to string.Intern
, although the latter only operates on complete strings, while expression tree interning can consider every subexpression.
Note: Interning is quite expensive due to the computations involved to figure out commonalities between expressions. It's only recommended to use interning if expressions are long-lived. In the context of Nuqleon, this is most useful when expression trees are kept alive in registries or on quotations that are needed to support higher-order query operators (e.g. in
xs.SelectMany(x => f(x))
, the expressionx => f(x)
is kept - even after compilation to a delegate - because inner subscriptions need to be able to construct an expression, which will get checkpointed, that represents the observablef(x)
for a given value ofx
).
The entrypoint of the interning functionality is the ExpressionInterning.Intern
method:
public static TExpression Intern<TExpression>(this TExpression expression, IExpressionInterningCache cache)
where TExpression : Expression;
When a cache is omitted, a global cache is used. Alternatively, an instance of ExpressionInterningCache
or a custom implementation of IExpressionInterningCache
can be provided.
Similar to string.Intern
, the Intern
method returns an object of the original type which should be used instead of the object that was passed in the first parameter. If interning was successful, the returned will be a different instance than the one that was passed in. The original instance is no longer needed and can get garbage collected.
For example:
var expr1 = Expression.Add(Expression.Constant(1), Expression.Constant(2));
expr1 = ExpressionInterning.Intern(expr1);
var expr2 = Expression.Multiply(Expression.Constant(2), Expression.Constant(3));
expr2 = ExpressionInterning.Intern(expr2);
After interning both expressions, expr2.Left
will be reference equal to expr1.Right
, because these nodes can be shared. Interning doesn't just work for leaf nodes, it works for subexpressions (in a bottom-up fashion) of any size. For example, if another expression represents f(1 + 2, 2 * 3)
, with unique nodes for 1
, 2
, 1 + 2
, 2
, 3
, and 2 + 3
, interning can substitute the subexpressions 1 + 2
and 2 * 3
using expr1
and expr2
from the sample above.
Diagnostics
To assist with logging or debugging of expression trees, this library provides a ToCSharpString
extension method for Expression
which produces C#-like syntax to represent the tree. Not all constructs in expression trees can be (correctly) represented in C# (e.g. LoopExpression
having a result), so the resulting syntax just looks and feels like C#.
An additional Boolean parameter called allowCompilerGeneratedNames
can be passed to ToCSharpString
to control whether the resulting string can contain compiler-generated names (otherwise, an exception will be thrown when encountering such a name).
Alternatively, the ToCSharp
method can be used which returns a CSharpExpression
, providing additional information about the tree, alongside a string representation. This additional information includes a table of Constants
as well as a list of GlobalParameters
.
Note: Better expression tree printing support is available in https://github.com/bartdesmet/ExpressionFutures/tree/master/CSharpExpressions.
BURS
BURS stands for Bottom-Up Rewrite System and is based on the https://www.researchgate.net/publication/220752446_Simple_and_Efficient_BURS_Table_Generation paper. It provides for a table-driven weight-based approach to generate code. The tables contain rules which consist of patterns to recognize, a cost, and a rewrite rule.
This library generalizes the BURS mechanism to conversions between a generalized notion of trees, represented as ITree<T>
. In this context, BURS is a table-driven way to build a compiler that translates between an ITree<T>
to an ITree<R>
. The special case of rewriting from an ITree<T>
to an ITree<T>
is a rule-driven optimizer.
As an example, consider a first ITree<T>
where T
represents node information for a tree of some arithmetic language providing unary operators +
and -
, as well as binary operators +
, -
, *
, and %
. Next, consider a second ITree<R>
where R
represents node information for a tree of some broader scientific computing language, which may provide operations such as Negate
, Add
, Inverse
, and Multiply
, etc. In this language, subtraction is represented as an Add
with the second operand being Negate
d first. Similarly, Divide
is represented as a Multiply
with the second operand being Inverse
d first. Using BURS, we can formulate rewrite rules, like this:
a + b => Add(a, b)
a - b => Add(a, Negate(b))
a * b => Multiply(a, b)
a / b => Multiply(a, Inverse(b))
Both sides really represent trees; the use of infix +
versus prefix Add
is just a means to denote different domains. Left is an arithmetic language, right is a different kind of math language.
In this example, the patterns on the left are pretty simple. They simply state to look for nodes such as +
, -
, etc. and recursively rewrite the left and right operands (represented as "variables" a
and b
). A rewrite across different domains requires an input tree to be fully covered by rules in the table in order to produce a tree in the output domain.
Most likely, our input and output languages also support symbolic variables and constant numbers. In the sample, a
and b
are really just used as wildcards to represent "any subtree" in the input domain, and the result of rewriting these subtrees for construction of a tree in the output domain. To add support for nodes such as Constant
or Variable
, extra rules can be provided:
c (where c is a constant) => Const(/* value of c */)
v (where v is a variable) => Variable(/* name of v */)
For now, let's ignore the details on how to encode the constraints, but let's assume a tree in the input domain that looks like:
(x - y) * 2 + (z - 1)
where x
, y
, and z
are variables, and 2
and 1
are constants. Given the rule tables above, the BURS rewriter will try to cover the tree in a bottom-up fashion:
+
/ \
* -
/ \ / \
- 2 z 1
/ \
x y
First x
will turn into Variable("x")
, then y
will turn into Variable("y")
. Next, the -
node combines the result of recursively rewriting x
and y
by applying the rule with production Add(a, Negate(b))
, thus yielding Add(Variable("x"), Negate(Variable("y")))
. In a next step, 2
gets converted to Const(2)
, then *
, etc. The ultimate result is:
Add(Multiply(Add(Variable("x"), Negate(Variable("y"))), Const(2)), Add(Variable("z"), Negate(Const(1))))
BURS can also be used to write optimizers. For example, assume we want to simplify expressions in the math language. We could then construct rules tables that look like this:
Negate(Const(c)) => Const(-c)
Negate(Negate(a)) => a
This illustrates that patterns expressed on the left hand side can be more complex and deeply nested. In this case, we match two levels deep, but much more complex rules could be written. For example, if we have a Boolean logic language, we could express De Morgan's rules:
Not(And(Not(a), Not(b))) => Or(a, b)
Not(Or(Not(a), Not(b))) => And(a, b)
Building on the earlier example, we could use BURS to create a reverse translation as well, which involves more complex patterns to match on:
Add(a, b) => a + b
Add(a, Negate(b)) => a - b
Multiply(a, b) => a * b
Multiply(a, Inverse(b)) => a / b
However, in this case, we also need to supply a way to support Inverse
and Negate
when these occur by themselves, which contributes rules like these:
Negate(a) => 0 - a
Inverse(a) => 1 / a
To prefer one pattern over another, rules can get weights assigned. In the original context of BURS weights typically corresponded to the cost of the emitted code for a pattern (e.g. if an instruction set supports some addmul
ternary operation, it may be cheaper than successive applications of add
and mul
operations, thus a rule producing addmul
would have a cost that's lower than the sum of add
and mul
individually).
BURS support in this library is provided through types such as BottomUpRewriter<...>
. An example taken from test code is shown below:
var burw = new BottomUpRewriter<ArithExpr, ArithNodeType, NumExpr, ArithWildcardFactory>
{
// Leaf nodes
Leaves =
{
{ (Const c) => new Val(c.Value), 1 },
},
// Tree patterns
Rules =
{
{ (l, r) => new Add(l, r), (l, r) => new Plus(l, r), 2 },
{ (l, r) => new Mul(l, r), (l, r) => new Times(l, r), 3 },
{ (a, b, c) => new Add(new Mul(a, b), c), (a, b, c) => new TimesPlus(a, b, c), 4 },
{ x => new Add(x, new Const(1)), x => new Inc(x), 1 },
},
Log = logger
};
In here, we add rules for leaf nodes (through Leaves
) like constants (or variables, as we alluded to in the earlier examples), as well as for tree patterns (through Rules
). Such rules are triplets of a pattern to match on in the source domain, a production in the target domain, and a weight.
Once a BURS rewriter has been constructed with all tables populated, the rewriter can be applied to an input tree, producing an output tree (or an error). For example:
// (((2 * 3) + 1) * (4 * 5)) + 6
var e = new Add(
new Mul(
new Add(
new Mul(
new Const(2),
new Const(3)
),
new Const(1)
),
new Mul(
new Const(4),
new Const(5)
)
),
new Const(6)
);
var res = burw.Rewrite(e);
In this example, the input is an ArithExpr
, and the output in res
is of type NumExpr
. The result for the rewrite shown above is:
TimesPlus(Inc(Times(2, 3)), Times(4, 5), 6)
where one can observethe selection of the third rewrite rule for trees of the form a * b + c
because the weight of that rule (4
) is less than the cumulative weight (5
) for successive applications of rewrite rules for l * r
(weight 3
) and l + r
(weight 2
).
To work with BURS, trees have to be implemented using the ITree<T>
abstraction. To use BURS over System.Linq.Expressions.Expression
trees, the library provides a ToExpressionTree
conversion from Expression
to ITree<ExpressionTreeNode>
.
Note: BURS has been used to build table-driven query providers (e.g. going from expression trees to some
ITree<SqlNode>
for translation to SQL) and optimizers. Mileage varies depending on the complexity of the rule matching involved. Further extensions of BURS have been written in spin-off projects, modeling type system checks (e.g. how does a rule involving a method invocationobject.Equals(object)
relate to rules that involve an override of this virtual method on a more derived type?), supporting additional predicates to drive the rule selection process, and dynamic computation of weights.
Miscellaneous utilities
This section contains documentation for utilities that don't really fit in another other section.
ReflectionHelpers
Provides a set of InfoOf
methods that obtain a MemberInfo
from an expression tree. This is a mechanism akin to typeof
for types but targeting methods, properties, fields, and constructors instead (much like a hypothethical C# infoof
operator could do). For example:
var m = (MethodInfo)ReflectionHelpers.InfoOf((string s, int x, int y) => s.Substring(x, y));
will return the MethodInfo
representing string.Substring(int, int)
.
RuntimeCompiler
The runtime compiler uses System.Reflection.Emit
to build anonymous types, closure types, and so-called record types.
Anonymous types
Runtime-generated anonymous types are analogous to C# 3.0 and VB 9.0 anonymous types. Both flavors can be built using different overloads of CreateAnonymousType
:
public static Type CreateAnonymousType(IEnumerable<KeyValuePair<string, Type>> properties);
public static Type CreateAnonymousType(IEnumerable<StructuralFieldDeclaration> properties);
public static Type CreateAnonymousType(IEnumerable<KeyValuePair<string, Type>> properties, params string[] keys);
public static Type CreateAnonymousType(IEnumerable<StructuralFieldDeclaration> properties, params string[] keys);
Overloads with StructuralFieldDeclaration
support adding custom attributes to the generated properties. The difference between overloads that lack a keys
parameter versus the ones that have one has to do with the properties that participate in the implementation for Equals
and GetHashCode
. Being able to specify particular properties as "keys" matches the design of anonymous types in Visual Basic.
Closure types
Closure types are simply classes that declare a bunch of fields. To create a closure type at runtime, use the CreateClosureType
method:
public static Type CreateClosureType(IEnumerable<KeyValuePair<string, Type>> fields);
Record types
The notion of record types in this library predates C# 9.0's record types by almost a decade. Record types in this library are classes that are similar to anonymous types but provide control over the implementation of equality (value versus reference equality). Use the CreateRecordType
method to create them:
public static Type CreateRecordType(IEnumerable<KeyValuePair<string, Type>> properties, bool valueEquality);
public static Type CreateRecordType(IEnumerable<StructuralFieldDeclaration> properties, bool valueEquality);
Define* method variants
In addition to the Create*
methods illustrated above, variants with the Define*
prefix are provided as well. Rather than returning a Type
, these accept a TypeBuilder
to define the type on. These variants are useful when trying to build recursive types (i.e. there's a cycle between declarations and uses of types), because one can use TypeBuilder
instances for the types of the properties on the anonymous or record type being constructed. Once all types have been defined, the user can then call CreateType
on the TypeBuilder
instances. Examples of types with cycles are:
// A -> A
class A
{
public A Next { get; set; }
}
// B -> C
class B
{
public C C { get; set; }
}
// C -> B
class C
{
public B B { get; set; }
}