Reaqtor's Open Source Journey
Endjin's journey with Reaqtor started in 2016, with a hot cup of tea and a simple question:
"What single action could you take which would deliver the biggest positive impact on customer satisfaction?"
The question popped up in a Brain Trust session with Ilario Corna, Head of Infrastructure, Content & Operations at Talk Talk (a UK Telco). The organisation had just rolled out an objective to become the UK's "most recommended provider". As part of this strategic initiative, we had designed and implemented an Azure based solution capable of ingesting and analysing over 200 million network telemetry events per day for anomalous behaviour. Feeling buoyed by success we were wondering "which hard problem could we tackle next?"
Ilario had the answer immediately. "Oh, that's easy" he said, "I would send a field engineer out to each customer's house, run diagnostics on their broadband connection and hardware, upgrade their firmware and implement any other tweaks required. But with over 1 million customers that's not commercially feasible to implement."
After taking a sip of tea and pondering for a few moments I responded "So we have a solution, but we need to find a way to implement and scale it several orders of magnitude more cheaply. If I were in your shoes and could wave a magic wand, I'd want to be able to say something like 'Cortana, monitor the Quality of Service for each of my customer's connections, and if it's below our SLA, then run a trouble-shooter to diagnose why and then apply a number of automatic remediation strategies'. Wouldn't that be a great way to manage the network?"
He responded with a chuckle, "Sure would! When can I have it?!?" We both laughed then, because it sounded like a far-off sci-fi fantasy.
On the train home, that part of the conversation kept coming back to me. Why couldn't we create some type of digital agent that could monitor not only the customer's device, but every part of the network infrastructure that led to the customers home? The nuance was that unlike traditional "big data" problems, whereby you need to reduce vast amounts of data into a format a human can comprehend, this type of problem is more akin to signals analysis: you need to process the full fidelity raw data in a stateful way.
If you have 1 million customers, you need demultiplex those 200 million messages into 1 million queries, each query will need to look at the message and ask "does this apply to me?" if so it would process the event and update its quality of service metric, evaluate if that had met a threshold and if so, trigger a new "threshold exceeded" event. Articulating the problem this way sounded a bit like the Actor Model, or perhaps an Rx query.
A few years previously we had worked on an DevOps project for a major UK online retailer. We used Rx to aggregate and disaggregate semantic logging events from all the servers in the data centre. The Rx query we implemented was simple, elegant, and incredibly powerful; a testament to the design of Rx, and a reason we're huge fans of it at endjin.
I wondered if it would be possible to run 1 million Rx queries concurrently, and remembered Bart De Smet has presented a session at NDC 2015 about Cloud Event Processing the previous year, so I found & watched the recording. In that talk he mentioned that he was working in the Bing organisation on an evolution of Rx which powered Cortana experiences, and this involved stateful, durable, high density Rx queries. This sounded very promising indeed!
While researching to see if this technology had been open sourced, I came across a Microsoft case study about "Bing Cortana" (unfortunately no longer public) that mentioned a technology called Reactor, which was used to evaluate 500 million queries per second. If this was the same technology, we only needed 1/500th of that capacity in order to solve our broadband Quality-of-Service monitoring scenario.
I reached out to David Goon, who was our Partner Manager at Microsoft, explained the situation and asked "do you think it would within the realms of possibility for us to get access to this technology?" He responded "It's a really compelling use case, and I can see there being lots more across other sectors. Let me reach out to Bart De Smet and see what he says. We can only try." Within a couple of weeks, we had an hour long call with Bart where we took him through the scenario we had elaborated, and asked if he thought Reactor would be a viable solution. He said that it seemed to be a good fit, and we agreed to start collaborating on a proof of concept.
It's hard to convey what an absolute honour and privilege it has been to work with Bart, and I will be forever grateful for the amount of his incredibly valuable time he dedicated to this endeavour. It goes without saying that without him, absolutely none of what follows would have been possible. Reaqtor is Bart's baby; we've just had the privilege of delivering it into the world.
Once Bart provided us with access to some sample Reactor code, Mike Larah and I paired on a "simple scenario"; an entirely in-memory version of Reactor hosted in a Console App, based on the NDC talk code sample Bart shared with us. We wanted to see if we could solve the broadband anomaly detection problem by processing a stream of RADIUS network telemetry and detecting the specific events that indicate there were problems in the broadband network. In this prototype we modelled the network as a graph, which allowed us to determine if an issue was localised to the customer's home or was occurring at a local (an outage for a street) or regional node level (an outage for a town). It worked, and we were fizzing with excitement.
One of the key points you need to understand is that Reactor is orders of magnitude bigger than Reactive Extensions with over 166 projects, and 90 NuGet packages, and yet it is still a framework, not a platform. While state, durability and reliability are defined in the design of the framework, the implementation is missing, as this is tied to the hosting environment.
Reactor was born into Bing and adopted by M365 and other product teams. Bart had done a fantastic job of ensuring a clean separation of Reactor from the hosting environment, but it meant that we had to tackle the thorny problem of building our own reliable hosting platform, state store, ingress & egress adapters, management & control plane, and workbench for talking to Reactor. We had a mountain to climb.
When it comes to solving hard problems, Ian Griffiths has always been our first port of call; he's been an endjin associate since we founded the company. I knew he was very passionate about Rx as he'd included a chapter about the subject in his Programming C# series of books for O'Reilly, and we'd had many conversations about it. Over a few months we implemented an initial spike of a reliable hosting platform, and had an absolute blast while doing it. Shortly afterwards Ian joined endjin full time as our first Technical Fellow, so that he could continue to work on the endeavour.
Over the next few years, we built several technical spikes exploring different facets of the Reactor technology; we embedded ML.NET models into Reaqtor queries, we created a proof-of-concept anomaly detection temporal query (using virtual time) that could correctly identify broadband quality of service events that indicate outages; processing a days' worth of telemetry (around 40GB) in under 90 seconds. The ability to do "what-if" simulations over historic datasets, which we could then flip a switch and turn into live queries running against realtime data blew us away. I want to express my thanks to Ben Dyer and Alex KeySmith for providing us with the anonymised dataset for this proof of concept - they were early believers.
In May 2018 Matthew Adams, my co-founder at endjin, & I flew over to Seattle for the Microsoft Build conference and we managed to spend our last day of the trip with Bart. In the morning Bart told us the story of how Reactor came to be (this is covered in detail in the "A Little History of Reaqtor" ebook you can download for free on the homepage). Over a fantastic lunch we waxed lyrical about the future of data processing, and Bart talked about his vision for IComputationProcessing
(more about this in the ebook!) and then in the afternoon we talked about whether we could open source Reactor and what that process would entail.
Carmel Eve, who had interned with us the previous year, re-joined endjin on our Apprenticeship Programme. Ian and Carmel started exploring Reactor with other proof of concepts covering different customer scenarios we had identified. Throughout this time Carmel documented her learnings about Rx and writing high performance C# in popular blog post series. The more we investigated Reactor, the more impressed we became.
In March 2019 I attended the MVP Summit in Redmond, and I bumped into Jon Galloway, who at the time was the Executive Director of the .NET Foundation. I gave him as brief a Reactor elevator pitch as I could, and said that endjin would love to become .NET Foundation Corporate Sponsors, as we wanted to demonstrate our commitment to the .NET ecosystem, while we also collaborated with Bart to create whitepapers and a business case in order to justify the investment from Microsoft to carry out the legal processes required to open source Reactor to the .NET Foundation under The MIT License.
By the end of the year we'd filled out the paperwork, paid our sponsorship fee, and were featured on the .NET Foundation homepage alongside: AWS, DevExpress, Microsoft, Octopus Deploy, Telerik, Uno Platform and VMWare.
At this point, we had to face the problem that we had been avoiding for the past 3 years; what do we call it? "Reactor" as a name is too overused. Microsoft Reactor is the name of the Microsoft Community meet up venues around the world. Project Reactor is the name of a JVM project from VMWare, inspired by Rx, and based on Reactive Streams. React is the hugely popular reactive user interface library from Facebook. One of the key concepts that powers Reactor is IQbservable
. We've always loved this weirdly named interface (partly due to how Bart pronounces it - I-Cub-servable
). Carmel came up with the suggestion that we include the "Q" in the name and as soon as Reaqtor was written down, it received unanimous agreement.
This also neatly solved two other naming problems we faced. Reaqtor consists of three conceptual layers; at the bottom a set of reusable components (that can be used independently of Reaqtor) which provide compiler, JSON and LINQ, and high performance / low memory extensions and utilities, which we named Nuqleon. The middle layer contains reactive primitives that evolve many of the concepts found in Rx to enable reliable, stateful and distributed reactive processing, which we named Reaqtive. And finally the top level consists of platform level services for building reliable, stateful, distributed reactive solutions, called Reaqtor.
We started to collaborate with the .NET Foundation to onboard the project. Special thanks to Claire Novotny who helped us set up the DevOps processes on the .NET Foundation infrastructure and all the other bits of administrivia required to prepare the project to be released to the public.
We also worked with Rodney Littles and the .NET Technical Steering Group to elaborate an RFC to modernise the Expression Tree subsystem, as it has not been invested in for many years and does not have parity with the latest language features. These improvements are fundamental to the future of Reaqtor. Fortunately there seems to be some progress in this area as the Entity Framework team would also like this modernisation to occur.
Part of the problem with Expression Trees is that they are one of the most complex and least well documented parts of .NET; they are also perceived only to exist to power LINQ (and LINQ Providers). As you'll see with Reaqtor (an in particular the Nuqleon layer, which is where the Expression Tree & Bonsai subsystem lives), Expression Trees really are one of .NET super powers and enable mind-bending meta-programming scenarios.
One of the final puzzle pieces was how we could create a great documentation experience for the community. Ian has a long history in teaching programming; DevelopMentor, Pluralsight and his O'Reilly books. We've had many conversations about what a good learning environment looks like: the ability to tell a cohesive narrative including text, images, videos and code samples that you can execute in context (to eliminate task switching). Microsoft Docs have done an excellent job in this respect, but how can you offer a similar experience when you are a small open source project?
We initially investigated Try .NET which lead us to talking to Jon Sequeira, Dr Diego Colombo, Maria Naggaga and Brett Forsgren. Initial conversations quickly turned to their current project .NET Interactive, which seemed perfect for our needs, not only for interactive documentation but also as an IDE for writing, testing and running Reaqtor queries.
We soon realised that .NET Interactive was designed around a request/response (REPL) model, but we'd really need to support long running in-process and out of process reactive queries. We collaborated with the team on a design for a client-side command pattern that would allow us to define custom commands and send them between the kernels. Ian implemented the feature and the PR was merged into the .NET Interactive codebase, allowing us to create great documentation and interesting demos, that you can use to understand Reaqtor, Reaqtive & Nuqleon.
The final significant contributor to the project is Felix Corke who designed the branding and creative assets which we have made available under a Creative Commons Attribution Share Alike 4.0 International license, so that the community can use them in blog posts, videos and presentations about Reaqtor. We have also created a Community Presentations repository containing branded, pre-canned presentations and demos using the existing interactive notebooks, so that you can easily do a presentation or lightning talk at your local .NET User Group.
Our journey with Reaqtor only covers a short period in its overall history. In his (free) ebook, A Little History of Reaqtor Bart De Smet tells the full story... starting in 2005. It is an absolutely fascinating account, and I hope by the time you finish reading it, you'll understand why we believe Reaqtor is the most exciting technology in the .NET ecosystem, that it's a game changer, and why we're so excited that you can get your hands on the framework, run the demos, read through the conceptual and API documentation, watch our growing collection of talks, read our blog (and don't forget to subscribe to our RSS feed) or chat to us via Slack. Please ⭐the GitHub Repos, as we'd love to gauge community interest in the project.
There's currently a significant feature gap between the Reaqtor components that have been open sourced and what's required to implement a large scale production Reaqtor platform. That's why we've spent 5 years building reaqtor.cloud a full, commercial cloud-native platform implementation of Reaqtor. We hope you'll join the waiting list for the upcoming private preview.