Sequences, LINQ, Rx, & Reaqtor Part 1: Sequences and IEnumerable<T>
To understand Reaqtor, it is necessary to understand Rx (the Reactive Extensions for .NET). And to understand Rx, it is necessary to understand how C# works with sequences of items. In this series I will outline the ideas at the heart of Reaqtor, and how they are handled in C#.
Sequences and IEnumerable<T>
At Reaqtor's heart lies an incredibly simple abstraction: a sequence of items. It's just one damn thing after another, so to speak.
C# has had integrated language support for sequences of items from day one. Perhaps the most obvious example is the array, but that's not the most important: there are many different ways in which one might want to make a sequence of items, and while arrays matter because they are an intrinsic feature of the .NET runtime, there are many reasons you might not always want to use one. For example, .NET arrays have a fixed size. For this reason, we might often instead choose something like List<T>
, a generic type that provides something strongly resembling a resizable array.
But there are often good reasons to use something that does not much resemble a dynamic array. Perhaps you need to support very long sequences with the ability to insert new items in the middle. Array-like structures can't do this efficiently, because if you want to insert an item in the middle (or at the start) of an array, you need to move all the items after the insertion point along by one position to make room for the new item. A linked list, on the other hand, is able to support insertion extremely cheaply. But it is absolutely terrible at finding the nth item, something arrays are extremely good at. This is why we have multiple data structures—there are many ways of representing things, each with its own strengths and weaknesses.
However, it is useful to be able to write code that doesn't need to concern itself with the details of the data structure. For example, if you want to calculate the sum of all the integers in a sequence, it's handy to be able to write code that works independently of the exact representation. And you can in C#—you can write this sort of thing:
int total = 0;
foreach (int i in sequence)
{
total += i;
}
This code can work across all sorts of different representations—it will work with native .NET arrays, or with List<T>
, but also with collections that use radically different data structures such as LinkedList<T>
, HashSet<T>
, or ImmutableList<T>
.
This is possible because the foreach
statement recognizes a pattern embodied by the IEnumerable<T>
interface. All of the aforementioned collection types implement IEnumerable<T>
and can therefore be consumed by a foreach
loop. (Language pedants might point out that IEnumerable<T>
is not strictly a requirement, because the compiler will in fact accept any type that offers the necessary members. I would respond by suggesting said pedants reread the carefully worded first sentence of this paragraph, which accommodates this fact without letting it get in the way of things. Experienced writers will then note that I've blown it by drawing attention to the issue in this parenthetical; I have no defence on that point. In practice, IEnumerable<T>
matters because as we will see, even though the C# foreach
keyword is pretty flexible, many language features insist on this particular interface. And in practice, it's more or less ubiquitous—anything you might want to iterate over with foreach
is likely to implement IEnumerable<T>
.)
An important aspect of IEnumerable<T>
is that it does not offer random access—it doesn't have an indexer enabling you to retrieve, say, the 1000th item. The way it works is that you call its GetEnumerator()
method, which returns an IEnumerator<T>
, which you then ask for items by repeatedly calling MoveNext()
. If MoveNext()
returns false, that means you've reached the end of the sequence. (Sometimes sequences are empty, in which case the very first call to MoveNext()
will return false.) If it returns true, that means a new item is available, which you can then retrieve from the enumerator's Result
property. Once you're done with an enumerator (either because you reached the end, or because you've decided not to fetch all the elements) you call Dispose()
on it. The foreach
keyword generates all this code for you.
This one-at-a-time behaviour can sometimes be a little frustrating for the consumer of an IEnumerable<T>
but it enables considerable variety of implementation. It makes it easy for a linked list to implement, for example.
A more subtle upshot of the nature of IEnumerable<T>
is that it means the sequence of objects doesn't necessarily have to exist in advance. For example, you could write an IEnumerable<int>
that returns numbers produced by the Random
class. Sequences that produce their items on demand like this are sometimes known as 'lazy' because they don't do work until they absolutely have to. Another interesting characteristic of this hypothetical random number generating IEnumerable<T>
is that it never needs to end—you could make its MoveNext()
return false after some number of calls, but you don't have to—you could just hardwire MoveNext()
to return true, because you can carry on producing random numbers indefinitely. This could be described as an 'infinite' sequence. Infinite sequences are necessarily lazy (because any attempt to produce all the items up front would run out of memory), although not all lazy sequences are infinite.
So in summary, IEnumerable<T>
is the normal way in .NET to represent sequences of items. Collection types (List<T>
, LinkedList<T>
etc.) almost invariably implement it, but it's also possible to write lazy implementations that produce items on demand, possibly without end.
In the second part of this series, we'll investigate LINQ.