Things to know before using LINQ Providers (LINQ to something) or PLINQ (Parallel LINQ to objects)
Today I wanted to continue my explanations on extensions methods and why one should know how they work before trying to use LINQ providers or PLINQ. The thing to remind from my previous post is that when it comes to extension methods, method resolution occurs at compile time and not at runtime. We will see today why you should care !
Imagine that you want to execute several CPU-bound time-consuming actions, and you’ve spotted a parallelization opportunity… I’ve tried to keep the sample as simple as it could, so here is a delegate that spin-waits and returns the id of the thread on which it runs :
Func<int, int> spinWaitAndGetThreadId = i => { Thread.SpinWait(100000000); return Thread.CurrentThread.ManagedThreadId; };
If we call that delegate using either a “normal” select or a “PLINQ” select, we can see the difference, in that several managed threads are used in the parallel case (which is obviously why PLINQ has been introduced !).
int[] sequential = Enumerable.Range(0, 16) .Select(spinWaitAndGetThreadId) .ToArray(); int[] parallel = Enumerable.Range(0, 16) .AsParallel() .Select(spinWaitAndGetThreadId) .ToArray(); Assert.IsTrue(sequential.Distinct().Count() == 1); Assert.IsTrue(parallel.Distinct().Count() > 1);
The AsParallel extension method return an instance of ParallelQuery, which also implements IEnumerable. But the next method calls are no longer IEnumerable’s extension methods, but the ParallelQuery’s extension methods.
If any call in the sequence after AsParallel (either instance method, static method or extension method call) returns an IEnumerable, then the extension method calls will no longer be those of ParallelQuery and the computation will become sequential again.
For instance, if we have a method such as this one :
public static IEnumerable<T> NaiveCustomReverse<T>( this IEnumerable<T> source) { Stack<T> stack = new Stack<T>(source); foreach (T item in stack) { yield return item; } }
And we change the previous parallel query to :
int[] parallel = Enumerable.Range(0, 16) .AsParallel() .NaiveCustomReverse() .Select(spinWaitAndGetThreadId) .ToArray();
Then the assertion is false, because everything has run on the same thread.
Post-Scriptum :
Besides PLINQ, I also wanted to give a sample using LINQ to SQL. But while working on the code samples, I’ve also finished reading the e-book built out of the EDULINQ blog series by Jon Skeet. I recommend it to anyone, this has really been a pleasure to read it !
The very last operator implemented in that series is AsEnumerable. It gives exactly the kind of explanation I was hoping to provide ! So I’m not going to compete and I’ll finish with a quote from his blog :
“it’s all about changing the compile-time type of the expression”