Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
Tags

Monday
Mar232009

## High-perf SelectInParallel in 120 lines of C#

A few months ago at Lokad, we started working on 8-core machines. Multi-core machines need adequate algorithmic design to leverage their processing power; and such a design can be more or less complicated depending of the algorithm that you are trying to parallelize.

In our case, there were many situations where the parallelization was quite straightforward: large loops, all iterations being independents. At that time, PLinq, the parallelization library from Microsoft wasn't still available as a final product (it will be shipped with Visual Studio 2010). Thus, since we were quite in a hurry, we decided to code our own SelectInParallel method (code being provided below). Basically, it's just Select but with a parallel execution for each item being selected.

Although, being surprisingly simple, we found out that, at least for Lokad, SelectInParallel alone was fitting virtually 99% of our multi-core parallelization needs.

Yet, when we did start to try to speed-up algorithms with our first SelectInParallel implementation, we did end-up stuck with poor speed-up ratio at 3x or even 2x where I was expecting near 8x speed-up.

At first, I thought it was an illustration of the Amdahl's law. But a more detailed performance investigation did show I was just plain wrong. The harsh reality was: threads, when not (very) carefully managed, involve a (very) significant overhead.

Our last SelectInParallel implementation is now 120 lines long with a quasi-negligible overhead, i.e. bringing a near linear speed-up with the number of CPU cores on your machine. Yet, this performance wasn't easy to achieve. Let's review two key aspects of the implementation.

Avoid synchronization altogether: At first, we were using a Producer - Consumer threading pattern. Bad idea again: it produces a lot of locking contention, the work queue becoming the main bottleneck of the process. Instead, an arithmetic trick can be used to let the workers tackle disjoint workset right from the beginning and without any synchronization.

So far, we have been quite satisfied by our 120-lines ersatz to PLinq. Hope this piece of code can help a few other people to get the most of their many-core machines. If you have ideas to improve further the performance of this SelectInParallel implementation, just let me know.

using System;using System.Threading;namespace Lokad.Threading{    ///<summary>    /// Quick alternative to PLinq.    ///</summary>    public static class ParallelExtensions    {        static int _threadCount = Environment.ProcessorCount;        /// <summary>Get or sets the number of threads to be used in        /// the parallel extensions. </summary>        public static int ThreadCount        {            get { return _threadCount; }            set            {                _threadCount = value;            }        }        /// <summary>Fast parallelization of a function over an array.</summary>        /// <param name="input">Input array to processed in parallel.</param>        /// <param name="func">The action to perform (parameters and all the members should be immutable!!!).</param>        /// <remarks>Threads are recycled. Synchronization overhead is minimal.</remarks>        public static TResult[] SelectInParallel<TItem, TResult>(this TItem[] input, Func<TItem,TResult> func)        {            var results = new TResult[input.Length];            if (_threadCount == 1 || input.Length == 1)            {                for(int i = 0; i < input.Length; i++)                {                    results[i] = func(input[i]);                }                return results;            }            // perf: no more thread than items in collection            int threadCount = Math.Min(_threadCount, input.Length);            // perf: start by syncless process, then finish with light index-based sync            // to adjust varying execution time of the various threads.            int threshold = Math.Max(0, input.Length - (int) Math.Sqrt(input.Length) - 2*threadCount);            int workingIndex = threshold - 1;            var sync = new object();            Exception exception = null;            int completedCount = 0;            WaitCallback worker = index =>            {                try                {                    // no need for lock - disjoint processing                    for(var i = (int) index; i < threshold; i += threadCount)                    {                        results[i] = func(input[i]);                    }                    // joint processing                    int j;                    while((j = Interlocked.Increment(ref workingIndex)) < input.Length)                    {                        results[j] = func(input[j]);                    }                    var r = Interlocked.Increment(ref completedCount);                    // perf: only the terminating thread actually acquires a lock.                    if (r == threadCount && (int)index != 0)                    {                        lock (sync) Monitor.Pulse(sync);                    }                }                catch (Exception ex)                {                    exception = ex;                    lock (sync) Monitor.Pulse(sync);                }            };            for (int i = 1; i < threadCount; i++)            {                ThreadPool.QueueUserWorkItem(worker, i);            }            worker((object) 0); // perf: recycle current thread            // waiting until completion or failure            while(completedCount < threadCount && exception == null)            {                // CAUTION: limit on wait time is needed because if threads                // have terminated                 // - AFTER the test of the 'while' loop, and                // - BEFORE the inner 'lock'                 // then, there is no one left to call for 'Pulse'.                lock (sync) Monitor.Wait(sync, 10.Milliseconds());            }            if(exception != null)            {                throw exception;            }            return results;         }    }}
Tuesday
Jun122007

## Few traps for former C++ developers coding in C#

I have been proofreading a couple of time C# source code written by former C++ developers. Also C# and C++ have a lot in common at the syntax level, the conceptual framework underlying the two languages is really different. This post is a negligible attempt to point out the most common issues.

C# is garbage collected. You should not need any destructor. If you are using destructors, then it's probably wrong 99% of the time. If you have doubts, ignore destructors, it's the way to go with garbage collected languages.

Pre-processor statements are deprecated. Although C# does have statements like #if #else #endif, the pre-processor statements are de-facto a deprecated construct in C#. Like destructors, they are some very specific cases where those statements can be used. Yet, 99% of the time, using those statements would just be a design mistake.

C# comes with native guidelines. Microsoft has published very extensive guidelines about variable, method, class naming. Every single C# line you write is expected to be compliant with those guidelines.

Wednesday
Oct252006

## Let's get cosmetic on C# documentation

I am gathering here just a few C# documentation cosmetic tips that I apply in my own projects. I do not pretend to have achieved any absolute truth or any optimal practice with those tips. I just find them convenient when applied with consistency.

• #region #endregion directives can be great, but caution not to overuse them by putting a dozen of directives within a 200-lines long code file. Indeed, if region makes the code structure more apparent, region also makes the code less proofreadable because you have to open the regions to actually start reading. As a rule of thumb, region directives should only be used for at least 100 lines.

• XML documentation is great too. But again, Visual Studio makes it quite easy to overuse XML documentation. Basically, with 3 strokes you can generate up to a dozen of lines of empty comments. Yet, if those comments are not completed with insightful content then you're just polluting your code. Here, I define insightful as being any comment that goes beyond the information implicitly provided by the method signature itself. Example: if you have a method named Foo.DeleteUser(Guid userId) do not tell me, that userId is the user identifier. I know it because it's obvious. Tell me what happen when no such user exists. Documentation has to be maintained like the rest of the source code. Useless documentation only makes the code harder to maintain.

• In the case of the internal project, wrapper methods (methods that basically forward the call to another method) is the kind of objects where it's actually better not to document. Instead, I prefer using a <seealso/> XML documentation reference. Indeed, documenting a wrapper is pretty much like duplicating your documentation. If this documentation is exposed to your customers, then you do not have much choice, you have to duplicate the docs (MSDN are full of replicates), but for the internal development needs, I found such replication far from being cost-effective. Thus I avoid the docs replication through XML references.
Sunday
Feb052006

## Refactoring and logistics ("L'intendance suivra!")

With Eclipse and VS2005, refactoring is now a standard feature of modern IDEs. No more than few minutes are now sufficient to drastically change the internal structure of a software library. Yet, if software logistics cannot keep the pace then productivity bottlenecks of software evolution remain unchanged. De Gaulle said L'intendance suivra! (which could be poorly translated by "Logistics always keep up!"). Yet many european wars have been lost due to poor logistics, and, back to the discussion, I believe that logistics is no less important in software matters than it is in wars.

By software logistics I mean all processes required to keep the whole thing running when changing the structure of a component; and, in particular, the issues related to upstream and downstream dependencies (deployement issues are left to a later post).

Upstream dependencies include all the components that you are relying on. Well, changing your code does not impact the components you are relying on, right? Wrong. Just consider serialization as a simple counter-example (your code evolves and let all existing data unreadable). Upstream issues are not serialization-specific, the very same problem exists with structured database storage (yes, versioning SQL queries is a pain too). More the generally, I am not aware of any refactoring tool providing any support to tackle data persistence issues. Although version tolerance features do exists in .Net 2.0 (but the framework is still lacking very simple features such as field renaming). I am really looking forward the tools that will (one day) provide persistence-aware refactoring capabilities.

Downstream dependencies include all the components that rely on your's. It's clear that any change you do (at least for any publicly exposed method/object) can break the downstream dependencies. The don't break anything approach is a killing approach in terms of software evolution (the most interesting case being don't fix bugs, our customers rely on them). But on the other hand, assuming that L'intendance suivra! ("we don't care, RTFM!") is not realistic. IMO, the present best practice is to provide an awful lot of documentation to facilitate the upgrade (see Upgrading WordPress for a good example). Yet providing such documentation is expensive and the ROI is low (because contrary to the code that has been improved, such documentation will not be leveraged in future developments). The solution, IMO again, lies in better refactoring tools that include some downstream-awareness. It would be possible to generate some "API signature patch" (the generation being a process as automated as possible) that could be applied by the client on its own code. Well, I am also really looking forward for such kind of tools.

Saturday
Jan072006

## Hungarian notation and thread safety

Joel Spolsky had a very good Making Wrong Code Look Wrong article where he rehabilitates the hungarian notation for certain dedicated purposes (tips: no, hungarian is not about junking your code with variable prefix such as string, int and array). Joel Spolsky presents the idea of prefixing unsafe (i.e. user provided) strings in the context of web-based application with us, standing for unsafe string. Such practice makes a lot of sense in situations where things have to be right by design (security is a typical example because no security holes are going to pop-up against typical non-hostile users).

Beside security, thread safety, for multithreaded applications, is another area that has to be right by design otherwise you are going to hit Heisenbugs. In .Net/C#, the most simple way to deal with thread safety is to rely on lock statements (some people would object that locking is unsafe by design and that transactions/agents/whatever should be used instead, that might be true but this is certainly beyond the scope of this post). Actually, I have found that designing multithreaded applications is not that hard but it requires some strong methodology to avoid things to get wrong by design.

TS level 1 : dedicated locks

A very common beginner mistake at multithreading to re-use some random object to ensure the locking scheme, worst of all, using the this statement.public class Bar{public void FooThreadSafe(){lock(this) { } // bravo, you just shoot your feet}}

Although, it may looks like a 12bytes overhead, a lock requires a dedicated object field in the class.public class Bar{private object fooLock = new Lock();public void FooThreadSafe(){lock(fooLock) { } // much better }}

The majar advantage of the dedicated lock instance approach is documentation. The purpose of dedicated object documentation is to explain which objects should be protected and when (Intellisense-like IDE feature makes this approach even more practical). A secondary advantage is encapsulation. The library end-user (i.e. the intern next door) is not aware of the "dedicated lock instance" stuff and might re-use whatever lockYXZ variable available for his own dark purposes. Don't offer him the chance to do that, mark the field as private.

TS level 2 : hungarian locks

Beside unprotected (concurrent) accesses, the second most important issue in multithreaded applications is deadlocks. Against deadlocks, there is only one known solution (to my knowledge) : complete and permanent ordering of the locks. In other words, locks should always be taken in same order. You might think, "Ok, let's just add a line in the dedicated lock object documentation to specify that." Well, I have tried, it just does not work. Because, over time, your application evolves, the intern next-door don't care about your documentation, new locks are added to fit some other requirements and the initial complete ordering is no more (although it should, but developpers are just so-not-failproof humans). The problem with those deadlocks is that you have no way to detect them just by looking at a piece of code. You have to look at the whole application code to ensure consistency. Here comes the hungarian rescue : the hungarian dedicated lock naming convention.

The locks should be taken in the order specified by the hungarian prefixes.

Let's look at a small example, if you have two dedicated lock instances called lock1Bar and lock2Foo (lockX being the hungarian prefix), then you know that lock1Bar should be taken before lock2Foo. Any exception to this rule is a guaranteed-or-reimbursed deadlock. Additionally, refactoring tools make it so easy to rename all your variables as many time as you need that there is really no practical obstacle to implement such policy.