Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
Tags

Entries in dotnet (15)

Monday
Mar232009

High-perf SelectInParallel in 120 lines of C#

A few months ago at Lokad, we started working on 8-core machines. Multi-core machines need adequate algorithmic design to leverage their processing power; and such a design can be more or less complicated depending of the algorithm that you are trying to parallelize.

In our case, there were many situations where the parallelization was quite straightforward: large loops, all iterations being independents. At that time, PLinq, the parallelization library from Microsoft wasn't still available as a final product (it will be shipped with Visual Studio 2010). Thus, since we were quite in a hurry, we decided to code our own SelectInParallel method (code being provided below). Basically, it's just Select but with a parallel execution for each item being selected.

Although, being surprisingly simple, we found out that, at least for Lokad, SelectInParallel alone was fitting virtually 99% of our multi-core parallelization needs.

Yet, when we did start to try to speed-up algorithms with our first SelectInParallel implementation, we did end-up stuck with poor speed-up ratio at 3x or even 2x where I was expecting near 8x speed-up.

At first, I thought it was an illustration of the Amdahl's law. But a more detailed performance investigation did show I was just plain wrong. The harsh reality was: threads, when not (very) carefully managed, involve a (very) significant overhead.

Our last SelectInParallel implementation is now 120 lines long with a quasi-negligible overhead, i.e. bringing a near linear speed-up with the number of CPU cores on your machine. Yet, this performance wasn't easy to achieve. Let's review two key aspects of the implementation.

Keep your main thread working: In the first implementation, we did follow the naive pattern: start N-threads (N being the number of CPUs), wait for them to finish, collect the results and proceed. Bad idea, if the amount of work happens to be small, then, simply waiting for your threads to start is going to be a performance killer. Instead, you should start N-1 threads, and get your calling thread working right away.

Avoid synchronization altogether: At first, we were using a Producer - Consumer threading pattern. Bad idea again: it produces a lot of locking contention, the work queue becoming the main bottleneck of the process. Instead, an arithmetic trick can be used to let the workers tackle disjoint workset right from the beginning and without any synchronization.

So far, we have been quite satisfied by our 120-lines ersatz to PLinq. Hope this piece of code can help a few other people to get the most of their many-core machines. If you have ideas to improve further the performance of this SelectInParallel implementation, just let me know.

using System;
using System.Threading;

namespace Lokad.Threading
{
///<summary>
/// Quick alternative to PLinq.
///</summary>
public static class ParallelExtensions
{
static int _threadCount = Environment.ProcessorCount;

/// <summary>Get or sets the number of threads to be used in
/// the parallel extensions. </summary>
public static int ThreadCount
{
get { return _threadCount; }
set
{
_threadCount = value;
}
}

/// <summary>Fast parallelization of a function over an array.</summary>
/// <param name="input">Input array to processed in parallel.</param>
/// <param name="func">The action to perform (parameters and all the members should be immutable!!!).</param>
/// <remarks>Threads are recycled. Synchronization overhead is minimal.</remarks>
public static TResult[] SelectInParallel<TItem, TResult>(this TItem[] input, Func<TItem,TResult> func)
{
var results = new TResult[input.Length];

if (_threadCount == 1 || input.Length == 1)
{
for(int i = 0; i < input.Length; i++)
{
results[i] = func(input[i]);
}

return results;
}

// perf: no more thread than items in collection
int threadCount = Math.Min(_threadCount, input.Length);

// perf: start by syncless process, then finish with light index-based sync
// to adjust varying execution time of the various threads.
int threshold = Math.Max(0, input.Length - (int) Math.Sqrt(input.Length) - 2*threadCount);
int workingIndex = threshold - 1;

var sync = new object();

Exception exception = null;

int completedCount = 0;
WaitCallback worker = index =>
{
try
{
// no need for lock - disjoint processing
for(var i = (int) index; i < threshold; i += threadCount)
{
results[i] = func(input[i]);
}

// joint processing
int j;
while((j = Interlocked.Increment(ref workingIndex)) < input.Length)
{
results[j] = func(input[j]);
}

var r = Interlocked.Increment(ref completedCount);

// perf: only the terminating thread actually acquires a lock.
if (r == threadCount && (int)index != 0)
{
lock (sync) Monitor.Pulse(sync);
}
}
catch (Exception ex)
{
exception = ex;
lock (sync) Monitor.Pulse(sync);
}
};

for (int i = 1; i < threadCount; i++)
{
ThreadPool.QueueUserWorkItem(worker, i);
}
worker((object) 0); // perf: recycle current thread

// waiting until completion or failure
while(completedCount < threadCount && exception == null)
{
// CAUTION: limit on wait time is needed because if threads
// have terminated
// - AFTER the test of the 'while' loop, and
// - BEFORE the inner 'lock'
// then, there is no one left to call for 'Pulse'.
lock (sync) Monitor.Wait(sync, 10.Milliseconds());
}

if(exception != null)
{
throw exception;
}

return results;
}
}
}
Tuesday
Aug192008

Creating an auto-update framework with WiX

The WiX does it job at letting you create Microsoft Installer packages (known as.msi files), but it involves it own set of weird peculiarities.

For my uISV, I have designed a minimal auto-update framework for WinForm applications. The spec was the following: the user can click on Check for update, and will optionally be notified if a new version is available. In such event, the user is proposed to upgrade toward the new version. Three clicks total, not bad (counting one click for the UAC notification under Vista).

First, let me introduce a word about minor revisions and major revisions as defined by the Microsoft Installer.

  • A minor revision is a revision that simply upgrade in place the very piece of software.

  • A major revision installs another software version, side-by-side with the old version.

For a simple upgrade framework, we will obviously stick with minor revisions

So far, all this well and this framework should be completely straightforward to do, yet some MSI absurd behavior is making the design much more difficult than it seems.

First, the MSI installer is taking into account the account the MSI file name. Any upgrade attempt with different MSI file name will be considered as a major upgrade. Apparently, this behavior based on some lame and deprecated considerations related to CD-ROM installations.

Thus, when your auto-update framework download the latest MSI package, it must be first be renamed to match the name of the MSI package at install-time. Yet, web browser just happens to frequently renamed downloaded files. For example, when people just download twice the very same file, your MSI package Setup.msi is renamed Setup[1].msi. Not even considering the situation where the users willingly rename the MSI file (why shouldn't they be permitted to do that?).

So you need to get the MSI name out of the Windows registry to figure out what was the file name at installation time. Well, it happens that this information can be found at
HKEY_CURRENT_USER\Software\Microsoft\Installer\Products\PRODUCT-CODE\SourceList\PackageName where PRODUCT-CODE is a token associated to your product.

One could have thought that it would have been natural to use the ProductId as defined in the MSI package (and your WiX project file) as PRODUCT-CODE, but some developers rightly send from hell, decided that it wasn't fun enough.

So, they settled for an arithmetic permutation of the ProductId. That's right, you can actually infer the PRODUCT-CODE by applying a not-too-simple permutation scheme on your original ProductId. For those would do not wish to loose as much time as I did on this matter, here is a PowerShell function to produce the conversion (this code is based on this thread):

# Get-MsiGuid
# By Joannes Vermorel, 2008
# Convert PackageId GUID (as provided in WiX) into ProductCode GUID (as used by MSI).
# Usage: Get-MsiGuid 'FOO-GUID-1234'

function Get-MsiGuid
{

param(
[string] $guidToken = $(throw "Missing: provide the GUID token to be converted.")
)

begin
{
}

process
{
$origGuid = new-object System.Guid $guidToken
$raw = $origGuid.ToString('N')
$aRaw = $raw.ToCharArray()

# compressed format reverses 11 byte sequences of the original guid
$revs = 8, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2

$pos = 0
for( $i = 0; $i -lt $revs.Length; $i++)
{
[System.Array]::Reverse($aRaw, $pos, $revs[$i])
$pos += $revs[$i]
}

# Passing the char array as a single argument
$n = new-object System.String (,$aRaw)
$newGuid = new-object System.Guid $n

# GUID in registry are all caps, ouput formats are N, D, B, P
write-output( $newGuid.ToString('N').ToUpper())
}

end
{
}

}

As a final note, the small auto-update framework is available as open source as part of Lokad Safety Stock Calculator on Sourceforge. The update information is directly grabbed from a PAD file (Portable Application Description).

Friday
Mar212008

Custom error page in ASP.NET when database connectivity is lost.

A particularly annoying, yet frequent, issue for your ASP.NET is the loss of database connectivity. Indeed, if your database is hosted on a separate machine (as it is generally advised for performance), then your web application is subject to database downtime.

Database downtimes have several particularities

  • It generates internal server errors.

  • It's not the type of error that can be fixed by be the developer.

  • The problem tends to get solved by itself (think: reboot of the database server)

  • Errors don't get logged (well, assuming that you are logging errors in the database).

Thus, for my own ASP.NET application, I want to display an error page that invites people to try again at a later time whenever a database downtime occurs. In comparison, if a "real" error is encountered, the error gets logged and the customer is invited to contact the support (although, support is also monitoring server side error logs on its own).

Although, ASP.NET makes it very easy to add a generic error page for all internal errors through the <customErrors/> section in the web.config, it's not that simple to have a dedicated page that is selectively displayed for database connectivity issue. Thus, I have decided to come up with my own HttpModule that catches database connectivity error and performs a custom redirect.

using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Text;
using System.Web;

namespace Lokad.Web.Modules
{
public class DatabaseConnectionErrorModule : IHttpModule
{
public void Init(HttpApplication context)
{
context.Error += new EventHandler(OnError);
}

public void Dispose() { }

protected virtual void OnError(object sender, EventArgs args)
{
HttpApplication application = (HttpApplication) sender;

// The SQL exception might have been wrapped into other exceptions.
Exception exception = application.Server.GetLastError();
while (exception != null && exception as SqlException == null)
{
exception = exception.InnerException;
}

if (exception as SqlException != null)
{
try
{
// HACK: no SqlConnection.TryOpen() method.
// Relying on error numbers seems risky because there are
// different numbers that can reflect a connectivity problem.
using (SqlConnection connection = new SqlConnection("foobar"))
{
connection.Open();
}
}
catch (SqlException)
{
application.Response.Redirect("~/MyError.aspx", true);
}
}
}
}

Finally, add a <httpModules/> section to your web.config, and you're done.

Ps: I have been suggested to use the Global.asax hook. I have discarded this approach, because no matter how I was looking at the problem, Global.asax just looks legacy to me (zero modularity, zero testability, etc...).

Wednesday
Oct312007

Crypt your config files with PowerShell 

ASP.Net 2.0 comes with a convenient native support for configuration file encryption. Yet, things are still not that easy for WinForms, Console applications or Windows Services since the aspnet_regiis.exe utility only supports Web Configuration files.

My own μISV has its share of distributed applications which involve securing a few connection strings over several machines. Securing the connection strings through encryption is not an ultimate defense (if the attacker gains executions rights on the local machine, connection strings will get disclosed anyway), but it can still save you a lot of trouble such as involuntary disclosure.

Download crypt-config.zip

I have found a practical way to solve the issue through PowerShell (see the PowerShell team blog for regular tips), namely two functions crypt-config and decrypt-config. The source code comes as single PSH script contains the function definitions.

To get started, extract the PS1 file from the Zip archive, then

PS docs:\> . ($directory + "\crypt-config.ps1") ;

PS docs:\>crypt-config 'MyConsole.exe' 'section';

PS docs:\>decrypt-config 'MyConsole.exe' 'section';

Typically, section will be replaced by connectionStrings. Note that you do not need to add the .config at the end of the configuration file path.

Wednesday
Aug082007

Securing CruiseControl.Net integration server

CruiseControl.NET is a great open source tool for continuous integration (CI). Yet, the default settings are quite permissive, and unless you're working on an open source project as well, you might prefer restrict the accesses to your sole team. I have found that securing CruiseControl.Net while keeping a developer-friendly environment is not such an easy task. This post is a summary of the various steps needed to secure your CI server. It should work against CCNET 1.2 and 1.3.

Create a dedicated Windows User for CCNET


There is (probably) no reason for your integration process to run as a administrator on your CI server. Running the CI as an administrator is just asking for more trouble if something goes wrong in the build process. First create a dedicated Windows account, I suggest to name it integration for the sake of simplicity. Then, From Start » Administrative Tools » Services, you can change the properties of the running services named CruiseControl.Net server, in the Log On tab. Just define the newly created account to be used for the CCNET service. You will also probably need to grant some Windows directory permission on root integration directory.

Restrict CCNET remoting access to localhost


Unless you're having a farm of build servers with a webserver dedicated to reporting the various build statuses, the CCNET remoting endpoint should not be remotely accessible (yet, that's the default CCNET settings). This behavior can be adjusted by changing the ccnet.exe.config file. Replace the line <channel ref="tcp" port="21234" > with <channel ref="tcp" port="21234" rejectRemoteRequests="true" >. Now, only a local CCNET dashboard instance is able to connect to the CCNET remoting endpoint.

Restrict CCNET Dashboard access to logged users


By default, no access restrictions are put on the CCNET Dashboard. The most simple way of restricting the access to the dashboard panel is to add a windows authentication layer within the ASP.NET application. You can add the following lines to the webdashboard\web.config configuration file to do that:

<authentication mode="Windows" />
<authorization>
  <deny users="?" />
  <allow users="*" />
</authorization>

Re-opening the CCTray status


CCTray does not support any kind of authentication, thus both Remoting and Via Web Dashboard connection methods will fail now that we have purposely put access restrictions. The trick consists in changing again the webdashboard\web.config to allow anonymous access to XmlServerReport.aspx with

<location path="XmlServerReport.aspx">
  <system.web>
    <authorization>
      <allow users="?" />
    </authorization>
  </system.web>
</location>

Then, configure CCTray with Via CruiseControl.Net Dashboard to connect to the URL http://myserver/XmlServerReport.aspx. Note that your build statuses (i.e. "Success" or "Failure") will be publicly available to anybody, yet it's not an issue to disclose such a limited information.