Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
Tags

Entries in design (15)

Tuesday
Apr052011

A few design tips for your NoSQL app

Since the migration of Lokad toward Windows Azure about 18 months ago, we have been near exclusively relying on NoSQL - namely Blob Storage, Table Storage and Queue Storage. Similar cloud storage abstractions exist for all major cloud providers, you can think of them as NoSQL as a Service.

It took us a significant effort to redesign our apps around NoSQL. Indeed, cloud storage isn't a new flavor of SQL, it's a radically different paradigm and it required in-depth adjustment of the core architecture of our apps.

In this post, I will try to summarize  gotchas we grabbed while (re)designing Lokad.com.

You need an O/C (object to cloud) mapper

NoSQL services are orders of magnitude simpler than your typical SQL service. As a consequence, the impedance mismatch between your object oriented code and the storage dialect is also much lower compared to SQL; this is a direct consequence of the relative lack of expressiveness of NoSQL.

Nevertheless, introducing an O/C mapper was a major complexity buster. At present time, we no more access cloud storage directly, and the O/C mapper layer is a major bonus to abstract away may subtleties such as retry policies, MD5, queue message overflow, ...

Performance is obtained mostly by design

NoSQL is not only simpler but more predictable as well when it comes to performance. However, it does not mean that a solution build on top of NoSQL automatically benefit from scalability and performance - quite the opposite actually.  NoSQL comes with strict built-in limitations. For example, you can't expect more than 20 updates / second on a single blob, which is near ridiculously low compared to its SQL counterpart.

Your design needs to embrace the strengths of NoSQL and be really cautious about not hitting bottlenecks. Good news, those are much easier to spot. Indeed, no later optimization will save your app from abysmal performance if the storage architecture doesn't match dominant I/O patterns of your app (see the Table Storage or the 100x cost factor).

Go for contract-based serializer

A serializer, aka a component that let you turn an arbitrary object graph into a serialized byte stream, is extremely convenient for NoSQL. In particular, it provides a near-seamless way to let your object-oriented code interact with the storage. In many ways, the impedance mismatch objects vs NoSQL is much lower than it was for objects vs SQL.

Although, sometimes, serializers are nearly too powerful. In particular, it's easy to serialize objects part of the runtime which can prove brittle over time. Indeed, upgrading the runtime might end-up breaking your serialization patterns. That's why I advise to go for simple yet explicit contract-based serialization schemes.

Although we did use a lot of XML in our early days on the cloud, we are now migrating away from XML in favor of JSON, Protocol Buffers or adhoc high-density binary encoding that provides better readability vs flexibility vs performance tradeoff in our experience.

Entity isolation is easiest path to versioning

One early mistake of Lokad in our early NoSQL day was apply too much of DRY principle (Don't Repeat Yourself).  Indeed, sharing the same class between entities is a sure way to end-up with painful versioning issues later on.  Indeed, touching entities once data has been serialized with them is always somewhat risky, because you can end-up with data that you can't deserialize any more.

Since the schema evolution required for one entity doesn't necessarily match the evolution of the other entities, you ought to keep them apart upfront. Hence, I suggest to give up on DRY early - when it comes to entities - to ease later evolutions.

With proper design, aka CQRS, needs for SQL drop to near-zero

Over the last two decades, SQL has been king. As a consequence, nearly all apps embed SQL structural assumptions very deep into their architecture, making a relational database an irreplaceable component - by design.

Yet,  we have find out that when the app deeply embraces concepts such as CQRS, event sourcing, domain driven design and task-based UI, then there is no more need for SQL databases.

This aspect was a surprise to us, as we initiated our cloud migration extensively leveraging SQL databases. Now, as we are gaining maturity at developing cloudy apps, we are gradually phasing those databases out: not because of performance or capabilities, simply because they aren't needed anymore.

Friday
Mar212008

Custom error page in ASP.NET when database connectivity is lost.

A particularly annoying, yet frequent, issue for your ASP.NET is the loss of database connectivity. Indeed, if your database is hosted on a separate machine (as it is generally advised for performance), then your web application is subject to database downtime.

Database downtimes have several particularities

  • It generates internal server errors.

  • It's not the type of error that can be fixed by be the developer.

  • The problem tends to get solved by itself (think: reboot of the database server)

  • Errors don't get logged (well, assuming that you are logging errors in the database).

Thus, for my own ASP.NET application, I want to display an error page that invites people to try again at a later time whenever a database downtime occurs. In comparison, if a "real" error is encountered, the error gets logged and the customer is invited to contact the support (although, support is also monitoring server side error logs on its own).

Although, ASP.NET makes it very easy to add a generic error page for all internal errors through the <customErrors/> section in the web.config, it's not that simple to have a dedicated page that is selectively displayed for database connectivity issue. Thus, I have decided to come up with my own HttpModule that catches database connectivity error and performs a custom redirect.

using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Text;
using System.Web;

namespace Lokad.Web.Modules
{
public class DatabaseConnectionErrorModule : IHttpModule
{
public void Init(HttpApplication context)
{
context.Error += new EventHandler(OnError);
}

public void Dispose() { }

protected virtual void OnError(object sender, EventArgs args)
{
HttpApplication application = (HttpApplication) sender;

// The SQL exception might have been wrapped into other exceptions.
Exception exception = application.Server.GetLastError();
while (exception != null && exception as SqlException == null)
{
exception = exception.InnerException;
}

if (exception as SqlException != null)
{
try
{
// HACK: no SqlConnection.TryOpen() method.
// Relying on error numbers seems risky because there are
// different numbers that can reflect a connectivity problem.
using (SqlConnection connection = new SqlConnection("foobar"))
{
connection.Open();
}
}
catch (SqlException)
{
application.Response.Redirect("~/MyError.aspx", true);
}
}
}
}

Finally, add a <httpModules/> section to your web.config, and you're done.

Ps: I have been suggested to use the Global.asax hook. I have discarded this approach, because no matter how I was looking at the problem, Global.asax just looks legacy to me (zero modularity, zero testability, etc...).

Saturday
May122007

Continous migration in software development

New (and soon to be deprecated) technologies are just flowing in the Software industry. Some people pointed out that you can't stop improving your product just to keep the pace with the release flow (that's the fire and motion theory). Yet, being an ISV, your options are quite limited. You have to rely on the latest (yet stable) technologies in order to maintain a highly competitive productivity.

Rewriting from scratch your application to support the latest Foo.NET release is a bad idea; no question asked. Yet, it must be taken into account that

  • getting people interested (worse, training them) on deprecated technologies (let's say Classic ASP) is both hard and depressing.

  • not beneficing from the latest tools means lower productivity. Ex: Classic ASP => ASP.Net 1.1 => ASP.Net 2.0, each new version being a huge time-saver compared to the previous one).

Lokad.com has been existing for less than a year and, we have already performed quite a lot of migrations.

  • SQL Server 2000 => SQL Server 2005

  • ASP.Net website => ASP.Net web application

  • No AJAX => ATLAS (betas) => ASP.Net AJAX Extensions

  • NAnt => MsBuild (when the MsBuild Community Tasks have been released)

  • VS 2005 Setup Project => WiX 2.0

  • Command Line => PowerShell (for our command-line tools)

  • IE6 => IE7 and FF1.5 => FF2.0 (for javascripts and CSS)

Among the next planned migrations

  • Visual Studio 2005 => Orcas

  • WiX 2.0 => WiX 3.0

  • Inline SQL in C# => LINQ

  • NDoc => SandCastle

  • NuSoap => PHP5 Web Services

  • osCommerce 2.2 => osC 3.0 (currently alpha) => osC 3.1 (for the plugin framework)

Our processes at Lokad involve continuous migrations toward new technologies. Upgrading take time and efforts, yet this process seems quite necessary to maintain optimal development conditions.

Monday
Feb122007

What's wrong with PAD files

There are quite a lot of things that are just simply wrong in the IT industry nowadays, I have already discussed the case of the Google Adwords, let's move to the subject of PAD files.

PAD stands for Portable Application Description, it's an XML format designed by the shareware industry to facilitate the submission of software products to software directories. The idea is pretty simple and pretty nice. As a software manufacturer, you create a PAD file for each one of your products; then you publish this PAD file directly on your website. For example, when Lokad did release its first open source product named Lokad Sales Forecasting for ASP.Net, I have created (and submitted) a PAD file for this application.

Submitting through cut-and-paste


Before PAD, you were just manually submitting your product description to every software directory of the web. Now with PAD, you're still submitting your product description to every single software directory on the web; but the submit operation is now (usually) restricted to a single operation: cut-and-pasting the URL of your PAD file. The support for PAD among the shareware/freeware distributor industry is really impressive. I would guess that over 95% of the freeware / shareware industry now supports PAD files.

But the only thing really impressive about PAD is its absolute lack of design.

When the XML design makes no sense


As a software producer, you don't need to manually generate your PAD file, you got a free editor for that. Yet, I don't think I have ever seen an XML schema that is so massively adopted while being so poorly designed.

They are so many issues with PAD that it's actually hard to even summarize the topic. Following a quasi-random order, the main PAD issues would be

  • You need to specify the size of your software in bytes, kilo-bytes AND mega-bytes (File_Size_Byte, File_Size_K AND File_Size_MB). Don't you think that this information is somehow redundant?

  • The requirement description is restricted to OS version. What about required 3rd party software like DirectX or .Net?

  • Open source (or source availability) is not part of the fields; furthermore it is not really possible to use PAD to describe open source software.

  • Software components / library cannot be described. It does not really "fit" the PAD template.

  • The software category field make no sense; a tag based system (think swik.net) would have been some much simpler AND so much more efficient.

  • No (X)HTML support for your description fields. Your software description ends up plain text. As a result, big lump of texts (like the 2000 characters description) are almost totally unreadable.

  • No consistence in XML tags naming
    • some tags are UPPER_CASED

    • some tags are Camel_Cased

    • some tags are explicit Program_System_Requirements

    • some tags are abbreviated Char_Desc_45
  • The localization makes no sense (localizing a software ~ translating the software + adapting to the regional settings)
    • only the Description tag can be localized.

    • not possible to localize the other fields like contact or support emails, like the screenshot.

    • no encoding specified upfront in the XML file.
  • The company address fields makes no sense for non-US locations (State_Province only apply to USA/Canada).

  • Why hard-coding the cost in US Dollars (Program_Cost_Dollars)? There are a lot of currency out-there. Then why not being able to support a price list? (list of currency/value).

  • The Download URL section is just moronic. You can specify up to 4 download URLs (why 4?) and the each URL gets its own special tag with no naming consistency
    • Primary_Download_URL / Secondary_Download_URL / Additional_Download_1 / Additional_Download_2

    • why not simply providing a URL lists?
  • The screenshot section is restricted to a single image URL. Why not a list?

  • No extensive mechanism for the affiliate programs (because the list of affiliate programs is hard-coded).

Note that all those suggestions would have made PAD easier to document, to produce and to consume.

Then some high-level criticisms could also be made

  • no mechanism to link to other PAD files (especially useful to support software versions).

  • no persistent mechanism using Global Identifiers (especially useful to detect replicated PADs).

  • no mechanism to retrieve the PAD files by simply crawling the web (think XML feed links in HTML pages).

Summary: PAD has been designed by junior high kids (probably)


Based on the previous elements, we could say that the PAD authors had no clue about

  • XML design: tag naming is random, data structures like lists are ignored.

  • Web design text readability is not a concern, screenshots are unimportant.

  • The world outside of the USA: utterly naive attempt to support internationalization.

  • Software industry: operating systems are the only components worth to be mentioned.

Still, I think that a Portable Application Description is based on a good idea, but it would really need to be re-designed from scratch.

Monday
Jan292007

Weird consequences of full transaction logs

Let say that you have an ASP.Net 2.0 web application running on top of MSSQL Server 2005. Guess what happen if you database transaction log get full? Well, you will get a large amount of weird side effects, most of them seeming totally unrelated to the saturation of the transaction log.

Among the problems that I have encountered

  • The web services of your website start to send totally misleading error messages like authentication failed.

  • You can not login through web form into your ASP.Net application any more. You will not get any error message, but the login control just tells you that your password is wrong.

  • You look at your error logs (like ELMAH), but nothing gets recorded.

  • You decide to go through the "recover password" (because you're still no suspecting the transaction logs), but actually it fails and no email is sent.

For the note, the following SQL question clears your transaction logs
DUMP TRANSACTION mydatabase WITH NO_LOG