Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
Tags

Entries in Lokad.Cloud (4)

Wednesday
Nov232011

Lokad.Cloud vs Lokad.CQRS, tiny insights about the future

Among the (small) community interested by the software practices of Lokad to develop entreprise software over Windows Azure, Lokad.Cloud vs Lokad.CQRS comes as a recurring question.

It's a good question, and to be entirely honest, the case is not 100% solved even at Lokad

One of the core difficulty to address this question is that Lokad.Cloud and Lokad.CQRS come:

  • from different backgrounds:
    • Lokad.Cloud orginates from the hard-core data analytics back-end.
    • Lokad.CQRS originates from our behavioral apps.
  • with different intents:
    • Lokad.Cloud wants to simplify hard-core distributed algorithmics.
    • Lokad.CQRS wants to provide flexibililty, auditability, extensibility (*).
  • and different philosophies:
    • Lokad.Cloud is a sticky framework, it defines pretty much how your app is architected.
    • Lokad.CQRS is more a NoFramework, precisely designed to minimally impact the app.

(*) without compromising scalability, however scalability is not the primary purpose.

Then, historically, Lokad.Cloud has been developed first (which is a mixed blessing), and, as we have been moving forward, we have started to partition into standalone sub-projects:

  • Lokad.Cloud.Storage, the O/C mapper (object to cloud), dedicated to the interactions with the Azure Storage.
  • Lokad.Cloud.AppHost, an AppDomain isolation layer to enable dynamic assembly loading within Azure Worker roles (aka reboot a VM with new assemblies in 5s instead of 5min). (**)
  • Lokad.Cloud.Provisioning, a toolkit for the Windows Azure Management API.

(**) Lokad.Cloud does not leverage Lokad.Cloud.AppHost yet, it still relyies on a very similar component (which was developed first, and, as such, is not as properly decoupled than AppHost)

Those sub-projects end-up combined into Lokad.Cloud but they can be used independently. Both Lokad.Cloud.AppHost and Lokad.Cloud.Provisioning are fully compatible with Lokad.CQRS.

The case of Lokad.Cloud.Storage is a bit more complicated because Lokad.CQRS because Lokad.CQRS already has its own Azure Storage layer which focuses on CQRS-style storage abstractions. In particular, Lokad.CQRS emphasizes interoperable storage abstractions where the local file storage can be used in place of the cloud storage.

The Future

As far I can speak for Lokad.CQRS (see the projet boss), the project will keep evolving focusing on enterprise software practices, aka not so much what the framework delivers, but rather how it's intended to structure the app. Then, Lokad.CQRS might be completed by:

  • tools at some point such as a maintenance console.
  • refined storage abstractions (probably event-centric ones).

In constrast, Lokad.Cloud will continue its partitioning process to become decoupled and more flexible. In particular,

  • the cloud runtime
  • the service execution strategy

are still very heavily coupled to other concepts within the execution framework, and likely candidates for sub-projects of their own.

Combining Lokad.Cloud and Lokad.CQRS?

I would not advise to combine Lokad.Cloud (execution framework) with Lokad.CQRS within the same app. At Lokad, we don't have any project that adopts this pattern, and the resulting architcture seems fuzzy.

However, if we consider the sub-projects of Lokad.Cloud, then the combination Lokad.CQRS + Lokad.Cloud.AppHost + Lokad.Cloud.Provisioning does make a lot of sense.

Then, it's possible to adopt a SOA architecture where some heavy-duty functional logic gets isolated, behind an API, into the Lokad.Cloud execution framework, while the bulk of the app adopt CQRS patterns through Lokad.CQRS. This pattern has been adopted to some extent at Lokad.

Thursday
Dec032009

O/C mapper for TableStorage 

The Table Service API is the most subtle service provided among the cloud storage services offered by Windows Azure (with also include Blob and Queue Series for now). I did struggle a while to eventually figure out what was the unique specificity of Table Storage from a scalability perspective or rather from a cost-to-scale perspective as the cloud charges you according to your consumption.

Since the scope of the Table Storage remained a fuzzy element for me for a long time, the beta version of Lokad.Cloud does not include (yet) support for Table Storage although. Rest assured that this is definitively part of our roadmap.

TableStorage vs. others

Let's start by identifying the specifics of TableStorage compared to other storage options:

  • Compared to Blob Storage,
    • Table Storage provides a much cheaper fine-grained access to individual bits of information. In terms of I/O costs, Table Storage is up to 100x cheaper than Blob Storage through Entity Group Transaction.
    • Table Storage will (in a near feature) provides secondary indexes while the Blob Storage only provide 1 single hierarchical access to blobs.
  • Compared to SQL Azure,
    • Table Storage lacks about everything you would expect from a relational database. You cannot perform any Join operation or establish a Foreign key relationship and this is very unlikely to be ever available.
    • yet, while SQL Azure is limited to 10GB (this value might increase in the future, this is really not the way to go), Table Storage is expected to be nearly infinitely scalable for its own limited set of operations.

The StorageClient library shipped with Azure SDK is nice as it provides a first layer of abstraction against the raw REST API.  Nevertheless, coding your app directly against the ADO.NET client library seems painful due to the many implementation contraints that comes with the REST API. Further separation of concerns is needed here.

The Fluent NHibernate inspiration

TableStorage has way much less expressivity than relational databases, nonetheless, classical O/R mappers are great source of inspiration, especially nicely designed ones such as NHibernate and its must-have addon Fluent NHibernate.

Although, the mapping entity-to-object isn't that complex in the case of TableStorage, I firmly believe that a proper mapping abstraction ala Fluent NH could considerably ease the implementation of cloud apps.

Among key scenarios that I would like to see addressed by Lokad.Cloud:

  • A seamless management of large entity batches when no atomicity is involved: let's say you want to update 1M entities in your Table Storage. Entity Group can actually reduce I/O costs by 100x. Yet, Entity Group comes with various constraints such as no more than 100 entities per batch, no more than 4MB by operation, ... Fine-tuning I/O from the client app would have to be replicated for every table, it really makes sense to abstract that away.
  • A seamless overflowing management toward the Blob Storage. Indeed, Lokad.Cloud already natively push overflowing queued items toward the Blob Storage. In particular, Table Storage assume than no properties should weight more than 64kb, but manually handling the overflow from the client app seems very tedious (actually a similar feature is already considered for blogs in SQL Azure).
  • A more customizable mapping from .NET type to native property types. The amount of property types supported by the Table Storage is very limited. Although a few more types might be added in the future, Table Storage won't (ever?) be handling native .NET type. Yet, if you have a serializer at hand, problem is no more.
  • A better versioning management as .NET properties may or may not match the entity properties. Fluent NH has an exemplary approach here: by default, match with default rule, otherwise override matching. In particular, I do not want the .NET client code to be carved in stone because of some legacy entity that lies in my Table Storage.
  • Entity access has to be made through indexed properties (ok, for now, there isn't many). With the native ADO.NET, it's easy to write Linq queries that give a false sense of efficiency as if entities can be accessed and filtered against any property. Yet, as data grow, performance is expected to be abysmal (beware of timeouts) unless entities are accessed through their indexes. If data is not expected to grow, then you go for SQL Azure instead, as it's way more convenient anyway.

Any further aspects that should be managed by the O/C mapper? Any suggestion? I will be coming back soon with some more implementation details.

Thursday
Dec032009

Serialization in the cloud: SharedContract vs. SharedType

Every time developers decide not to go for relational databases in cloud apps, they end-up with custom storage formats. In my (limited) experience, that one of the inescapable law of cloud computing.

Hence, serialization plays a very important role in cloud apps either for persistence or for transient computations where input data need to be distributed among several computing nodes.

In the case of Lokad.Cloud, our O/C mapper (object to cloud), our blob storage abstraction relies on seamless serialization. Looking for a serialization solution, we did initially go the quick & dirty way through the BinaryFormatter that has been available since .NET 1.1, that is to say forever in the .NET world.

Binary formatter is easy to setup, but pain lies ahead:

  1. No support for versioning, i.e. what will happen to your data if your code happen to change?
  2. Since it embeds all .NET type info, it's not really compact, even for small datastructure (if you just want to serialize a 1M double array, it's OK though, but that's not the typical situation).
  3. It offers little hope for interoperability of any kind. Even interactions with other distinct .NET Framework versions can be subject to problems.

Robust serialization approach is needed

With the advent of WCF (Windows Communication Foundation), Microsoft teams came up with a much improved vision for serialization. In particular, they introduced two distinct serialization behaviors:

Both serializers produce XML streams but there is a major design gap between the two.

Shared contract assumes that the contract (the schema in the XML terminology) will be available at deserialization time. In essence, it's a static spec while implementation is subject to evolution. Benefits are that versioning, and even performance to some extend, can be expected to be great as the schema is both static and closed.

Shared type, in the other hand, assumes that the concrete .NET implementation will be available at deserialization time. The main benefit of the shared type approach is its expressivity, as basically any .NET object graph can be serialized (object just need to be marked as [Serializable]). Yet, as price to pay for this expressiveness, versioning does suffer.

Serialization and O/C mapper

Our O/C mapper is designed not only to enable persistence (and performance), but also to ease the setup of transient computations to be run over the cloud.

As far persistence is concerned, you really want to go for a SharedContract approach, otherwise data migration from old .NET types to new .NET types is going to heavily mess-up your design through the massive violation of the DRY principle (you would typically need to have old and new types side by side).

Then, for transient computations, SharedType is a much friendlier approach. Indeed, why should you care about data schema and versioning, if you can just discard old data, and re-generate them as part of your migration? That's going to be a lot easier, but outdated data are considered as expendable here.

As a final concern for O/C mapper, it should be noted that CPU is really cheap compared to storage. Hence, you don't want to store raw XML in the cloud, but rather GZipped XML (which comes as a tradeoff CPU vs Storage in the cloud pricing).

The case of Lokad.Cloud

For Lokad.Cloud, we will provide a GZipped XML serializer based on a combination of both the DataContractSerializer and the NetDataContractSerializer to get the best of both worlds. DataContractSerializer will be used by default, but it will be possible switch to NetDataContractSerializer through a simple attribute (idea has been borrowed to Aaron Skonnard).

Thursday
Jul302009

Lokad.Cloud - alpha version released

One of the major little-known weakness of cloud computing is development productivity. Indeed, developing over the cloud ain't easy, and as complexity goes, the management of a complex, fully-distributed app may become a nightmare. At Lokad, as we started migrating a fairly complex technology, we did get the feeling that we were needing strong patterns and practices - tailored for the cloud - so that we don't get lost half-way in the migration process.

That's how Lokad.Cloud was born.

In short, Lokad.Cloud is a framework that can be used to rationalize and speed-up development of back-end apps over Windows Azure. Read more on the announcement made directly on the Windows Azure Forums.