I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)


Entries in azure (26)


Serialization in the cloud: SharedContract vs. SharedType

Every time developers decide not to go for relational databases in cloud apps, they end-up with custom storage formats. In my (limited) experience, that one of the inescapable law of cloud computing.

Hence, serialization plays a very important role in cloud apps either for persistence or for transient computations where input data need to be distributed among several computing nodes.

In the case of Lokad.Cloud, our O/C mapper (object to cloud), our blob storage abstraction relies on seamless serialization. Looking for a serialization solution, we did initially go the quick & dirty way through the BinaryFormatter that has been available since .NET 1.1, that is to say forever in the .NET world.

Binary formatter is easy to setup, but pain lies ahead:

  1. No support for versioning, i.e. what will happen to your data if your code happen to change?
  2. Since it embeds all .NET type info, it's not really compact, even for small datastructure (if you just want to serialize a 1M double array, it's OK though, but that's not the typical situation).
  3. It offers little hope for interoperability of any kind. Even interactions with other distinct .NET Framework versions can be subject to problems.

Robust serialization approach is needed

With the advent of WCF (Windows Communication Foundation), Microsoft teams came up with a much improved vision for serialization. In particular, they introduced two distinct serialization behaviors:

Both serializers produce XML streams but there is a major design gap between the two.

Shared contract assumes that the contract (the schema in the XML terminology) will be available at deserialization time. In essence, it's a static spec while implementation is subject to evolution. Benefits are that versioning, and even performance to some extend, can be expected to be great as the schema is both static and closed.

Shared type, in the other hand, assumes that the concrete .NET implementation will be available at deserialization time. The main benefit of the shared type approach is its expressivity, as basically any .NET object graph can be serialized (object just need to be marked as [Serializable]). Yet, as price to pay for this expressiveness, versioning does suffer.

Serialization and O/C mapper

Our O/C mapper is designed not only to enable persistence (and performance), but also to ease the setup of transient computations to be run over the cloud.

As far persistence is concerned, you really want to go for a SharedContract approach, otherwise data migration from old .NET types to new .NET types is going to heavily mess-up your design through the massive violation of the DRY principle (you would typically need to have old and new types side by side).

Then, for transient computations, SharedType is a much friendlier approach. Indeed, why should you care about data schema and versioning, if you can just discard old data, and re-generate them as part of your migration? That's going to be a lot easier, but outdated data are considered as expendable here.

As a final concern for O/C mapper, it should be noted that CPU is really cheap compared to storage. Hence, you don't want to store raw XML in the cloud, but rather GZipped XML (which comes as a tradeoff CPU vs Storage in the cloud pricing).

The case of Lokad.Cloud

For Lokad.Cloud, we will provide a GZipped XML serializer based on a combination of both the DataContractSerializer and the NetDataContractSerializer to get the best of both worlds. DataContractSerializer will be used by default, but it will be possible switch to NetDataContractSerializer through a simple attribute (idea has been borrowed to Aaron Skonnard).


Live from PDC'09 - it's quite cloudy out there

The first keynote for the PDC'09 ended up minutes ago. Although it was not such a surprise, Azure was pervasive in virtually every talk being made this morning. Azure is definitively a top priority for Microsoft much like Windows or Office.

For my small company, it's very good news because we are banking a lot on this technology. Also, I really like the vision of Microsoft that includes tooling as a core of the cloud experience.

Key news in the session:

  •  Windows Azure won't be in production before Jan 1st.
  • .NET Services finally renamed / federated as AppFabric.
  • Caching layer is now part of AppFabric.
  • VM with various sizes are now available.
  • Configurable VM images in Azure will come in 2010.

Windows Azure deserves a public roadmap

Last week, I had the chance to meet in person with Steve Marx and Doug Hauger, two key people part of the Windows Azure team at Microsoft.

First of all, I have been really pleased, those folks are brilliant. My own little company is betting a lot on Windows Azure. When I tell people (partners, investors, customers) about the amount of work involved to migrate Lokad toward the cloud, the most frequent feedback is that I am expecting way too much from Microsoft, that Lokad is taking way too much risk too rely on unproved Microsoft products, that Microsoft failed many times before, ...

My own belief in that matter is that Microsoft is a large company, with loads of talented people and loads of not so talented people too. Yet it seems clear to me now that Microsoft has gathered a top notch team on Windows Azure, and this alone is a very healthy sign concerning the future of Windows Azure.

In particular, Doug Hauger spend a lot time to explain to me his vision about the future of Windows Azure. Again, it was brilliant. Unfortunately, due to NDA, I won't be able to discuss here the most salient aspects of this roadmap. It's a bit sad because I am pretty sure that most of the Azure community would be thrilled - like I am - if this vision was openly shared.

Among all projects going on at Microsoft, on team that I like a lot is the C# team. In my humble opinion, C# is about one of the finest product ever released by Microsoft; and one thing that I appreciate a lot about the C# team is that they openly discuss their roadmap. C# 4.0 is not even released, and that have already started to discuss features that lies further ahead. If C# is such a good product, I believe it's precisely because every feature get openly discussed so much.

Back to Windows Azure, I think everybody would agree that cloud computing is, as a technology, about several order of magnitude more complex than any programming language (even C#). My own experience - reading questions asked on the Windows Azure Forums - is that many developers still fails to understand the cloud, and keep asking for the wrong features (ex: Remote Desktop). A roadmap would help people to avoid such pitfall, as it would make it much more obvious to see where Azure is heading.

Then, when we started migrating Lokad toward Azure about 6 months ago, we build our architecture upon a lot of guesses about the features that were most likely to be shipped with Windows Azure. So far, we have been really lucky, and Doug Hauger just confirmed me last week loads of things that we were only guesstimating so far. Yet, I would have been 10x more confident in the roadmap had been available from the start. You can't expect people to be that lucky at doing forecasts as a line of business.

The world is vast, and no matter how dedicated is the Azure team, it does not seems reasonable to expect they will be able to spend hours with every partner to enlight them with their secret roadmap. Private roadmaps just don't scale. Considering that Microsoft is a late entrant in the cloud computing market (Amazon EC2 has been in production for more than 2 years), a public disclosure of their roadmap seems unlikely to profit to any competitor (or rather the profit will be very marginal).

In the other hand, an Azure roadmap would heavily profit in very certain ways to all the partners already investing on Windows Azure; plus it would also help convincing other partners that Azure is here to stay, not just cover fire.


Azure Management API concerns

Disclaimer: this post is based on my (limited) understanding of the Azure Management API, I did start reading the docs only a few hours ago.

Microsoft has just released the first preview of their Management API for Windows Azure.

As far I understand the content of the newly released API (check the MSDN reference), this API just let you automates what was done manually through the Windows Azure Console so far.

At this point, I have two concerns:

  1. No way to adjust your instance count for a given role.

  2. Auto-management (*) involves loads of quirks.

(*) Auto-Management: the ability for a cloud app to scale itself up and down depending on the workload.

I am not really satisfied by this Management API as it does not seem to address basic requirements to easily scale up or down my (future) cloud app.

Being able to deploy a new azure package programmatically is nice, but we were already doing that in Lokad.Cloud. Thanks to the AppDomain restart trick, I suspect we will keep deploying that way, as the deployment through Lokad.Cloud is likely to be still 100x faster.

That being said, the Management API is powerful, but it does not seem to address auto-management, at least not in a simple fashion.

The single feature I was looking forward was being able to adjust the number of instances on-demand through a very very simple API that would have let me do three things:

  1. Create new instance for the current role.

  2. Shut down current instance.

  3. Get the status of instances attached to the current role.

That's it!

Notice that I am not asking here to deploy a new package, or to change the production/staging status. I just need to be able tweak the instance count.

In particular, I would expect a Non-SSL REST API to do those limited operations, much like the other REST API available for the cloud storage.

Indeed, security concerns related to the instance count management are nearly identical to the ones related to the cloud storage. Well, not really, as in practice securing your storage is way much more sensitive.


Table Storage or the 100x cost factor

Until very recently, I was a bit puzzled by the Table Storage. I couldn't manage to get a clear understanding how the Table Storage could be a killer option against the Blob Storage.

I get it now: Table Storage can cut your storage costs by 100x.

At outlined by other folks already, I/O costs typically represents more than 10x the storage costs if your objects are weighting less than 6kb (the computation has been done for the Amazon S3 pricing, but the Windows Azure pricing happens to be nearly identical).

Thus, if you happen to have loads of fine grained objects to store in your cloud, say less-than-140-characters tweets for example, you're likely to end-up with an insane I/O bill if you happen to store those fine-grained items in the Blob Storage.

But don't lower your hopes, that's precisely the sort of situations the Table Storage has been designed for, as this service lets you insert/update/delete entities by batches of 100 through Entity Group Transactions.

This fine-grained item orientation is reflected in the limitations that apply to entities:

  • A single entity should not weight more than 1MB.

  • A single group transaction should not weight more than 4MB.

  • A single entity property should not weight more than 64kb.

Situations where loads of small items end-ups being processed - threshold being at 60kb - by your cloud apps are likely to be good candidate for the Table Storage.

We will definitively try to reflect this in our favorite O/C mapper.