I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)


Entries in cloudcomputing (30)


Cloud questions from Syracuse University, NY

A few days ago, I received a couple of questions from a student of Syracuse University, NY who is writing a paper about cloud computing and virtualization. Questions are relatively broad, so I am taking the opportunity to directly post here the answers.

What was the actual technical and business impact of adopting cloud technology?

The technical impact was a complete rewrite of our codebase. It has been the large upgrade ever undertaken by Lokad, and it did span over 18 months, more or less mobilizing the entire dev workforce during the transition.

As far business is concerned, it did imply that most of the business of Lokad during 2010 (the peak of our cloud migration) has been stalled for a year or so. For a young company, 1 year of delay is a very long time. 

On the upside, before the migration to the cloud, Lokad was stuck with SMBs. Serving any mid-large retail network was beyond our technical reach. With the cloud, processing super-large retail networks had become feasible. 

What, if any, negative experience did Lokad encounter in the course of migrating to the cloud?

Back in 2009, when we did start to ramp up our cloud migration efforts, the primary problem was that none of us at Lokad had any in-depth experience of what the cloud implies as software architecture is concerned. Cloud computing is not just any kind of distributed computing, it comes with a rather specific mindset.

Hence, the first obstacle was to figure out by ourselves patterns and practices for enterprise software on the cloud. It has been a tedious journey to end-up with Lokad.CQRS which is roughly the 2nd generation of native cloud apps. We rewrote everything for the cloud once, and then we did it again to get sometime simpler, leaner, more maintainable, etc.

Then, at present time, most our recurring cloud problems come from integrations with legacy pre-Web enterprise software. For example, operating through VPNs from the cloud tends to be a huge pain. In contrast, modern apps that offer REST API are a much more natural fit for cloud apps, but those are still rare in the enterprise.

From your current perspective, what, if anything, would you have done differently?

Tough question, especially for a data analytics company such as Lokad where it can take 1 year to figure out the 100 magic lines of code that will let you outperform the competion. Obviously, if we had to rewrite again Lokad from scratch, it would take us much less time. However it would be dismissing that the bulk of the effort has been the R&D that made our forecasting technology cloud native.

The two technical aspects where I feel we have been hesitating for too long were SQL and SOAP.

  • It took us too long to decide to ditch SQL entirely in favor of some native cloud storage (basically the Blob Storage offered by Windows Azure).
  • SOAP was a somewhat similar case. It took us a long time to give up on SOAP in favor of REST.

In both cases, the problem was that we had (or maybe it was just me) not been fully accepting the extent of the implications of a migration toward the cloud. We remained stuck for months with older paradigms that caused a lot of uneeded frictions. Giving up on those from Day 1 would have save a lot of efforts.


3 features to make Azure developers say "wow"

Wheels that big aren't technically required.The power of the "wow" effect seems frequently under-estimated by analytical minds. Nearly a decade ago, I remember a time when analysts where predicting that the adoption color screens on mobile phones would take a while to take off as color was serving no practical purposes.

Indeed, color screens arrived several years before the widespread bundling of cameras within cell phones. Then, at present day, there are still close to zero mobile phone features that actually require a color screen to work in smooth condition.

Yet, I also remember that the first reaction of practically every single person holding a phone with color screen for the first time was simply: wow, I wan't one too; and within 18 months or so, the world had upgraded from grayscale screens to color screens, nearly without any practical use for color justifying the upgrade (at the time, mobile games were inexistent too).

Windows Azure is a tremendous public cloud, probably one of the finest product released by Microsoft, but frequently I feel Azure is underserved by a few items that trigger something close to an anti-wow effect in the mind of the developer discovering the platform. In those situations, I believe Windows Azure is failing at winning the heart of the developer, fostering adoption out of sheer enthusiam.

No instant VM kick-off

With Azure, you can compile your .NET cloud app as an Azure package - weighting only a few MB - and drap & drop the package as a live app on the cloud. Indeed, on Azure, you don't deploy a bulky OS image, you deploy an app, which is about 100x smaller than a typical OS.

Yet, when booting your cloud app takes a minima 7 mins (according to my own unreliable measurements) to Windows Azure Fabric, even if your app require no more than a single VM to start with.

Here, I believe Windows Azure is missing a big opportunity to impress developers by bringing their app live within seconds. After all - assuming that a VM is ready on standby somewhere in the cloud - starting an average .NET app does not take more than a few seconds anyway.

Granted, there are no business case that absolutely require instant app kick-off, and yet, I am pretty sure that if Azure was capable of that, every single 101 Windows Azure session would start by demoing a cloud deployment. Currently, the 7 mins delay is simply killing any attempt at public demonstration of a Windows Azure deployement. Do you really want to keep your audience waiting for 7 mins? No way.

Worse, I typically avoid demoing Azure to fellow developers out of fear of looking stupid facing waiting for 7 mins until my "Hello World" app gets deployed...

Queues limited to 500 message / sec

One of the most enthusiastic aspect of the cloud is scalability: your app will not need a complete rewrite every time usage increases from a 10x factor. Granted, most apps ever written will never need to scale for the lack of market adoption. From a rational viewpoint, scalability is irrelevant for about 99% of the apps.

Yet, nearly every single developer putting an app in the cloud dreams of being the next Twitter, and thinks (or rather dreams) about the vast scalability challenges that lie ahead.

The Queue Storage offers a tremendous abstraction to scale out cloud apps, sharing the workload over an arbitrarily large amount of machines. Yet, when looking at the fine print, the hope of the developer is crushed when discovering that the supposedly vastly scalable cloud queues can only process 500 messages per second, which is about 1/10th of what MSMQ was offering out of the box on server in 1997!

Yes, queues can be partitioned to spread the worload. Yes, most apps will never reach 500 msg / sec. Yet, as far, I can observe looking at community questions raised by adopters of Lokad.Cloud and Lokad.CQRS (open source libraries targeting Windows Azure), queue throughput is a concern raised by nearly very single developer tackling Windows Azure. This limitation is killing enthusiam.

Again, Windows Azure is missing a cheap opportunity to impress the community. I would suggest to shoot for no less than 1 million messages / second. For the record, Sun was already achieving 3 millions message / sec one on a single quasi-regular server 1 year ago with insane latency constraints. So 1 million is clearly not beyond the reach of the cloud.

Instant cloud metrics visualization

One frequent worry about on-demand pricing is: what if my cloud consumption get out of control? In the Lokad experience, cloud computing consumption is very predictable and thus, a non-issue in practice. Nevertheless, the fear remains, and is probably dragging down adoption rates as well.

What does it take to transform this "circumstance" into marketing weapon? Not that much. It takes a cloud dashboard that reports live your cloud consumption, key metrics being:

  • VM hours consumed for the last day / week / month.
  • GB stored on average for the last day / week / month.
  • GB transferred In and Out ...
  • ...

As it stands, Windows Azure offers a bulky Silverlight console that takes about 20s to load on broadband network connection. Performance is a featurenot having a lightweight dashboard page is a costly mistake. Just think of developers at BUILD discussing their respective Windows Azure consumption over their WP7 phones. With the current setup, it cannot happen.

Those 3 features can be dismissed as anecdotal and irrational, and yet I believe that capturing (relatively) cheap "wow" effect would give a tremendous boost to the Windows Azure adoption rate.


A few design tips for your NoSQL app

Since the migration of Lokad toward Windows Azure about 18 months ago, we have been near exclusively relying on NoSQL - namely Blob Storage, Table Storage and Queue Storage. Similar cloud storage abstractions exist for all major cloud providers, you can think of them as NoSQL as a Service.

It took us a significant effort to redesign our apps around NoSQL. Indeed, cloud storage isn't a new flavor of SQL, it's a radically different paradigm and it required in-depth adjustment of the core architecture of our apps.

In this post, I will try to summarize  gotchas we grabbed while (re)designing

You need an O/C (object to cloud) mapper

NoSQL services are orders of magnitude simpler than your typical SQL service. As a consequence, the impedance mismatch between your object oriented code and the storage dialect is also much lower compared to SQL; this is a direct consequence of the relative lack of expressiveness of NoSQL.

Nevertheless, introducing an O/C mapper was a major complexity buster. At present time, we no more access cloud storage directly, and the O/C mapper layer is a major bonus to abstract away may subtleties such as retry policies, MD5, queue message overflow, ...

Performance is obtained mostly by design

NoSQL is not only simpler but more predictable as well when it comes to performance. However, it does not mean that a solution build on top of NoSQL automatically benefit from scalability and performance - quite the opposite actually.  NoSQL comes with strict built-in limitations. For example, you can't expect more than 20 updates / second on a single blob, which is near ridiculously low compared to its SQL counterpart.

Your design needs to embrace the strengths of NoSQL and be really cautious about not hitting bottlenecks. Good news, those are much easier to spot. Indeed, no later optimization will save your app from abysmal performance if the storage architecture doesn't match dominant I/O patterns of your app (see the Table Storage or the 100x cost factor).

Go for contract-based serializer

A serializer, aka a component that let you turn an arbitrary object graph into a serialized byte stream, is extremely convenient for NoSQL. In particular, it provides a near-seamless way to let your object-oriented code interact with the storage. In many ways, the impedance mismatch objects vs NoSQL is much lower than it was for objects vs SQL.

Although, sometimes, serializers are nearly too powerful. In particular, it's easy to serialize objects part of the runtime which can prove brittle over time. Indeed, upgrading the runtime might end-up breaking your serialization patterns. That's why I advise to go for simple yet explicit contract-based serialization schemes.

Although we did use a lot of XML in our early days on the cloud, we are now migrating away from XML in favor of JSON, Protocol Buffers or adhoc high-density binary encoding that provides better readability vs flexibility vs performance tradeoff in our experience.

Entity isolation is easiest path to versioning

One early mistake of Lokad in our early NoSQL day was apply too much of DRY principle (Don't Repeat Yourself).  Indeed, sharing the same class between entities is a sure way to end-up with painful versioning issues later on.  Indeed, touching entities once data has been serialized with them is always somewhat risky, because you can end-up with data that you can't deserialize any more.

Since the schema evolution required for one entity doesn't necessarily match the evolution of the other entities, you ought to keep them apart upfront. Hence, I suggest to give up on DRY early - when it comes to entities - to ease later evolutions.

With proper design, aka CQRS, needs for SQL drop to near-zero

Over the last two decades, SQL has been king. As a consequence, nearly all apps embed SQL structural assumptions very deep into their architecture, making a relational database an irreplaceable component - by design.

Yet,  we have find out that when the app deeply embraces concepts such as CQRS, event sourcing, domain driven design and task-based UI, then there is no more need for SQL databases.

This aspect was a surprise to us, as we initiated our cloud migration extensively leveraging SQL databases. Now, as we are gaining maturity at developing cloudy apps, we are gradually phasing those databases out: not because of performance or capabilities, simply because they aren't needed anymore.


Telling the difference between cloud and smoke

Returned a few days ago from NRF11. As expected, there were many companies advertising cloud computing, and yet, how disappointing when investigating the case a tiny bit further: it seems that about less than 10% of the companies advertising themselves as cloudy are actually leveraging the cloud.

For 2011, I am predicting there will be a lot of companies disappointed by cloud computing - now apparently widely used a pure marketing buzzword without technological substance to support the claims.
For those of you who might not be too familiar with cloud computing, here is a 3min sanity test to check if an app is cloud-powered or not. Obviously, you also go for a very rigorous in-depth audit, but with this test, you should be able to uncover the vast majority of smoky apps.

1. Is there any specific reason why this app is in the cloud?

Bad answer: we strive to deliver next-generation outstanding software solutions, exceeding customer expectations, blah blah blah, insert here more corporate talk ...
A pair of regular servers - typically a web server plus database server - can handle thousands of concurrent users for non-intensive webapps. This is already a lot more users than what most apps of the market will ever face (remember with a high probability you don't need to scale). So there has to be a compelling reason that justify the cloud beside the very hypothetical scenario to grow faster than Facebook.

2. Is the underlying infrastructure larger than 100k machines?

Bad answer: well, in fact we are just having our own dedicated servers at DediHost Corp Inc (put here the name of regular hoster).
A key aspect of cloud computing is cost reduction through massification. As of 2011, there are still only a handfew cloud providers available, namely: Amazon WS, Google App Engine, Rackspace Cloud, Salesforce and Windows Azure. Make sure to ask which cloud infrastructure is being used. Also, private clouds are no exceptions, it's not because it's "private" that suddenly massification is achieved with 100 servers. It takes more, a lot more, to build a cloud.

3. Can you open an account and get started right from the web, no setup cost?

Bad answer: let's meet and evaluate your requirements first.
Multitenancy is a key aspect to reduce admin costs. In particular, with any reasonable cloud-based architecture there is no reason to have mandatory setup costs (which does not mean that company may not charge some optional onboarding package providing eventually training , dedicated support etc). Setup costs are typically a sign of a non cloud software where each extra deployment takes some amount of gruntwork.

4. Is there a public pricing? Typically indexed on usage or user metrics.

Bad answer: pricing really depends on your company.
For cloud-based apps, there are about zero compeling reason not to have a public pricing. Indeed, cloud costs are highly predicable and strictly based on usage, hence, it makes little sense from a market perspective to go for a customized pricing for each client as it increase sales friction providing no added value for the client.

5. Can two machines failing bring down the app along with them?

Bad answer: we have backups, don't worry.
In the cloud, the app layer should be properly decoupled from the hardware layer. In particular, hardware failures are accounted for and primarily handled by the cloud fabric which reallocate VMs when facing hardware issues. The cloud does not offer better hardware, just a more resilient way to deal with failures. In this respect, setting-up a backup server for every single production server is a very non-cloud approach. First, it doubles the hardware cost, keeping half of the machine idle about 99% of the time, and second, it proves brittle facing Murphy's law, aka 2 machines failing at the same time.


As a final note, it's rather hard to tell the difference between a well-designed SaaS powered by a regular hoster and the same but powered by a cloud. Although, back to point 1, unless there is a reason to need the cloud, it won't make much difference anyway.


A few tips for Web API design

During the iteration in spring 2010 that has lead Lokad to release its Forecasting API v3, I have been thinking a lot about how to design proper Web APIs in this age of cloud computing.

Designing a good Web API is very surprisingly hard. Because Remote Procedure Call has been around forever, one might think that designing API is a well-known established practice, and yet, suffering the defects of Forecasting API v1 and v2, we learned the hard way it wasn't really the case.

In this post, I will try cover some of gotchas that we learned the hard-way about Web API design.

1. Full API stack ownership

When you expose an API on the web, you tend to rely on building-blocks such as XML, SOAP, HTTP, ... You must be prepared to accept and to support a full ownership of this stack: in order words, problems are likely to happen at all levels, and you must be ready to address them even if the problem arises in one of those building blocks.

For example, at Lokad, we initially opted for SOAP, and I now believe it was a mistake from Day 1. There is nothing really wrong with SOAP itself (*), the problem was that the Lokad team (myself included) was not ready to support SOAP to in full. Whenever a client was raising a subtle question some obscure XML namespace or SOAP version matters, we were very far from our comfort area. In short, we were relying on a complex SOAP toolkit which proved to be a leaky abstraction.

When it comes to public API, you have to be ready to support all abstraction leaks from the TCP, HTTP, XML, SOAP... That why, in the end, for our Forecasting API v3, we settled for POX (Plain Old Xml) over HTTP with a basic REST philosophy. Now, when an issue arises, we can trully own the problem along with its solution, instead of being helpless facing a leaky abstraction that we don't have the resource to embrace.

(*) Although there are definitely many gray areas with SOAP (just look at SOAP Exceptions for example).

2. Be minimalistic

Dealing with all the software layers that sit between the client code and the Web API itself is bad enough, if the API itself adds to complexity of its own, the task quickly becomes maddening.

While refactoring our Forecasting API v2 into the Forecasting API v3, I reduced the number of web methods from +20 to 8. Looking back, it was probably one of the best insights I had.

When developing a software library, it is usually convenient for the client developer to benefit from many syntactic sugars that is to say method variants that achieve the purpose following various coding style. For a Web API, the reverse is true: there should be one and only one way to achieve each task.

Redundancy within methods only cause confusion and extra-friction when developing against the API, and while cause an endless stream of questions whether the method X should be favored against the method X* while both achieve essentially the same thing.

3. Idempotence

Network is unreliable. With a low frequency, API calls will fail because of network glitches. The easiest way to deal with those glitches is simply to have retry policies in place: if the call fails, try again. If your API semantic is idempotent, retry policies will integrate seamlessly with your API. If not, each network glitch is likely to wreak havoc within the client app.

For example, coming from a SQL background, most developer might consider having both INSERT (only for new items) and UPDATE (only for existing items) methods. INSERT is a bad choice, because if the method is attempted twice because of a retry, the second call will fail probably triggering some unexpected exception on the client side. Instead you should rather adopt UPSERT, ie UPDATE or INSERT, semantics which play much nicer with retry policies.

4. Explicit input/output limitations

No web request or response should be allowed to be arbitrarily large as it would cause a lot of problem server side to maintain decent performance across concurrent web requests. It means that from Day 1 every message that goes in or out your API is explicitly capped: don't let your users discover your API limitations through reverse engineering. In practice, it means that no array, list or string should be left unbounded: max length should always be specified.

Then, unless you precisely want to move plain text around, I suggest to keep tight limitation on any string being passed on your API. In the Forecasting API, we have opted for the Regex pattern ^[a-zA-Z0-9]{1,32}$ for all string tokens. Obviously, this is rather inflexible, but again, unless your API is intended for plain text / binary data storage there is no point is supporting the whole UTF-8 range which can prove very hard to debug and trigger SQL-injection-like problems.