Author

Portrait of Joannes Vermorel

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
Tags

Entries in patterns (10)

Wednesday
Mar102010

You don't know how much you'd miss an O/C mapper till you get one

When we started moving our enterprise app toward Windows Azure, we quickly realized that scalable enterprise cloud apps were tough to develop, real tough.

Windows Azure wasn't at fault here, quite the opposite actually, but the cloud computing paradigm itself is tough to develop enterprise apps. Indeed, scalability in enterprise apps can't be solved by just pilling up tons of memcached servers.

Enterprise apps aren't about scaling out some super-simplistic webapp to a billion users who will be performing reads 99.9% of the time, but rather scaling out complex business logic and accordingly complex business data along.

This lead us to implement Lokad.Cloud, an open source .NET O/C mapper (object-to-cloud) much similar in the spirit to O/R mapper such as NHibernate but tailored for NoSQL storage.

I am proud to announce that Lokad.Cloud has reached its v1.0 milestone.

As a matter of fact, you've probably never heard of O/C mappers, so I will explain why relying a decent O/C mapper should be a primary concern for any ambitious cloud app developer.

To illustrate the point, I am going to list a few subtleties that arise as soon you start using the Queue Storage. As far cloud apps are concerned, Queue Storage is one of the most powerful and most handy abstraction to achieve true scale out behaviors.

Microsoft provides the StorageClient which is basically a .NET wrapper around the REST API offered by the Queue Storage. Let see how an O/C mapper implemented on top of the StorageClient can make queues even better:

  • Strong typed messages: Queue Storage deals with binary messages, not with objects. Obviously, you don't want to entangle your business logic with serialization/deserialization logic. Business logic only cares about the semantic of the processing, not about the underlying data format used for persistence while transiting data over the cloud. The O/C mapper is here to provide a strong typed view of the Queue Storage.
  • Overflowing messages: Queue Storage upper bounds messages to 8kB. This limitation is fine as the Blob Storage is available to deal with large (even gigantic) blobs. Yet again, you don't want to mix storage contingencies (8kB message limit) with your business logic. The O/C mapper lets large message overflow into the Blob Storage.
  • Garbage collection: you might think that manually handling overflowing messages is just fine. Not quite so. What will happen to your overflowing messages, conveniently stored in the Blob Storage, if the queue (for good or ill reasons) happens to be cleared? Simple, you end up with a cloud storage leak: dead piece of data start to pill-up into your storage, and you get charge for it. In order to avoid such situation, you need a cloud garbage collector that makes sure that expired data are automatically collected. The O/C mapper embeds a storage garbage collector.
  • Auto-deletion of messages: Messages should not only be retrieved from the Queue, but also deleted once processed. Following the GC idea, developers should not be expected to delete queue messages when the message processing goes OK, much like you don't have to care about destroying objects getting out of reach. The O/C mapper auto-deletes queue messages upon process completion.
  • Delayed messages: Queue Storage does not offer any simple way to schedule a message to reappear in the queue at a specified time. You can come up with your own custom logic, but again, why should the business logic even bother about such details. The O/C mapper supports delayed messages so that you don't have to think about it.
  • Poisoned queues: that one is deadly subtle one. A poisoned queue message refers to a message that leads to a faulty processing, typically an uncaught exception being thrown by the business logic while trying to process the message. The problem is intricately coupled to the good behavior of the Queue, indeed, if a retrieved message fails to be deleted within a certain amount of time, the message will reappear in the Queue. This behavior is excellent for building robust cloud apps. but deadly if not properly handled. Indeed, faulty messages are going to fail and to reappear over and over, consuming ever increasing cloud resources for no good reason. In a way, poisoned messages represents processing leaks. The O/C mapper detects poisoned messages and isolate them for further investigation and eventual reprocessing once the code is fixed.
  • Abandoning messages: In the clouds, you should not expect VM instances to stay up forever.  In addition to hardware faults, the fabric might decide anytime to shutdown one of your instance.  If a worker instance gets shut down while processing a message, then the processing will be lost until the message reappears in the Queue. Nevertheless, such extra delay might negatively impact your business service level, as an operation that was supposed to take only half a minute might suddenly take 1h (the expiration delay of your message). If the VM gets the chance to be notified of the upcoming shutdown, the O/C mapper abandons in-process messages, making them available for processing again without waiting for expiration.

I have only illustrated here a few point about Queue Storage, but Blob Storage, Table Storage, Management API, Performance Monitoring, ...  also need to rely on higher level abstractions as offered by an O/C mapper such as Lokad.Cloud to become fluently usable.

Don't waste any more time crippling your business logic with cloud contingencies, and start using some O/C mapper. I suggest Lokad.Cloud, but I admit this is biased viewpoint.

Monday
Feb222010

Paging indices vs Continuation tokens

Developers coming from the world of relational databases are well familiar with indexed paging. Paging is rather straightforward:

  • Each row gets assigned a unique integer, starting from 0, and going with +1 increment for each additional row.
  • The query is said to be paged, because constraints specifies that only the row assigned an index greater or equal to N and lower than N+PageSize are retrieved.

I call such as pattern a chunked enumeration: instead of trying to retrieve all the data at once, client app is retrieving chunks of data, potentially splitting a very large enumeration is into a very large number of much small chunks.

Indexed paging is a client-driven process. Indeed, it is the client code (aka the code retrieving the enumerated content) that decides how big each chunk is supposed to be. Indeed, it's the client code that is responsible for incrementally updating indices from one request to the next. In particular, the client code might decide to make a fast forward read on the data.

Although indexed paging is well established pattern, I have found that it's not such a good fit for cloud computing. Indeed, client-driven enumeration is causing several issues:

  • Chunks may be highly heterogeneous in size.
  • Retrieval latency on the server-side might be erratic too.
  • If a chunk retrieval fails (chunk too big), client code has no option but to initiate a tedious trial-and-error process to gradually go for smaller chunks.
  • Chunking optimization is done on the client side, injecting replicated logic into every single client implementation.
  • Fast forward may be completely impractical to implement on the server side.

For those reasons, on the continuation tokens are usually favored in a cloud computing situation. This pattern is simple too:

  1. Request a chunk, passing a continuation token if you have one (not for the first call)
  2. Server returns an arbitrarily sized chunk plus an eventual continuation token.
  3. If no continuation token is retrieved, then the enumeration is finished.
  4. If a token is returned, then go back to 1, passing the token in the call.

Although, this pattern looks similar to the indexed paging, constraints are very different. Continuation tokens are a server-driven process. It's up to the server to decide how much data should be send at each request, which yield many benefits:

  • Chunk size, and retrieval latency can be made much more homogeneous.
  • Server has (potentially) much more local info to wisely choose appropriate chunk sizes.
  • Clients hold no more complex optimization logic for data retrieval.
  • Fast-forward is disabled which leads to a simpler server-side implementation.

Then, there are even more subtle benefits:

  • Better resilience against denial of service. If the server suffer an overload, then, it can optionally delay the retrieval by returning nothing but the current continuation token (a proper server-driven way of saying busy, try again later to the client).
  • Better client resilience to evolution. Indeed, the logic that optimize the chunking process might evolve on the server-side over time, but client code is not impacted and implicitly benefits from those improvements.

Bottom line: unless you specifically want to offer support for fast-forward, you are nearly always better off relying continuation tokens in your distributed computing patterns.

Friday
Sep182009

Azure Management API concerns

Disclaimer: this post is based on my (limited) understanding of the Azure Management API, I did start reading the docs only a few hours ago.

Microsoft has just released the first preview of their Management API for Windows Azure.

As far I understand the content of the newly released API (check the MSDN reference), this API just let you automates what was done manually through the Windows Azure Console so far.

At this point, I have two concerns:

  1. No way to adjust your instance count for a given role.

  2. Auto-management (*) involves loads of quirks.

(*) Auto-Management: the ability for a cloud app to scale itself up and down depending on the workload.

I am not really satisfied by this Management API as it does not seem to address basic requirements to easily scale up or down my (future) cloud app.

Being able to deploy a new azure package programmatically is nice, but we were already doing that in Lokad.Cloud. Thanks to the AppDomain restart trick, I suspect we will keep deploying that way, as the deployment through Lokad.Cloud is likely to be still 100x faster.

That being said, the Management API is powerful, but it does not seem to address auto-management, at least not in a simple fashion.

The single feature I was looking forward was being able to adjust the number of instances on-demand through a very very simple API that would have let me do three things:

  1. Create new instance for the current role.

  2. Shut down current instance.

  3. Get the status of instances attached to the current role.

That's it!

Notice that I am not asking here to deploy a new package, or to change the production/staging status. I just need to be able tweak the instance count.

In particular, I would expect a Non-SSL REST API to do those limited operations, much like the other REST API available for the cloud storage.

Indeed, security concerns related to the instance count management are nearly identical to the ones related to the cloud storage. Well, not really, as in practice securing your storage is way much more sensitive.

Monday
Sep142009

Thinking the Table Storage of Windows Azure

Disclaimer: I am not exactly a Table Storage expert. In this post, I am just trying to sort out my own thoughts about this service offered with Windows Azure. Check my follow-up post.

Soon after the release announcement of the release of our new O/C mapper (object to cloud) named Lokad.Cloud, folks on the Azure Forums raised the question of the Table Storage.

Although it might be surprising, Lokad.Cloud does not provide - yet - any support for Table Storage.

At this point, I feel very uncertain about Table Storage, not in the sense that I do not trust Microsoft to end-up with finely tuned product, but rather at the patterns and practices level.

Basically, the Table Storage is an entity storage that features three special system properties:

  • PartitionKey: a grouping criterion - data having the same PartitionKey being kept close.

  • RowKey: the unique identifier for the entity.

  • Timestamp: the equivalent of Blob Storage ETag.

So far, I got the feeling that many developers feel attracted toward the Table Storage for the wrong reasons. In particular, Table Storage is not a substitute of your old plain SQL tables:

  • No support for transactions.

  • No support for keys (let alone foreign keys).

  • No possible refactoring (properties are frozen at setup).

If you are looking for those features, you're most likely betting on the wrong horse. You should be considering SQL Azure instead.

Then, some might argue that SQL Azure won't scale above 10GB (at least considering the current pricing plans offered by Microsoft). Well, the trick is Table Storage won't scale either, at least not unless you're not very cautious with your queries.

AFAIK, the only indexed column of the Table Storage is the RowKey. Thus, any filtering criterion based on custom entity properties is likely to get abyssal performance as soon your Table Storage get large.

Well, sort of, the most probable scenario is like to to be worse as your queries are just going to timeout after exceeding 60s.

Again, my goal here is not to bash the Table Storage, but it must be understood that the Table Storage is clearly not a magically scalable equivalent of the plain old SQL tables.

Back to Lokad.Cloud, we did not consider adding Table Storage because we did not feel the need either although our forecasting back-end is probably very high in the currently complexity spectrum of the cloud apps.

Indeed, the Blob Storage is surprisingly powerful with very predicable performance too:

  • Storing complex objects is a non-issue with a serializer at hand.

  • A blob name prefix is a very efficient substitute to the PartitionKey.

Basically, it seems to me that any Table Storage operation can be executed with the same performance with the Blob Storage for now. Later on, when the Table Storage will start supporting secondary indexes, this situation is likely to evolve, but meantime I still cannot think a single situation that would definitively support Table Storage over Blob Storage.

Monday
Sep142009

O/C mapper - object to cloud

When we started to port our forecasting technology toward the cloud, we decided to create a new open source project called Lokad.Cloud that would isolate all the pieces of our cloud infrastructure that weren't specific of Lokad.

The project has been initially subtitled Lokad.Cloud - .NET execution framework for Windows Azure, as the primary goal of this project was to provide some cloud equivalent of the plain old Windows Services. We did quickly end-up with QueueServices which happens to be quite handy to design horizontally scalable apps.

But more recently, the project has taken a new orientation, becoming more and more an O/C mapper (object to cloud) inspired by the terminology used by O/R mappers. When it comes to horizontal scaling, a key idea is that data and data processing cannot be considered in isolation anymore.

With classic client-server apps, persistence logic is not supposed to invade your core business logic. Yet, when your business logic happens to become so intensive that it must be distributed, you end-up in a very cloudy situation where data and data processing becomes closely coupled in order to achieve horizontal scalability.

That, being said, close coupling between data and data processing isn't doomed to be an ugly mess. We have found that obsessively object-oriented patterns applied to Blob Storage can made the code both elegant and readable.

Lokad.Cloud is entering its beta stage with the release of the 0.2.x series, check it out.