Author

Portrait of Joannes Vermorel

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades,

Meta

Entries in pattern (2)

Wednesday
Mar102010

You don't know how much you'd miss an O/C mapper till you get one

When we started moving our enterprise app toward Windows Azure, we quickly realized that scalable enterprise cloud apps were tough to develop, real tough.

Windows Azure wasn't at fault here, quite the opposite actually, but the cloud computing paradigm itself is tough to develop enterprise apps. Indeed, scalability in enterprise apps can't be solved by just pilling up tons of memcached servers.

Enterprise apps aren't about scaling out some super-simplistic webapp to a billion users who will be performing reads 99.9% of the time, but rather scaling out complex business logic and accordingly complex business data along.

This lead us to implement Lokad.Cloud, an open source .NET O/C mapper (object-to-cloud) much similar in the spirit to O/R mapper such as NHibernate but tailored for NoSQL storage.

I am proud to announce that Lokad.Cloud has reached its v1.0 milestone.

As a matter of fact, you've probably never heard of O/C mappers, so I will explain why relying a decent O/C mapper should be a primary concern for any ambitious cloud app developer.

To illustrate the point, I am going to list a few subtleties that arise as soon you start using the Queue Storage. As far cloud apps are concerned, Queue Storage is one of the most powerful and most handy abstraction to achieve true scale out behaviors.

Microsoft provides the StorageClient which is basically a .NET wrapper around the REST API offered by the Queue Storage. Let see how an O/C mapper implemented on top of the StorageClient can make queues even better:

  • Strong typed messages: Queue Storage deals with binary messages, not with objects. Obviously, you don't want to entangle your business logic with serialization/deserialization logic. Business logic only cares about the semantic of the processing, not about the underlying data format used for persistence while transiting data over the cloud. The O/C mapper is here to provide a strong typed view of the Queue Storage.
  • Overflowing messages: Queue Storage upper bounds messages to 8kB. This limitation is fine as the Blob Storage is available to deal with large (even gigantic) blobs. Yet again, you don't want to mix storage contingencies (8kB message limit) with your business logic. The O/C mapper lets large message overflow into the Blob Storage.
  • Garbage collection: you might think that manually handling overflowing messages is just fine. Not quite so. What will happen to your overflowing messages, conveniently stored in the Blob Storage, if the queue (for good or ill reasons) happens to be cleared? Simple, you end up with a cloud storage leak: dead piece of data start to pill-up into your storage, and you get charge for it. In order to avoid such situation, you need a cloud garbage collector that makes sure that expired data are automatically collected. The O/C mapper embeds a storage garbage collector.
  • Auto-deletion of messages: Messages should not only be retrieved from the Queue, but also deleted once processed. Following the GC idea, developers should not be expected to delete queue messages when the message processing goes OK, much like you don't have to care about destroying objects getting out of reach. The O/C mapper auto-deletes queue messages upon process completion.
  • Delayed messages: Queue Storage does not offer any simple way to schedule a message to reappear in the queue at a specified time. You can come up with your own custom logic, but again, why should the business logic even bother about such details. The O/C mapper supports delayed messages so that you don't have to think about it.
  • Poisoned queues: that one is deadly subtle one. A poisoned queue message refers to a message that leads to a faulty processing, typically an uncaught exception being thrown by the business logic while trying to process the message. The problem is intricately coupled to the good behavior of the Queue, indeed, if a retrieved message fails to be deleted within a certain amount of time, the message will reappear in the Queue. This behavior is excellent for building robust cloud apps. but deadly if not properly handled. Indeed, faulty messages are going to fail and to reappear over and over, consuming ever increasing cloud resources for no good reason. In a way, poisoned messages represents processing leaks. The O/C mapper detects poisoned messages and isolate them for further investigation and eventual reprocessing once the code is fixed.
  • Abandoning messages: In the clouds, you should not expect VM instances to stay up forever.  In addition to hardware faults, the fabric might decide anytime to shutdown one of your instance.  If a worker instance gets shut down while processing a message, then the processing will be lost until the message reappears in the Queue. Nevertheless, such extra delay might negatively impact your business service level, as an operation that was supposed to take only half a minute might suddenly take 1h (the expiration delay of your message). If the VM gets the chance to be notified of the upcoming shutdown, the O/C mapper abandons in-process messages, making them available for processing again without waiting for expiration.

I have only illustrated here a few point about Queue Storage, but Blob Storage, Table Storage, Management API, Performance Monitoring, ...  also need to rely on higher level abstractions as offered by an O/C mapper such as Lokad.Cloud to become fluently usable.

Don't waste any more time crippling your business logic with cloud contingencies, and start using some O/C mapper. I suggest Lokad.Cloud, but I admit this is biased viewpoint.

Monday
Feb222010

Paging indices vs Continuation tokens

Developers coming from the world of relational databases are well familiar with indexed paging. Paging is rather straightforward:

  • Each row gets assigned a unique integer, starting from 0, and going with +1 increment for each additional row.
  • The query is said to be paged, because constraints specifies that only the row assigned an index greater or equal to N and lower than N+PageSize are retrieved.

I call such as pattern a chunked enumeration: instead of trying to retrieve all the data at once, client app is retrieving chunks of data, potentially splitting a very large enumeration is into a very large number of much small chunks.

Indexed paging is a client-driven process. Indeed, it is the client code (aka the code retrieving the enumerated content) that decides how big each chunk is supposed to be. Indeed, it's the client code that is responsible for incrementally updating indices from one request to the next. In particular, the client code might decide to make a fast forward read on the data.

Although indexed paging is well established pattern, I have found that it's not such a good fit for cloud computing. Indeed, client-driven enumeration is causing several issues:

  • Chunks may be highly heterogeneous in size.
  • Retrieval latency on the server-side might be erratic too.
  • If a chunk retrieval fails (chunk too big), client code has no option but to initiate a tedious trial-and-error process to gradually go for smaller chunks.
  • Chunking optimization is done on the client side, injecting replicated logic into every single client implementation.
  • Fast forward may be completely impractical to implement on the server side.

For those reasons, on the continuation tokens are usually favored in a cloud computing situation. This pattern is simple too:

  1. Request a chunk, passing a continuation token if you have one (not for the first call)
  2. Server returns an arbitrarily sized chunk plus an eventual continuation token.
  3. If no continuation token is retrieved, then the enumeration is finished.
  4. If a token is returned, then go back to 1, passing the token in the call.

Although, this pattern looks similar to the indexed paging, constraints are very different. Continuation tokens are a server-driven process. It's up to the server to decide how much data should be send at each request, which yield many benefits:

  • Chunk size, and retrieval latency can be made much more homogeneous.
  • Server has (potentially) much more local info to wisely choose appropriate chunk sizes.
  • Clients hold no more complex optimization logic for data retrieval.
  • Fast-forward is disabled which leads to a simpler server-side implementation.

Then, there are even more subtle benefits:

  • Better resilience against denial of service. If the server suffer an overload, then, it can optionally delay the retrieval by returning nothing but the current continuation token (a proper server-driven way of saying busy, try again later to the client).
  • Better client resilience to evolution. Indeed, the logic that optimize the chunking process might evolve on the server-side over time, but client code is not impacted and implicitly benefits from those improvements.

Bottom line: unless you specifically want to offer support for fast-forward, you are nearly always better off relying continuation tokens in your distributed computing patterns.