Author

Portrait of Joannes Vermorel

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
Tags
Wednesday
May112011

Google App Engine becoming much more like Windows Azure?

The latest Google App Engine 1.5 release announcement has some puzzling edges (see also its 1-year ahead roadmap). For once, it seems that Google is making its public cloud evolves to become much more alike another public cloud, namely Windows Azure.

Google App Engine (GAE) was PAAS (Platform as a Service) right from the beginning, much like Windows Azure. Although, GAE had this very distinctive edge where apps were charged against strict CPU usage whereas most other public clouds charge per instance, aka per allocated Virtual Machine. With the latest announcement, it's clear that GAE is transitionning toward a much finer control on VMs, which bring GAE very close to Azure as far client app architecture is concerned.

Then, GAE features the Go programming language and runtime which looks to me like an attempt to get the best of both Java and Python, while giving Google a lot of freedom to push its own innovations. Indeed, Java is controlled by Oracle while Python is controlled by its own foundation. Nothing wrong here, except that Google can't make those languages and runtimes evolve to better fit its cloud platform. It's noticeable that Microsoft initiated such a change nearly a decade ago with C#, as an effort, at the time, to combine the best of Java and C++. The evolution of GAE as a Go-friendly PAAS would make it extremely similar to Azure in its C#-friendly ways.

If GAE trully follows the Azure path, here are a few items that we can expect from GAE in the future:

  • Native IDE (Integrated Development Environmnent) fo Go ala Visual Studio.
  • SQL as a Service (probably some MySQL variant) ala SQL Azure.
  • Caching as a Service ala AppFabric Caching.

Wait and see.

Tuesday
Apr052011

A few design tips for your NoSQL app

Since the migration of Lokad toward Windows Azure about 18 months ago, we have been near exclusively relying on NoSQL - namely Blob Storage, Table Storage and Queue Storage. Similar cloud storage abstractions exist for all major cloud providers, you can think of them as NoSQL as a Service.

It took us a significant effort to redesign our apps around NoSQL. Indeed, cloud storage isn't a new flavor of SQL, it's a radically different paradigm and it required in-depth adjustment of the core architecture of our apps.

In this post, I will try to summarize  gotchas we grabbed while (re)designing Lokad.com.

You need an O/C (object to cloud) mapper

NoSQL services are orders of magnitude simpler than your typical SQL service. As a consequence, the impedance mismatch between your object oriented code and the storage dialect is also much lower compared to SQL; this is a direct consequence of the relative lack of expressiveness of NoSQL.

Nevertheless, introducing an O/C mapper was a major complexity buster. At present time, we no more access cloud storage directly, and the O/C mapper layer is a major bonus to abstract away may subtleties such as retry policies, MD5, queue message overflow, ...

Performance is obtained mostly by design

NoSQL is not only simpler but more predictable as well when it comes to performance. However, it does not mean that a solution build on top of NoSQL automatically benefit from scalability and performance - quite the opposite actually.  NoSQL comes with strict built-in limitations. For example, you can't expect more than 20 updates / second on a single blob, which is near ridiculously low compared to its SQL counterpart.

Your design needs to embrace the strengths of NoSQL and be really cautious about not hitting bottlenecks. Good news, those are much easier to spot. Indeed, no later optimization will save your app from abysmal performance if the storage architecture doesn't match dominant I/O patterns of your app (see the Table Storage or the 100x cost factor).

Go for contract-based serializer

A serializer, aka a component that let you turn an arbitrary object graph into a serialized byte stream, is extremely convenient for NoSQL. In particular, it provides a near-seamless way to let your object-oriented code interact with the storage. In many ways, the impedance mismatch objects vs NoSQL is much lower than it was for objects vs SQL.

Although, sometimes, serializers are nearly too powerful. In particular, it's easy to serialize objects part of the runtime which can prove brittle over time. Indeed, upgrading the runtime might end-up breaking your serialization patterns. That's why I advise to go for simple yet explicit contract-based serialization schemes.

Although we did use a lot of XML in our early days on the cloud, we are now migrating away from XML in favor of JSON, Protocol Buffers or adhoc high-density binary encoding that provides better readability vs flexibility vs performance tradeoff in our experience.

Entity isolation is easiest path to versioning

One early mistake of Lokad in our early NoSQL day was apply too much of DRY principle (Don't Repeat Yourself).  Indeed, sharing the same class between entities is a sure way to end-up with painful versioning issues later on.  Indeed, touching entities once data has been serialized with them is always somewhat risky, because you can end-up with data that you can't deserialize any more.

Since the schema evolution required for one entity doesn't necessarily match the evolution of the other entities, you ought to keep them apart upfront. Hence, I suggest to give up on DRY early - when it comes to entities - to ease later evolutions.

With proper design, aka CQRS, needs for SQL drop to near-zero

Over the last two decades, SQL has been king. As a consequence, nearly all apps embed SQL structural assumptions very deep into their architecture, making a relational database an irreplaceable component - by design.

Yet,  we have find out that when the app deeply embraces concepts such as CQRS, event sourcing, domain driven design and task-based UI, then there is no more need for SQL databases.

This aspect was a surprise to us, as we initiated our cloud migration extensively leveraging SQL databases. Now, as we are gaining maturity at developing cloudy apps, we are gradually phasing those databases out: not because of performance or capabilities, simply because they aren't needed anymore.

Saturday
Jan222011

Telling the difference between cloud and smoke

Returned a few days ago from NRF11. As expected, there were many companies advertising cloud computing, and yet, how disappointing when investigating the case a tiny bit further: it seems that about less than 10% of the companies advertising themselves as cloudy are actually leveraging the cloud.

For 2011, I am predicting there will be a lot of companies disappointed by cloud computing - now apparently widely used a pure marketing buzzword without technological substance to support the claims.
For those of you who might not be too familiar with cloud computing, here is a 3min sanity test to check if an app is cloud-powered or not. Obviously, you also go for a very rigorous in-depth audit, but with this test, you should be able to uncover the vast majority of smoky apps.
 

1. Is there any specific reason why this app is in the cloud?

Bad answer: we strive to deliver next-generation outstanding software solutions, exceeding customer expectations, blah blah blah, insert here more corporate talk ...
A pair of regular servers - typically a web server plus database server - can handle thousands of concurrent users for non-intensive webapps. This is already a lot more users than what most apps of the market will ever face (remember with a high probability you don't need to scale). So there has to be a compelling reason that justify the cloud beside the very hypothetical scenario to grow faster than Facebook.
 

2. Is the underlying infrastructure larger than 100k machines?

Bad answer: well, in fact we are just having our own dedicated servers at DediHost Corp Inc (put here the name of regular hoster).
A key aspect of cloud computing is cost reduction through massification. As of 2011, there are still only a handfew cloud providers available, namely: Amazon WS, Google App Engine, Rackspace Cloud, Salesforce and Windows Azure. Make sure to ask which cloud infrastructure is being used. Also, private clouds are no exceptions, it's not because it's "private" that suddenly massification is achieved with 100 servers. It takes more, a lot more, to build a cloud.
 

3. Can you open an account and get started right from the web, no setup cost?

Bad answer: let's meet and evaluate your requirements first.
Multitenancy is a key aspect to reduce admin costs. In particular, with any reasonable cloud-based architecture there is no reason to have mandatory setup costs (which does not mean that company may not charge some optional onboarding package providing eventually training , dedicated support etc). Setup costs are typically a sign of a non cloud software where each extra deployment takes some amount of gruntwork.
 

4. Is there a public pricing? Typically indexed on usage or user metrics.

Bad answer: pricing really depends on your company.
For cloud-based apps, there are about zero compeling reason not to have a public pricing. Indeed, cloud costs are highly predicable and strictly based on usage, hence, it makes little sense from a market perspective to go for a customized pricing for each client as it increase sales friction providing no added value for the client.
 

5. Can two machines failing bring down the app along with them?

Bad answer: we have backups, don't worry.
In the cloud, the app layer should be properly decoupled from the hardware layer. In particular, hardware failures are accounted for and primarily handled by the cloud fabric which reallocate VMs when facing hardware issues. The cloud does not offer better hardware, just a more resilient way to deal with failures. In this respect, setting-up a backup server for every single production server is a very non-cloud approach. First, it doubles the hardware cost, keeping half of the machine idle about 99% of the time, and second, it proves brittle facing Murphy's law, aka 2 machines failing at the same time.

 

As a final note, it's rather hard to tell the difference between a well-designed SaaS powered by a regular hoster and the same but powered by a cloud. Although, back to point 1, unless there is a reason to need the cloud, it won't make much difference anyway.

Wednesday
Dec222010

A few tips for Web API design

During the iteration in spring 2010 that has lead Lokad to release its Forecasting API v3, I have been thinking a lot about how to design proper Web APIs in this age of cloud computing.

Designing a good Web API is very surprisingly hard. Because Remote Procedure Call has been around forever, one might think that designing API is a well-known established practice, and yet, suffering the defects of Forecasting API v1 and v2, we learned the hard way it wasn't really the case.

In this post, I will try cover some of gotchas that we learned the hard-way about Web API design.

1. Full API stack ownership

When you expose an API on the web, you tend to rely on building-blocks such as XML, SOAP, HTTP, ... You must be prepared to accept and to support a full ownership of this stack: in order words, problems are likely to happen at all levels, and you must be ready to address them even if the problem arises in one of those building blocks.

For example, at Lokad, we initially opted for SOAP, and I now believe it was a mistake from Day 1. There is nothing really wrong with SOAP itself (*), the problem was that the Lokad team (myself included) was not ready to support SOAP to in full. Whenever a client was raising a subtle question some obscure XML namespace or SOAP version matters, we were very far from our comfort area. In short, we were relying on a complex SOAP toolkit which proved to be a leaky abstraction.

When it comes to public API, you have to be ready to support all abstraction leaks from the TCP, HTTP, XML, SOAP... That why, in the end, for our Forecasting API v3, we settled for POX (Plain Old Xml) over HTTP with a basic REST philosophy. Now, when an issue arises, we can trully own the problem along with its solution, instead of being helpless facing a leaky abstraction that we don't have the resource to embrace.

(*) Although there are definitely many gray areas with SOAP (just look at SOAP Exceptions for example).

2. Be minimalistic

Dealing with all the software layers that sit between the client code and the Web API itself is bad enough, if the API itself adds to complexity of its own, the task quickly becomes maddening.

While refactoring our Forecasting API v2 into the Forecasting API v3, I reduced the number of web methods from +20 to 8. Looking back, it was probably one of the best insights I had.

When developing a software library, it is usually convenient for the client developer to benefit from many syntactic sugars that is to say method variants that achieve the purpose following various coding style. For a Web API, the reverse is true: there should be one and only one way to achieve each task.

Redundancy within methods only cause confusion and extra-friction when developing against the API, and while cause an endless stream of questions whether the method X should be favored against the method X* while both achieve essentially the same thing.

3. Idempotence

Network is unreliable. With a low frequency, API calls will fail because of network glitches. The easiest way to deal with those glitches is simply to have retry policies in place: if the call fails, try again. If your API semantic is idempotent, retry policies will integrate seamlessly with your API. If not, each network glitch is likely to wreak havoc within the client app.

For example, coming from a SQL background, most developer might consider having both INSERT (only for new items) and UPDATE (only for existing items) methods. INSERT is a bad choice, because if the method is attempted twice because of a retry, the second call will fail probably triggering some unexpected exception on the client side. Instead you should rather adopt UPSERT, ie UPDATE or INSERT, semantics which play much nicer with retry policies.

4. Explicit input/output limitations

No web request or response should be allowed to be arbitrarily large as it would cause a lot of problem server side to maintain decent performance across concurrent web requests. It means that from Day 1 every message that goes in or out your API is explicitly capped: don't let your users discover your API limitations through reverse engineering. In practice, it means that no array, list or string should be left unbounded: max length should always be specified.

Then, unless you precisely want to move plain text around, I suggest to keep tight limitation on any string being passed on your API. In the Forecasting API, we have opted for the Regex pattern ^[a-zA-Z0-9]{1,32}$ for all string tokens. Obviously, this is rather inflexible, but again, unless your API is intended for plain text / binary data storage there is no point is supporting the whole UTF-8 range which can prove very hard to debug and trigger SQL-injection-like problems.

 

Friday
Nov052010

Big Wish List for Windows Azure - PDC10 update

At Lokad, we have been working with Windows Azure for more than 2 years, received the 1st Windows Azure Award and serving large and small companies through a 100% powered by Windows Azure technology since the commercial availability in Q1 2010.

In my previous Big Wish List for Windows Azure, I was stating that Microsoft was a late entrant in the cloud computing arena. Considering the tremendous efforts that Microsoft has pushed around cloud technologies in 2010, I believe this aspect is no more relevant.

With all the PDC10 announcements and all the improvements delivered in 2010 in Windows Azure, it's time to revisit this list.

Windows Azure

Top priority:

  • (Nice improvement) Faster CPU burst: Compared to 20min VM burst observed at the very beginning of 2010, it now takes about 8min to get the first extra requested VMs. That's a major speed-up already, and I am really looking forward an equivalent improvement in 2011. Then, near real-time VM instanciation would open tons of new possibilities, but it's beyond the strict requirement of Lokad.
  • (Done!) Smaller VMs: Quarter VMs have been announced which is going to very handy for tactical apps.
  • (No update) Per minute CPU billing (but no cloud provider delivers this feature either).
  • (No update) Per-VM termination control.

 Nice to have:

  • (Downgraded) Bandwidth and storage quota: we have now about ~20 cloud apps running at Lokad, and the cloud consumption proves to be very predicable. Hence the need for quota is nearly not as bad as I was expecting almost 1 year ago.
  • (No update) Instance count management through RoleEnvironment.
  • (Downgraded) Geo-relocation of services: After extensive use of Windows Azure, you just get used to chose the right service location from the start.

Overall feedback: even if occasional glitches have been observed after 1 year of services in production of Windows Azure, it clearly proved to be the most stable hosting environment we ever experienced.

SQL Azure

Top priority:

  • (Nice improvement) DB snapshot & restore toward the Blob Storage: the copy feature of SQL Azure is a big step forward. It costs more than a Blob Storage dump, but in practice, VM costs are dwarfing SQL Azure costs anyway.
  • (Interested & surprised) Smaller DB (starting at 100MB for $1 / month): No update on that one, but the announcement of federations for SQL Azure might bring a solution from a fresh angle for multi-tenant apps.
  • (Nice Improvement) Size auto-migration: Still no auto-scaling, but changing the database size can now be done with a tiny SQL command which is really nice.

Nice to have:

  • (Downgraded) Geo-relocation of service: Idem.

Overall feedback: a really distinct feature of the Azure platform. Its near seamless integration with SQL Server proved to be very handy for a couple clients of Lokad to expose their data to Salescast in order to avoid overloading of their on-premise databases with intensive read operations.

Table Storage

Top priority:

  • (NEW) Upsert operation: group entity transactions are provided for insert or update, but neither of those operations are idempotent an important aspect of large scale computations. Hence, it would be really nice if Table Storage was supporting upsert (update or insert) entity transactions, as it facilitates the design of large scale data crunching apps.
  • (NEW) Indexed get-many entity retrieval: if it is possible to update up to 100 entities in a single request, it is not possible to efficiently retrieve 100 entities at once from the same partition while explicitly specifying entity identifiers. Indeed, the get-many request triggers a linear scan of the table partition.
  • (No update) REST level .NET client library.

Nice to have:

  • (No update) Secondary indexes.

Overall feedback: very powerful storage service, still lacking the .NET client library it deserves. Work-around exists for the lack of upsert and get-many operations, but they complicates the client code.

Queue Storage

Nice to have:

  • (NEW) Increase scalability beyond 500 messages / sec: Queues have been announced to be caped at 500 messages / sec which is definitively not large scale. Yet, I am impressed by the attitude of Microsoft in this area: Azurescope being an excellent initiative. In comparison, the SLAs offered by the other cloud providers are rather fuzzy.
  • (No update) Push multiple messages at once.

Overall feedback: Just exactly what you would expect from a FIPFO. The throughput cap at 500 msg / sec is annoying, but not a show stopper. The work-around consists of sharding a single logical queue over multiple Azure queues. It's not too complicated to implement, but it adds extra layers of code.

Blob Storage

Nice to have:

  • (No update) Reverse Blob enumeration.

Overall feedback: Although the most common usage pattern consists of using this storage as a substitute for a classical file system, Lokad mostly uses the Blob Storage for pre-aggregated data chunks - in order to make our data accesses more coarse (handy to improve overall process latencies). It works just fine.

Windows Azure Console

(Big updates coming) At PDC10, Microsoft unveiled an entirely redesigned web console for Windows Azure. I did not get the chance to have a try yet, but I believe big changes (for the better) are coming soon in this area.

New services

Although it was not part of my initial wish list, but the Windows Azure AppFabric Caching (distributed cache) and Windows Azure Virtual Network (IP management) are impressive additions that I am very eager to see in production.

Concerning other services:

  • (Interested & surprised) .NET Role Profiler: Not there yet, but it could come as a latter extension of Intellitrace an impressive addition to .NET that I wasn't even close to expect from Microsoft. I don't think any cloud offer a similar feature at present day.
  • (No update) Map Reduce.

I am eager to see how Windows Azure unfold in 2011. This upcoming year is likely be turning point in terms of widespread adoption of the cloud among traditional software companies (not just Californian startups :-).

Page 1 ... 2 3 4 5 6 ... 28 Next 5 Entries »