Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
Tags

Entries in azure (26)

Monday
Aug292016

The sad state of .NET deployments on Azure

One of the core benefit of cloud computing should be ease of deployment. At Lokad, we have been using Azure in production since 2010. We love the platform, and we depend on it absolutely. Yet, it remains very frustrating that .NET deployments have not nearly made enough progress as it could have been expected 6 years ago.

The situation of .NET deployements is reminiscent of the data access re-inventions which were driving Joel Spolsky nuts 14 years ago. Now, Microsoft has shifted its attention to app deployements, and waves of stuff keep rolling in, without really addressing core concerns, and leaving the whole community disoriented.

At present time, there are 6 major ways of deploying a .NET app on Azure:

  • ASM, WebRole and WorkerRole
  • ASM, Classic VM
  • ARM, WebApp
  • ARM, Azure Batch
  • ARM, Azure Functions
  • ARM, VM scale set

ASM stands for Azure Service Manager, while ARM stands for Azure Resource Manager. The ASM gathers the first generation of cloud services on Azure. The ARM gathers the second generation of cloud services on Azure.

ASM to ARM transition is a mess

In Azure, pretty much everything comes in two flavors: the ASM one and the ARM one; even the Blob Storage accounts (equivalent of S3 on AWS). Yet, there are no migration possible. Once you create a resource - a storage account, a VM, etc - on one side, it cannot be migrated to the other side. It's such a headache. Why is the responsibility of the clients to deal with this mess? Most resources should be accessible from both sides, or be "migratable" in a few clicks. Then, whenever, the ASM/ARM distinction is not even relevant (eg. storage accounts), the distinction should not even be visible.

So many ways to do the same thing

It's maddening to think that pretty every service handle .NET deployments in a different way:

  • With WebRole and WorkerRole, you locally produce a package of assemblies (think of it as a Zip archive containing a list of DLLs), and you push this package to Azure.
  • With the Classic VM, you get a fresh barebone OS, and you do your own cooking to deploy.
  • With WebApp, you push the source code through Git, and Azure takes care of compiling and deploying.
  • With Azure Batch, you push your DLLs to the blob storage, and script how those files should be injected/executed in the target VM.
  • With Azure Functions, you push the source code throuhg Git, except that unlike WebApp, this is not-quite-exactly-C#.
  • With the VM scale set, you end up cooking your own OS image that you push to deploy.

Unfortunately, the sanest option, the package option as used for WebRole and WorkerRole, is not even available in the ARM world.

The problem with Git pushes

Many companies - Facebook or Google for example - leverage a single source code repository. Lokad does too now (we transitionned to single repository 2 years ago, it's much better now). While having a large repository creates some challenges, it also make tons of things easier. Deploying through Git looks super cool in a demo, but as soon as your repository reaches hundreds of megabytes, problems arise. As a matter of fact, our own deployments on Azure routinely crashes while our Git repository "only" weights 370MB. By the time our repository reaches 1GB, we will probably have entirely given up on using Git pushes to deploy.

In hindsight, it was expected. The size of the VM needed to compile the app has no relevance to the size of the VM needed to run the app. Plus, the compiling the app may require many software pieces that are not required afterward (do you need your JS minifier to be shipped along with your webapp?). Thus, all in all, deployment through Git push only gets you so far.

The problem with OS management

Computer security is tough. For the average software company, or rather for about 99% of the (software) companies, the only way to ensure a decent security for their apps consists of not managing the OS layer. Dealing with the OS is only asking for trouble. Delegating the OS to a trusted party who knows what she is doing is about the only way not to mess it up, unless you are fairly good yourself; which, in practice, elimitates 99% of the software practionners (myself included).

From this perspective, the Classic VM and the VM scale set feel wrong for a .NET app. Managing the OS has no upside: the app will not be faster, the app is not be more reliable, the app will not have more capabilities. OS management only offers dramatic downsides if you get something wrong at the OS level.

Packages should have solved it all

In retrospect, the earliest deployement method introduced in Azure - the packages used for WebRole and WorkerRole - was really the good apprach. Packages scale well and remain uncluttered by the original size of the source code respository. Yet, for some reason this approach was abandonned on the ARM side. Now, the old ASM design does not offer the most obvious benefits that should have been offered by this approach:

  • The packages could have been made even more very secure: signing and validating packages is straightforward.
  • Deployment could have been super fast: injecting a .NET app into a pre-booted VM is also straightforward.

For demo purposes, it would have been simple enough to have a Git-to-package utility service running with Azure to offer Heroku-like swiftness to small projects, with the possibility to transition naturally to package deployments afterward.

Almost reinventing the packages

Azure Batch is kinda like the package, but without the packaging. It's more like x-copy deployment with file hosted in a Blob Storage. Yet, because it's x-copy, it will be tricky to support any signing mechanisms. Then, looking further ahead, the pattern 1-blob-per-file is near guaranteed to become a performance bottleneck for large apps. Indeed, the Blob Storage offers much better performance at retrieving a 40MB package rather than 10,000 blobs of 4KB each. Thus, for large apps, batch deployments will be somewhat slow. Then, somebody somewhat will start re-inventory the notion of "package" to reduce the number of files ...

With the move toward .NET Core, .NET has never been more awesome, and yet, it could be so much more a clarified vision and technology around deployments.

Wednesday
Jun222011

3 features to make Azure developers say "wow"

Wheels that big aren't technically required.The power of the "wow" effect seems frequently under-estimated by analytical minds. Nearly a decade ago, I remember a time when analysts where predicting that the adoption color screens on mobile phones would take a while to take off as color was serving no practical purposes.

Indeed, color screens arrived several years before the widespread bundling of cameras within cell phones. Then, at present day, there are still close to zero mobile phone features that actually require a color screen to work in smooth condition.

Yet, I also remember that the first reaction of practically every single person holding a phone with color screen for the first time was simply: wow, I wan't one too; and within 18 months or so, the world had upgraded from grayscale screens to color screens, nearly without any practical use for color justifying the upgrade (at the time, mobile games were inexistent too).

Windows Azure is a tremendous public cloud, probably one of the finest product released by Microsoft, but frequently I feel Azure is underserved by a few items that trigger something close to an anti-wow effect in the mind of the developer discovering the platform. In those situations, I believe Windows Azure is failing at winning the heart of the developer, fostering adoption out of sheer enthusiam.

No instant VM kick-off

With Azure, you can compile your .NET cloud app as an Azure package - weighting only a few MB - and drap & drop the package as a live app on the cloud. Indeed, on Azure, you don't deploy a bulky OS image, you deploy an app, which is about 100x smaller than a typical OS.

Yet, when booting your cloud app takes a minima 7 mins (according to my own unreliable measurements) to Windows Azure Fabric, even if your app require no more than a single VM to start with.

Here, I believe Windows Azure is missing a big opportunity to impress developers by bringing their app live within seconds. After all - assuming that a VM is ready on standby somewhere in the cloud - starting an average .NET app does not take more than a few seconds anyway.

Granted, there are no business case that absolutely require instant app kick-off, and yet, I am pretty sure that if Azure was capable of that, every single 101 Windows Azure session would start by demoing a cloud deployment. Currently, the 7 mins delay is simply killing any attempt at public demonstration of a Windows Azure deployement. Do you really want to keep your audience waiting for 7 mins? No way.

Worse, I typically avoid demoing Azure to fellow developers out of fear of looking stupid facing waiting for 7 mins until my "Hello World" app gets deployed...

Queues limited to 500 message / sec

One of the most enthusiastic aspect of the cloud is scalability: your app will not need a complete rewrite every time usage increases from a 10x factor. Granted, most apps ever written will never need to scale for the lack of market adoption. From a rational viewpoint, scalability is irrelevant for about 99% of the apps.

Yet, nearly every single developer putting an app in the cloud dreams of being the next Twitter, and thinks (or rather dreams) about the vast scalability challenges that lie ahead.

The Queue Storage offers a tremendous abstraction to scale out cloud apps, sharing the workload over an arbitrarily large amount of machines. Yet, when looking at the fine print, the hope of the developer is crushed when discovering that the supposedly vastly scalable cloud queues can only process 500 messages per second, which is about 1/10th of what MSMQ was offering out of the box on server in 1997!

Yes, queues can be partitioned to spread the worload. Yes, most apps will never reach 500 msg / sec. Yet, as far, I can observe looking at community questions raised by adopters of Lokad.Cloud and Lokad.CQRS (open source libraries targeting Windows Azure), queue throughput is a concern raised by nearly very single developer tackling Windows Azure. This limitation is killing enthusiam.

Again, Windows Azure is missing a cheap opportunity to impress the community. I would suggest to shoot for no less than 1 million messages / second. For the record, Sun was already achieving 3 millions message / sec one on a single quasi-regular server 1 year ago with insane latency constraints. So 1 million is clearly not beyond the reach of the cloud.

Instant cloud metrics visualization

One frequent worry about on-demand pricing is: what if my cloud consumption get out of control? In the Lokad experience, cloud computing consumption is very predictable and thus, a non-issue in practice. Nevertheless, the fear remains, and is probably dragging down adoption rates as well.

What does it take to transform this "circumstance" into marketing weapon? Not that much. It takes a cloud dashboard that reports live your cloud consumption, key metrics being:

  • VM hours consumed for the last day / week / month.
  • GB stored on average for the last day / week / month.
  • GB transferred In and Out ...
  • ...

As it stands, Windows Azure offers a bulky Silverlight console that takes about 20s to load on broadband network connection. Performance is a featurenot having a lightweight dashboard page is a costly mistake. Just think of developers at BUILD discussing their respective Windows Azure consumption over their WP7 phones. With the current setup, it cannot happen.

Those 3 features can be dismissed as anecdotal and irrational, and yet I believe that capturing (relatively) cheap "wow" effect would give a tremendous boost to the Windows Azure adoption rate.

Wednesday
May112011

Google App Engine becoming much more like Windows Azure?

The latest Google App Engine 1.5 release announcement has some puzzling edges (see also its 1-year ahead roadmap). For once, it seems that Google is making its public cloud evolves to become much more alike another public cloud, namely Windows Azure.

Google App Engine (GAE) was PAAS (Platform as a Service) right from the beginning, much like Windows Azure. Although, GAE had this very distinctive edge where apps were charged against strict CPU usage whereas most other public clouds charge per instance, aka per allocated Virtual Machine. With the latest announcement, it's clear that GAE is transitionning toward a much finer control on VMs, which bring GAE very close to Azure as far client app architecture is concerned.

Then, GAE features the Go programming language and runtime which looks to me like an attempt to get the best of both Java and Python, while giving Google a lot of freedom to push its own innovations. Indeed, Java is controlled by Oracle while Python is controlled by its own foundation. Nothing wrong here, except that Google can't make those languages and runtimes evolve to better fit its cloud platform. It's noticeable that Microsoft initiated such a change nearly a decade ago with C#, as an effort, at the time, to combine the best of Java and C++. The evolution of GAE as a Go-friendly PAAS would make it extremely similar to Azure in its C#-friendly ways.

If GAE trully follows the Azure path, here are a few items that we can expect from GAE in the future:

  • Native IDE (Integrated Development Environmnent) fo Go ala Visual Studio.
  • SQL as a Service (probably some MySQL variant) ala SQL Azure.
  • Caching as a Service ala AppFabric Caching.

Wait and see.

Friday
Nov052010

Big Wish List for Windows Azure - PDC10 update

At Lokad, we have been working with Windows Azure for more than 2 years, received the 1st Windows Azure Award and serving large and small companies through a 100% powered by Windows Azure technology since the commercial availability in Q1 2010.

In my previous Big Wish List for Windows Azure, I was stating that Microsoft was a late entrant in the cloud computing arena. Considering the tremendous efforts that Microsoft has pushed around cloud technologies in 2010, I believe this aspect is no more relevant.

With all the PDC10 announcements and all the improvements delivered in 2010 in Windows Azure, it's time to revisit this list.

Windows Azure

Top priority:

  • (Nice improvement) Faster CPU burst: Compared to 20min VM burst observed at the very beginning of 2010, it now takes about 8min to get the first extra requested VMs. That's a major speed-up already, and I am really looking forward an equivalent improvement in 2011. Then, near real-time VM instanciation would open tons of new possibilities, but it's beyond the strict requirement of Lokad.
  • (Done!) Smaller VMs: Quarter VMs have been announced which is going to very handy for tactical apps.
  • (No update) Per minute CPU billing (but no cloud provider delivers this feature either).
  • (No update) Per-VM termination control.

 Nice to have:

  • (Downgraded) Bandwidth and storage quota: we have now about ~20 cloud apps running at Lokad, and the cloud consumption proves to be very predicable. Hence the need for quota is nearly not as bad as I was expecting almost 1 year ago.
  • (No update) Instance count management through RoleEnvironment.
  • (Downgraded) Geo-relocation of services: After extensive use of Windows Azure, you just get used to chose the right service location from the start.

Overall feedback: even if occasional glitches have been observed after 1 year of services in production of Windows Azure, it clearly proved to be the most stable hosting environment we ever experienced.

SQL Azure

Top priority:

  • (Nice improvement) DB snapshot & restore toward the Blob Storage: the copy feature of SQL Azure is a big step forward. It costs more than a Blob Storage dump, but in practice, VM costs are dwarfing SQL Azure costs anyway.
  • (Interested & surprised) Smaller DB (starting at 100MB for $1 / month): No update on that one, but the announcement of federations for SQL Azure might bring a solution from a fresh angle for multi-tenant apps.
  • (Nice Improvement) Size auto-migration: Still no auto-scaling, but changing the database size can now be done with a tiny SQL command which is really nice.

Nice to have:

  • (Downgraded) Geo-relocation of service: Idem.

Overall feedback: a really distinct feature of the Azure platform. Its near seamless integration with SQL Server proved to be very handy for a couple clients of Lokad to expose their data to Salescast in order to avoid overloading of their on-premise databases with intensive read operations.

Table Storage

Top priority:

  • (NEW) Upsert operation: group entity transactions are provided for insert or update, but neither of those operations are idempotent an important aspect of large scale computations. Hence, it would be really nice if Table Storage was supporting upsert (update or insert) entity transactions, as it facilitates the design of large scale data crunching apps.
  • (NEW) Indexed get-many entity retrieval: if it is possible to update up to 100 entities in a single request, it is not possible to efficiently retrieve 100 entities at once from the same partition while explicitly specifying entity identifiers. Indeed, the get-many request triggers a linear scan of the table partition.
  • (No update) REST level .NET client library.

Nice to have:

  • (No update) Secondary indexes.

Overall feedback: very powerful storage service, still lacking the .NET client library it deserves. Work-around exists for the lack of upsert and get-many operations, but they complicates the client code.

Queue Storage

Nice to have:

  • (NEW) Increase scalability beyond 500 messages / sec: Queues have been announced to be caped at 500 messages / sec which is definitively not large scale. Yet, I am impressed by the attitude of Microsoft in this area: Azurescope being an excellent initiative. In comparison, the SLAs offered by the other cloud providers are rather fuzzy.
  • (No update) Push multiple messages at once.

Overall feedback: Just exactly what you would expect from a FIPFO. The throughput cap at 500 msg / sec is annoying, but not a show stopper. The work-around consists of sharding a single logical queue over multiple Azure queues. It's not too complicated to implement, but it adds extra layers of code.

Blob Storage

Nice to have:

  • (No update) Reverse Blob enumeration.

Overall feedback: Although the most common usage pattern consists of using this storage as a substitute for a classical file system, Lokad mostly uses the Blob Storage for pre-aggregated data chunks - in order to make our data accesses more coarse (handy to improve overall process latencies). It works just fine.

Windows Azure Console

(Big updates coming) At PDC10, Microsoft unveiled an entirely redesigned web console for Windows Azure. I did not get the chance to have a try yet, but I believe big changes (for the better) are coming soon in this area.

New services

Although it was not part of my initial wish list, but the Windows Azure AppFabric Caching (distributed cache) and Windows Azure Virtual Network (IP management) are impressive additions that I am very eager to see in production.

Concerning other services:

  • (Interested & surprised) .NET Role Profiler: Not there yet, but it could come as a latter extension of Intellitrace an impressive addition to .NET that I wasn't even close to expect from Microsoft. I don't think any cloud offer a similar feature at present day.
  • (No update) Map Reduce.

I am eager to see how Windows Azure unfold in 2011. This upcoming year is likely be turning point in terms of widespread adoption of the cloud among traditional software companies (not just Californian startups :-).

Wednesday
Aug112010

Why perfectly reliable storage is not enough

Cloud computing now offers near perfectly reliable storage. Amazon D3 is announcing a 99.999999999% durability and the Windows Azure storage is in the same league.

Yet, perfectly reliable data storage does not prevent data loss - by a long range. It only prevents data loss caused by hardware failure, which nowadays are no more the most frequent cause for losing data.

The primary danger threatening your data is just plain accidental deletion. Yes, it's possible to setup administrative rights and so on to minimize the surface area of potential trouble. But at the end of the road, someone yields sysadmin powers over the data, and this person is just a few clicks away from causing a lot of trouble.

 A long established pattern to avoid those kind of trouble is  automated data snapshots taken on a daily or weekly basis that can be restored when something will go utterly wrong. In the SQL world, snapshots are given as any serious RDBMS do provide snapshotting as a basic feature at present day.

Yet, in the NoSQL world, things aren't that bright, and at Lokad, we realized that such obvious feature was still missing from the Windows Azure Storage.

Thus, today, we are releasing Lokad.Snapshot an open source C#/.NET app targeting the Windows Azure Storage and running on Windows Azure itself. Kudos to Christoph Rüegg, the primary architect of this app. Lokad.Snapshot offers automated snapshots for tables and blobs. In addition, Lokad.Snapshot exposes a Really Simple Monitoring endpoint to be consumed by Lokad.Monitoring.

The Lokad.Snapshot codebase should still be considered as beta, although the app is already in production for our own internal needs at Lokad. If your Azure data isn't snapshotted yet, make sure to have a look at Lokad.Snapshot, it might be a life-saver sooner than expected.