Entries in azure (26)
Yesterday, I had the chance to meet Eric Rudder, Senior Vice President at Microsoft for nearly 1h30, along with three of my students who actively contributed to the Sqwarea project, an open source C# game designed for Windows Azure.
Eric proposed internships at MS Research after about 45min of discussion (really nice since CS students are expected to make a research internship in the US in their 2nd year at the ENS).
(From left to right: Joannes Vermorel, Ludovic Patey, Robin Morisset, Eric Rudder, Fabrice Ben Hamouda)
Special thanks to Thomas Serval and Pierre-Louis Xech who made this meeting possible in the first place.
Beyond running a small software company, I am also responsible for the Sofware Engineering and Distributed Computing course at the ENS Paris. For the fourth year in a row, Microsoft offered gracious support for this course (include some Windows Azure resources).
Every year, a small dozen of 1st year Computer Science students take over a sofware project. Last year, my students produced Clouster, a scalable clustering algorithm on top of Windows Azure. It was already significant achievement considering the beta status of Windows Azure at the time (student upgraded twice from a SKD version to another during the time of the course).
This year, my students went (*) for an online massively multiplayer strategy game named Sqwarea (heavy contraction of square+war+area).
You are a King battling over a gigantic map to conquer the world. Train soldiers, conquer new territories, and resist the assault of other kingdoms. The world is flat, see for yourself.
Despite my teaching methods, students managed to do really great (especially considering that we are only at 2/3 of the project at this point of time), so let's review a few salient facts about this project:
- Open source, see sqwarea.codeplex.com
- ASP.NET MVC, C#, jQuery, OpenId for the front-end.
- Lokad.Cloud for the persistence, and back-end execution framework.
- Windows Azure used as the hoster.
- Table Storage for the persistence (1 entity per map square).
- Queue Storage to spread the workload among VMs.
Then, in order to make sure, project wasn't going to be easy, I included a game rule real hard to implement:
People and soldiers have to be constantly reminded who is the King; otherwise, they just do it their own way. If, after a conquest, a part of your kingdom is no more connected to your King through a path of controlled land squares, then the disconnected area is reverted as neutral.
Apparently, students managed to implement a good (and expectedly complicated) scheme to get it this connectivity rule working in a very scalable way.
(*) Actually, every year, I choose the project to be carried on by my students. Hence, if you think the project idea is lame, blame me.
About two months ago, when Mike Wickstrand setup a UserVoice instance for Windows Azure, I immediately posted my own suggestion concerning MapReduce. MapReduce is a distributed computing concept initially published by Google late 2004.
Against all odds, my suggestion, driven by the needs of Lokad, made it into the Top 10 most requested features for Windows Azure (well, 9th rank and about 20x times less voted than the No1 request for scaled down hosting).
Lately, I had the opportunity to discuss more with folks at Microsoft gathering market feedback on this item. In software business, there is frequent tendency for users to ask for features they don't want in the end. The difficulty being that proposed features may or may not correctly address initial problems.
Preparing the interview, I realized that, to some extend, I had fallen for the same trap when asking for MapReduce. Actually, we have already reimplemented our own MapReduce equivalent, which is not that hard thanks to the Queue Storage.
I care very little about framework specifics, may it be MapReduce, Hadoop, DryadLinq or something not-invented-yet. Lokad has no cloud legacy calling for a specific implementation.
What I do care about is much simpler. In order to deliver truckloads of forecasts, Lokad needs :
- large scale CPU
- burstable CPU
- low cost CPU
Windows Azure is already doing a great job addressing Point 1. Thanks to the massive Microsoft investments on Azure datacenters, thousands of VMs can already be instantiated if needed.
When asking for MapReduce, I was instead expressing my concern for Point 2 and Point 3. Indeed,
- Amazon MapReduce offers 5x cheaper CPU compared to classical VM-based CPU.
- VM-based CPU is not very burstable: it takes minutes to spawn a new VM, not seconds.
Then, low-cost CPU is somehow conflicting with burstable CPU, as illustrated by the Reserved Instances pricing of Amazon.
As far low-level cloud computing components are concerned, lowering costs usually mean giving up on expressiveness as a resulting trade-off:
- Relational DB at $10/GB too expensive? Go for NoSQL storage at $0.1/GB, much cheaper, but much weaker as far querying capabilities are concerned.
- Guaranteed VMs too expensive? Go for Spot VMs, price is lower on average but you've have no more certainties about either the price or the availability of VMs.
- Latency of cloud storage too high? Go for CDN, latency is much better for reads, yet, much worse for writes.
Seeking large scale burstable CPU, here are the list of items that we would be very willing to surrender in order to lower the CPU pricing:
- No need for local storage. VM comes with 250GB hard-drive, which we typically don't need.
- No need for 2GB of memory. Obviously, we still need a bit of memory but 512MB would be fine.
- No need for any level of access to the OS
- Runtime could be made .NET only, and restricted to safe IL (which would facilitate code sandboxing).
- No need for generic network I/O. Contrained accesses to specific Tables / Queues / Containers would be fine. This would facilitate colocation of storage and CPU.
- No need for geolocalized resources. Cloud can push wherever CPU is available. Yet, we would expect no to be charged from bandwidth that happens between cloud data centers (if the transfer is caused by offsite computations).
- No need for fixed pricing. Prioritization of requests based on a variable pricing would be fine (considering that the CPU price could be lowered in average).
Obviously, options are plenty to drag the price down in exchange of a more constrained framework. Since Azure has the unique opportunity to deliver some very .NET oriented features, I am especially interested by approaches that would leverage sandboxed code executions - giving up entirely on the OS itself to purely focus on the .NET Runtime.
I am very eager to see how Microsoft will be moving forward on this request. Stay tuned.
At Lokad, we have been working with Windows Azure for more than 1 year now. Although Microsoft is a late entrant in the cloud computing arena, So far, I am extremely satisfied with this choice as Microsoft is definitively moving forward in the right direction.
Here is my Big Wish List for Windows Azure. It's the features that would turn Azure into a killer product, deserving a lion-sized market share in the cloud computing marketplace.
My wishes are ordered by component:
- Windows Azure
- SQL Azure
- Table Storage
- Queue Storage
- Blob Storage
- Windows Azure Console
- New services
- Faster CPU burst: The total time between initial VM request (through the Azure Console or the Management API), and the start of the client code execution is long, typically 20min, and in my (limited) experience above 1h for any larger number of VMs (say +10 VMs). Obviously, we are nowhere near real-time elastic scalability. In comparison, SQL Azure needs no more than a few seconds to instantiate a new DB. I would really like to see such an excellent behavior on the Windows Azure side.
- Smaller VMs: for now, the smallest VMs are 2GB large and costs $90/month, which brings the cost of a modest web app to 200 USD/month (considering a web role and a worker role). Competitors (such as Rackspace) are already offering much smaller VMs, down to 256MB per instance priced about 10x cheaper. I would really like to see that on Azure as well. Otherwise, scaled down apps are just not possible.
- Per minute charge: for now Azure is charging by the hour, which means that any hour that your start consuming will be charged fully. Obviously, it would be a great incentive to improve performance to start charging by the minute, so that developers could really fine tune their cloud usage to meat the demand without wasting resources. Obviously, such a feature makes little sense if your VMs take 1h to get started.
- Per-VM termination control: Currently, it is not possible to tell the Azure Fabric which VM should be terminated; which is rather annoying. For example, long running computations can be interrupted at any moment (they will have to be performed again) while idle VMs might be kept alive.
- Bandwidth and storage quota: most apps are never supposed to be require truckloads of bandwidth or storage. If they do, it just means that something is going really wrong. Think of a loop endlessly pooling some data from a remote data source. With pay-as-go, a single VM can easily generates 10x its own monthly costs through a faulty behavior. To prevent such situations, it would be much nicer to assign quota for roles.
Nice to have:
- Instance count management through RoleEnvironment: The .NET class RoleEnvironment provides a basic access to the properties of the current Azure instance. It would be really nice to provide here a native .NET access to instance termination (as outlined here above), and instance allocation requests - considering that each role should be handling its own scalability.
- Geo-relocation of services: Currently, the geolocation of a service is set at setup-time, and cannot be changed afterward. Yet, the default location is "Asia" (that's the first item of the list), which makes the process quite error-prone (any manual process should be considered as error-prone anyway). It would nicer if it was possible to relocate a service - eventually with a limited downtime, as it's only a corrective measure, not a production imperative.
- DB snapshot & restore toward the Blob Storage: even if the cloud is perfectly reliable, cloud app developers are not. The data of a cloud app (like any other app btw) can be corrupted by a faulty app behavior. Hence, frequent snapshots should be taken to make sure that data could be restored after a critical failure. The ideal solution for SQL Azure would be to dump DB instances directly into the Blob Storage. Since DB instances are kept small (10GB max), SQL Azure would be really nicely suited for this sort of behavior.
- Smaller VM (starting at 100MB for $1 / month): 100MB is already a lot of data. SQL Azure is a very powerful tool to support scaled-down approaches, eventually isolating the data of every single customer (in case of a multi-tenant app) into an isolated DB. At $10/month, the overhead is typically too large to go for a such a strong isolation; but at $1/month, it would become the de-facto pattern; leading to smaller and more maintainable DB instances (as opposed to desperately trying to scale up monolithic SQL instances).
- Size auto-migration: Currently, a 1GB DB cannot be upgraded as 10GB instances. The data has to be manually copied first, and the original DB deleted later on (and vice-versa, the other way around). It would be much nicer if SQL Azure was taking care of auto-scaling up or down the size of the DB instances (within the 10GB limit obviously).
Nice to have:
- Geo-relocation of service: Same like above. Downtime is OK too, just a corrective measure.
- REST level .NET client library: at present time, Table Storage can only be accessed though an ADO.NET implementation that proves to be rather troublesome. ADO.NET feels in the way if you really want to get the most of Table Storage. Instead, it would be much nicer if a .NET wrapper around the REST API was provided as low-level access.
Nice to have:
- Secondary indexes: this one has already been announced; but I am re-posting it here as it would be a really nice feature nonetheless. In particular, this would be very handy to reduce the number of I/O operations in many situations.
Nice to have:
- Push multiple messages at once: the Queue offers the possibility to dequeue multiple messages at once; but messages can only be queued one by one. Symmetrizing the queue behavior by offering batch writes too would be really nice.
Nice to have:
- Reverse Blob enumeration: prefixed Blob enumeration is probably one of the most powerful of the Blob Storage. Yet, items can only be enumerated in increasing order against their respective blob names. Yet, in many situation the "canonical" order is exactly the opposite of what you want (ex: retrieve blob names prefixed by dates, starting by the most recent ones). It would be really nice if it was to possible to enumerate the other way around too.
Windows Azure Console
The Windows Azure Console is probably the weakest component of Windows Azure. In many ways, it's a real shame to see such a good piece of technology so much dragged down by the abysmal usability of its administrative web client.
- 100x speed-up: when I say 100x, I really mean it; and even with 100x factor, it will still be rather slow by most web standards, as refresh latency of 20min is not uncommon after updating the configuration of a role.
- Basic multi-user admin features: for now, the console is a mono-user app which is quite a pain in any enterprise environment (what happens when Joe, the sysadmin, goes in vacations?). It would much nicer if several Live ID could be granted access rights to an Azure project.
- Billing is a mess, really: beside the fact that about 10 counter-intuitive clicks are required to navigate from the console toward your consumption records; the consumption reporting is still of substandard quality. Billing cries for massive look & feel improvements.
Nice to have:
- Project rename: once named, projects cannot be renamed. This situation is rather annoying as there are many situations that would call for a naming correction. At present time, if you are not satisfied with your project name, you've got no choice but to reopen an Azure account; and starts all over again.
- Better handling of large projects: the design of the console is OK if you happen to have a few services to manage, but beyond 10 services, the design starts being messy. Clearly, the console has not been designed to handle dozens of services. It would be way nicer to have a compact tabular display to deal with the service list.
- Aggregated dashboard: Azure services are spread among many panels. With the introduction of new services (Dallas, ...), getting a big picture of your cloud resources is getting more and more complex. Hence, it would be really nice to have a dashboard aggregating all resources being used by your services.
- OpenID access: Live ID is nice, but OpenID is nice too. OpenID is taking momentum, I would be really nice to see Microsoft supporting OpenID here. Note that there is no issue to support LiveID and OpenID side by side.
Finally, there are a couple of new services that I would be thrilled to see featured by Windows Azure:
- .NET Role Profiler: in a cloud environment, optimizing has a very tangible ROI, as each performance gain will be reflected through a lower consumption bill. Hence, a .NET profiler would be a killing service for cloud apps based on .NET. Even better, low overhead sampling profilers could be used to collect data even for systems in production.
- Map Reduce: already featured by Amazon WS, it's such a massively useful for the rest of us (like Lokad) who perform intensive computations on the cloud. Microsoft has already been moving forward with DryadLinq in this direction, but I am eager to see how Azure will be impacted.
This is a rather long list already. Did I forget anything? Just let me know.