Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
Tags
Monday
Dec152014

A few lessons about pricing B2B apps

My own SaaS company has always been struggling with its own pricing. For a company now selling its own pricing optimization technology for commerce, this was a bit ironic. Well, pricing of software is unfortunately very unlike pricing goods in store, and the experience we acquired working with our retail clients improving their own prices provided little insights about the pricing of Lokad.

Since the creation of the company, Lokad has been offering a metered pricing, charging according to the amount of forecasts consumed. However, in practice for the last two years, we signed only a handfew contracts where the pay-as-you-go pricing had been actually preserved. In practice, the usage consumption as observed during the trial period was used as the starting point of the negotiation; and then the negotiation invariably converged toward a flat monthly fee.

Starting from today, we have extensively revised the pricing of Lokad toward a very simple list of packages only differentiated by the maximal size for the client companies.

For SaaS companies selling to businesses, the (almost) ubiquitous pricing pattern consist of charging per user; that's the approach of Salesforce, Google Apps, Office 365, Zoho and many more. However, sometimes, charging per user doesn't make sense, because the number of users can be made arbitrarily low, and does not reflect at all the usage of the service. All cloud computing platforms fall into this category.

Metered pricing only works with Über-geek clients

The cloud computing example is misleading because it gives the false impression that metered pricing is just fine. Metered pricing works for cloud computing platforms because their clients are very technical and can digest pricing logics 100x more complex than logics acceptable by "non-tech" businesses.

At Lokad, we have observed many times that the fear of doing a mistake and increasing the invoice tenfold was generally considered as a deal-breaker. Most companies don't even nearly trust as much their employees as software companies do trust their software developers. A metered pricing put an implicit high level of trust on the employees operating the metered service.

Flat monthly / quarterly / yearly fees are the way to go

Through dozens of negotiations with clients, some large, some small, and across many countries, we have always converged toward periodic fees to be paid every month, quarter or year. Sometimes, we did add an additional setup fee to reflect some extra-effort to be delivered by Lokad to setup the solution, but in 7 years of business, we had only a handfew contracts more complex than a flat setup fee followed by a flat period fee.

The lesson here is that anything more complex than setup fee + periodic fee is very prone to accidental complexity providing little or no business value for the software company and its client.

Don't cripple your software by restricting access to features

The "freemium" vision consists of offering a free version with limited features, and restricting the access to the more advance features to paying clients. Again, if you consider a software where it's natural to charge per user this approach might work; however, when the software is not user-driven, not granting access to all features just drags down your small clients - who have mostly the same needs than your bigger clients.

We learned that crippling our own apps was just bad. At the end of most negotiations with clients, we were nearly always ending up granting access to all features - like the highest paying plan - for most companies. Naturally, the price point was adjusted accordingly, but nevertheless, we observed many times that crippling the software was just a lose-lose approach.

It's fine to trust your clients by default

For years, at Lokad, we had relied on the implicit assumption that whatever metric were going to be used to define the boundaries between the subscription plans, this metric had to be tracked by the software itself. However, by narrowing our vision to the sole metrics that our software could track, we had eliminated the one metric which was truly making sense: charge according to the company turnover.

Our new plans are differentiated based on turnover, and yet, we have not automated way to measure the turnover. However, is it really a problem? I don't think so. Over the years, we have very (very) few companies trying to game our terms. Moreover my observations indicates that the larger the company, the less likely they are to even consider the possibility of cheating.

The logical conclusion is then to grant access to everything by default, and then to gently remind companies of your pricing terms when the opportunity arise. B2B isn't B2C, for the vast majority of B2B software, even if you don't put any protection in place, the service isn't going to be swarmed by corporate freeloaders.

If it does, well that's a rich man's problem.

Tuesday
Oct212014

How we ended up writing our own programming language

About one year ago, my company had the opportunity to expand into an area which was very new for us at the time : pricing optimization for commerce. Pricing optimization is quite different to demand forecasting; the latter being the original focus of Lokad at the beginning of the company’s existence. While demand forecasting fits rather nicely into quantitative frameworks that allow you to decide which forecasting methods are the most suitable for any given task pricing is a much more evasive problem. The idea that profits can be maximized by carrying out a simple analysis of demand elasticity is deceptive. Indeed, pricing is a signal sent to the market; and like with any marketing ingredient, there is not one valid answer to the problem.

One year ago, most of the companies that helped merchants manage their pricing were consulting companies, but I wanted to build a pricing webapp to help businesses deal with their pricing which would go beyond the classic consulting services on pricing. I quickly ruled out the idea of offering a template list of “pricing recipes”. Some competitors were already offering such “pricing recipe” services, and they were desperately inflexible. Merchants needed to be able to “tweak” their pricing logic in many ways. Thus, I started to consider more elaborate user interfaces that would allow merchants to compose their own pricing strategies. After pushing some efforts at mockups, I was ending up with something oddly similar to Microsoft Access “visual” query designer.

This was not a good thing. My limited interactions with this query designer, a decade prior, had left me with the lasting impression of this being just about the worst user experience I have ever had with the “normal” behavior of a product released by Microsoft. While it was supposedly a visual query editor with plenty of very visual buttons, but unless you had some knowledge of SQL or experience in programming, you weren’t going very far with this tool. In the end, anyone using Access was falling back on the non-visual query editor, which quite unfortunately, was a second-class citizen.

Gradually, I came to consider the possibility of going for a programming language instead. With a programming language, we could provide the desirable expressiveness, but also a powerful data environment. Lokad would be hosting the data along with offering a cloud-based execution environment for the pricing logic. This environment would not be constrained by the client’s desktop setup which can be too old, too weak, too messy or downright corrupted.

At first, I considered reusing an existing programming language such as JavaScript or Python. However, this presented two very specific challenges. The first challenge was security. Running server-side client code seemed like a giant vector for entire classes of injection attacks. In theory, it should be possible to sandbox the execution of any program, but my instincts were telling me that the surface attack area was so great we would never be confident enough about not having leaks in our sandbox. So, we would have to leverage disposable VMs for every execution, and it seemed that an endless stream of technical problems was heading our way if we were to implement this.

The second, and in fact bigger, problem is that JavaScript or Python being full-fledged programming languages are also complex languages, which include truckloads of features downright irrelevant for the pricing use cases that I was considering: loops, objects, exceptions, nil references. No matter how much we would try to steer the usage of our future product away from these elements, I felt that they would invariably resurface again and again because some our future users would be familiar with just these languages, and as a result, they would do things the way they were used to doing them before. It is tough to debug a generic programming source code, so the tooling would necessarily end up being complex as well.

This left me with the prospect of inventing a new programming language, and yet this idea was accompanied by all the red flags possible in my mind. Usually, for a software company, inventing its own programming language is a terrible idea. Actually, I had witnessed quite closely three companies who had rolled out their own respective programming languages, and for each one of these companies, the experience was very poor to say the least . Two of them managed to achieve a certain level of success nonetheless, but the ad hoc nature of the language had been a huge hindrance to the process. Moreover, about every single experience I ever had with niche programming languages (hello X++) confirmed that an ad hoc language was a terrible idea.

However, as far pricing and commerce were concerned, a generic programming language was not required, and we were hopeful that through extreme specialization, we could produce a highly specialized language that would, all being well, compare favorably with the mainstream languages, at least for the narrow scope of commerce.

And thus, the Envision programming language was born at Lokad.

Unlike a generic programming language, Envision is designed around a relatively rigid and relatively simple data model that we knew to work well for commerce. My intent was to be able to reproduce all the domain-specific calculations that nearly all merchants were doing in Microsoft Excel, but putting some distance between the logic and the data - but not too much distance either. Indeed, Envision, as the name suggests, is also heavily geared toward data visualization; again, not generic data visualization, but things that matter for commerce.

Envision has no loops, no branches, no nulls, no exceptions, no objects … and it does just fine without them. It is not Turing-complete either, so we do not end up with indefinite execution delays.

Less than one year after starting to write the first line of code for the Envision compiler, we have now secured over a million Euros through multi-year contracts that are to be (almost) completely implemented in Envision. To be transparent, it is not the language that clients are buying, but the service to be built with it. Nevertheless, over the last couple of months, we have been able to deliver all kinds of quantitative optimizations with Envision - not just pricing actually – and within timeframes that we would never had achieved in the past.

There are a two takeaways lessons to be learned from this initiative. First, ultra-specialized languages are still a valid option for vertical niches. While it is a very rough estimate, I would say that with Envision, when dealing with a suitable challenge, we end-up with about 50 times fewer lines of code than when the same logic is implemented in C#, the primary programming language used at Lokad. Yes, using a functional language like F# would already make the code more compact than C#, but it would still be far from being that compact. Also, with Envision, we get more concise code not because of we leverage highly abstract operators, but merely because the language itself is geared towards the exact problem to be addressed.

Second, when introducing a programming language, it should not be half-baked. Not only the language itself needs the good ingredients known to computer science – a context-free grammar written in Backus-Naur form for example; but a good integrated programming environment with code auto-completion and meaningful error messages is also needed. The language is only the first ingredient; it is the tooling around the language that makes the difference in terms of productivity. From the very beginning, we invested in the environment as much as did we invest in the programming language.

Also, having a webapp as the primary execution environment for your language opens up a lot of new possibilities. For example, unless you spend years polishing your syntax before releasing anything, you are bound to make design mistakes. We certainly did. Then, as all the Envision scripts were also hosted by Lokad, it was possible for us to rectify those mistakes, first by fixing the language, and second, by upgrading all the impacted scripts from the entire user base. Sure, it takes time, but better to spend a few days on this early on, as opposed to end up with broken language forever.

I have not delved much into the details of the Envision language itself in this post, but in case you would be interested, I have just published a book about it. The preview gives you access to the entire book too.

Ps: while I credit myself for initiating the Envision project at Lokad, it is actually a colleague of mine, Victor Nicollet, presently the CTO of Lokad, who came up with nearly all the good ideas for the design of this language and who carried about 90% of the implementation effort.

Thursday
Mar062014

Dismal IT usually starts with recruitment agencies

In most companies, especially non-tech companies, the IT landscape is dismal: the old ERP (vastly unmaintainable) coexists the new ERP (mostly unmaintainable). Sales figures from the ERP diverges from the ones of the Business Intelligence which also diverge from the ones of the CRM. Systems are slow and unreliable. Major evolutions take years, etc. Technical debt has not been paid for years, and interests are usurious.

Root causes of dismal IT are many. Enterprise software vendors have their share of owernship in the mess, but usually they are more a symptom of the problem rather than the cause of the problem. Indeed, had the company been more prepared, no big check would have been signed to such troublesome vendors in the first place.

Such a situation calls for a 5 whys analysis popularized by Toyota.

  • Why is there so much technical debt? Because IT relies on a couple of poorly managed enterprise vendors, with plenty of functional overlaps. Those overlaps are intensified by the respective business models of the vendors that drive each one of them to spread toward other functional areas Both CRM and Web Platform wants to track audience, etc.
  • Why are vendors so poorly chosen and managed? The road to hell is paved of good intentions. A few carefully crafted Request for Quotes (RFQ) typically ends up with a fully disfunctional selection process where one of the worst vendors gets selected. The situation could have been salvaged by hevary hands-on management capable of driving the software vendor toward more reasonable designs and architectures; but middle management is clueless about IT and keeps a low profile.
  • Why is middle-management so clueless? Middle management staff never had to bother gaining IT skills in the past. Moreover, most of them managed to climb the company hierarchy just as well. Why would they bother now with tedious software stuff? Since outrageous fees are paid to software vendors, surely vendors should be able to manage themselves.
  • Why top management made such hiring mistakes? Top management had correctly anticipated that IT was a critical ingredient for pratically every single aspect of the business. However, recruiting talents takes a lot of time. Furthermore, recruiting technical talents requires to become a bit of a technical expert to assess the technical skills. Thus, it's easier to just delegate the problem to a recruitement agency.
  • Why (almost?) recruitment agencies fail in IT? Recruiting in IT is tough. The job market has been excellent for 4 decades now, and there are about zero talents unemployed. Moreover talents don't just expect a good pay: they expect a project that makes sense. Unfortunately, there are few meetings that are less inspiring than doing a job interview with a recruitment agency. As a result, candidates flowing through agencies are invariably poor.

If you want a better IT, the journey starts by re-internalizing every single recruitement.

Sunday
Nov242013

Bitcoin, more thoughts on an emerging currency

Two years ago, I was publishing some first thoughts on Bitcoin. Meantime, Bitcoin has grown tremendously, and I remain an enthusiast observer of those developments. I had originally proposed a vision in 5 stages for the development of Bitcoin with

  1. Mining stage
  2. Trading stage
  3. End-user stage
  4. Merchant stage
  5. Enterprise stage

Back in 2011, I had written that mining was taken care of. Well, since that time, Bitcoin has witnessed an explosion of the hashing power through the development of ASICs, that is, hardware dedicated to the sole purpose of mining Bitcoins. Mining has definitively emerged as an extremely specialized niche.

Bitcoin is now halfway through its trading stage. Two years ago, MtGox was so dominant that it was the closest thing to be considered as a single point of failure for Bitcoin. Meantime, many other exchanges have emerged: Bitstamp, Kraken, Btcchina … I suspect that MtGox holds no more than 20% of the exchange market share. Are we done with exchanges yet? Well, not yet, Bitcoins remain convoluted to acquire – I will get back to this point.

Fade of interest, a fading danger but still the main danger

Price volatility, malevolent uses and adverse regulations are usually quoted as dangers faced by the emerging currency. I think that those threats have grown into non-issues for Bitcoins. Indeed, the very same criticisms can be made about most currencies and commodities anyway, and Bitcoin is now beyond the point where a roadblock could wipe out the initiative.

No, the one major risk for Bitcoin remains a fade of interest from the community. High-tech is a fast paced environment and few technologies survive a decade. However, considering the steady growth of Bitcoin in the public awareness, I am inclined to think that this risk, the one true danger for Bitcoin, is itself fading away.

Bitcoin, a poster child for antifragility

Over the last two years, Antifragile from Nassim Nicholas Taled, is the most noticeable book I have given the chance to read. In particular, I realized that antifragility is probably one of the greatest and most misunderstood quality of Bitcoin. Bitcoin might seem complex, but it’s nothing but a protocol sitting on top of a shared ledger. Thanks to the present Bitcoin reach, the ledger itself – technically the blockchain – is probably the dataset in existence that benefits from the greatest number of backups world-wide. That part is safe, arguably orders of magnitude safer than the ledger of any bank.

What about the protocol then? Well, the protocol can fail, like any piece of software. It certainly did in the past, and most likely, it will fail again in the future. Let’s bring the case further, and imagine that instead of a simple glitch, someone manages to crack the protocol tomorrow, what would happen? Well, as it’s exactly what happened to Namecoin not too long ago, it’s not too hard to make a good guess. First, a corrupted blockchain would spread wreaking havocs in the Bitcoin ecosystem. Exchange rates would drop of 90% overnight, and then exchanges would simply stop operating. Meantime, within hours after the emergence of the problem, community developers, possibly members of the Bitcoin Foundation, would start working on a fix.

Depending on the nature of the weakness found the Bitcoin, fixing the problem would take from a few hours to a few weeks. Considering the amount of people involved, I fail to see why it would take much more than that. Indeed, Bitcoin is complex, but in the end, it’s not that complex, especially when compared to other popular open source projects such as Linux, Firefox or Open Office.

In the case of Namecoin, the terminal protocol bug was resolved in about 24h, and that’s Namecoin, an alt-coin with about 0.1% of the community traction of Bitcoin.

Then, once a solution is found, a new blockchain would be restarted from one of the many non-corrupted copies of the old blockchain still available. Depending on the depth of the problem, multiple and incompatible solutions might be proposed more or less at the same time by distinct developers. The market might even undergo a few competing solutions for a while, but then a “winner’s take all” effect will quickly push to oblivion all solutions BUT the leading one. Within a few months (maybe less), the exchange rates would have returned to their previous levels.

It’s Bitcoin as a ledger that is truly antifragile. The other part, the Bitcoin as a protocol is fragile and it is likely to be modified dozens of times over the next decade, each new version annihilating the previous version if the community consents to it.

If a massive protocol breach was to happen, many companies part of the Bitcoin ecosystem could go burst overnight: some exchanges might accumulate instant but terminal losses, a revised protocol could possibly make former hardware designs incompatible with the revised protocol, etc. The Bitcoin ledger itself is the only entity to be antifragile within the ecosystem, simply because many developers are personally vested in the preservation of this ledger.

Moreover, shocks do benefit to Bitcoin:

  • Blockchain spam forced the community into making the protocol more resilient,
  • Major thefts, the rise and fall of Silkroad, helped Bitcoin to make the headlines,
  • Cyprus crisis undermined a bit the trust in the euros, again in favor of Bitcoin,
  • Etc.

The next country printing its money into oblivion, the next bank failing with or without bail-out, the next country not to honor its debts … any of those events will further boost Bitcoin: not because Bitcoin will have succeeded at doing or succeeded at preventing anything, but merely because Bitcoin will have remained un-impacted.

In a way, betting on Bitcoin is betting on a degree of economic chaos for the years to come. A world of perfectly stable economies offering frictionless currencies does not need Bitcoin.

When an Unstoppable Force meets an Immovable Object

While many trading options have emerged for Bitcoin, exchanging national currencies for Bitcoin remains a convoluted exercise; and, I suspect that it will remain non-trivial for a while, possibly a long while.

Indeed, pretty much everything in the banking system has been built around the notion of reversible transactions: the money on the bank is your’s, but only from a legal viewpoint. If a court decides that one of the transactions that originally funded your account was not legitimate, then the transaction can be reversed, and the money can change of owner based on third party interventions. With Bitcoin, ownership is a matter of knowledge. If you know the private key of a Bitcoin address, and if nobody else knows it, then you are the true owner of whatever Bitcoins this address has accumulated. It’s very physical process deeply uncaring for any legal considerations: no court order can recover a transaction made toward an address if keys have been lost.

This aspect explains why it remains almost impossible to use a credit card to buy Bitcoins, and why considerable delays tend to be introduced by parties even when wire transfers are involved. Exchanging cash for Bitcoins feels a more natural option though. A Bitcoin-to-Cash ATM is now already available in Vancouver. However, I suspect that ATM owners are heading for frictions. For any ATM model that takes off, bad guys will start buying ATMs for the sole purpose of reverse-engineering them with ad-hoc counterfeit money printed for the sole purpose of fooling this specific type of ATM. Indeed, bad guys don’t need to produce quasi-perfect counterfeit bank notes, merely counterfeit notes good enough to fool this one machine – a much easier task.

Again, with regular ATMs, it’s a non-issue. If someone manages to stuff an ATM with counterfeit money, the bank will simply cancel the corresponding transaction later on when the misdeed is uncovered. The bank has full control on its ledger.

A store of value

When I discovered Bitcoin, I was inclined to think it would succeed because it made world-wide payment frictionless. Well, it’s certainly still part of the picture, but the more I observe the community, the more I believe it’s a positive but relatively marginal driving force.

Few people would argue that the growth of Bitcoin has been essentially driven by speculative investments. Then, according to the Bitcoin community wisdom, many would also argue that the ecosystem will gradually transition from pure speculation to more mundane uses, hence justifying high anticipated conversion rates. However,

  • what if speculation stayed the dominant force not to be replaced by any other?
  • what if Bitcoin did not need any alternative force to maintain its value?

Indeed, a shared yet incorruptible ledger may offer a fantastic intrinsic value on its own, as it gives people the possibility to save value without trusting any designated third party – trusting instead the community as a whole.

Gold arguably offers the same benefice, but in practice, gold is an impractical medium to make any payment; and, as a result, any gold transaction starts by converting the gold back to a local currency.

Then, why trusting a designed third party should be a problem, one might ask? Well, most currencies are simply not managed in the interest of the currency holders. China, Brazil, Russia and Argentina probably come top of the list here because of their respective size, but they are far from being the worst offenders. Then, even dollars, euros and yens are hardly managed in the best interest of currency holders.

Here, Bitcoin benefits from an ancient social pattern called the Gresham's law. According to the Wikipedia:

Gresham's law is an economic principle that states: "When a government overvalues one type of money and undervalues another, the undervalued money will leave the country or disappear from circulation into hoards, while the overvalued money will flood into circulation." It is commonly stated as: "Bad money drives out good".

This law has been quoted many times about Bitcoin, but its consequences are usually misunderstood. Many detractors argue People are just hoarding Bitcoin, instead of spending them, which will be the downfall of Bitcoin. This observation is partial, and I believe that the conclusion incorrect too. Note that Bitcoin can still fail, but not because of this (see above).

A more accurate observation would be Many, if not most, are hoarding Bitcoins until they have an actual need to spend them. Meanwhile, those people just keep spending whatever non-Bitcoin currency they have. This behavior exactly fits the Gresham’s law, but what does it imply for Bitcoin?

First, merchants should not expect too many people rushing to spend their Bitcoins. Most people will keep spending their non-Bitcoin currency as long as they can. However, as accepting Bitcoins is an inexpensive option, there are little downsides in accepting Bitcoins - especially if Bitcoins are immediately converted to the local currency. Second, the more people keep their coins, the more the exchange rate will rise, due to simple market mechanics; thus, actually preserving the value storage property of Bitcoin.

At this point, detractors would argue that if there is little exchanges through Bitcoin and if it’s only about hoarding something that has no real value, how could this something be worth anything? This brings me back to the ledger (i.e. the blockchain). The one distinctive innovation brought by Satoshi Nakamoto is to make the world realize that a fully decentralized and yet incorruptible ledger was possible. The Bitcoin ledger is unique and it’s is what gives Bitcoin its value.

What people really owns when owning Bitcoins is a quantified amount of favors that could be given back from any member of the community; as long community interest has not faded, and it can be a valuable privilege – hence, not needing further benefits to justify the value.

Alt-coins will drive the evolution of Bitcoin

As an asset, what is the value of the Bitcoin protocol? Well, zero. Anybody can fork the source code, almost 2000 already did. Anybody can restart an alt-coin variant, dozens already did. While Bitcoin can be arguably estimated as invaluable to mankind, the protocol itself has zero market value: nobody makes money by selling the protocol.

The market value is in the ledger and only in the ledger, and this is why alt-coins are unlikely to gain any significant market value: they recycle the bulk of the Bitcoin protocol (the value-less part) while ditching the blockchain (the valuable part).

Namecoin is barely an alt-coin, because it addresses a very different problem; and that’s precisely because it does not compete with Bitcoin that it managed to gain traction.

Nevertheless, alt-coins represent an incredible opportunity for Bitcoin. Through experiments with alternative approaches, alt-coins are producing the knowledge that will make Bitcoin more secure, more usable, leaner, etc. Alt-coins, by being fragile experiments, directly helps Bitcoin in becoming more antifragile.

For example, Zerocoin brings an unprecedented level of anonymity in transactions by introducing rocket-science zero-knowledge cryptography in the protocol. From the Bitcoin perspective, there is absolutely no need to rush to import Zerocoin into the protocol. After all, Bitcoin has been striving without it so far. It’s much more reasonable to remain a passive observer for a (long) while, to let Zerocoin take all the bullets as bugs and flaws are uncovered, to let the Zerocoin community patiently address performance issues; and then, once Zerocoin has fully matured, to upgrade the Bitcoin protocol leveraging all this hard-won knowledge.

Thus, from a currency holder perspective, it means that alt-coins are doomed with high probability, because they won’t be able to preserve any technological advantage over time, bringing the competition back to a competition between ledgers where Bitcoin will only grow stronger over time.

Preserving Bitcoins

Since Bitcoin is about storing value, foolproof ways to secure Bitcoins is a critical ingredient. Two years ago, I was already indicating this challenge was not specific of Bitcoins: it’s just incredibly convoluted to operate a computing environment that you can fully trust. Long story short: you need air gaps, but it’s harder than it looks.

Furthermore, the overall amount of trust that people should have in their computing devices - notebooks, phones, servers in the cloud – has rather gone downward since the Snowden revelations. Thus, I am inclined to think that many successful ventures of the end-user stage will be Bitcoin appliances, that is, hardware devices designed for the sole purpose of dealing with Bitcoins. The Bitcoin Card and Trezor are both promising appliances, and I suspect there is room for a lot more contenders in this market.

Indeed, as most people invest in Bitcoins, it’s fairly reasonable to assure that most of those people will be inclined in spending a bit to more to secure their investment.

The widespread availability of Bitcoin appliances that have gained the trust of the community will be the sign that the end-user stage of Bitcoin is taken care of.

Annex: More technical considerations

Instant transactions are coming without much effort. It takes half a dozen of blocks to gain an absolute confidence in a Bitcoin transaction, which means about 1h of delay. Many people see this aspect as a design failure, which prevents most live payment scenarios. However, if one is OK from relaxing the constraint from absolute confidence to quasi-absolute, then instant transactions can be made very secure, arguably a lot more secure than credit cards transactions (because of chargebacks). All it takes is an online service that aggressively spreads the transaction over the network while in the same time it aggressively monitors any double-spend attempt. Such a service does not exist yet, but it’s not the most pressing issue for Bitcoin either.

Scalability is a very addressable concern. Scalability is frequently presented as a core design flaw, that is, if Bitcoin starts gaining traction, it will fail because it won’t be scalable enough. (Disclaimer: argument from authority) My own experience in teaching distributed computing and tackling Big Data projects for year indicates is that scalability is never a terminal problem. Scalability problems are straightforward problems merely needing patience and dedication to be solved. Furthermore, many developers just love tackling scalability challenges well beyond market needs. That part of Bitcoin is probably very safe.

Tuesday
Oct152013

Thinking Big Data for commerce

As of October 2013, Google Trends indicates that the buzz around Big Data is still growing. Based on my observations of many services company (mostly retailers though), I believe that it's not all hype, and that indeed Big Data is going to deeply transform those businesses. However, I also believe that Big Data is wildly misunderstood by most, and that most Big Data vendors should not be trusted.

Search volume on the term "Big Data" as reported by Google Trends

A mechanization of the mind based on data

Big Data, like many technological viewpoint, is both quite new, and very ancient. It's a movement deeply rooted in the perspective of mechanization of the human mind which started decades ago. The Big Data viewpoint states the data can be used as the source of knowledge to produce automated decisions.

This viewpoint is very different from Business Intelligence or Data Mining, because humans get kicked out of the loop altogether. Indeed, producing millions of numbers a day cost almost nothing; however, if you need people to read those numbers, the costs are fantastic. Big Data is mechanization : the number produced are decision and nobody is needed to interfere at the lowest level to get things done.

The archetype of the Big Data application is the spam filter. First, it's an ambient software, taking important decisions all the time: deciding for me which message is not worth my time to read certainly is an important decision. Second, it has has been built based on the analysis of lot of data, mostly by establishing databases of messages labeled as spam or not-spam. Finally, it requires almost zero contribution of its end-user to deliver its benefits.

Not every decision is eligible to Big Data

Decisions that are eligible to a Big Data processing are everywhere:

  • Choosing the next most profitable prospect to send the paper catalog.
  • Choosing the quantity of goods to replenish from the supplier.
  • Choosing the price of an item in a given store.

Yet, such a decision needs two key ingredients:

  1. A large number of very similar decisions are taken all the time.
  2. Relevant data exist to bootstrap an automated decision process.

Without No1, it's not cost-efficient to even tackle the problem from a Big Data viewpoint, Business Intelligence is better suited. Without No2, it's only rule-based automation.

Choosing the problem first

The worst mistake that a company can do when starting a Big Data project consists of choosing the solution before choosing the problem. As of fall 2013, Hadoop is now the worst offender. I am not saying that there is anything wrong with Hadoop per se; however choosing Hadoop without even checking that it's a relevant solution is a costly mistake. My own experience with service companies (commerce, hospitality, healthcare ...) indicates that, for those companies, you hardly ever need any kind of distributed framework.

The one thing you need to start a Big Data project is a business problem that would be highly profitable to solve:

  • Suffering from too high or too low stock levels.
  • Not offering the right deal to the right person.
  • Not having the right price for the right location.

The ROI is driven by the manpower saved by the automation, and by getting better decisions. Better decisions can be achieved because the machine can be made smarter than a human with less than 5s of brain-time per individual decision. Indeed, in service companies, productivity targets imply that decisions have to be made fast.

Your average supermarket has typically about 20,000 references, that is, 330 hours of work if employees spend 1 minute per reordered quantity. In practice, when reorders are made manually, employees can only afford a few seconds per reference.

Employees are not going to like Big Data

Big Data is a mechanization process. And, it's usually easy to tell to spot the real thing just by looking at internal changes caused by the project. Big Data is not BI (Business Intelligence) where employees/managers are offered a new gizmo, changing nothing to the status quo. The first effect of Big Data project is typically to reduce the number of employees (sometimes massively) required to process a certain type of decisions.

Don't expect teams soon-to-replaced-by-machines to be overjoyed by such a perspective. At best, upper management can expect passive resistance from their organization. Across dozens of companies, I don't think I have observed, among services companies, any Big Data project succeeding without a direct involvement from the CEO herself. That's unfortunately the steepest cost of Big Data.

The roots of Big Data

The concept of Big Data did emerge only in 2012 among mainstream media, but its roots are much older. Big Data results from 3 vectors of innovations:

  • Better computing infrastructures to move data around, to process data, to store data. The latest instance of this trend is cloud computing.
  • Algorithms and statistics. Over the last 15 years, the domain of statistical learning has exploded (driverless cars are a testimonial of that).
  • Enterprise digitalization. Most (large) companies have completed their digitalization process where each key business operation has its counterpart digital record.

Big Data is first a result of the mix of those 3 ingredients.

Big Data = Big Budget?

Enterprise vendors are now chanting their new motto big data = big budget, and you can't afford not to take the Big Data train, right? Hadoop, SAP Hana, Oracle Exalytics, to name a few, are all going to cost you a small fortune when looking at TCO (total cost of ownership).

For example, when I ask my clients how much does it cost to store 24TB  of data on disk? Most of the answers come above 10,000€ per month. Well, OVH (hosting company) is offering 24TB servers from 110€ / month. Granted, this is not highly redundant storage, but at this price, you can afford a few spares.

Then, the situation is looks even more absurd when one realizes that storing 1 year of receipts of Walmart - the largest retailer world-wide - can fit on a USB key. Unless you want to process images or videos, very few datasets cannot be made to fit on a USB key.

Here, there is a media-induced problem: there are about 50 companies world-wide who have web-scale computing requirements such as Google, Microsoft, Facebook, Apple, Amazon to name a few. Most Big Data frameworks originate from those companies: Hadoop comes from Yahoo, Cassandra comes from Facebook, Storm is now Twitter, etc. However, all those companies have in common of roughly processing about 1000x more data that the largest retailers.

The primary cost of a Big Data project is the focus that top management needs to invest. Indeed, focus comes with a strong opportunity cost: while the CEO is busy thinking how to transform her company with Big Data, fewer decisions can be made on other pressing matters.

Iterations and productivity

A big data solution does not survive its first contact with data.

Big Data is a very iterative process. First, the qualification of data (see below) is an extremely iterative process. Second, tuning the logic to obtain acceptable results (the quality of the decisions) is also iterative. Third, tuning the performance of the system is, again, iterative. Expecting a success at first try is heading for failure, unless you tackle a commoditized problem, like spam filtering, with a commoditized solution, like Akismet.

Since many iterations are unavoidable, the cost of the Big Data project strongly impacted by the productivity of the people executing the project. In my experience at teaching distributed computing at the Computer Science Department Ecole normale supérieure, if the data can kept on a regular 1000€ workstation, the productivity of any developer is tenfold higher compared to a situation where the data has to be distributed - no matter how many frameworks and toolkits are thrown at the problem.

That's why it's critical to keep things lean and simple as long it's possible. Doing so gives your companies an edge against all your competitors who will tar pit themselves with Big Stuff.

Premature optimization is the root of all evil. Donald Knuth. 1974

Qualifying the data is hard

The most widely estimated challenge is Big Data is the qualification of the data. Enterprise data has never been created with the intent to feed some statistical decision-taking processes. Data exist only as the by-product of software operating the company. Thus, enterprise data (nearly) always full of subtle artifacts that need to be carefully addressed or mitigated.

The primary purpose of point of sales is to let people pay; NOT to produce historical sales records. If a barcode has become unreadable, then, many cashiers might just scan twice another item that happens to have the same price than the non-scannable one. Such a practice is horrifying from a data analysis perspective, but looking at the primary goal (i.e. letting people pay), it's not unreasonable business-wise.

One of the most frequent antipatterns observed in Big Data projects is the lack of envolvement of the non-IT teams with all the technicalities involved. This is a critical mistake. Non-IT teams need to tackle hands-on data problems because solutions only come from a deep understanding of the business.

Conclusion

Big Data is too important to be discarded as an IT problem, it's first and foremost a business modernization challenge.