Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
Tags

Entries in Lokad (18)

Tuesday
Mar082016

Cloud-first programming languages

The art of crafting of programming languages is probably one of the most mature fields of software, and yet it’s surprising to realize how much potential there is in rethinking programming from a cloud-first [0] perspective. At my company Lokad, we ended-up writing our own programming language - a narrow domain specific language geared toward commerce analytics – and, we keep stumbling on elements that would have been hard to achieve from a more traditional perspective.

Our language – Envision – lives within the walled garden of its parent company: Lokad provides the tools to author the code as well as the platform to execute the scripts. While this approach has limitations of its own; it offers some rather unique upsides as well.

1. Automated language upgrade

Designing a programming language is like any other design challenge: even the most brilliant designer makes mistakes. Then, assuming that the language gains some traction, a myriad of programs get written leveraging what has now become an unintended feature. At this point, rolling back any bad design decision takes a monumental effort, because every single piece of code ever written needs to be upgraded separately. All major programming languages (C++, JavaScript, Python, C#) are struggling with this problem. Overall, change is very slow, measured in decades [1].

However, if the parent company happens to be in control of all the code in existence, then it becomes possible to refactor automatically, through static code analysis, all code ever written, and through refactoring to undo the original design mistake. This does not mean that making mistakes becomes cheap but only that it becomes possible to fix those mistakes within days [2], while regular programming languages mostly have to carry on forever with their past mistakes.

From a cloud-first perspective, it’s OK to take some degree of risk with language features as long as the features being introduced are simple enough to be refactored away later on. The language evolution speed-up is massive.

2. Identifying and fixing programming antipatterns

Programming languages are for humans and humans make mistakes. Some mistakes can be identified automatically through static code analysis; and then, many more can be identified through dynamic code analysis. Within its walled garden, the company has direct access not only to all the source code, but all past executions as well, plus all the input data as well. It this context, it becomes considerably easier to identify programming antipatterns.

Once an antipattern is identified, it becomes possible to selectively warn impacted programmers with a high degree of accuracy. However, it also becomes possible to think of the deep-fix: the programming alternative that should resolve the antipattern.

For example, at Lokad, we realized a few months ago that lines of code dealing with minimal ordering quantities were frequently buggy. The deep fix was to get rid of this logic entirely through a dedicated numerical solver. The challenge was not so much of implementing the solver – although it happened to be a non-trivial algorithm – but to realize that such a solver was needed in the first place.

3. Out-of-band calculations

As soon as your logic needs to process a lot of data, computation delays creep in. Calculation delays are typically not an issue in production: results should to be served fast, but refreshing the results [3] can typically take minutes without any impact. As long as nobody is waiting for the newer results, latency matters little.

However, there is one point of time when calculation latency is critical: design time, when the programmer is slowly iterating over hundreds of versions of the same code to incrementally craft the intended calculation. At design time, calculation delays are a real hindrance. Data scientists know the pattern too well: add 2 lines to your code, execute, and go grab a coffee while the calculation completes.

But what if the platform was compiling and running your code in the background? What if the platform was even planning things ahead of you, and pre-computing many elements before you actually need them? It turns out that if the language has been designed upfront with this sort of perspectives, it’s very feasible; not all the time, just frequently enough. Through Envision, we are already doing those, and it’s not even that hard [4].

A careful cloud-first design of the programming language can be used to intensify the amount of calculations that can be performed out-of-band. Those calculations could be performed on local machines, but in practice, a relying on a cloud makes everything easier.

4. Data-rich environment

From a classic programming perspective, the programming language – or the framework – is supposed to be decoupled from data. Indeed, why would anyone ship a compiler with datasets in the first place? Except for edge cases, e.g. Unicode ranges or timezones, it’s not clear that it would even make sense to bundle any data with the programming language or the development environment.

Yet, from a cloud-first perspective, it does make a sense. For example, in Envision, we provide a native access to currency rates, both present and historical. Then, even within the narrow focus of Lokad, there are many more potential worthy additions: national tax rates, ZIP code geolocation, manufacturer identification through UPC... Other fields would probably have their own domain-specific datasets ranging from the properties of chemical compounds to trademark registrations.

Embedding terabytes of external data along with the programming environment is a non-issue from a cloud-first perspective; and it offers the possibility to make vast datasets readily available with zero hassle for the programmer.

In conclusion, the transition toward a cloud-first programming language represents an evolution similar to the one that happens when transitioning from desktop software to SaaS. From afar, both options look similar, but the closer you get, the more differences you notice.

[0] I am not entirely satisfied with this terminology; it could have been LaaS for “Language as a Service”, or maybe IDEE for “Integrated Development and Execution Environment”.

[1] The upgrade from Python 2 from Python 3 will have roughly cost about a decade to this community. Improving the way null values are handled in C# is also a process that will most likely to span over a decade; the end-game being to make those null values unnecessary in C#.

[2] In the initial version of Envision, we decided that the operator == when applied to strings would perform a case-insensitive equality test. In hindsight, this was a plain bad idea. The operator == should perform a case-sensitive equality test. Recently, we rolled a major upgrade where all Envision scripts got upgraded toward the new case-insensitive operators, effectively freeing the operator == for the revised intended semantic.

[3] Most people would favor a spam filter introducing 10 seconds of processing delay per message if the filtering accuracy is at 99.99% versus a spam filter needing 0.1 seconds but offering only a 99% accuracy. Similarly, when Lokad computes demand forecasts to optimize containers shipped from China to the USA, speeding up the calculation of a few minutes is irrelevant compared to any extra forecasting accuracy to be gained through a better forecasting model.

[4] If somebody uploads a flat file – say a CSV file – to your data processing platform, what comes next? You can safely assume that loading and parsing the file will come next; and Lokad does just that. Envision has more fancy tricks under the hood than flat file pre-parsing, but it's same sort of ideas.

Monday
Dec152014

A few lessons about pricing B2B apps

My own SaaS company has always been struggling with its own pricing. For a company now selling its own pricing optimization technology for commerce, this was a bit ironic. Well, pricing of software is unfortunately very unlike pricing goods in store, and the experience we acquired working with our retail clients improving their own prices provided little insights about the pricing of Lokad.

Since the creation of the company, Lokad has been offering a metered pricing, charging according to the amount of forecasts consumed. However, in practice for the last two years, we signed only a handfew contracts where the pay-as-you-go pricing had been actually preserved. In practice, the usage consumption as observed during the trial period was used as the starting point of the negotiation; and then the negotiation invariably converged toward a flat monthly fee.

Starting from today, we have extensively revised the pricing of Lokad toward a very simple list of packages only differentiated by the maximal size for the client companies.

For SaaS companies selling to businesses, the (almost) ubiquitous pricing pattern consist of charging per user; that's the approach of Salesforce, Google Apps, Office 365, Zoho and many more. However, sometimes, charging per user doesn't make sense, because the number of users can be made arbitrarily low, and does not reflect at all the usage of the service. All cloud computing platforms fall into this category.

Metered pricing only works with Über-geek clients

The cloud computing example is misleading because it gives the false impression that metered pricing is just fine. Metered pricing works for cloud computing platforms because their clients are very technical and can digest pricing logics 100x more complex than logics acceptable by "non-tech" businesses.

At Lokad, we have observed many times that the fear of doing a mistake and increasing the invoice tenfold was generally considered as a deal-breaker. Most companies don't even nearly trust as much their employees as software companies do trust their software developers. A metered pricing put an implicit high level of trust on the employees operating the metered service.

Flat monthly / quarterly / yearly fees are the way to go

Through dozens of negotiations with clients, some large, some small, and across many countries, we have always converged toward periodic fees to be paid every month, quarter or year. Sometimes, we did add an additional setup fee to reflect some extra-effort to be delivered by Lokad to setup the solution, but in 7 years of business, we had only a handfew contracts more complex than a flat setup fee followed by a flat period fee.

The lesson here is that anything more complex than setup fee + periodic fee is very prone to accidental complexity providing little or no business value for the software company and its client.

Don't cripple your software by restricting access to features

The "freemium" vision consists of offering a free version with limited features, and restricting the access to the more advance features to paying clients. Again, if you consider a software where it's natural to charge per user this approach might work; however, when the software is not user-driven, not granting access to all features just drags down your small clients - who have mostly the same needs than your bigger clients.

We learned that crippling our own apps was just bad. At the end of most negotiations with clients, we were nearly always ending up granting access to all features - like the highest paying plan - for most companies. Naturally, the price point was adjusted accordingly, but nevertheless, we observed many times that crippling the software was just a lose-lose approach.

It's fine to trust your clients by default

For years, at Lokad, we had relied on the implicit assumption that whatever metric were going to be used to define the boundaries between the subscription plans, this metric had to be tracked by the software itself. However, by narrowing our vision to the sole metrics that our software could track, we had eliminated the one metric which was truly making sense: charge according to the company turnover.

Our new plans are differentiated based on turnover, and yet, we have not automated way to measure the turnover. However, is it really a problem? I don't think so. Over the years, we have very (very) few companies trying to game our terms. Moreover my observations indicates that the larger the company, the less likely they are to even consider the possibility of cheating.

The logical conclusion is then to grant access to everything by default, and then to gently remind companies of your pricing terms when the opportunity arise. B2B isn't B2C, for the vast majority of B2B software, even if you don't put any protection in place, the service isn't going to be swarmed by corporate freeloaders.

If it does, well that's a rich man's problem.

Tuesday
Oct212014

How we ended up writing our own programming language

About one year ago, my company had the opportunity to expand into an area which was very new for us at the time : pricing optimization for commerce. Pricing optimization is quite different to demand forecasting; the latter being the original focus of Lokad at the beginning of the company’s existence. While demand forecasting fits rather nicely into quantitative frameworks that allow you to decide which forecasting methods are the most suitable for any given task pricing is a much more evasive problem. The idea that profits can be maximized by carrying out a simple analysis of demand elasticity is deceptive. Indeed, pricing is a signal sent to the market; and like with any marketing ingredient, there is not one valid answer to the problem.

One year ago, most of the companies that helped merchants manage their pricing were consulting companies, but I wanted to build a pricing webapp to help businesses deal with their pricing which would go beyond the classic consulting services on pricing. I quickly ruled out the idea of offering a template list of “pricing recipes”. Some competitors were already offering such “pricing recipe” services, and they were desperately inflexible. Merchants needed to be able to “tweak” their pricing logic in many ways. Thus, I started to consider more elaborate user interfaces that would allow merchants to compose their own pricing strategies. After pushing some efforts at mockups, I was ending up with something oddly similar to Microsoft Access “visual” query designer.

This was not a good thing. My limited interactions with this query designer, a decade prior, had left me with the lasting impression of this being just about the worst user experience I have ever had with the “normal” behavior of a product released by Microsoft. While it was supposedly a visual query editor with plenty of very visual buttons, but unless you had some knowledge of SQL or experience in programming, you weren’t going very far with this tool. In the end, anyone using Access was falling back on the non-visual query editor, which quite unfortunately, was a second-class citizen.

Gradually, I came to consider the possibility of going for a programming language instead. With a programming language, we could provide the desirable expressiveness, but also a powerful data environment. Lokad would be hosting the data along with offering a cloud-based execution environment for the pricing logic. This environment would not be constrained by the client’s desktop setup which can be too old, too weak, too messy or downright corrupted.

At first, I considered reusing an existing programming language such as JavaScript or Python. However, this presented two very specific challenges. The first challenge was security. Running server-side client code seemed like a giant vector for entire classes of injection attacks. In theory, it should be possible to sandbox the execution of any program, but my instincts were telling me that the surface attack area was so great we would never be confident enough about not having leaks in our sandbox. So, we would have to leverage disposable VMs for every execution, and it seemed that an endless stream of technical problems was heading our way if we were to implement this.

The second, and in fact bigger, problem is that JavaScript or Python being full-fledged programming languages are also complex languages, which include truckloads of features downright irrelevant for the pricing use cases that I was considering: loops, objects, exceptions, nil references. No matter how much we would try to steer the usage of our future product away from these elements, I felt that they would invariably resurface again and again because some our future users would be familiar with just these languages, and as a result, they would do things the way they were used to doing them before. It is tough to debug a generic programming source code, so the tooling would necessarily end up being complex as well.

This left me with the prospect of inventing a new programming language, and yet this idea was accompanied by all the red flags possible in my mind. Usually, for a software company, inventing its own programming language is a terrible idea. Actually, I had witnessed quite closely three companies who had rolled out their own respective programming languages, and for each one of these companies, the experience was very poor to say the least . Two of them managed to achieve a certain level of success nonetheless, but the ad hoc nature of the language had been a huge hindrance to the process. Moreover, about every single experience I ever had with niche programming languages (hello X++) confirmed that an ad hoc language was a terrible idea.

However, as far pricing and commerce were concerned, a generic programming language was not required, and we were hopeful that through extreme specialization, we could produce a highly specialized language that would, all being well, compare favorably with the mainstream languages, at least for the narrow scope of commerce.

And thus, the Envision programming language was born at Lokad.

Unlike a generic programming language, Envision is designed around a relatively rigid and relatively simple data model that we knew to work well for commerce. My intent was to be able to reproduce all the domain-specific calculations that nearly all merchants were doing in Microsoft Excel, but putting some distance between the logic and the data - but not too much distance either. Indeed, Envision, as the name suggests, is also heavily geared toward data visualization; again, not generic data visualization, but things that matter for commerce.

Envision has no loops, no branches, no nulls, no exceptions, no objects … and it does just fine without them. It is not Turing-complete either, so we do not end up with indefinite execution delays.

Less than one year after starting to write the first line of code for the Envision compiler, we have now secured over a million Euros through multi-year contracts that are to be (almost) completely implemented in Envision. To be transparent, it is not the language that clients are buying, but the service to be built with it. Nevertheless, over the last couple of months, we have been able to deliver all kinds of quantitative optimizations with Envision - not just pricing actually – and within timeframes that we would never had achieved in the past.

There are a two takeaways lessons to be learned from this initiative. First, ultra-specialized languages are still a valid option for vertical niches. While it is a very rough estimate, I would say that with Envision, when dealing with a suitable challenge, we end-up with about 50 times fewer lines of code than when the same logic is implemented in C#, the primary programming language used at Lokad. Yes, using a functional language like F# would already make the code more compact than C#, but it would still be far from being that compact. Also, with Envision, we get more concise code not because of we leverage highly abstract operators, but merely because the language itself is geared towards the exact problem to be addressed.

Second, when introducing a programming language, it should not be half-baked. Not only the language itself needs the good ingredients known to computer science – a context-free grammar written in Backus-Naur form for example; but a good integrated programming environment with code auto-completion and meaningful error messages is also needed. The language is only the first ingredient; it is the tooling around the language that makes the difference in terms of productivity. From the very beginning, we invested in the environment as much as did we invest in the programming language.

Also, having a webapp as the primary execution environment for your language opens up a lot of new possibilities. For example, unless you spend years polishing your syntax before releasing anything, you are bound to make design mistakes. We certainly did. Then, as all the Envision scripts were also hosted by Lokad, it was possible for us to rectify those mistakes, first by fixing the language, and second, by upgrading all the impacted scripts from the entire user base. Sure, it takes time, but better to spend a few days on this early on, as opposed to end up with broken language forever.

I have not delved much into the details of the Envision language itself in this post, but in case you would be interested, I have just published a book about it. The preview gives you access to the entire book too.

Ps: while I credit myself for initiating the Envision project at Lokad, it is actually a colleague of mine, Victor Nicollet, presently the CTO of Lokad, who came up with nearly all the good ideas for the design of this language and who carried about 90% of the implementation effort.

Monday
Mar192012

Bizarre pricing, does it matter? (B2B)

My company has just released quantile forecasts upgrade. It's no less than a small revolution for us, however, unless you've got some inventory to manage, it's probably not too relevant to your business.

Another salient aspect is our new pricing for quantiles (the old pricing for classic forecasts remains untouched). Lokad is selling a monthly subscription, and if $q_i$ represents one of the actual quantile values retrieved by the client during the month, then the monthly cost $C$ is given by:

$$C = $0.15 \times \left(\sum_{i=0}^n q_i^{2/3} \right)^{2/3}$$

We hesitated to round 0.15 as $\frac{\pi}{2}$ because formula look better with Greek letters. Obviously, it's not simple, and most people would go as far as saying it's downright obscure, but it is really a good pricing, or just plain insanity?

To understand a bit where Lokad is coming from, let's start with the fact that we are a B2B software company. About 95% of competitors don't have any kind of public pricing: you can only ask for a quote, and then a talented sales guy will contact you to figure out your maximum budget, only to get back to you with a quote at 120% of the figure you gave him.

However, I strongly favor public pricing, not because it's more transparent, honest, fair, whatever, but because it's a massive time saver. At Lokad, we don't enter into time-consuming pricing negotiations except for the largest clients, where it does make sense to spend time negotiating.

The cardinal rule of software pricing is that it should capture the willingness to pay of the client, which, in B2B, is typically related to the economic gains generated by the usage of the product. In the case of demand forecasting, benefits can be accurately computed. However, turning this forecasting benefits formula into a pricing formula is insaly complex in the general case.

Hence, we decided to settle for heuristics that somehow mimic this theoretical willingness to pay, ran many simulations over our existing customer base, and finally figured out the formula. I do not claim that this pricing formula is optimal in any way: it is not. However, it does bring a very reasonable pricing for clients ranging from 1-man companies to 100,000+ employees companies.

Pros:

  • (As far we can judge) It's aligned with the value Lokad creates for clients.
  • It's still simple enough to be memorized in 20s.
  • It does not put incentive to game the pricing by excluding slow movers (i.e. products with low sales) from the forecasting process.
  • There is no threshold effect, where the pricing jumps to a much larger number just because the company has 1 more product than what the license would support.

Cons:

  • It certainly falls into the category of bizarre pricing.
  • The only way to know for sure the real monthly cost is to give a try (1). 
  • Some prospects try the pricing formula on their own, and get it wrong (2).

(1) This statement applies to most metered SaaS, even if the pricing is linear. For example, at Lokad we had very little clue about our exact bandwidth consumption until we migrated toward the cloud (with dedicated servers, bandwidth was part of the package).

(2) I believe this partly explains why 95% of our competitors don't put any public price on display. That, and the fact that a very expensive pricing is likely to scare away prospects, before getting the chance of cornering them into the sales process.

I would be interested to see if other B2B niches have designed their own bizarre pricing formulas. Don't hesitate to submit them in comments.

Wednesday
Feb222012

Cloud questions from Syracuse University, NY

A few days ago, I received a couple of questions from a student of Syracuse University, NY who is writing a paper about cloud computing and virtualization. Questions are relatively broad, so I am taking the opportunity to directly post here the answers.

What was the actual technical and business impact of adopting cloud technology?

The technical impact was a complete rewrite of our codebase. It has been the large upgrade ever undertaken by Lokad, and it did span over 18 months, more or less mobilizing the entire dev workforce during the transition.

As far business is concerned, it did imply that most of the business of Lokad during 2010 (the peak of our cloud migration) has been stalled for a year or so. For a young company, 1 year of delay is a very long time. 

On the upside, before the migration to the cloud, Lokad was stuck with SMBs. Serving any mid-large retail network was beyond our technical reach. With the cloud, processing super-large retail networks had become feasible. 

What, if any, negative experience did Lokad encounter in the course of migrating to the cloud?

Back in 2009, when we did start to ramp up our cloud migration efforts, the primary problem was that none of us at Lokad had any in-depth experience of what the cloud implies as software architecture is concerned. Cloud computing is not just any kind of distributed computing, it comes with a rather specific mindset.

Hence, the first obstacle was to figure out by ourselves patterns and practices for enterprise software on the cloud. It has been a tedious journey to end-up with Lokad.CQRS which is roughly the 2nd generation of native cloud apps. We rewrote everything for the cloud once, and then we did it again to get sometime simpler, leaner, more maintainable, etc.

Then, at present time, most our recurring cloud problems come from integrations with legacy pre-Web enterprise software. For example, operating through VPNs from the cloud tends to be a huge pain. In contrast, modern apps that offer REST API are a much more natural fit for cloud apps, but those are still rare in the enterprise.

From your current perspective, what, if anything, would you have done differently?

Tough question, especially for a data analytics company such as Lokad where it can take 1 year to figure out the 100 magic lines of code that will let you outperform the competion. Obviously, if we had to rewrite again Lokad from scratch, it would take us much less time. However it would be dismissing that the bulk of the effort has been the R&D that made our forecasting technology cloud native.

The two technical aspects where I feel we have been hesitating for too long were SQL and SOAP.

  • It took us too long to decide to ditch SQL entirely in favor of some native cloud storage (basically the Blob Storage offered by Windows Azure).
  • SOAP was a somewhat similar case. It took us a long time to give up on SOAP in favor of REST.

In both cases, the problem was that we had (or maybe it was just me) not been fully accepting the extent of the implications of a migration toward the cloud. We remained stuck for months with older paradigms that caused a lot of uneeded frictions. Giving up on those from Day 1 would have save a lot of efforts.