I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)


Entries in spam (9)


Squarespace and blog spam filtering: epic fail

Yesterday for the 10th time or so, I have been sending a ticket to Squarespace - the company hosting this very blog - support to improve their abysmal spam filter (inexistent actually) for blog comments. This is rather frustrating esperience to delete about 10 spam comments on a daily basis just because Squarespace can't manage to do things right in this area. Worse, people have been quitting Squarespace for years for this very reason - spam comment being the No1 reason quoted for the change.

The issue is even more infuriating when you consider that:

  • It is common knowledge that, when designing software for the web you have to design for evil. Even if 99.9% of the worldwide population is perfectly harmless, the remaining 0.1% can be an extreme painful, and serious measures should taken in this area. Squarespace despite all the good stuff they keep delivering (such as their dedicated iPad app) seems to be simply blind to this issue.
  • Squarespace raised $38.5M from Accel, Index Ventures. How is it possible that the VC company that has also funded Facebook is not able to provide a hint of feedback to the management of Squarespace concerning a burning issue that is likely to endanger their own investment.

The feedback from the Squarespace support has always two properties:

  • Extremely fast, my tickets are addressed within minutes.
  • Extremely useless, canned answers constantly suggest trivial but vastly unsatisfying solutions.

In a way, this is not very different from the blog spam content I am trying to get rid of. Hence, I am wondering support replies would actually be reported as spam by a decent spam filter; but I digress.

When it comes to customer support KPI, speed of answer isn't everything. What really matter is to make sure that every problem gets addressed at multiple levels. Solving the immediate problem is only the tip of the iceberg, you have to go for the root cause. In the present case, suggesting to disable comments is not an acceptable solution.

Also, the support staff has been claiming for several years that Squarespace is investing a lot of efforts in fixing the spam problem. The worst part is that it might actually be true.

Indeed, spam filtering is a machine learning problem. The fundamental issue with machine learning problems is that unless your company is 100% dedicated to the problem, it can't be solved. Period. (*)

As far spam filtering Aksimet has been around for years. Last time I checked their technology, it was downright excellent; and their pricing is so agressive it's a non issue (about $0.001 per comment for the enterprise package). Squarespace does not even have the excuse that no good dedicated tech is readily available

At this point, the only reasonable explanations for this situation is either carelessness or ego, the later being more likely. Since dealing with support is useless, let's see if I get some non-zombie feedback from Squarespace here.

(*) For large companies, very compartimented branches work too, a good example being the Kinect software by Microsoft.


Copied by the Chinese government

Apparently, my company website has been copied by an official branch of the Chinese government. Although, Ghandi has said that Imitation was the sincerest form of flattery, I am not sure how I should handle such a blatant ripoff of Lokad's copyrights.

Key interesting facts:

  • plenty of "left-over" on the Chinese website from the original one.

  • imaginative ways of recycling irrelevant illustrations.

  • it's a website, that is to say an official Department of the Government of China.

The Business of Software folks have already quite few ideas on the subject. I will probably ponder the case a few days to decided what to do next.

Co-worker suggestion:
Rinat is suggesting me to recontact them saying that since they appear to like our website that much, they might want to try our forecasting technology too.

A few screenshots in case the Chinese website gets updated: 1, 2 and 3.


Spam 2.0 or the spammers reloaded

Spammers are legions, and unfortunately, most recent systems are just very weak against adversarial behavior (see my previous discussion on the Google case ).

In the last few months, I have just noticed no less than 4 new kinds of spammers.

Spam 2.0 released, buy now!
  • P2P spam targeting file-sharing applications such as Emule. The basic idea is the following: spread, through the P2P application, a virus that breaks into the P2P application itself. Once the P2P application is infested, all the incoming requests will return the virus wrapped under the name of the incoming query. For example if the incoming request is "some illegal song" then, the infested P2P application will claim the file "some-illegal-song.mp3.exe". Nasty but effective.

  • SMS spam with incentive for the recipients to call a very expensive phone number. Indeed, sending SMS is not free (as far I know); thus you need a strong incentive like "To the owner of 0123456789, you've won a Nitendo Wii, call 987654321 to claim your prize". No need to tell that 987654321 is anything but a tool-free number.

  • Instant Messaging spam targeting applications such a Skype. Actually, I would suspect that some black hat guys managed to pass through the "usual" white-listing systems because I end up, once or twice a day, forcefully connected into huge conference calls (with roughly of 200 people); the spam being sent through the conference canal.

  • Virtual Worlds spam targeting popular MMPORGs such as World of Warcraft. Basically, spammers just start flooding the main discussion canals with commercial links. So far, it was mostly Warcraft-related (like buying Warcraft gold coins with US Dollars), but I suspect that pretty soon, spammers will realize that they are able to sell fake drugs and fake watches on Warcraft too.

Spam has already upgraded toward the version 2.0 but I am still waiting the delayed release of Cypercop 2.0.


Homeworks going freelance

I am a regular customer of freelance services. It's especially useful when you need to translate your website or when you need open source developments (because confidentiality becomes irrelevant).

Usually, I am browsing the freelance websites on the buyer side; yet I only recently gave a try to the provider side. Most freelance websites include tons of job like

  • Simple sort in Java, NEED HELP


  • Solving a puzzle in C

  • ...

Those jobs are clearly student homeworks, and I have been stunned by the fact that those jobs may represent one job out of two on most freelance websites (yet those jobs are tiny on average; thus the business impact is probably much smaller than 50%). Western IT companies might not already exploit outsourcing to its full potential but US IT students seem to be really good at it.


WetPaint is far too expensive, migrate or die

WetPaint is a hosted wiki solution. Although Wetpaint still lacks from "professional" wiki features like being able to insert custom HTML or scripts, it's a nice, simple wiki application with a great look&feel. I have triedWetPaint, I did even use it for while for; but looking back, it turned out to be a really BAD move. I have finally manually removed all the content from my wetpaint wiki (because there is no "Remove my wiki" feature available); and I have migrated all the content to, a wiki powered by ScrewTurn.

It can be argued that moving to ScrewTurn may not be the optimal solution (they are tons of open source wiki software available out there, see the wiki matrix ); but moving away from Wetpaint is the only solution that can be considered if you fall into the WetPaint trap. Yes, to all the people using WetPaints, I strongly suggest to re-consider and start thinking about moving your wiki somewhere else.

WetPaint owns your traffic

For those who thinks that wetpaint is a "free" wiki, it's not. It's neither free as in "libre" nor free as in "free beer". Wetpaint inserts Google Ads in your wiki without providing a paid subscription plan to remove them. Let's look at the problem this way: if the traffic of your wiki stay low, then any $2/month hosting solution will be sufficient to host your website; but if your web traffic starts to go up, then WetPaint takes 100% of the advertising profits generated by your content.

No matter how you look at it, the content of your wiki is worth more than $10/month, the price to get an hosted wiki. As, Jacob Nielsen pointed out long before me, free solutions have terrific costs. If you think that the wiki that you are starting is not even worth $10/month; then starting a wiki might not be a good idea in the first place.

You end-up advertising for your competitors

The worst part of the WetPaint ads is actually the accuracy of the Google Ads system. Indeed, the Google algorithm are really efficient to optimize the ads displayed on your wiki. But what are the ads the most relevant to your wiki content? Well, it's simply the ads of your competitors. Choosing WetPaint means that you are granting your competitors the right to advertise on your website.

This is the WetPaint trap; and I have been sufficiently foolish to fall for it for almost two weeks. Actually WetPaint people are smart, thus, as long as they are able to track you (through Cookies), WetPaint don't display any ads. The longer it takes to realize the situation, the more painful the migration. Migrate now or you will soon experiment the true cost of this free application.