Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
Tags
Friday
Oct062006

Typo hunting: get some weapons first!

It's just too hard to spell check your text by hand (or it requires an unreasonable amount of time). Before you actually get a spell checker, you might not even realize how many typos you produce. I have just downloaded today the FireFox RC1 that includes a web form spell checker. My first target for typo hunting was an intranet wiki I am working on. I feel that I won't ever go back to a browser that does not provide such a feature natively.

My second target for typo hunting was the source code of a Asp.Net application I am currently developing. Since Visual Studio 2005 does not provide a native spell checker (shame! such feature should have been implemented years ago), I have used the MailFrame CodeSpell product. CodeSpell has a really nice VS 2005 integration. On the minus side, the dictionary is a bit light at the present time (nchar or aspx not being part of the default dictionary for example) and I did experience of a few crashes while playing with the custom user dictionary.

A funny aspect of applying this source code spell-checker was to discover that 3rd party [*] XML documentations present in my .Net solution were no less typo-crippled than my own code. If automated spell-checking is not part of your development process, well, it should.

[*] Elmah and Telerik are both advised to get some spell-checking tools :-)

Tuesday
Oct032006

The unteachable parts of software engineering

Having the responsibility to handle the software engineering course at the ENS in spring 2007, I have started to think about the desirable qualities that make the difference between an average developer and a brilliant one. Indeed, I can't think of a better goal for this course but to actually try to develop such qualities.

What makes a "good" software developer?

It's an obvious fact that many qualities and skills are required to make a good developer. Smart and Get things done are often cited as the top criterions to decide whether a candidate should be recruited or not.

... a passionate curiosity for software related matter ...

I would complete those two criterions with a third one that I consider to be no less important: a passionate curiosity for software-related matters. Most well-known antipatterns such as programming by permutation, golden hammering, re-inventing the wheel, ... are caused by a lack of curiosity. There is far too much to know of the subject of software engineering to trust any particular school diploma (or certification) to be sufficient to produce even a "passable" developer.

Additionally, the software world is fast-paced. Hardware and software get obsolete alike. Development methods evolve. Better tools are released continuously. The sheer (intellectual) complexity of those evolutions is beyond what a single individual can possibly handle. Curiosity when leveraged through teamwork is a strong driver to actually maintain the development practices and tools as close as possible to a state-of-the-art level.

Finally, on the long run, I do not see how it is possible to stay motivated (and therefore productive) if there is no eager interest in mastering this constant flow of evolutions.

How to train such a developer?

I have listed smart, get things done and passionate curiosity as being the top qualities for a software engineer. But this list is more than troublesome for a teacher: it's not even clear whether any of those qualities can be actually taught

Concerning the first criterion, i.e. smart, I have simply surrendered all "academic" ambitions. The course schedule is far too short; beside I have the chance of having ENS students who are already very smart (though entrance exams have already filtered out students who were not so smart). Yet, the course can them the opportunity to apply their intelligence to a large variety of fuzzy problems that are inherent to software development.

The second criterion get things done is probably the most actionable element of the three. The French education system (highly selective and highly individualistic) usually produces people that are relatively weak against this criterion, the ENS students being no exception (quite the opposite in fact). I have planned to incorporate a student software project within the course mostly to push student develop a get things done mentality .

As for the first criterion, I haven't much ambition to actually transmit a passionate curiosity the students. In this respect, my first ambition will be to avoid the issue that usually plagues software engineering courses: an overwhelming boredom. I do not think it is really possible to create curiosity ex-nihilo if people have no interest beforehand. But, assuming that students are at least somehow interested (well, if not, it's going to be tough for the students and me alike), an "expand your horizons" strategy for the course might give them materials to apply and develop their curiosities.

There are two kinds of students: those who are too weak to be taught anything and those who are so strong that the teacher is totally unnecessary.

Bottom-line: out of three main qualities to make a good developer, the ambitions associated with the software engineering course I am thinking of are pretty weak. Well, considering the importance of the subject, I still believe it's a worthy attempt (stay tuned, more on later posts ... )

Sunday
Sep242006

Your e-mail is your username

The 30-years-old unix pattern for user management involves a UserName, Password pair associated to a unicity constraint on the usernames. On most web applications, this pattern has been enriched as a triplet UserName, Password, Email with, usually, a two-fold unicity constraint both on usernames and on emails. But is there any good reason to distinguish e-mail from username?

By looking at the major web players (Google, PayPal, Amazon ...), the answer is negative without any doubt. This pattern has already been widely adopted. Yet, it's still not stated a being an obvious web design best practice. For example, there is still no simple ways in ASP.Net 2.0 to enforce such a policy through the CreateUserWizard. Let's review the arguments in favor of this pattern.

  • UserName and Email are redundant because both are used as user identification keys. For the user, it means more information to provide at the registration step, and then more information to remember.

  • Choosing your UserName is often painful because most of the time your favorite username is already taken (unless you got the chance having a pretty unfrequent first name like joannes).

  • UserName is not idiot-proof because most non-web traditional services (your bank, your mobile operator) do not rely on your name to get you authenticated.

A mighty, but somehow recursive, argument is that since big players (Google, PayPal, Amazon) are doing that then doing anything else is only going to hurt a large number of users.

The only real drawback of this approach is the lack-of-privacy concern i.e. you may prefer your e-mail address to stay invisible to the other users of the website where you are registering. Yet IMO, the Display Name is a much more appealing and user-friendly concept compared to the username approach.

Ps: PeopleWords.com does not follow this pattern. I have no good excuse for this situation. I was just young and inexperienced at the time.

Saturday
Sep232006

Antipatterns of Software engineering courses

I am very honored to be in charge of the Sofware engineering and distributed applications course at the Ecole normale superieure (ENS). This will be my first official teaching assignment, and I will be affected to brilliant Licence 3 students (it's pretty tough to get through the entrance exam of the ENS).

Software engineering is a difficult topic to teach. I have been browsing the web just to have an outlook at what people are usually doing in a software engineering course, and I must confess that I haven't found anything very satisfying so far (although the MIT experience is definitively worth reading). In a nutshell, I would say that software engineering is the art of producing great software with limited resources. In respect to this definition, it seems that there are many ways to spend hours teaching things that totally miss the point. I have listed in this post what I think to be the 3 most common ways of disgusting students from all software engineering matters.

Coding antipattern

Here is the 300-pages API. This API will be the subject of your exam. Documents will not be authorized.

The second fastest way on earth to get your 30h of teaching ready is simply to teach some particular programming language and its associated technologies (think Apache/PHP/MySql, stay tuned the fastest way will be detailed next). Just take any reference documentation, and you get enough material to talk for 30h. But your course is both deadly boring and super-short-lived. Some people assume that their students have no prior knowledge about any language. I won't (it's stated in the course-prerequisite btw). My objective is not to teach the syntax of any programming language, because I believe students are smart enough to learn that by themselves. If not then, I would say that it was a bad move to take the software engineering course in the first place.

ISO-certified development antipattern

If you don't assess your customer requirements through a 17-phase process, then you're not ISO-171717.

Just by looking at Learning Tree - Software Engineering course (one of the top result of Google for "software engineering"), I think Wow, those people are attempting to kill their attendants with boredom. Look at the table of content: Life cycle phase contains (sic) 1) Understanding the problem 2) Developing the solution 3) Verifying the product 4) Maintaining the system. I can't remember of any software project that ever happened that way but who cares because the content is going to be nothing more than obvious statements anyway.

I call this kind of teaching ISO-certified development, the worst part being certainly the enumeration of software life-cycle models (which are, according to Learning Tree: Waterfall, V, Phased, Evolutionary and Spiral). This approach is basically the extreme opposite of the coding course. You're not teaching anything technical, instead you end-up with long, super-detailed, over-boring descriptions of business practices that do not even exists as such in the real world anyway.

Green-field projects antipattern

Our project was to develop a video game. Alice wrote the scenario, Bob took care of the rules and I did the graphics. The rest has been left unfinished.

The fastest way on earth to get your 30h software teaching course ready is the green-field project: just say Decide among yourselves what you're gonna do, I will be the judge of the software you produced and then, spend the next 30h doing groupwork (groupwork is like teamwork but instead of having a team, you have a random bunch of people, i.e. a group). Don't take me wrong, I think that projects do have a huge pedagogic value, yet the probability of re-discovering good software engineering ideas just by doing random groupwork is low.

There are also other criticisms to the green-field projects as usually practiced. The first poor practice is to let the students come up with their own project ideas. For various reasons, students have a tendency to favor projects that are quite ill-adapted (such as video games); then it's really hard to scale the project in such a way that it matches the time to be invested in the course. The second poor practice is to let the students come up with their own internal organization. I have never encountered any company where all employees are equal, I do not see why it should be the case in a student software project (more on the subject in a subsequent post).

Friday
Sep222006

From RAD to test driven ASP.NET website

Both unit testing and R.A.D. (Rapid Application Development) impacted quite deeply my insights over software development. Yet, I have found that combining those two approaches within a single ASP.NET project is not that easy especially if you want to keep the best of both worlds. There are at least א (alef zero) methods to get the problem solved. Since my blog host does not provide yet that much disk storage, I will only describe here 2.5 design patterns that I have found to be useful while developing ASP.NET websites.

  • R.A.D: all your 3-tier layers (presentation, business logic and data access) get compacted into a single ASPX page. ASP.Net 2.0 makes this approach very practical thanks to a combination of GridView, DetailsView and SqlDataSource.

  • CRUD layer: The business logic and the data access are still kept together but removed from the ASPX page itself. This design pattern (detailed below) replaces the SqlDataSource by a combination of ObjectDataSource and library classes.

  • Full blown 3-tier: Like, above but the business logic and the data access gets separated. Compared to the CRUD layer approach, you get extra classes reflecting the database objects.

R.A.D.


Using combinations of GridView, DetailsView and SqlDataSource, you can fit your 3-layers a single ASPX page, the business logic being implemented in code-behind whenever required. This approach does not enable unit-testing but if your project is fairly simple, then R.A.D. works with a dramatic productivity. For example, PeopleWords.com has been completely developed through R.A.D (with a single method added to the Global.asax file that logs all thrown exceptions). I would maybe not defend R.A.D. for large/complex projects, but I do think its very well suited for small projects and/or drafting more complicated ones.

The forces behind R.A.D. :

  • (+) Super-fast feature delivery: a whole website can designed very quickly.

  • (+) The code is very readable (not kidding!): having your 3-layers in the same place makes the interactions quite simple to follow.

  • (+) Facilitate the trial & error web page design process: It's hard to get a web page very usable at once, R.A.D. let you re-organize web page very easily.

  • (-) If the same SQL tables get reused between ASPX pages, then data access code tends to be replicated each time (idem for business logic).

  • (-) No simple way to perform unit testing.

  • (-) No structured way to document your code.

"CRUD layer" design pattern


The main purpose of the "CRUD layer" is to remove the business logic and the data access from the ASPX page to push it into an isolated .Net library. Yet, there are several major constraints associated to this process. The first constraint is to ensure an easy migration from R.A.D. to "CRUD Layer". Indeed, R.A.D. is usually the best prototyping solution. Therefore the main issue is not to implement from scratch but to improve the quality of the R.A.D. draft implementation. The second constraint is to maintain business logic and data access design as plain as possible (verifying the YAGNI principle among others). The CRUD layer is an attempt to address those two issues.

The CRUD layer consists in implementing a class

public class FooCrudLayer
{
public FooCrudLayer() { ... } // empty constructor must exists
public DataTable GetAllBar( ... ) // SELECT method
{
DataTable table = new DataTable();

using (SqlConnection conn = new SqlConnection( ... ))
{
conn.Open();
SqlCommand command = new SqlCommand( ... , conn);

using (SqlDataReader reader = command.ExecuteReader())
{
table.Load(reader);
}
}

return table;
}

public void Insert( ... ) { ... } // INSERT method
public void Delete( ... ) { ... } // DELETE method
public void Update( ... ) { ... } // UPDATE method
}

Notice that the select method returns a DataTable whereas most ObjectDataSource tutorials would return a strongly-typed collection List<Bar>. The presented approach has two benefits. First, the migration from the SqlDataSource is totally transparent (including the sorting and paging capabilities of the GridView).

<ObjectDataSource Id="FooSource" runat="server"
TypeName="FooCrudLayer"
SelectMethod="GetAllBar"
InsertMethod="Insert"
DeleteMethod="Delete"
UpdateMethod="Update" >

... (parameters definitions)

</ObjectDataSource>

Second, database objects are not replicated in the .Net code. Indeed, replicating the database objects at the .Net level involves some design overhead because both sides (DB & .Net) must be kept synchronized.

The forces behind the CRUD layers

  • (+) Very small design overhead compared to R.A.D.

  • (+) Easy migration from R.A.D.

  • (+) Unit testable and documentable.

  • (-) No dedicated abstraction for business logic.

  • (-) Limited design scalability (because it's not always possible to avoid code replication).

Full blown 3-tiers


I won't say much about this last option (this post is already far too long), but basically, the 3-tier design involves to strong type the database objects in order to isolate business logic from data access. This approach comes as a large overhead compared from the original RAD approach. Yet, if your business logic gets complex (for example workflow-like processes) then there is no escape, layers have to be separated.