Author

I am Joannes Vermorel, founder at Lokad. I am also an engineer from the Corps des Mines who initially graduated from the ENS.

I have been passionate about computer science, software matters and data mining for almost two decades. (RSS - ATOM)

Meta
Tags
Saturday
Sep232006

Antipatterns of Software engineering courses

I am very honored to be in charge of the Sofware engineering and distributed applications course at the Ecole normale superieure (ENS). This will be my first official teaching assignment, and I will be affected to brilliant Licence 3 students (it's pretty tough to get through the entrance exam of the ENS).

Software engineering is a difficult topic to teach. I have been browsing the web just to have an outlook at what people are usually doing in a software engineering course, and I must confess that I haven't found anything very satisfying so far (although the MIT experience is definitively worth reading). In a nutshell, I would say that software engineering is the art of producing great software with limited resources. In respect to this definition, it seems that there are many ways to spend hours teaching things that totally miss the point. I have listed in this post what I think to be the 3 most common ways of disgusting students from all software engineering matters.

Coding antipattern

Here is the 300-pages API. This API will be the subject of your exam. Documents will not be authorized.

The second fastest way on earth to get your 30h of teaching ready is simply to teach some particular programming language and its associated technologies (think Apache/PHP/MySql, stay tuned the fastest way will be detailed next). Just take any reference documentation, and you get enough material to talk for 30h. But your course is both deadly boring and super-short-lived. Some people assume that their students have no prior knowledge about any language. I won't (it's stated in the course-prerequisite btw). My objective is not to teach the syntax of any programming language, because I believe students are smart enough to learn that by themselves. If not then, I would say that it was a bad move to take the software engineering course in the first place.

ISO-certified development antipattern

If you don't assess your customer requirements through a 17-phase process, then you're not ISO-171717.

Just by looking at Learning Tree - Software Engineering course (one of the top result of Google for "software engineering"), I think Wow, those people are attempting to kill their attendants with boredom. Look at the table of content: Life cycle phase contains (sic) 1) Understanding the problem 2) Developing the solution 3) Verifying the product 4) Maintaining the system. I can't remember of any software project that ever happened that way but who cares because the content is going to be nothing more than obvious statements anyway.

I call this kind of teaching ISO-certified development, the worst part being certainly the enumeration of software life-cycle models (which are, according to Learning Tree: Waterfall, V, Phased, Evolutionary and Spiral). This approach is basically the extreme opposite of the coding course. You're not teaching anything technical, instead you end-up with long, super-detailed, over-boring descriptions of business practices that do not even exists as such in the real world anyway.

Green-field projects antipattern

Our project was to develop a video game. Alice wrote the scenario, Bob took care of the rules and I did the graphics. The rest has been left unfinished.

The fastest way on earth to get your 30h software teaching course ready is the green-field project: just say Decide among yourselves what you're gonna do, I will be the judge of the software you produced and then, spend the next 30h doing groupwork (groupwork is like teamwork but instead of having a team, you have a random bunch of people, i.e. a group). Don't take me wrong, I think that projects do have a huge pedagogic value, yet the probability of re-discovering good software engineering ideas just by doing random groupwork is low.

There are also other criticisms to the green-field projects as usually practiced. The first poor practice is to let the students come up with their own project ideas. For various reasons, students have a tendency to favor projects that are quite ill-adapted (such as video games); then it's really hard to scale the project in such a way that it matches the time to be invested in the course. The second poor practice is to let the students come up with their own internal organization. I have never encountered any company where all employees are equal, I do not see why it should be the case in a student software project (more on the subject in a subsequent post).

Friday
Sep222006

From RAD to test driven ASP.NET website

Both unit testing and R.A.D. (Rapid Application Development) impacted quite deeply my insights over software development. Yet, I have found that combining those two approaches within a single ASP.NET project is not that easy especially if you want to keep the best of both worlds. There are at least א (alef zero) methods to get the problem solved. Since my blog host does not provide yet that much disk storage, I will only describe here 2.5 design patterns that I have found to be useful while developing ASP.NET websites.

  • R.A.D: all your 3-tier layers (presentation, business logic and data access) get compacted into a single ASPX page. ASP.Net 2.0 makes this approach very practical thanks to a combination of GridView, DetailsView and SqlDataSource.

  • CRUD layer: The business logic and the data access are still kept together but removed from the ASPX page itself. This design pattern (detailed below) replaces the SqlDataSource by a combination of ObjectDataSource and library classes.

  • Full blown 3-tier: Like, above but the business logic and the data access gets separated. Compared to the CRUD layer approach, you get extra classes reflecting the database objects.

R.A.D.


Using combinations of GridView, DetailsView and SqlDataSource, you can fit your 3-layers a single ASPX page, the business logic being implemented in code-behind whenever required. This approach does not enable unit-testing but if your project is fairly simple, then R.A.D. works with a dramatic productivity. For example, PeopleWords.com has been completely developed through R.A.D (with a single method added to the Global.asax file that logs all thrown exceptions). I would maybe not defend R.A.D. for large/complex projects, but I do think its very well suited for small projects and/or drafting more complicated ones.

The forces behind R.A.D. :

  • (+) Super-fast feature delivery: a whole website can designed very quickly.

  • (+) The code is very readable (not kidding!): having your 3-layers in the same place makes the interactions quite simple to follow.

  • (+) Facilitate the trial & error web page design process: It's hard to get a web page very usable at once, R.A.D. let you re-organize web page very easily.

  • (-) If the same SQL tables get reused between ASPX pages, then data access code tends to be replicated each time (idem for business logic).

  • (-) No simple way to perform unit testing.

  • (-) No structured way to document your code.

"CRUD layer" design pattern


The main purpose of the "CRUD layer" is to remove the business logic and the data access from the ASPX page to push it into an isolated .Net library. Yet, there are several major constraints associated to this process. The first constraint is to ensure an easy migration from R.A.D. to "CRUD Layer". Indeed, R.A.D. is usually the best prototyping solution. Therefore the main issue is not to implement from scratch but to improve the quality of the R.A.D. draft implementation. The second constraint is to maintain business logic and data access design as plain as possible (verifying the YAGNI principle among others). The CRUD layer is an attempt to address those two issues.

The CRUD layer consists in implementing a class

public class FooCrudLayer
{
public FooCrudLayer() { ... } // empty constructor must exists
public DataTable GetAllBar( ... ) // SELECT method
{
DataTable table = new DataTable();

using (SqlConnection conn = new SqlConnection( ... ))
{
conn.Open();
SqlCommand command = new SqlCommand( ... , conn);

using (SqlDataReader reader = command.ExecuteReader())
{
table.Load(reader);
}
}

return table;
}

public void Insert( ... ) { ... } // INSERT method
public void Delete( ... ) { ... } // DELETE method
public void Update( ... ) { ... } // UPDATE method
}

Notice that the select method returns a DataTable whereas most ObjectDataSource tutorials would return a strongly-typed collection List<Bar>. The presented approach has two benefits. First, the migration from the SqlDataSource is totally transparent (including the sorting and paging capabilities of the GridView).

<ObjectDataSource Id="FooSource" runat="server"
TypeName="FooCrudLayer"
SelectMethod="GetAllBar"
InsertMethod="Insert"
DeleteMethod="Delete"
UpdateMethod="Update" >

... (parameters definitions)

</ObjectDataSource>

Second, database objects are not replicated in the .Net code. Indeed, replicating the database objects at the .Net level involves some design overhead because both sides (DB & .Net) must be kept synchronized.

The forces behind the CRUD layers

  • (+) Very small design overhead compared to R.A.D.

  • (+) Easy migration from R.A.D.

  • (+) Unit testable and documentable.

  • (-) No dedicated abstraction for business logic.

  • (-) Limited design scalability (because it's not always possible to avoid code replication).

Full blown 3-tiers


I won't say much about this last option (this post is already far too long), but basically, the 3-tier design involves to strong type the database objects in order to isolate business logic from data access. This approach comes as a large overhead compared from the original RAD approach. Yet, if your business logic gets complex (for example workflow-like processes) then there is no escape, layers have to be separated.

Monday
Sep182006

Over the Internet, your name is your personal trademark

I have been dealing with freelancers for various tasks (translations, graphists, development), and it's still unbelievable that most freelancers do not pay any attention to maintain a consistant name in their communications. Let me clarify this point: I do not care to know of the exact legal name of any freelancer I am dealing with. But how can I even recognize the person if messages never get signed twice with same name?

Over the Internet, your name is your personal trademark. If you're not careful, people will simply not remember who you are and this rule isn't restricted to freelancers. The most usual consequence over poor name branding is that people will filter out your communication attempts (email, intant messenger and the like) as spam. A clear naming policy means that your name must be explicited and obvious in all your communications ranging from Skype to regular postal mails.

In my experience, the most common inconsistant naming case are the following:

  • The e-mail must completely match the person name. If your name is John Smith then your email must be john.smith@yourcompay.com not joe@yourcompany.com or jsmi@yourcompany.com. If you have a lengthy name so will be your e-mail address.

  • The Skype/MSN/Whatever username must completely match the person name too. I have found the instant messaging practices to be even worse. Fantasy names like batman4ever are not uncommon. A good practice is to use your e-mail as a instant messenger id.

  • Lack of personal home page: Google is the de facto yellow pages. When somebody type your name then he must get your home page. If your name is really John Smith then it's going to be tough. Well, in such case, just add some random nickname between the first and the last name to become distinguishable.

It may appear obvious to some of us, but it seems that many people do not realize that the lack of consistency in their communications does have a strong (negative) impact on the people they are communicating with, especially if the name is the only concrete reference to the person

Sunday
Sep172006

Don't booby trap your ASP.Net Session state

The ASP.Net Session state is a pretty nice way to store (limited) amount of data on the server-side. I am not going to introduce the ASP.Net session itself in this post; you can refer to the previous link if you have no idea what I am talking about.

Although many articles can be found over the web arguing that the major issue with the session state is scalability, don't believe them! Well, as long that you're not doing crazy things like storing your whole DB in the session state which is not too hard to avoid. Additionnally, 3rd party providers (see ScaleOut for example) seem to offer pretty much plug&play solutions if you're in desperate need for more session space (disclaimer: I haven't tested myself any of those products).

In my (limited) experience, I have found that the main issue with the Session State is the lack of strong typing. Indeed, lack of strong-typing has a lot of painful consequences such as

  • Hard collisions: The same session key gets used to store 2 distinct objects of different .Net types. This situation is not too hard to debug because the web application will crash in a sufficiently brutal way to be quickly detected.

  • Smooth session collisions: The same session key gets used to store 2 distinct objects of the same type. The situation is really worse than the previous case because, if you're unlucky, your web app will not crash. Instead your users will just experience very weird app behaviors from time to time. The multi-tab browsing users will be among the first victims.

  • No documentation: static field get documented, right? There is no good reason to discard Session State documentation.

  • No refactoring: Errare Humanum Est, session keys do get poorly or inconsistently named like any other class/method/whatever. No automated refactoring means that the manual refactoring attempts will introduce bugs with a high probability.

A simple approach to solve most of those issues consists of strong typing your sessions keys. This can be easily achieved with the following pattern:

partial class MyPage : System.Web.UI.Page
{
// Ns = the namespace, SKey = SessionKey
const string Foo1SKey = "Ns.MyPage.Foo1";
const string Foo2SKey = "Ns.MyPage.Foo2";
....
}

Instead of explicitly typing the session key, the const string fields get used instead. If you need to share session keys between several pages then a dedicated class (gathering the session keys) can be used following the same lines. This pattern basically solves all the issues evocated here above.

  • Collisions get avoided because of the combination of namespace and class prefixes in the session key definitions (*).

  • Documentation is straightforward, just add a <summary/> documentation tag to the const string definition.

  • Refactoring is easy, just refactor the field or change its value (depending on the refactoring intent).

(*) It would have been possible to prefix directly in the code all session keys by the very same combination of namespace and class name, but it becomes a real pain if you start using the session frequently in your code.

Saturday
Jul222006

Resx2Word, when simplistic is not enough

RESX files are great (and simple) containers of textual resources for your .Net/Asp.Net applications. It's especially useful if you're planning to translate your application into multiple languages (PeopleWords has been translated into 13 languages all textual content being put into RESX files). Yet, using Microsoft Visual Studio as a RESX file editor is quite an overkill solution for translators (whoses programming often equate zero since it's not their job anyway). In a previous post, I was discussing ResxEditor, a simplistic and stand-alone RESX file editor.

Où que tu sois je te retrouverai, car si tu ne viens pas à Lagardère, Lagardère ira à toi! (If you do not come to RESX, RESX will come to you)

Yet, I am still not entirely satisfied by ResxEditor. Indeed, during the translation of process of PeopleWords, half of the translators (smart and educated btw) were surprised by the sheer existence of other text editors beyond MS Word. I imagine that this kind of thing can happen if you have been working your entire life with MS Word.

As a result, those translators, no matter how many times you tell them not to use Word, they can't resist the urge and the RESX file gets opened and translated through Word ... and then funny things happen. For example, I have now several translations of the Microsoft RESX instructions Microsoft ResX Schema, Version 2.0, The primary goals of this format is to allow a simple XML format that is mostly human readable. ..., the large XML comment created by VS by default at the beginning of each RESX file. This XML comment is going to one of the most translated piece of MS literature (I do not think that the VS engineers were expecting this when they wrote those RESX instructions).

In order to escape the curse of the RESX instructions translation, I have just released Resx2Word, a RESX to MS Word converter (and vice-versa). Naturally, it's not possible to translate generic MS Word document to RESX, only MS Word document generated by Resx2Word can be converted back into RESX by Word2Resx. Any feedback?