Giles rulez

Giles Bowkett is one of the bloggers out there whose blog I regularly read and enjoy. This September he had a presentation on Archaeopteryx at the RubyFringe conference. Yesterday I watched his talk. Awesome! This guy rocks and I can only suggest that if you are interested in ruby, lambdas, music or just like to supercharge your senses this talk is for you!

Hats off. And don't go back to your gas station Giles, we need you here!


CI and NCoverage

On our current project the continuous integration process demands that the code base has a minimal test coverage of 85%. If this condition is not met – guess what – the build fails. Initially we’ve started with a limit of 95%. We all knew that this was not quiet realistic but we thought there could be nothing wrong by starting with a noble attitude. As expected the effective coverage of the code was always slowly dropping to the configured limit. To the current point I am still satisfied with the result but there were some issues coming up during the daily stand up meetings:

Who’s in charge fixing the build if the limit is not met and the build fails?

From my point of view this is a no brainer. The person who actually checks in is responsible. But one member of my team had a different view. After some days off the project he joined us again, made a new class, checked it in and the build failed. From his point of view the coverage before his check in was already not sufficient because there should always be enough reserve for an empty class with a constructor to be checked in without the build failing. So he suggested we should all take some time to collectively increase the overall coverage.
I replied that if you are consequently do test first there should not even be such a class without an according test. Unfortunately until this point not every member of our team has come to the point where he is consequently doing test first. Also we did usually no coverage analysis on our local machine before checking in. This means, if the person which checks the code in is not a test-hardliner the build might. It means further that if you are close to the breaking limit of the test coverage your build will fail proportional to the number of members in your team which do not have enough experience in testing. This is really not a good option because a failing build means not only subsequent check ins will also fail if the build has not been fixed in the meantime, it means also more turbulence in the team.

For me this means that even if I manage to convince the team that if the build fails it is the one checking in who's in charge, I have to live with the fact that there’s some flow getting lost if we are near the limit.

Do we set the limit narrow enough to allow code be within the constraint which the team agrees does not have to be tested? (Like declarative code which when tested would only duplicate the declaration) Or do we tag the code with the according attributes to ignore it in the analysis and therefore are able to set the limit on a higher percentage?

From my point of view I do not like my code to be cluttered with ignore attributes just to be able to meet a self determined limit in our build process. But there were other opinions. One team member suggested that if we all agree that this code does not need to be tested why should we not tag it with such an attribute. I thinks this is a legitimate argument. Although I would not like the additional typing if we could avoid it simply by lowering the coverage limit accordingly. Eventually this is why we are configuring such a limit: because there is code which we all agree cannot be efficiently or meaningful be tested.

Bottom line we are still gathering experience how to properly handle the additional metrics of a coverage analysis in our development process.


IQueryable<T> vs. IEnumerable<T>

In one of our current projects we're using LINQ TO SQL to conquer the object-relational impedance mismatch. We all had some experience with LINQ and deferred execution. But it was getting obvious that we all needed to deeply internalize the difference between IEnumerable and IQueryable.

EntitySet<T> and Table<T> both implement the IEnumerable<T> interface. However, if there would not be IQueryable all querying functionality - including filtering and sorting - would be executed on the client. To optimize a query for a specific data source we need a way to analyze a query and its definition. That’s where expression trees are coming in. As we know an expression tree represents the logical definition of a method call which can be manipulated and transformed.

So we have a mutable logical definition of our query on one side and a queryable data source on the other. The Property Provider on IQueryable now returns an IQueryProvider which is exactly what we need here.

public interface IQueryProvider {
  IQueryable CreateQuery(Expression expression);
  IQueryable<TElement> CreateQuery<TElement>(Expression     expression);
  object Execute(Expression expression);
  TResult Execute<TResult>(Expression expression);

There are 2 interesting operations and their generic counterparts. The generic versions are used most of the time and they perform better because we can avoid using reflection for object instantiation. CreateQuery() does precisely what we are looking for. It takes an expression tree as argument and returns another IQueryable based on the logical definition of the tree. When the returned IQueryable is enumerated it will invoke the query provider which will then process this specific query expression.
The Execute() method now is used to actually executing your query expression. This explicit entry point – instead of just relying on IEnumerator.GetEnumerator() – allows executing ET’s which do not necessarily yield sequences of elements. (For example aggregate functions like count or sum.)

We finally have our two worlds nicely connected together. The mutability of ET and the deferred execution of IEnumerable combined to a construct that can analyze an arbitrary mutated and extended query at the last possible moment and execute an optimized query against its data source. It’s not even too hard to implement your own IQueryProvider for your own data source. Maybe I’ll cover that in a later post. This is really nice work Eric Meijer and his team has done here.


Release It! (Book Review)

Today I finished the Book 'Release It! Design and Deploy Production-Ready Software' by Michael T. Nygard. Coincidentally it was just a few days ago when InfoQ has released an article to this book as well. But this was not preventing me from blogging my own personal two cents about the work of Nygard.

The book is the 5the title published by the pragmatic programmers in my bookshelf. All the titles were very interesting, qualitatively convincing and beautifully put together - don't underestimate the attraction of a beautiful book to a reader and aesthetically oriented guy like me. Now to this specific title.

The author brings up a series of patterns and anti-patterns for creating software. Software which not only survives development but also survives in production - the environment of the real world. It provides general guidance as well as some very specific tips and tricks for development as well as operations. Although some parts are closely related to the Java world, there is almost always a matching example in the .NET world. The writer comes up with loads of interesting examples and case studies to fortify his statements and it's apparent that he brings a lot of experience out of the field.

Generally the book is about large scale (web-)projects but there are plenty of interesting points worth to delve into, even if one's current projects are not (yet) in need of sophisticated load balancing technologies.

Some catchwords from my personal things-to-remember-list of this title:

  • Circuit Breaker (together with Timeouts) is a very interesting pattern for dealing (even monitoring) with integration points of a system.

  • Beware of Chain Reactions in a system. Prevent with Bulkheads if necessary.

  • Communicate transparently and often. "Good marketing can kill you at any time." (Paul Lord)

  • Keep your SLA in mind while dealing with dependencies.

  • "Data purging never makes it into the first release, but it should."

  • Conway's Law: "Organizations which design systems are constrained to produce designs whose structure are copies of the communication structures of these organizations."

  • Keep the session timeouts (and the dead time before that) in mind. (Not only when speaking of 'concurrent users'). Ideally it should be set to one standard deviation past the average think time.

  • Consider involved threads when using resource pools - use a resource pool size equal to the number of threads to guarantee that every thread immediately gets the resource it needs.

  • AJAX should be called AJAS because the JavaScript object notation (JSON) format is much better suited (easier to parse and less chatty) to transfer data. (Yes, we knew this one before. Honestly.) However even JSON has a dark side. Its gates are eval().

  • There is another reason behind using Cascading Style Sheets instead of HTML tables besides conciseness and code organization: bandwidth.

  • Only starting a new thread is more expensive than creating a database connection.

  • Make your application cache configurable.

  • Use soft references. (Java has it. Does .NET?)

  • Load-balanced clusters do not scale linear due to overhead in heartbeats and state synchronizations. Fully load-balanced farms almost do.

  • The ZOI paradigm of software design should be considered in QA.

  • Don't mix production configuration with basic plumbing (like DI) in the same configuration file.

  • There is a project called Recovery-Oriented Computing (ROC).

  • The 3 phases of Zero Downtime Deployment: Expansion, rollout and cleanup.

  • The difference between white- and black-box (monitoring) technologies: The first are running within the system under observation while the second reside on the outside and are not known by the system itself. Naturally black-box technologies are lesser coupled to the system.

  • Colonel John Boyd and the OODA-Loop can be mapped to an iteration in agile development.

  • The author brings up a nice definition of cohesive in software design: "...In other words, if a set of methods touches only a subset of the object's state but is unaffected by other aspects of the object's state, the object is not cohesive."

  • ...and many many more...

I have read the book cover to cover and can recommend it to anyone interested in delving a little deeper in the subject.