The order of things – Contracts and Typeclasses

It's only fair to share...Share on Google+0Share on Facebook0Tweet about this on TwitterShare on LinkedIn0Email this to someone

This post is about a topic that I have discussed with fellow engineers over and over again: ordering of objects.

The underlying idea sounds simple at first, then gets more complicated when you dive in, then you start to see the light again, before your understanding plunges you back into the darkness. In this post, I’ll take you along for the journey and even give you a glimpse of the light at the end of this tunnel.

A few definitons up front, so you don’t get too confused. You may not fully grasp these yet, so feel free to come back and revisit them. Note that these are not formal definitions, but simply ensure that we share the same basic idea for each of these terms:

A comparison is used to compare two objects and find out if one is greater than, equal to, or smaller than the other.

An ordering is the result of sorting objects based on a comparison. To order objects denotes the process of arriving at an ordering.

An order is a typeclass defined by a comparison. In contrast to an ordering, an order represents the general concepts of objects being comparable, being able to get an ordering for objects, find the greater or smaller of two objects, etc.

Simple idea: Order your stuff

It all starts with a simple requirement like this:

The tasks shall be displayed in the list ordered by their execution time.

We assume a software here that in some way keeps tracks of tasks, which can have some execution time. Nothing out of the ordinary. You can think of work items on production machines or anything else. Let’s write up a simple class for the Task. I’ll provide the example in Scala, but this is a straightforward class with two attributes that could be written in any other language as well. I’ll go into Java specifics lateron anyways, so if you want to think Java, then just imagine your setters and getters for the two attributes (they’re in the Scala code anyways, in case you don’t know Scala and are wondering about that).

case class Task(startTime : Long, endTime : Long)

For the time we just take a simple long value to store the epoch milliseconds. As we care about ordering this’ll do for the example.

Now, we somehow ended up with a list of these tasks that should be ordered. Simple really:

val tasks = List(Task(3, 4), Task(5, 9), Task(1, 4), Task(3, 5), Task(4, 6))
tasks.sortWith((t1,t2) => t1.startTime <= t2.startTime)

Ok, Java needs a little more code for that. You’d use Collections.sort with a Comparator that compares the startTime of two tasks. But so far it’s pretty similar. However, it’s not the only option available of course. Another quite typical option is to implement the Comparable interface:

case class Task(startTime : Long, endTime : Long) extends Comparable[Task] {
    override def compareTo(o: Task): Int = java.lang.Long.compare(startTime, o.startTime)
}

As you can see from the usage of java.lang.Long, this isn’t exactly the preferred way to do things in Scala. Nevertheless, in the Java world, this is still pretty standard. However, let’s get down the rabbit hole now …

Not so simple: Adhere to contracts

What’s the difference between the above two approaches? Apart from their differing implementation, can you list pros and cons for each approach?

Let’s start with the Comparator that is defined at the usage site:

  • Pro: locality – if we want to order Tasks in a different way somewhere else, we can just provide a different Comparator
  • Pro: evolvability – see next section.
  • Con: correctness – no safety net against contract violations
  • Con: locality – if we want to reuse this Comparator somewhere else, we have to extract it into some attribute first and take it for a walk

We can compare that to implementing Comparable:

  • Pro: generality – the order defined by Comparable applies to tasks everywhere throughout the project
  • Con: generality – as the order applies everywhere, we have to take special care if we want to order tasks in another way elsewhere
  • Con: correctness – no safety net against contract violations

Generality and locality, respectively, can be seen as pro or con each. What’s interesting though is that I declare correctness to be a problem in both cases. Why is that? I’m not even addressing the actual method implementation of the comparison, as those are pretty much equal. (If you used Collections.sort in Java the implementations of the comparison methods would actually be literally equal.)

The problem lies in contracts. If you haven’t spotted the problem in the above Comparable implementation yet, then be wary, be very wary. I’ll give you another hint:

val tasks = List(Task(3, 4), Task(5, 9), Task(1, 4), Task(3, 5), Task(4, 6))
val treeSet = new java.util.TreeSet[Task]()
treeSet.addAll(tasks)
println(tasks.size + " -- " + treeSet.size) // prints: 5 -- 4

Yes, all of the tasks are different, but somehow the addAll call seems to have lost one. This is a bug caused by a contract violation. Unfortunately, these bugs are really tough to crack. Unless you get things spoon-fed like here, you might take a while to realize that the Comparable implementation is at fault here. If you’re lucky, you get to see this funny little Exception:

java.lang.IllegalArgumentException: Comparison method violates its general contract!

So what is that contract I’m talking about all the time? You are probably aware of the contract between equals and hashCode overrides. When you overwrite equals, then in all cases where equals returns true, the hashCode of the compared objects should be the same as well. Note that this contract is one way only, and for a reason. Objects with the same hashCode need not be equal, but equal objects must have the same hashCode. Trivia: This is why a hash conflict is resolved on HashMap/HashSet by executing equals to see if it’s just a hash collision or actually an equal object.

What a lot of programmers miss is that the contract extends to Comparable as well. If your compareTo method returns 0, then equality is implied. Hence, a compareTo result of 0 means your equals method is supposed to return true for the two compared objects. The TreeSet above is making use of that fact by actually not calling equals at all. In the particular case, Task(3,4) was inserted first and Task(3,5) was compared to it afterwards. Since the Comparable implementation above only considers the startTime attribute, the result is 0 and the TreeSet makes use of the contract to conclude that the two tasks are equal and it already has one of them stored. So the Task(3,5) is thought to be contained in the set already.

The problem with this contract is not the TreeSet implementation. Once the contract is defined and agreed upon, its very purpose is to be relied upon by implementations like the TreeSet after all. However, as I stated above, there is no safety net. If a developer doesn’t know about the contract, or simply forgets it once, there is no warning or error. Faults will arrive at different places lateron and it’ll take time to connect the dots all the way back to this contract (root-cause-analysis should lead you there though).

So if your preferred way of implementation is the Comparable interface, then you will just have to make sure that you think about this contract every single time you implement or change your compareTo method. Even better if you can add a few unit tests to actually verify the contract holds. Property-based testing is a promising way here, as this contract is a very easy to express as a property. The underlying problem remains though: if you forget to write these tests, you have no other safety net.

Back to Simple: Default methods for Comparators

Since Java 8 the Comparator is even more preferred, which you can see from the mass of default methods that have been added to the interface. If you don’t know yet about default methods, then go read up on it! Your Comparator interface now has a load more methods available than it used to.

Ever needed to order things in just the other way around? No need to add a separate Comparator anymore – just take the old one and call .reverse(). Forgot to add null treatment to your Comparator? Just call Comparator.nullsFirst or nullsLast with your existing Comparator and you get a null-safe version for free. Tired of writing SomeNumberClass.compare(attribute, other.attribute) ? Just use the Comparator.comparing* methods. In our example, this would be Comparator.comparingLong(Task::getStartTime). And last, but not least, compositionality! Ever had requirements to order things by a, then if the a thing is equal by another attribute b? Yeah, I thought so. Since Java 8, we finally have Comparator#thenComparing to compose individual Comparators.

So to solve our buggy contract-violating implementations with Java 8, we could write something like this:

tasks.stream().sorted(
    Comparator.comparingLong(Task::getStartTime)
        .thenComparing(Comparator.comparingLong(Task::getEndTime)))

Of course, you’d want to add some sort of terminal operation to that. It still quite a bunch of boilerplate code, but that is the way of Java. Still, it’s much better to read and maintain than pre Java 8.

It seems the world has been saved. We adhere to the contract now and have a nice succinct way to declare composable Comparators. Things are fine, aren’t they?

Still not so simple: (Partial) Orders

Time to get back to mathematics and think about what “ordering things” does. We are really defining a binary relation that is transitive, antisymmetric and total. Actually, sometimes we are defining a binary relation that is reflexive, transitive, and antisymmetric instead. These are better known as total and partial orders. Reflexivity, transitivity and antisymmetry are actually already part of the Comparator contract, and note that reflexivity is implied by totality for total orders.

Interesting conceptual idea, but can we find that in our code? Can we even have a partial order? It turns out this is not something Java 8 can help you with. So let’s stick to total orders for now. At least for those, Java 8 brings us some support, but it’s really Comparators that are used and mixed up with orders. For example, the Stream API offers max and min methods, which allow you to use a comparator to find the maximum or minimum element within a stream. Do not be fooled by the return type being Optional – this has nothing to do with partial orderings, but simply applies to empty streams returning empty.

The actual concept of an order is missing though. Why does this matter? Since we already looked at the definition of the total order, let’s dive into some mathematics. Both, partially ordered sets as well as ordered sets form a category. If you’re not into category theory, then this won’t convince you yet.  Let’s take a step back and look at other things based on partial orders: what about a partially ordered group? No way to express that in a direct way in Java – although all the standard Comparators on numeric types form an ordered group. Would you prefer a lattice? This is again built on top of a partially ordered set by requiring a unique supremum and infimum for any two values. Mathematics has a rich history of these hierarchical definitions, which has proven itself time and again. Unfortunately, we programmers are falling short here, as all we do is re-inventing the wheel over and over again. We write more Comparators and we add “helper” or “utility” methods for min and max, although every Comparator is actually required to define a total order, which in turn guarantees that min and max elements are available. But unless you’re holding a stream, where conveniently the API has been extended with this utility method, you have no access to this total order in any other way. Let alone higher-level concepts …

Back to Simple Again: Order Typeclasses

Typeclasses are an old concept to support ad hoc polymorphism via type parameters. We don’t get that in Java, but with generics we can come pretty close. What is more interesting though is the idea to rely on typeclasses for explicit encoding of mathematical structures, like orders. In Java, the google guava library provides a class Ordering (yes, that class name clashes with the definitions used in this article), which allows you to simply lift any Comparator into an order, for which you can derive min and max calculations (no support for partial orders though).

Recently though, awareness of this topic has lead to more and more libraries for providing these typeclass implementations in composable ways much closer to mathematics than in traditional programming languages. Well “recently” may be a bit off here, since languages like Haskell have nailed this down long ago, but it is just now coming into the mainstream world.

How is that different to what we did before, namely defining Comparators and using them? In the following example, I’m using the Order typeclass from cats, but it doesn’t really matter which library/implementational variant is used. All the example does is computing the maximum task based on the comparison that defines the Order.

implicit val timeOrder = Order.from[Task]((x, y) => x.startTime compare y.startTime)

def max(tasks : List[Task])(implicit ord : Order[Task]) : Option[Task] = {

 @tailrec
 def rec_max(tasks : List[Task], currentMax : Task) : Task = tasks match {
 case t :: ts =>
 if (ord.compare(t, currentMax) < 0) rec_max(ts, currentMax)
 else rec_max(ts, t)
 case _ => currentMax
 }

 if (tasks.isEmpty) None
 else Some(rec_max(tasks.tail, tasks.head))
}

Yes, I am aware that the compare call could simply be exchanged with Order#max as well, but we’ll get to that. In terms of the Clean Code value system, let us examine why it makes sense to step up to a typeclass for your orders.

Evolvability

Since the typeclass is derived from the Comparator (or comparison function), we can derive different orders together with all implied implementations. Consider the Comparable we wrote for tasks. If we want to determine the maximum of two tasks (according to that order), then we would end up writing something like the above example. Of course, we could refactor that into a separate max method like this:

def max(task1 : Task, task2 : Task)(implicit ord : Order[Task]) : Task =
    if (ord.compare(task1, task2) < 0) task2
    else task1

When you look at that closely, you realiz there is nothing at all specific to a Task in there. If we had a Order for a different type, then the exact same code would work. So we can generalize it like this:

def max[A](a1 : A, a2 : A)(implicit ord : Order[A]) : A =
    if (ord.compare(a1, a2) < 0) a2
    else a1

In the case of cats, the Order class itself provides a definition for max. What is important in terms of evolvability though is that the calculation of the maximum happens in exactly the same way for all types. Similarly, our example could be generalized to any Order of any type. How often have you written code that computes a maximum of two things? Next time, think about the order first, because you would no longer need to implement max or min for that type, if you just defined the required order. Evolving your codebase becomes much simpler, since the typeclass allows you to apply the same control flow logic to various types and cleanly extracts the relevant difference – the actual comparison to be used.

Correctness

As we could see above, a lot of the actual calculation logic can be reused in vastly different contexts. It all comes down to a single compare method that needs to be correct. You are still bound to the contract of the compare method which limits your implementation options (in a good way), but all the derived functionality is guaranteed to work correctly. Since the comparison method is directly tied to the actual order semantics you want to implement, it is clear that we cannot get that for free, but you get everything else correctly derived from that!

Production Efficiency

If you have not yet introduced your own typeclasses for this sort of work, or included a library for it, then you will of course incur a slight delay. Note however, that we are talking about something quite small, since the code samples on this page are basically the sort of complexity you could expect from implementing your own Order class. On the other hand, you will not have to write any control logic anymore (for things related to the order only of course). Just like you stopped caring about the sort algorithm in your sort calls, you can stop caring about how to compare objects, how to calculate a min or max, how to reverse an order, etc. You have more time available to spend on the actual meaning of what you’re working on and less code that needs to be written, tested, managed, and just generally would have slowed you down.

Reflection

Thinking about what you’re doing is immensely important, yet often neglected. Abstracting all your comparisons into proper orders forces you to rethink your design. Since an order is now widely applicable, it is of utter importance, that the ordering of objects is exactly what you think it is. When you no longer write your own maximum calculation, you need to be certain to understand the order you are using – unless you are in a situation in which you don’t have to care (like sorting objects).

Summary

If you ever get one of these nifty IllegalArgumentExceptions complaining that your code violates its general contract, then it’s high time to take a step back and rethink your orders. When you implement Comparable, be wary of its limits due to locality and only use it when you have a single clear (also called natural) ordering for objects of your type. If you expect other developers to be interested in ordering the objects of a type in different ways, then a comparator is a more suitable approach. In either case, you can derive an order for your type, which allows you to derive all related functionality. As a rule of thumb, every time you call a Comparable’s or a Comparator’s method (outside of your generalized order-derived implementation) you should wonder whether the corresponding algorithm isn’t one that can be (or hasn’t already been) derived automatically for all orders of any type.

[The feature image is CC BY-NC 3.0 AT by Andreas Schamanek]

It's only fair to share...Share on Google+0Share on Facebook0Tweet about this on TwitterShare on LinkedIn0Email this to someone