Programming

On the harm of simple types

Recently, I encountered a lot of occurrences, where I found problematic code that was essentially caused by the usage of primitive types where they simply were the wrong tool for the job. The following article will hopefully make you stop and rethink what you’re doing when you’re using one of these types the next time.

What is a simple type?

First of all, let’s get an agreement of what I’m talking about here. Clearly, primitive types like Java’s int, boolean, or float are considered simple. But for the purpose of this post, let’s also include their boxed variants Integer, Boolean, and Float. In terms of what these types stand for, there is little difference. The latter may have a nice API and more closely resemble a unified OO view, but really a Boolean is all about being true or false.

One more common and primitive type is the string type. Whether it’s a char[] or a full-blown String class, I will consider it simple for the purpose of this post.

How do you distinguish a simple type from one that isn’t though? Let’s take a clue from the cover image of this post and consider domain-driven design. A type like Ship or Car may have a clear meaning in your domain. However, a simple type will never have a correspondence within the domain as such. If in doubt, imagine a business expert who has no clue about programming whatsoever. Will she understand the term “boolean”? No, certainly not. Will she understand the term “Car”? Yes, though maybe in a different way. Will she understand what a string is? Maybe, but she’d rather prefer you’d call that the car model, or the shipping destination of a ship.

As far as a definition goes, we can consider a simple type as a type that has no clear correspondence within the domain, yet is used to represent something of this domain. (The latter part is necessary to still give us the freedom to have all sorts of classes like “Button”, “Widget”, “…Listener”, etc. without confusing them with simple types.) Also, if you know of a better or well-established term for this, let me know in the comments.

But that’s what the variable name is for, isn’t it?

What’s wrong with these?

String destination;
double distance;
boolean isShowOnlyAvailable;

One might think that this destination variable is clearly named. But what you ended up with is mapping a domain concept to the name of the variable only, not its type. Why does that even matter?

Source: Flickr user pchow98 CC BY-NC-ND 2.0
Source: Flickr user pchow98 CC BY-NC-ND 2.0

“Banana” – That’s why! Why does “Rome” make for a better destination than “Banana”? It really isn’t in terms of your type. Why is 3.14159 a distance, yet at the same time it is pi? And of course it’s a temperature just as well. Understandably, everyone who is confronted with this thought, at first answered to me that no sane person would mistake “Banana” as a destination. But think again, how often have you tracked down a bug just to find out that some string contained a typo, or that a double variable was entering a calculation at the wrong scale (is distance in meter or kilometer)? It’s not that someone is routing your ships to banana for a particularly Gru-ish reason – what’s really happening is that your type system is not being used to help you avoid such an error, and programmers being humans will eventually introduce it one way or another.

Is this really just about safety?

Not at all. There is a reason we continuously encountered the term “domain”. Once you enrich your types and more closely relate them to the domain, or at least closer to their intended semantic usage you will also increase everyone’s understanding.

Consider again the above example of the isShowOnlyAvailable variable. Let’s say you are supposed to display a set of cars with your software, but it’s all so big and involved, that you also have access to cars that are still being manufactored, or cars that are already sold, or leased out, etc. In short, there are a lot of reasons why a car could be considered “available” or not. Nevertheless, in your GUI your users may get a simple checkbox to limit the displaying to available cars. You find that the developer of this feature just stored the result of this checkbox in the above variable.

As usual, we’re looking around that place, because there is a bug hidden somewhere within that feature. Sometimes cars show up that aren’t really available, despite the checkbox being selected to only show available cars. So you dig in and find this:

public Collection<Car> getCarsForDisplaying(boolean isShowOnlyAvailable) {
    return getAllCars().stream()
            .filter(car -> !isSoldOut(car, isShowOnlyAvailable))
            .collect(Collectors.toList());
}

private boolean isSoldOut(Car car, boolean isShowOnlyAvailable) {
    return isShowOnlyAvailable && car.isSoldOut();
}

Now that looks totally strange. Hopefully you wouldn’t consider that a good solution. If you’re really lucky you may not yet have seen one like that, but unfortunately this sort of code is around. But we can clearly apply our clean code knowledge and do much better than that. Since being sold out has nothing to do with showing only available cars, let’s try to not mix these two together and directly use the car’s method:

public Collection<Car> getCarsForDisplaying(boolean isShowOnlyAvailable) {
    return getAllCars().stream()
            .filter(car -> isShowOnlyAvailable ? !car.isSoldOut() : true)
            .collect(Collectors.toList());
}

Okay okay.. the ternary operator isn’t doing much good for readability either. Let’s have another go and extract the determination of the correct predicate:

public Collection<Car> getCarsForDisplaying(boolean isShowOnlyAvailable) {
    return getAllCars().stream()
            .filter(carAvailability(isShowOnlyAvailable))
            .collect(Collectors.toList());
}

private Predicate<Car> carAvailability(boolean isShowOnlyAvailable) {
    if (isShowOnlyAvailable) {
        return car -> car.isSoldOut();
    } else {
        return car -> true;
    }
}

Ok, now we at least have the domain concept of the “availability” of a car represented in the code. That’s a small gain at least, though, the method signature in line 7 must be quite confusing to the domain expert. What does the availability of a car have to do with what’s being shown on the screen?

Another thing we notice is the inevitable result of every boolean variable – it makes a decision. Essentially, we have now defined “car availability” as being “this” or “that”, depending on some obscure boolean variable. That’s not really what we wanted though. We could go back to the first method and duplicate the code there with the filter call only being applied if the variable is true, but that would be even worse.

How can we improve on this?

During the above refactorings we have recognized two key points:

  1. Availability is an important domain concept, but apparently we haven’t given it too much thought yet, as there is no clear representation of it in our code (let alone the types).
  2. All the checkbox effectively does is applying a different filter predicate to the stream of cars being displayed.

The second point immediately reveals a better representation in our type system than “boolean”. While the checkbox may be checked or unchecked, true or false, what we really care about is a predicate for filtering cars. While you may be forced to keep a boolean variable around instead of a Predicate, for example because checkboxes can’t be set to a predicate and a boolean is much simpler to persist, there is no excuse for putting that burden on someone else by letting that type escape any further. There is absolutely nothing wrong with your GUI class providing a predicate for filtering your cars instead of just a true/false value. Keep the boolean variable private and only use it when unavoidable – for everything else, get closer to your problem and use the predicate instead.

Funny enough, creating the predicate requires exactly the if-condition we have been dreading above. But that’s the price for having the boolean variable around and it’s much better to get that over with as soon as possible.

The first point is harder to deal with in general. Introducing a new domain concept is always involving a lot of discussions and clarifications. For the sake of keeping this post readable, let’s just say we have agreed to introduce a class AvailabilityCriterion that implements Predicate<Car>. Then we can combine our conclusions to something like this:

private boolean isShowOnlyAvailable;
public Predicate<Car> getVisibilityFilter() {
    if (isShowOnlyAvailable) {
        return new AvailabilityCriterion();
    } else {
        return car -> true;
    }
}

// ...

public Collection<Car> getCarsForDisplaying(final Predicate<Car> visibilityFilter) {
    return getAllCars().stream()
            .filter(visibilityFilter)
            .collect(Collectors.toList());
}

How much is all that effort worth?

Since all of the above code samples have essentially been refactorings, we haven’t actually added any value generating feature. So was it really worth it? There is some worth in refactoring things to become cleaner and more readable, but let’s just concentrate on what we gained by changing the type from boolean to a Predicate<Car>.

First of all, when we look at the code above, we realize that the bug we were looking for cannot even be hiding in it. It must be somewhere in the AvailabilityCriterion instead. Since we have the checkbox checked, and hence, are applying this class as a visibility filter, it must be the availability check within that class that accepts too many cars. In other words, if you wanted to fix that bug, all the time spent looking at this code is limited to at most a few minutes to realize you’re not going to find it in there. It’s also quite fitting that a problem with availability is found in the class that directly corresponds to your domain’s concept of availability. You won’t get that clarity from a boolean.

It’s also remarkable that the selection of cars to be shown is clearly the result of applying a visibility filter now. In fact, we have ended up with a method for selecting cars for the GUI that does not even know about the concept of availability. Why is that worth something? Simply because we can re-use that method unchanged when we want to only show cars that are available and red! All we need to do is extend the code for generating the visibility filter by and()-ing another Predicate<Car> that checks for the red color. This is just another gain of the better type: predicates are composable in a way that helps us. It is clear what happens if we want to limit our car findings to those satisfying “this” and “that” criterion. Contrast that with the other extreme of two boolean variables isShowOnlyAvailable && isShowOnlyRed. That doesn’t make any sense at all.

Given that you believe in the value of refactoring and domain-driven design, I hope I have convinced you that staying clear of simple types has a lot of worth to it as well. If I did, feel free to leave me a comment below, and if you disagree feel obliged to leave a comment and state your doubts.

 

[The feature image is the copyrighted cover of Eric Evans’ excellent book: Domain-Driven Design.]