Sergio Gutiérrez Mota

How not to become a public enemy - Part 2

There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.

Tony Hoare, creator of the quicksort algorithm (and way more)

In the last blogpost we were building a new Java library called CARumi to drive RC cars in a homogeneous way. We focused on designing a public API so that there were as little ambiguities as possible. Not only we improved the usage of the library by making it more readable and more intuitive but also reduced the number of potential bugs and issues by using types to our advantage.

In this post, we will continue working on the same library and its public API and how some decisions we make in the implementation can leak information to the outside. If you are familiar with the concept of coupling, then you already know the potential issues of letting others know too much about your implementation. If you are not, then you should know that they will make it way harder for our library to evolve and change while keeping backward compatibility.

CARumi v0.2.0

This is the API as we left it at the end of the first blog post:

public final class CARumi {
    public CARumi(CarModel carModel) {/*...*/}
    public void throttle(Power power) {/*...*/}
    public void brake(Power power) {/*...*/}
    public void turnLeft(Rotation rotation) {/*...*/}
    public void turnRight(Rotation rotation) {/*...*/}
}

This is how a client of our library would then use our API:

CARumi carumi = new CARumi(CarModel.FORD_MUSTANG_2016);
carumi.throttle(Power.inHorsePower(45));
carumi.turnLeft(Rotation.inDegrees(30));

As we explained, using types and smart constructors for our input parameters make it harder for our user to make mistakes and therefore, our library is safer to use (and way more readable if you ask me!).

Be graceful with your collections

There is something we need to think about when using classes with some sort of hierarchy. Imagine we are now adding a couple methods to execute actions in batches.

CARumi 0.3.0

public final class CARumi {
    // ...
    public void performActions(LinkedList<Action> actions) {/*...*/}
    public Collection<Action> getPendingActions() {/*...*/}
}

The first method is used to send actions so that the library process them one by one. The second one just returns a collection of actions that have not been processed yet. Collections are just a common case of a big hierarchy of classes that we use in our day to day when programming in Java.

If you are used to collections you will notice something weird about our choice of classes. The performActions method is somehow too concrete; too specific. This is a common mistake when first working with Java, we are basically leaking information to the outside and telling others that we are probably using a LinkedList inside the CARumi class. The problem comes when we decide LinkedList is not a suitable implementation anymore and we want to start using an ArrayList. Either we keep asking for a LinkedList and do a transformation inside the method or we start asking for ArrayList instances, breaking backward compatibility.

There is a similar problem with the getPendingActions method, we are using the base Collection class to represent the actions. The problem here is that if our users want to see what the 5th action is, they will have to write some additional code to get the nth element of a collection. Besides that, we are not declaring anywhere that the list of pending actions we are returning is ordered in any way so, what'd be the point of getting the 5th element anyways? We know that actions are being processed in a very specific order, though.

The problem we are seeing is basically the two faces of the same coin: we are misusing the classes of a hierarchy. There is a golden rule when facing these situations:

Ask for the minimum, return the maximum

In other words, try to be as general as possible in your input parameters and as specific as possible when returning values. The only exception to this rule is that we should always be hiding implementation details as much as possible so that we can safely change our library in the future. This has a lot to do with the Dependency Inversion Principle (DIP) but applied to libraries. If you don't know what this principle is, it basically states that:

Abstractions should not depend on details. Details should depend on abstractions.

If we would ask for specific lists then we would be making our users depend on our details and any change from our side would force them to update their code.

In our library, the only possible type for the input actions we are expecting in the method performActions is a List, we could go for a more generic class like Collection but we would be breaking one of our preconditions for that method, that is, the actions should be ordered so that we can start processing them in a predictable way.

The very same thing happens in the getPendingActions method but for different reasons. In this case, we try to return the most specific type we can. We can't forget our rule of not leaking implementation details, though, that means that the class we should use is also List.

CARumi 0.3.5

public final class CARumi {
    // ...
    public void performActions(List<Action> actions) {/*...*/}
    public List<Action> getPendingActions() {/*...*/}
}

Hope for the best, prepare for the worst

There are a lot of preconditions we often take for granted when developing new libraries, mostly because we don't realize we already have knowledge of the library implementation. However, it is important to address them in some way to avoid frustration and endless sessions of debugging for the users that don't know how the library works.

We will discuss the three most common issues when implementing a library.

Null values

In java, every object is optional, that means that they can hold a reference to a value of said class or the special value null. When we try to call a method of a null value, the runtime throws the known NullPointerException or NPE. This is a huge problem every Java developer faces at some point. It is way worse, when the NPE is thrown from within a library we don't have control. That's why we have to be clear about what our preconditions are regarding null values. If for some reason, we are not expecting a value to be null we should add at least a meaningful message.

This should always be followed with a @NotNull or @NonNull annotation when possible. These annotations don't force parameters verification but can be used as documentation and most IDEs understands them and highlights you when something is breaking said precondition.

Let's say we are implementing the turnLeft method of our library, if we are not expecting a null rotation value, we can always add a guard and the proper annotation.

public final class CARumi {
    // ...
    public void turnLeft(@NotNull Rotation rotation) {
        if (rotation == null) {
            throw new IllegalArgumentException("The rotation parameter in turnLeft can't be null");
        }
        
        // ...
    }
}

Throwing an exception might look like an ugly thing to do, keep in mind that in this example we are just following the standard way of handling errors in Java (for functional languages, there are more elegant solutions like the one presented here). The point is that we are now giving our users more information about the problem. This is way better than letting the null value pass and crash in some inner implementation with no further information.

Mutation

The next most common issue is mutability of parameters. In Java, all input parameters are passed by value (be it a primitive type or a reference to an object). That means that we can mutate any object from our library method and the caller won't even know until it's too late. The rule here is to never surprise your clients. Either declare explicitly that you are going to mutate that input value (with a meaningful method name like updateX, mutateX or setX) or avoid mutating it at all!

We might be used to use a method and throw the input parameter afterwards but imagine a user is saving that input parameter as a constant in one of their classes to reuse it.

Remember our turnLeft method implementation? What would happen if we managed to mutate the rotation parameter? What if the user has a couple constant rotations to reuse them from their side of the program?

The safest solution if we need to mutate an input parameter is to just copy it.

public final class CARumi {
    // ...
    public void turnLeft(Rotation rotation) {
        // ...
        
        Rotation copyOfRotation = rotation.copy();
        
        // ...
    }
}

Concurrency

Finally, we are used to use a single thread for dealing with libraries as simple as CARumi but we have to force ourselves to think if there is a possibility of someone using it in a multithread environment. The easiest solution, if we don't want to deal with those situations, is to document somewhere if you support them or not. In mobile platforms, we also have to think about blocking or not what is called the main thread (also called the UI thread).

If we decide to support concurrency, then we have to make sure there is no possibility of a race condition. In the turnLeft method we have created a lock to avoid multiple rotations to happen at the same time:

public final class CARumi {
    // ...
    public void turnLeft(Rotation rotation) {
        // ...
        
        synchronized(rotationLock) {
            // ...
        }
    }
}

CARumi 0.5.0

This is how our turnLeft method would look after all the changes we made when implementing it.

public final class CARumi {
    // ...
    public void turnLeft(@NotNull Rotation rotation) {
        if (rotation == null) {
            throw new IllegalArgumentException("The rotation parameter in turnLeft can't be null");
        }
        Rotation copyOfRotation = rotation.copy();
        
        synchronized(rotationLock) {
            // ...
        }
    }
}

Beautifying your API

There is a certain idiom to design APIs that is used in a lot of libraries to chain calls to the same object. It is used to avoid some boilerplate and make the library more readable and it's called Fluent interfaces.

A regular usage of our library as it is right now would be something like

CARumi carumi = new CARumi(CarModel.FORD_MUSTANG_2016);
carumi.throttle(Power.inHorsePower(45));
carumi.turnLeft(Rotation.inDegrees(30));

But we can do better, we can rethink our API a little bit so that clients can use it more directly by returning the CARumi instance in every method. If we do that, we can transform the previous usage by this:

CARumi.withModel(CarModel.FORD_MUSTANG_2016)
      .throttle(Power.inHorsePower(45))
      .turnLeft(Rotation.inDegrees(30));

CARumi 1.0.0

public final class CARumi {
    private CARumi(CarModel carModel) {/*...*/}
    public static CARumi withModel(CarModel carModel) {/*...*/}
    public CARumi throttle(Power power) {/*...*/}
    public CARumi brake(Power power) {/*...*/}
    public CARumi turnLeft(Rotation rotation) {/*...*/}
    public CARumi turnRight(Rotation rotation) {/*...*/}
}

Not only the code is more readable (arguably, I know), but it makes it way easier for the developers of the library to make changes on it, specially when using the most sophisticated pattern called Step Builder.

Versioning

A lot have been written about libraries versioning, we will stick here to the popular Semantic Versioning system (semver from now on).

If you have followed the blog post so far you might have noticed that we didn't follow any rule for the versions of the library. This is obviously not a good way to work, specially when programming public source code. That's why semver was created (by Tom Preston-Werner). Semver defines 3 numbers to represent a version, each separated by a dot, major, minor and patch.

  • The patch number is incremented when adding bugfixes or non-directly observable changes like improving performance.
  • The minor version is incremented when adding functionality while keeping backward compatibility. It is also used to deprecate methods that are soon to be removed/changed.
  • The major version is incremented when making incompatible changes in the API.

With that in mind, we now can say that our CARumi library version is 1.0.0

Changing our API

Now that we have defined how are going to version our library, what would we do if we want to replace a method in the API? First of all we have to think of the consequences of doing so, people using your library will be forced to change their code in order to keep their projects updated. Not everyone will be willing to do it so we first have to think if the change is worth it and prioritize it accordingly.

If we are confident on doing the change, we would first need to deprecate the methods that we are going to change/remove in order to give our users a time window so that they can update the library version.

Imagine we decided to merge both methods, turnLeft and turnRight into one: turn. We then would have to release a new minor version with the following API

CARumi 1.1.0

public final class CARumi {
    private CARumi(CarModel carModel) {/*...*/}
    public static CARumi withModel(CarModel carModel) {/*...*/}
    public CARumi throttle(Power power) {/*...*/}
    public CARumi brake(Power power) {/*...*/}
    public CARumi turn(Rotation rotation) {/*...*/}
    @Deprecated public CARumi turnLeft(Rotation rotation) {/*...*/}
    @Deprecated public CARumi turnRight(Rotation rotation) {/*...*/}
}

The most important changes are the new turn method and the @Deprecated annotation in turnLeft and turnRight. If people updates to this version they will be warned of future changes in those methods and will have more time to update.

After version 1.1.0, we would then release our incompatible version

CARumi 2.0.0

public final class CARumi {
    private CARumi(CarModel carModel) {/*...*/}
    public static CARumi withModel(CarModel carModel) {/*...*/}
    public CARumi throttle(Power power) {/*...*/}
    public CARumi brake(Power power) {/*...*/}
    public CARumi turn(Rotation rotation) {/*...*/}
}

With this version, we remove the deprecated methods and leave our API as we first wanted.

Last thoughts

We might think that with the release of CARumi 2.0.0 we just finished our journey and there is nothing else we should do but we just addressed the problems of releasing a new version, from that point on, it starts the most consuming part of an open source project, its maintenance. You will have to deal with performance issues, bugs, people not understanding how to use your library and much more. This is just the beginning of the project!

There are a couple things you should keep in mind when releasing your first version. First of all, the best documentation you can write for your library is a sample project covering, at least, the basics of how to use it. It is also very important to create a safety net of tests so that non-explicit requirements are written down somewhere. It is not only a way to keep your library bug free but also a document for collaborators of what are you expecting of your library. Finally, you will have to deal with pull requests, merges, squashes and so, knowing how to deal with Git (or Mercurial) will save you a lot of time when working with other people.

One last thing we should never forget is that the word public has a fuzzy meaning in the context of this blog post. When we think about public code, we are often thinking about an open source project in Github but most of the ideas we have discussed here are also relevant in non-open source projects. Public can be applied to code your workmates will use, but also, public can be code that you yourself will use in the future.