Joe's
Digital Garden

On Unit Testing

A unit test suite is a developer written and maintained set of automated tests that cover atomic units of work, often isolated to a single layer and mocking external or system-level resources (database, external-api requests, filesytem, etc.)

Test Driven Development

Kent Beck in Test Driven Development: By Example outlines a method for tackling the software development process through unit tests. He proposes that developers work in a red-green cycle of testing when extending new features or tackling an identified bug in the system.

This cycle runs like so:

  1. Write a unit test that covers the bug or expresses the expected behavior of the new feature. Run this test and watch it fail (Red, the bug still exists, the feature does not).
  2. Fix the bug or implement the feature.
  3. Run the test and see that it passes (Green).

David Heinemeier Hansson, in a recorded discussion titled Is TDD Dead with Kent Beck and Martin Fowler in 2014 discussed the downsides to the TDD approach. The largest complaint is to avoid a strictly dogmatic approach to TDD usage. It is a tool among many and is best deployed in situations where the business logic can be understood before hand and no unorthodox architectural decision will need to be implemented into the system. In cases where the work is purely exploratory or experimental, it may be better to invert the second and first step or in some cases to admit that a meaningful automated test coverage of a particular propblem is not possible.

The Peter Norvig vs Ron Jeffries sudoku solver problem is an illustration in which TDD fails. Mohan Ravi brings up the contrast between these two attempts. Ron Jeffries attempts to solve Sudoku through a long series of unit tests that eventually become bogged down in attempting to model the state of the Sudoku board. Eventually never completing the work. Norvig solves the problem in a single brief article outlining an elagant solution to the issue but forgoing the architectural requirements for testing.

Writing Good Unit Tests

Dan North in Introducing BDD discusses the difficulties of instructing the early TDD practionier in how to write good tests -- what to test, what to call things, how to know that a test is testing the right thing. His response was to develop Behaviour-Driven Development which itself has gone on to encompass an entire set of testing tools, methodologies and consultancies. However, the advice given in Introducing BDD is still valid and capable of being implemented using any Unit testing framework.

But first, what is a good unit test?

A good unit test illustrates the behavior of the unit under test from the perspective of the consumer of the public API of the code. As such, a good unit test can be difficult to define and attempts at developing hard rules for unit testing often results in enforcing poor architectural choices and brittle tests.

The first difficultly is in defining a unit. A pattern in OOP languages emerges where the unit test suite becomes a mirror of the classes being tested with one test case per public method. This pattern should be seen as occuring by accident and not intention. The class might be the entire unit, but the unit could also be a small set of highly cohesive classes that work together. It is best then to think of the unit of work from a behavioral stand point. What behavior does this class, module, or small subset of classes expose to the consuming client? If I were to be pulling these classes in as vendor code and need to incorporate them into my own services what set of tests would illustrate for me how to deploy and use these classes correctly? As such the organization of how big to make the unit, how to name the test, or what to test is more art than science.

This implies as well that a good unit test is one that is unaware of the internal implementation details of the classes, methods, or functions that are under test. It should be focused on testing the public interface such that the internal implementation can be changed without breaking the test itself. Vanity metrics, such as 100% line coverage, will encourage the test-writer to write tests that are overly concerned with the implementation details. Such test will need frequent upkeep every time the implementation itself is modified. My experience is that around 80% line coverage begins to hit large diminishing returns and is a sign of potential test brittleness.

Methods under test that are dependent upon other classes (perhaps as parameters or constructor variables) should best rely upon the actual implementations of those classes instead of stubs or mocks unless those classes are themselves dependent on or representative of external resources (e.g. a guzzle-client or database connection). Rarely should a test make use of mocks, where the test itself is examining and evaluating mutations upon the dependency within the class or method itself.

Returning to Dan North, he adds a few finer details. First, test cases should be a sentence, no more and no less. This helps us focus on the behavior being tested and helps us understand when the test fails why it failed and what unit of work was being tested. Furthermore, he suggests using the word "should." That is, the unit of work should do something. He avoids using must, as it is acceptable to delete or change a no-longer correct test.

Last, from a business perspective. The test suite should cover the acceptance criteria passed to development. If the requirements can be formulated into a suite of tests than development has a proof of work that can be turned in demonstrating they implemented the desired behavior. As such, we should aim to inject as much of the ubiquitous language (from [[DDD]]) that we can into the test suite.

We can define a unit then as a given context and then compose a single unit test that sets up that context and then executes a series of test cases against it, each representing a particular set of events and desired outcomes in that context. In this case, we would write more unit tests than we would have classes -- one for each context that the class can be instatiated under and a set of tests for that context.

And of course, avoid testing obvious code like getters or setters. These methods should be called as a matter of course in the process of setting up and testing the actual unit of work. Classes that are pure containers of data with nothing but getters and setters are best tested at the module level.

Putting these principles together we have:

  1. Avoid hard rules in unit test design
  2. Define the unit of work being tested
  3. Define the unit by the behavior expected by the consumer
  4. Isolate the unit to a particular layer (module or class)
  5. Avoid testing the implementation details of the code under test
  6. Test the public interface of the unit of work
  7. Changing the internal implementation of the code under test should not break the test of that code.
  8. Use real dependencies unless they necessarily access external resources, in which case use stubs, avoid using mocks.
  9. Avoid vanity metrics like line coverage. Accept that some methods cannot be tested and avoid creating uneccessary abstractions or obtuse architectural patterns in order to achieve testing these methods.
  10. Write each test case as a single sentence and injecting as much of the ubquitous language as possible.
  11. Define one unit test per context that a class or module can be used in, and execute the test cases against that context. This implies that there can be more than one unit test per class.
  12. Avoid testing obvious methods. Test classes that are pure data-containers at the module level.

Example Using PHPUnit

Using the Customer class from [[PHP Models]] as an example.

declare (strict_types=1);

use PHPUnit\Framework\TestCase;

class CustomerTest extends TestCase
{
    const
        ID         = 10,
        FIRST_NAME = 'James',
        LAST_NAME  = 'Kirk',
        STREET     = '100 St.',
        CITY       = 'Phoenix',
        STATE      = 'AZ',
        ZIP        = '80001'
    ;

    private $Customer;

    public function setUp(): void
    {
        $this->Customer = new Customer(
            self::ID,
            self::FIRST_NAME,
            self::LAST_NAME
        );
    }

    public function testFullNameShouldBeTheFirstAndLastName()
    {
        $this->assertEquals(
            self::FIRST_NAME . ' ' . self::LAST_NAME,
            $this->Customer->fullName()
        );
    }

    public function testCustomerShouldNotYetHaveAnAddress()
    {
        $this->assertEquals([], $this->Customer->addresses());
    }

    public function testAddingAnAddressShouldCreateANewIdenticalCustomerWithThatAddress()
    {
        $AddressedCustomer = $this->Customer->addAddress(
            self::STREET,
            self::CITY,
            self::STATE,
            self::ZIP
        );

        // Assert a new customer was returned.
        $this->assertNotSame(
            $AddressedCustomer,
            $this->Customer
        );

        // Assert the new customer has the same name and id as the old.
        $this->assertEquals(
            $AddressedCustomer->id(),
            $this->Customer->id()
        );
        $this->assertEquals(
            $AddressedCustomer->fullName(),
            $this->Customer->fullName()
        );

        // Assert the new customer has the address we added.
        $this->assertEquals([
            new CustomerAddress(
                self::STREET,
                self::CITY,
                self::STATE,
                self::ZIP
            )
        ], $AddressedCustomer->getAddresses());
    }
}

The Shape of Tests

Robin Schroer, in The Shape of Tests outlines four different shapes that a unit test can take.

1. Matrix Shaped Test

Schroer describes this as taking a matrix multiplication of all possible inputs in order to perform an exhaustive test of all results in the continuous spectrum. The possible number of test cases is thus a matrix mulplication of each possible input. This is impracticle, but we can achieve a similar effect by grouping together inputs with similar mathimatical properties to reduce the number of imputs to a small set of classes.

An example of this, is a function that divides a numerator by a denominator. If we were writing a test to examine if the resultant is positive, zero or negative, we would have only nine cases that need to be examined -- positive % positive, positive % zero, positive % negative, zero % postive, zero % zero, zero % negative, negative % postive, negative % zero, and negative % positive.

With the exception of a special test case, the above would be still be an exhaustive test.

2. Tree Shaped Tests

The tree shaped test followes a particular branch through the code and confirms that, excluding all other branches in this case, it results in a correct value. If we test each branch of the code, we have performed an exhaustive test. The benefit of this route, is that it scales well as inputs grow and handles special cases well. The downside, is this particular test is coupled to the implementation. A change to the implementation may skip a particular branch.

An example of a tree-shaped test would be to notice that any value divided by zero results in an undefined result. We only need one test to test this branch, not three.

3. Definition Shaped Tests

In the final shape we work off the definition of the function. That is, we work from the principal of describing the expected outputs from a given set of inputs and the different cases that we expect the function to handle. The function is treated as a black box. In this case we are neither seeking an exhaustive set of inputs as we see in the Matrix Shaped test, nor are wee seeking to exhaust all branches.

External References

  1. Beck, Kent. Test Driven Development: By Example. O'Reily, 2002.
  2. Fowler, Martin et al. Is TDD Dead. martinFowler.com. Retrieved 2020-11-24.
  3. Mohan, Ravi. Learning from Sudoku Solvers. One Man Hacking. Retrieved 2020-11-24.
  4. North, Dan. Introducing BDD, Dan North & Associates. Retrieved 2020-11-21.
  5. Schroer, Robin. The Shape of Tests. Sulami's Blog. Retrieved 2021-03-09.
  6. Inozemtseva, Laura et al. Coverage is not strongly correlated with test suite effectiveness. ICSE 2014. Retrieved 2021-09-28.

Linked References