Improving your tests one mutant at a time

Imagine someone wrote a function finding the largest odd number:

    
fn largest_odd(v: &[f32]) -> Option<f32> {
    let mut largest = None;
    for n in v {
        if largest.is_none() || largest.unwrap() > *n {
            largest = Some(v.len() as f32);
        }
    }
    largest
}
    
    

And the test:

    
#[test]
fn test() {
    let v = vec![1.0, 2.0];
    let r = largest_odd(&v);
    assert!(r.is_some())
}
    
    

Admittedly, the test isn't very good. It only tests whether the found value is not None. You may laugh, but I've seen something similar at a large UK bank.

When we look at the implementation, you'll see there are many places where the programmer could make a mistake (in fact, there are some). And yet, the test passes. What's worse - the line coverage is 100%.

Mutation testing is about introducing small changes (potential mistakes a programmer could make) in the code and then checking whether they would be caught by tests. You could say that mutation testing checks how good your asserts are. It's not uncommon to have tests that run through code (so there is 'line coverage') with some of the tests hitting 'if condition-returning-true' line, but not a single test triggers 'if condition-returning-false'.

In Rust, you can run mutation testing with 'cargo mutants' installed by 'cargo install cargo-mutants'. Then, you just run 'cargo mutants' (or 'cargo mutants -- --release' if you want to test the code in release mode).

Let's see what mutations it can find in the above function:

    
Found 9 mutants to test
ok       Unmutated baseline in 0.4s build + 0.1s test
Auto-set test timeout to 20s
MISSED   src/main.rs:7:50: replace > with == in largest_odd in 0.3s build + 0.1s test
MISSED   src/main.rs:7:50: replace > with < in largest_odd in 0.3s build + 0.1s test
MISSED   src/main.rs:5:5: replace largest_odd -> Option<f32> with Some(1.0) in 0.3s build + 0.1s test
MISSED   src/main.rs:5:5: replace largest_odd -> Option<f32> with Some(0.0) in 0.3s build + 0.1s test
MISSED   src/main.rs:5:5: replace largest_odd -> Option<f32> with Some(-1.0) in 0.3s build + 0.1s test
9 mutants tested in 3s: 5 missed, 2 caught, 2 unviable
    
    

So first of all, there are 9 potential mutants that could be found (some tools are better at this than others, for Java I highly recommend pitest.org - it's absolutely fantastic!). Two of them aren't viable (the exact meaning depends on the tool: it could be an infinite loop or some other issue with the generated code). We see that 7 mutations were tested and only 2 were caught.

Knowing that the function should return the largest odd number, we can immediately know what's wrong with our test by looking at the mutation 'Option<f32> with Some(0.0)'. It should never return Some(0.0)! But first, let's change this in the code:

    
fn largest_odd(v: &[f32]) -> Option<f32> {
    let mut largest = None;
    for n in v {
        if largest.is_none() || largest.unwrap() > *n {
            largest = Some(v.len() as f32);
        }
    }
    Some(0.0)
}
    
    

Cargo test output:

    
Compiling mt v0.1.0 (/tmp/mt)
Finished test [unoptimized + debuginfo] target(s) in 0.30s
Running unittests src/main.rs (target/debug/deps/mt-9bfb3f44be23891f)
running 1 test
test test ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
    
    

Yep, the test passes.

Instead of writing another test (the fewer tests, the better), let's modify it a little:

    
#[test]
fn test() {
    let v = vec![1.0, 2.0];
    let r = largest_odd(&v).unwrap();
    assert_eq!(1.0, r);
}
    
    

And this fails as it should. So we add a check for oddity and return the correct value:

    
fn largest_odd(v: &[f32]) -> Option<f32> {
    let mut largest = None;
    for n in v {
        if *n % 2.0 == 1.0 && (largest.is_none() || largest.unwrap() > *n) {
            largest = Some(*n);
        }
    }
    largest
}
    
    

Now the tests pass, but let's check mutations:

    
Found 13 mutants to test
ok       Unmutated baseline in 0.4s build + 0.1s test
Auto-set test timeout to 20s
MISSED   src/main.rs:7:70: replace > with == in largest_odd in 0.3s build + 0.1s test
MISSED   src/main.rs:7:70: replace > with < in largest_odd in 0.3s build + 0.1s test
MISSED   src/main.rs:5:5: replace largest_odd -> Option<f32> with Some(1.0) in 0.3s build + 0.1s test
MISSED   src/main.rs:7:28: replace && with || in largest_odd in 0.3s build + 0.1s test
13 mutants tested in 3s: 4 missed, 5 caught, 4 unviable
    
    

We see that the number of possible mutations went up (it will grow as the code becomes more complex). The test caught 5 mutations but 4 were missed. That's like 55% mutation coverage, not very good. From my experience, we should try to get around 70-80% mutations killed (higher than that and we'll end up testing silly things. But as always - it depends).

Interestingly, if we swap(!) the comparison (from > to <) the test still passes. Getting the comparison wrong would be potentially an expensive mistake to make! So let's test for this as well:

    
#[test]
fn test() {
    let v = vec![3.0, 2.0, 1.0];
    let r = largest_odd(&v).unwrap();
    assert_eq!(3.0, r);
}
    
    

And the test fails. Now we're getting to the bottom of this. Of course, the comparison should be '<' to be correct:

    
fn largest_odd(v: &[f32]) -> Option<f32> {
    let mut largest = None;
    for n in v {
        if *n % 2.0 == 1.0 && (largest.is_none() || largest.unwrap() < *n) {
            largest = Some(*n);
        }
    }
    largest
}
    
    

This gives 8 caught and 1 missed:

    
MISSED   src/main.rs:7:70: replace < with == in largest_odd in 0.3s build + 0.1s test
    
    

That's right. If we replace the comparison with '==' the test will not catch this! It happens because in our test the largest odd number is the first item in the vector. Put at the end and the test (with == ) fails. Let's test for more numbers:

    
#[test]
fn test() {
    let v = vec![3.0, 2.0, 1.0, 5.0];
    let r = largest_odd(&v).unwrap();
    assert_eq!(5.0, r);
}
    
    

But this leaves one mutation:

    
MISSED   src/main.rs:7:28: replace && with || in largest_odd in 0.3s build + 0.1s test
    
    

It's almost like it doesn't matter if we do:

    
if *n % 2.0 == 1.0 && (largest.is_none() || largest.unwrap() < *n) {
    
    

or:

    
if *n % 2.0 == 1.0 || (largest.is_none() || largest.unwrap() < *n) {
    
    

or ignore testing oddity altogether:

    
if largest.is_none() || largest.unwrap() < *n {
    
    

That's right. The test passes in all three implementations. If the test says it doesn't matter - maybe we shouldn't be doing this? But of course we know we need this, so we need to fix the test.

So wait. Why does the test pass without this check? Well, in our test, the largest number is odd! So let's add an even number that's larger than 5.0:

    
#[test]
fn test() {
    let v = vec![3.0, 2.0, 1.0, 5.0, 6.0];
    let r = largest_odd(&v).unwrap();
    assert_eq!(5.0, r);
}
    
    

That's the output:

    
Found 13 mutants to test
ok       Unmutated baseline in 0.3s build + 0.1s test
Auto-set test timeout to 20s
13 mutants tested in 3s: 9 caught, 4 unviable
    
    

And so we got to 100% mutation coverage. (There is still is a small problem present - we don't test for None value. It just shows that no tool alone is perfect.)

In real software, it's not so simple. Sometimes it's quite a challenge to get to 60% mutation coverage and... the software still works. Business doesn't care (rightly!) about your mutation coverage. At some point, the benefit of killing more mutations is smaller than the effort spent (especially when there are other tests not run by 'cargo mutants'). But mutation testing is quite a good tool to discover error-prone code (it will have more places that can be mutated). I also see it as a tool to get developers familiar with a new code base (they need to go deep one feature at a time).

In Qpackt, I don't really practice TDD. Instead, I look at mutation and think: what feature wouldn't work had I made this mistake. Then I just implement a test checking for this feature. This usually kills more mutations than I initially hoped for and tests real features instead of functions. I wrote more about Qpackt mutation testing here.

Back to blog