Archive for the ‘Unit Testing’ Category.

Building Database Tests that Don’t Break

As I discussed in Unit Testing the Database, a difficulty you run into when using a shared dataset for unit tests is making sure you write your test in such a way that changes to the dataset will not break older tests.

Note: depending on the database access framework you use, you will actually access the database through a Connection, EntityManager, Session, PersistenceManager etc. I’ll use the term “Connection” generically for any of these access types because the same pattern applies to them all.

Don’t Rely on Particular Data

The following is an example of a bad test that will break when the dataset changes:

public void testFindActive() {     Collection returnSet = new PersonDAO(connection).findActive();     assertEquals(5, returnSet.size());}

When you first write the above test, it will work fine because you have know your data set has 5 person rows in it. However, when you get to the next test and realize you need more test data and add a new person row, the testFindActive() method will suddenly fail when the method being tested isn’t actually broken.

What I have found to be the best way of writing better tests is to create a framework that lets you describe what type of rows you expect your method to return. Here is a code example:

public void testFindActive() {assertDataCorrect(”Did not return active people correctly”,    new DataComparator(entityManager, Person.class) {            public boolean include(Person person)  {                    return person.isActive();            }    }}

There are two advantages to this method of testing:

  1. It doesn’t break if person rows are added or removed
  2. The test better documents the intent of the isActive() method
  3. Since you are describing your filter via normal object methods, the test will stay up to date as you refactor your code.

Of course, you need to have an implementation of assertDataCorrect(). Unfortunately, although it is fairly straightforward to implement, how you actually implement it varies depending on your database access framework. Here is a pseudo-code sample to get you started:

public abstract class DataTestCase extends TestCase {assertDataCorrect(String message,            DataComparator comparator, Collection            collectionToTest) {

    assertNotNull(”Collection to test was null, poor test”, collectionToTest);    assertTrue(”Collection to test was empty, poor test”, collectionToTest.size() > 1);

    List allObjects = comparator.getEntityManager        .createQuery(”from ” + comparator.getEntityClass().getName());

    int matchingObjects = 0;    List filteredObjectsFromDB = filterObjects(comparator, allObjects);    for (Object objectFromDB : filteredObjectsFromDB) {      matchingObjects++;      if (!collectionToTest.contains(object)) {            fail (”Expected object ‘”+object.toString()+”‘ not found in collection”);      }    }    assertTrue(”Expected “+matchingObjects            +” objects, passed collection was “+collectionToTest.size(),            matchingObjects == collectionToTest.size());}}

Some notes on the above implementation:

  1. The class extends TestCase so as long as all your database tests extends the new DatabaseTestCase you will always have access to the new assertions.
  2. If the passed collection is null or empty, the test fails. The reason for this is because if your query is returning nothing from the database, you don’t really know if it is filtered down correctly. It could be working correctly, or it could be you have a typo in your filter that will always make it return no results.
  3. The assert function selects out all rows from the database and compares them to the returned objects. If you have a lot of rows, this will be a very poor-performing test. In practice, your test dataset will normally not grow so large this will become a problem, but it is something to keep in mind if you ever think about running your test suite against a copy of production data.

Eventually, I hope to create a database unit testing library that better encapsulates this logic, but have not had the time yet. If there is anyone who would like to volunteer to help out with creating one, please contact me at nathan [at] sundog.net.

Start a transaction in your setUp() method and roll it back in your tearDown() method

By running your test in a transaction, you ensure that (normally) whatever your code under test does to the database will have no lasting affect and will not cause later tests to fail. This is a simple extension of the unit testing principal of not assuming your tests will run in a particular order and to have no side effect of your tests.

If your code under test attempts to start and commit transactions, you may need to create a wrapper connection around the actual connection that intercepts calls to start and end transactions and simply logs that they happened. That way you can test that the commit you expected actually happened and still actually roll back the database in your tearDown() method.

The thing to watch out for is auto-committing code in your class under test. For example, if your tested logic creates a new table, many databases will auto-commit the transaction. In these cases, you will need to manually undo the changes in a finally block in your test so you still follow the “no side-effect” rule.

Unit Testing the Database Access Layer

The Problem

Writing unit tests for code that accesses a database is a difficult problem, and it’s one I’ve struggled with for several years. The problems come down to a balancing act between several (often competing) requirements:

  1. The unit test must not not break due to changes outside the class under test.
  2. The unit test must not break due to code refactoring that doesn’t break functionality.
  3. The unit test must run very fast.
  4. The unit test must remain valid through database refactorings.
  5. The unit test must fail if the code doesn’t function correctly.

I’ve tried several approaches in the past and was unsatisfied with them:

Mock the Database Connection

This is the most “pure” unit testing approach because it completely isolates the class under test from all external dependencies.

Connection conn = createMock(Connection.class);
Statement statement = createMock(Statement.class);
expect(conn.createStatement()).andReturn(statement);
expect(statement.execute(”SELECT ID FROM PERSON;”)).andStubReturn(true);
statement.close();
expectLastCall();
replay(conn);
replay(statement);

ClassUnderTest classUnderTest = new ClassUnderTest ();
classUnderTest .findPersonIds();

verify(conn);
verify(statement);

The advantages to mocking the database connection is that the tests run fast and don’t change to due to creating of other unit tests.

In practice, however, I’ve found these tests to be more or less worthless. They (like most mock tests) really just test the implementation of the method, not the behavior of the method, and therefore often fail due to code refactoring. The biggest problem, however, is that the access to the database is really the whole point of the method and there are so many ways that can fail.

I would argue, that really the method is implemented as SQL with a Java wrapper around it and in the mock unit test, only the Java code is really tested–the SQL is often just cut and pasted from what is in the method.

Throw in changes to the database over the life of the project (the “person” table name changes to “employee”) and you end up with tests that pass, but don’t really tell you if your code will work.

Create Test Data Sets For Each Test

The obvious response to the problems with mock database tests is to have your tests access an actual database. This method will create tests that actually fail if your SQL is wrong or your database schema changes without a corresponding code update, but brings with it its own difficulties, primarily with keeping the database in a known state. There are tools available such as DbUnit and Unitils to help with the test-data management, but these have disadvantages.

The main problem is that the way the test data is stored, it is very dependent on the exact schema when it was created, and if your database structure changes your test data can no longer be inserted and your tests are now worthless. For example, if your test is created with test data that inserts rows into a “person” table, but later that table is renamed to “employee”, your insert statements will no longer execute. Depending on the schema changes, you may be able to recover your test data with a search and replace, but often the changes are too much and it has been long enough since you wrote the test that you don’t remember exactly what was supposed to be in the data.

Additionally, the data is inserted as part of the setup method for each test. Accessing databases in Unit tests slows them down considerably already, we should work to minimize any calls to the database that we have to do.

Manage Test Data With All Other Database Changes

To solve the problem of keeping test data definitions from getting out of sync with the schema, you need to have your test data built up along with your database so it will be modified by database refactoring that were made after it was initially created. For example, here is a database change script:

  1. create table person
  2. insert row into person [test data]
  3. insert row into person [test data]
  4. rename table person to employee

By including the test data with your database changes, the data is automatically kept up in the same way the production data would be. Using this technique over the dataset per method also has the advantage of performing better because the data is only inserted once, but it has the disadvantage that you need to handle all the test data that any method would want in one place.
To facilitate this technique, I built the idea of execution contexts into LiquiBase so you can mark the test data changes and only apply them in environments where you run unit tests. So far I have been happy with the results. The tests fail when there is a differences between the database schema and what the code is expecting or when there is a bug in my SQL and I haven’t lost any tests due to database refactorings.

The tests don’t run as fast as mock connection tests would, but they aren’t prohibitively slow. We have over 1000 unit tests in one project that all run in about a minute. It would be better if they would all run in 10 seconds, but what good are fast tests that don’t actually test what’s important? Just remember to mock the (already tested) data access layer in higher layers of your code so you don’t take the database performance hit any more than you need to.

With this technique, you need to always remember that your test data set will evolve over time and write your tests accordingly, but how to do that will be covered in a later blog posting.