Wednesday, 21 March 2012

Where the heck are we?

Your nightly build is red. There are some test cases failing and you know that this should be the moment when you need to stop and investigate. Your Product Owner runs in the team room and start to rant about features which must be in before the release at the end of the week. He is questioning are those faults so bad that we really need to consume time to fix them, because we have promised some cool stuff to our customer.

Roadsign to Lost farm near Bellabeg. (Stanley Howe) / CC BY-SA 2.0
What to do? Every time we have failing test case(s) in our build(s) it means that our location is unknown, failing test case says that behavior of our product is not what we have expected it to be and before those case(s) are green again, everything what you create are just hypothesis.

Abnormal situation is work need to be done, and it's work that does not get any better with age, actually quite the opposite. It's work which generates more work. On the other hand, if we use stop the belt and fix the problem immediately it causes idling to the rest of system and the bigger the system is, the bigger the effect of idling is. So what do? I think that abnormality should always be removed as soon as it is noticed. We should always have the best understanding where we are and get back on the map quickly.


  1. I think it is possible to find out where the problem is in your system when a test case turns red. For me the first clue is the commit which made the test case red, and the second clue is the coverage of the failing test case. But I understand that there are cases when they are useless.

    1. Hello Zsolt,

      And thanks for the comment.

      What I have seen, usually problem is easy to isolate to 1 commit and it's preferred that it's just fixed to next commit. If abnormality is hard to fix, just reverting code to the previous proven good state. I think these covers most of the cases.

      After that, there might be other reasons why cases fails (like test environment etc.), those are usually the nasty ones. But that usually leads to improving test and development environment(s).

  2. 2nd try to comment

    What about the PO? Now you focus the discussion in the team, I think that the PO is acting unprofessional.

    Has the PO planned any buffers? Is your achievement history so good, that you are able to deliver "all the cool stuff" by end of the sprint? You have any hardening/stabi phase?

    What does the PO want to have now and in long term? Is the PO aware that a downgrade in quality might leave permanent scars in the team - irreparable damage? Is s/he willing to take that risk?

    And then one would need to re-discuss working agreements and/or DoD. Why having a DoD is low quality can go out of the door? Is it becoming waste? What other issues in DoD can be sacrificed?

    and more thoughts in this line...

    1. Thanks Wolfi for your comment.

      I created this scenario by myself and it's more like inspired by a true story not based on a true story.

      You are stating absolutely the right questions. What I was trying to illuminate here is that there is basically two different (actually three) ways to re-act in these kind of a situation. First ignore it and continue developing new, stop the line and remove abnormality and the third one that I did not relieve in this blog, continue developing new and try to fix odd behavior at the same time (also bad practice IMHO)

      PO might not be the only one who is resisting to stop and fix, it can be also other stakeholders, like managers and even developers