Sebastian's blog

Engineering, scientifically

This is an answer to Colin Breck's post Is the Scientific Method Valuable in Engineering?, so you might want to read this first. I liked this post very much because it set me thinking, even though I do not agree entirely with its conclusions. In it, Breck argues that the scientific method is not useful in software engineering because the goal of engineering is different from the goal of science.

In science, we form a hypothesis and design experiments to falsify that hypothesis. In engineering, we establish a process, and apply methods rooted in science, mathematics, and statistics, to ask questions of the system, run experiments, and improve the process.

He concludes that the scientific method is not applicable to software engineering because of those differences, and draws on examples from Bryan Cantrill's talk Things I Learned The Hard Way and on Dave Farley's book Modern Software Engineering to support his arguments.

I will use the same sources and offer a different lens through which to view the interplay between software engineering and the scientific method. Why interplay? Because I think the scientific method is a necessary, but insufficient ingredient to effective engineering. You cannot say, ‘I use the scientific method, therefore I am doing software engineering’, but you also – in my view – cannot claim to do software engineering without using the scientific method in places. You may, however, be using it implicitly much of the time, which can make it a little harder to see. It mainly shows up when someone is not doing it. If this sounds a little cryptic, bear with me; I will explain.

First, however, let's recap what the scientific method is. I will take the easy way and just use the same quote from Robert M. Pirsig's summary of the scientific method from Zen and the Art of Motorcycle Maintenance that Breck used in his post:

(1) statement of the problem, (2) hypotheses as to the cause of the problem, (3) experiments designed to test each hypothesis, (4) predicted results of the experiments, (5) observed results of the experiments, and (6) conclusions from the results of the experiments.

Note that this is a purely declarative definition. It presupposes the existence of a problem without caring where it came from. It assumes that there are hypotheses without saying anything about how they come to be. It demands experiments without giving guidance on how to design them. And so on. Breck's 3rd footnote contains some quotes to this effect. His – and Bryan Cantrill's – conclusion seems to be that the scientific method requires you to guess (because each hypothesis is nothing other than a guess), and guessing is bad because there are other methods out there that better support goal-oriented problem solving.

So let's take the examples in his post and see if we can discover the scientific method in there!

Debugging, scientifically

The first example is Cantrill talking about his approach to debugging, and how he hates working with ‘hypothesis-centric engineers’:

I know this! It’s this! Oh, it wasn’t that, it’s this!

I can relate to that. Working with someone like that drives me crazy, too.

Cantrill suggests asking questions instead of guessing, and to make an effort to answer them by examining the state of the system. The answers to those questions allow you to narrow your bug search until you have found the bug. This is great advice, in my opinion, and a very good talk overall; I encourage you to watch the whole thing.

Anyway, if you do it his way, there's no guessing, no hypotheses, no scientific method, right?

Wrong! The way I see it, what he is doing is giving advice on how to do step 2 of the scientific method, coming up with a good hypothesis. Because at this point, it is still a hypothesis. He is going ‘I know it's this!’, but after a lot more analysis than the engineers he is complaining about, so the likelihood of his being right is much higher than a random guess. But there is no fix yet, so the problem is not solved, is it? What's next? Fixing the problem. Assuming it is a simple bug, identifying the bug roughly equals fixing it[1]. At the end of this stands the hypothesis, ‘This was the bug, and what I just did fixes it.’

Step 3 of the scientific method says you should design experiments to test the hypothesis. In my book, this means you should have at least one test that fails before the fix and succeeds after the fix. For reproducible bugs, this should be easy, for others it can be hard, but the scientific method does not require that it be easy.

The prediction of the result (step 4) is included in each test case in the form of assertions. If the prediction turns out to be incorrect, the test fails. Running the test, and observing the result, is step 5.

Step 6: draw your conclusions. If the test succeeds, you will probably conclude that you were right and have fixed the bug. Breck mentions that in science, ‘a lot of effort often goes into explaining the errors in measurements.’ This is what you do if the test fails, too. You examine the test – maybe it contains an error? If not, you have invalidated your hypothesis. That is, you go, ‘Oh, it wasn't that!’ And you are back to step 2, coming up with a new hypothesis.

In conclusion, Cantrill's critique of using the scientific method for debugging seems to me to be really a critique of engineers who churn out ‘fixes’ without bothering to come up with a good hypothesis first.

I use the scientific method for debugging, and many good software engineers do, even if they do not think about it as such.

Improving performance, scientifically

The second example in Breck's post is performance optimization. Here, too, the critique seems to be mostly directed at people who jump to conclusions instead of following a systematic approach of coming up with well-grounded hypotheses, and my answer is the same.

Let's reduce it to bullet points this time:

  1. Statement of the problem: Where exactly is the performance insufficient? If you skip this step, Donald Knuth might want a word with you[2].
  2. Hypotheses as to the cause of the problem: I agree full-heartedly that it is a bad idea to guess randomly. Use the profiler, measure where CPU time is spent, identify areas for improvement, come up with hypotheses of the form, ‘Unnecessary string copies in the flubber function cause 50% of the CPU load.’
  3. Experiments designed to test each hypothesis: Implement the fix, and carefully design performance tests (this is a rabbit hole in and of itself).
  4. Predicted results of the experiments: ‘The test will run twice as fast after the fix as before.’
  5. Observed results of the experiments: Run the test. Uh, oh, the test shows CPU load was reduced only by 10%.
  6. Conclusions from the results of the experiments: Examine the test (maybe we measured it wrong?), keep the fix (10% is better than nothing), put some more work in?

Again, I see the scientific method at work when I expand my view beyond the problem of coming up with a hypothesis.

Improving continuously, scientifically?

Breck uses manipulation of process variables in a chemical reactor as an example. I'm not sure about this one. If the goal is simply to observe the system behavior under different conditions, this sounds more like fiddling with the knobs and seeing what happens. No scientific method, there; I'll grant you this.

However, I think that as soon as you try to formulate a theory of how the system works and improve it on this basis, this sounds suspiciously like scientific territory again, and you might benefit from a more rigorous application of the scientific method. After all, is, ‘If we lower the inlet temperature by 1 degree, this will lower our energy costs by 2% and not affect the output quality’ not a testable hypothesis?

Changing software, scientifically

Colin Breck cites Dave Farley and comes to the conclusion that, while his book Modern Software Engineering promotes an ‘empirical, scientific approach’, it does not actually advocate for strict application of the scientific method. I have not read Dave Farley's book (sorry), but have heard him talk live and on YouTube and thus gotten a slightly different picture.

About the tests in a continuous delivery pipeline, Breck writes:

But these tests are not being used to disprove a hypothesis or discover a fundamental truth of the system, they are being used to test the invariants of the system and provide the guardrails for continued experimentation, discovery, refinement, and iteration.

And in a footnote:

It could be argued that the fundamental truth is evaluating if the software is always in a releasable state. But tests never perfectly replicate production environments and we need to continue to use empirical engineering techniques to evaluate software once deployed to production systems.

On the first point, I remember Dave Farley saying that each commit codifies the hypotheses that

  1. the commit does what it is supposed to do, and
  2. the commit does not break any part of the system.

And, yes, the tests are used to disprove these hypotheses. Regression tests try to disprove the hypothesis that we did not break anything (another way of saying the invariants of the system are still intact), and newly added tests try to disprove that the commit does what it is supposed to do.

The part about the tests not replicating production exactly is a digression. Other scientific experiments are not perfect either, see the observer effect. This is one reason why experiments/tests can only ever invalidate the hypothesis, not prove its correctness. ‘Tests are not perfect‘ does not imply the scientific method is not in action.

Breck describes the difference between science and engineering as follows:

Engineering is different in that we are not working to prove or disprove a hypothesis, but rather iteratively and empirically improve a process by applying proven methods.

I agree with this, and also with science and engineering having different goals. We can still apply the scientific method in engineering, where appropriate. I see it not as an either/or, but rather as the scientific method supporting the engineering processes.

Necessary, but not sufficient

The scientific method is not all there is to software engineering. We need to do a lot more to deliver good software, as Colin Breck, Bryan Cantrill, and Dave Farley have pointed out.

Some of this plays out within certain steps of the scientific method[3], some of it is outside the scope of the scientific method[4]. But there are places where the scientific method is applicable, and using it is indispensable.

Consider these examples of leaving out some steps of the scientific method:

I do not see which part you can leave out and still claim to be doing software engineering.

Closing thoughts

The scientific method is more than guessing and testing randomly. It includes other aspects not mentioned in the simplified summary we used for our discussion, like inductive reasoning to come up with good hypotheses. But even if reduced to guessing and testing, it acts as an important safeguard against bad solutions and non-solutions. After all, even the engineers Cantrill complained about noticed that, ‘Oh, it wasn't that!’ Presumably, they had run experiments to determine this.

Going about debugging in this way is not efficient[5], but it is certainly safer than doing it completely unscientifically. What do you think the result would have been, had those engineers not bothered to test their ‘fixes’?

At the same time, the scientific method alone is not enough to drive software development. We need those repeatable processes. We need to ask good questions, use architectural principles like coupling and cohesion, automate our testing, and work in small steps. But we need the scientific method, too.

And being strict about it is not necessarily slow. I can do the scientific method in a few minutes:

  1. Select a small ticket from the backlog (contains the statement of the problem).
  2. Write a little code and the first test. Since my test includes an assertion, this covers the hypothesis (the code), the experiment (the ‘given’ and ‘when‘ parts of the test), and the prediction of the result (the ‘then’ part of the test).
  3. Run the test, observe the result.
  4. On success, celebrate and commit (which triggers another round of experiments); on failure, go back to 2.

For me, the scientific method, translated into software development terms, means:

  1. You should know what you want to achieve before you start changing anything!
  2. Always write tests for your changes!
  3. No tests without assertions!
  4. Actually run the tests (new and old), look at the results, and act on them!

I, for one, would not want to have it any other way.


Footnotes

  1. If it is not a simple bug, the hypothesis ‘It's this!’ may expand into multiple hypotheses on how to fix it (which makes the process more difficult, but does not change anything substantial, so let's ignore this rabbit hole). ↩︎

  2. ‘Premature optimization is the root of all evil.’ — Donald Knuth ↩︎

  3. Like asking questions of the system for coming up with good hypotheses when debugging. ↩︎

  4. Like establishing a repeatable process, and the sense of quality Breck mentions in his post – determining what is good enough. ↩︎

  5. The scientific method does not require efficiency. ↩︎