Reviving The Scientific Method on Artisanal Bytes

Reviving The Scientific Method

Written on Monday, July 22, 2013

When I first learned the scientific method in what seems like elementary school, I figured I would be using it the rest of my life as I conducted my science. Then reality struck. Fast forward 20-odd years to today and the focus of my industry has, on the surface, moved away from science and towards just banging out code. I still think the scientific method is alive and well; it is just hidden by Jira tickets, Kanban boards, and UI mockups; and sometimes it is just disguised by a new name.

What is the scientific method?

The scientific method is a reliable process for acquiring knowledge, where knowledge is defined as facts that are observed and measured. It is comprised of five steps, but it is not a strict process – perhaps it was the first agile practice.

Question: For any learning to occur, you must have a question to answer. In the first step, you set about formalizing your question.
Hypothesize: In order to narrow down your work and focus your experiment, you essentially take a guess as to what the answer is.
Predict: In this step, you attempt to explain how your hypothesis will materialize in the real observed world. In a true scientific experiment, the prediction is needed to ensure that the test you will run will be conclusive.
Test: Now you run your experiment, which should uphold or disprove your predictions. The testing phase is when you collect your data.
Analysis: Finally, you decide if the results from your test support your hypothesis or not.

The scientific method is so basic that everyone can do it, so let’s see it in action.

Putting the “science” back in comp sci

While I will show that the scientific method is in use today, you will quickly see that it does not need to be as purely scientific as if you were deriving the existence of subatomic particles. Sometimes the question is not very quantitative, and sometimes the prediction does not alter the experiment, but the rough process is always the same.

A/B testing

The most clear example of the scientific method being alive and well is the A/B test used mostly for user experience design or ad targeting. Let’s look at the process of designing the landing page of a subscription service. First you question: what will convert most visitors into subscribers? Then you put out a hypothesis: people’s positive reaction to pictures of cats will drive up subscriptions. That is probably good enough for prediction, too. The test is easy: you just add a little code to your page that decides between not showing a picture of a cat and showing a picture of a cat. You have to track whether the picture was shown and whether a subscription was made. Then you run the analysis to determine if your hypothesis was correct. Congratulations, you just increased your subscriptions by 14% thanks to a cat picture.

A/B testing is the poster child of the scientific method, providing clear evidence of its importance in the world of software. I have several other examples that will highlight different steps of the method.

Diagnosing and fixing a bug

The work of diagnosing and fixing a bug runs the gamut from smoking out a defect to identifying a scalability bottleneck to pinpointing a performance issue. The scientific method applies in each of these cases.

A colleague went through an ad hoc application of the scientific method as he tried to figure out why database writes were taking upwards of a few seconds when high load was put on the server. As he dug through log files and transaction logs, he was doing the initial investigation to formulate his question and hypothesis (though he probably did not think of it that way). Once he thought he knew what the issue was, he was quickly able to pose a prediction: our setting of innodb_autoinc_lock_mode was causing concurrency to suffer, and changing it will increase concurrency of writes. Once he developed a simple test to prove his hypothesis, we set about to changing that setting.

Another colleague was trying to speed up a single database query that covered a lot of data. There were a few options to rewrite the query to take better advantage of indexes that already existed, and in our discussion, it seemed like there was a best choice. My colleague was ready to just make that change, but I insisted that he run an experiment to observe the behavior we expected. I wanted him to prove or disprove our hypothesis. Of course, I didn’t ask him to run an experiment directly; I asked him to generate an explain plan to see what the query optimizer would do. With that and a simple test of the query on real data, he was able to feel confident in the change.

When diagnosing and fixing a bug, it is not just cool to be able to use the scientific method, but it is practically mandatory. In taking a systematic approach to this process, you will work your way to the root cause of the bug quickly, and once you find the cause, having applied the scientific method will help prove your solution and prevent the defect from occurring in the future. Here is my favorite example.

A long time ago, a colleague was using Hibernate to access the database. Along with a slew of other changes that were going into a release, he upgraded Hibernate to the next minor version. He did not notice that the default configuration for the second-level cache had changed, but an odd bug started creeping up in his app. After debugging for a while, he asked me for a second set of eyes on the problem. In this scenario, my typical line of questioning starts with, “What has changed since the bug showed up?”¹ After some hemming and hawing, he mentioned that he had upgraded Hibernate, so we took a look at the changelog, and identified a potential problem with the cache configuration. He coded up an integration test that exposed the issue, and then fixed the configuration and saw the test pass.

There was a lot of unspoken scientific method in that process. First, we questioned what had changed. Then we hypothesized that it was the cache configuration. Next we predicted that a change to the configuration would fix our bug, and we literally wrote a test to find out.

Performance testing

Similar to the diagnosing of a bug, applying scientific rigor to performance testing will focus your work and create an appropriate end goal. The first step is to determine what you are going to measure – this is your questioning phase. Wrapped up in that is determining how much performance you need. You can likely skip the formal hypothesis and prediction steps, assuming you understand how to best test for the performance you need. But if you think about what outcome you expect, at least predicting some relative performance capabilities, you may have an easier time with the last, most crucial step: analysis.

Often times I see people run a benchmark and start referencing results without really thinking about them. Recently I saw some numbers that showed read times from HBase were more than three times as long as the write times. That struck me as odd, because most of the time writing to a database is slower than reading from it. After thinking about the scenario we tested though, I felt confident in the numbers: our test was reading an order of magnitude more data at a time than it was writing. Perhaps we should have added that note to part of our prediction before we ran the tests. A more dire case resulted in an experiment that showed a higher throughput of calculations than writes even though a single calculation involved several writes. On questioning, it turned out that the code to measure writes was wrong. If there had been no analysis, then the experiment’s results would have been invalid.

In conclusion

The scientific method is in use all around us, but we might have to look a little deeper to see the true science. As a side note, at first I lamented not having an example of using the scientific method for the writing of code, but I have an explanation. The original writing of code is more of a creative process than a scientific one. The scientific method is for learning and explaining; writing code is creating. Once that code is written and the software is running, the scientific method can be used to easily explain what it is doing.

I firmly believe that computers are always deterministic – they will do exactly what the programmer has programmed them to do. Sometimes determinism is difficult to figure out, but computers don’t have minds of their own. Yet. ↩

artisanal bytes

“Hand-crafted in San Francisco from locally sourced bits.”