Why Software Fixes Can Create New Bugs and How We Deal With Them

Bugs exist in many different forms, even outside of software. Consider a structural engineer tasked with building a bridge. Calculations need to be made early on, before construction of the bridge begins, to determine what the weight limits of the bridge will be and to ensure it will not become a victim to aeroelastic flutter. Failure to identify these issues early on can (and will) result in a “buggy” bridge that is prone to collapse. A structural engineer building a bridge will begin with blueprints that allow him/her to assess any potential problems early on. It’s obviously much easier to tweak a drawing than a bridge that is already built.

Avoid Haste to Prevent Waste

No matter what the scope for a given project is, major problems can and will occur, even with extensive amounts of testing and planning. Leaky O-Rings were defective in the Space Shuttle SRBs for years before the Challenger disaster occurred, and the Titanic may have been saved if it weren’t for budgetary concerns over using a double hull rather than faulty watertight compartments. Building a quality product with minimal defects is always a delicate balance of timely delivery and cost. Largely due to the ability to rapidly ship changes, software is no exception to this rule.

The upside of this is that software engineering has the unique capability to magically transform a product that is already in user’s hands. When under fire for allegedly exploding batteries, Tesla Motors pushed out a patch to every single vehicle on the road which raised the chassis when a Model S was traveling at highway speeds. This improved the car’s design to reduce the chance of underbody impact and prevent the fires that appeared to be battery explosions. The magic of software permitted this. While no other industry is equally capable of quickly addressing issues or adapting to user needs, the downside is that rapid development consequently introduces more and more opportunities for software to break.

Understand Code Interdependency

Software is particularly susceptible to user-facing defects. As an industry, we’re always working on new ways to push updates to users faster and on a more regular basis. When building an application, there is no blueprint. The code itself becomes the blueprint, which means we lose the ability to tackle problems before they occur. Unfortunately, the more quickly we iterate and ship, the more likely it is that bugs are going to be introduced.

Even when an application’s codebase is well-designed, seemingly minor tweaks can cause sweeping changes throughout—drastically increasing the risk of introducing bugs. What’s worse is the amount of code that is changing under the hood which the typical developer has no insight into or power to change. This includes OS updates, toolchain updates, changes to 3rd party libraries, device-specific nuances, etc. Attempting to have a truly bug-free software product is like trying to shoot a moving target.

Test Changes Before Applying

There are plenty of things we do to mitigate software defects. Code review is a standard procedure at Vokal, and one that is deeply rooted in our engineering culture. It gives us an opportunity to review the “blueprints” for any obvious bugs before a change ships. Tests get added with every new feature so we can always be confident that regressions haven’t been introduced into previously working code. Changes aren’t permitted to pass code review unless our Continuous Integration service indicates the application still builds successfully and all of the automated tests mark as passing. Several of our performance-centric applications run benchmarks in the CI process and will fail the build if a proposed change makes the application run slower. Some of our engineering teams won’t even allow code to ship if the automated tests cover less than 97% of the application.

We don’t stop at automation: Writing good automated tests requires a deep understanding of the product and requirements, which other members of the team performing a code review won’t necessarily have. That leaves a lot of opportunity for automation to miss a particular vector that may lead to a defect. Our QA Engineering team covers this using manual test plans. Test plans are written as the product is developed and changed. Bugs reported by a QAE are fixed, then backed with an automated test to prevent regressions. We’ll dedicate entire “hardening sprints” to purely executing test plans and fixing any bugs that are reported.

Decrease Size and Frequency of Bugs

Software is complicated. A typical mobile app may seem simple on the surface, but it is built upon a complex stack of technologies that are individually being tweaked and updated, each introducing a potential point of failure where there wasn’t before. Testing has to occur every time an OS or browser update is released, as well as when changes are being made to the application itself. Allowing your application to evolve is a constant tradeoff between time, budget and quality. Chasing down defects is an inevitable part of the development process, but it is the price paid for being able to tweak and adapt your product at the drop of a dime.