The Metrics Maze: Uncovering the Dark Side of KPIs
In the ever-evolving world of software development, the quest for the perfect Key Performance Indicators (KPIs) and metrics is akin to searching for the Holy Grail. Management eagerly seeks golden metrics to estimate, plan, and track development work and individual performance. Engineering teams often find themselves divided, debating the merits and pitfalls of each. From velocity, code churn and complexity to defect density and cycle time, these KPIs are anything but clear-cut. They come with their own set of controversies and unintended consequences that can ripple through like plot twists in thriller novels. Buckle up as we dive into the fascinating, and sometimes contentious, world of software development metrics!
The Desire
The goals of these metrics are multifaceted. Initially, they aim to accurately estimate development time and resources for feasibility, timelines, and workforce needs. Another critical goal is progress tracking. By monitoring metrics like story points and code changes, leaders can assess project status and identify bottlenecks or inefficiencies. These insights enable timely interventions to keep projects on track and aligned with external expectations.
Furthermore, KPIs are used to measure the performance of both teams and individual contributors. This allows for comparison and evaluation of their performance, facilitating process improvements, training provisions, and identification of varying performance levels.
Ultimately, the intention is that these measures will enhance productivity, quality, and timelines by clearly defining priorities and, consequently, motivating and rewarding all parties involved appropriately.
The Mismatch
Regardless of personal opinions, companies ultimately prioritize one primary metric: net profit. While there may be differing views, citing numerous examples of publicly declared philanthropic goals, it is important to note the following observations:
-
Bigger fish with bigger profits eat smaller fish regardless of how philanthropic they may be.
-
Bigger fish with bigger profits have the potential be more philanthropic. No profit = no company = no philanthropy.
-
Many are philanthropic only to increase profits by a combination of reducing taxes and leveraging philanthropy merely as a marketing ploy.
How do our KPIs translate into profits? Suppose that we have invented a perfect software development metric that reflects the amount of truly needed and desired work, excluding any waste and accounting for just the right amount of quality. We can now use this to estimate, plan, track progress, measure performance and motivate and reward everyone equally for truly equal work.
Consider this question: do two different pieces of code that require the same amount of work and skill to produce always generate the same profit? The answer is often no. As software developers, we may spend significant effort on elements that add minimal value rather than focusing on the core functionality. This does not only apply to user interfaces; in some cases, they are what generates revenue. It refers to any non-essential code implemented for additional polish or perceived completeness. Remember the Pareto Principle: the first 90% of the work takes 10% of the time, and the last 10% of the work takes 90% of the time. Numbers may vary but you’ll get the point.
Software development metrics tend to correlate more closely with expenditures than profits. We’re all about keeping things lean and mean! We aim for less code, fewer changes, cycles, and deployments while still hitting our main targets. Why? Because less code means less hassle: it’s easier to create, test, maintain, and get new developers up to speed with. Plus, more code is like that extra closet!
Think about what that means. We want teams and individuals to produce, work, and exert less effort. Ha! You got me! The objective is to accomplish the true goals more efficiently by optimizing the metrics achieved per unit of time spent. Remember when I said “Suppose that we have invented the perfect … metric”? It’s time to “unsuppose” that. We don’t have it.
Collateral Damage
Instituting KPIs and/or other metrics is meant to and will motivate people to improve their metrics. Rewarding developers’ code output will lead to increased code production. Some do it to help their careers, others genuinely believe it’s better. Ask a developer to write more or worse code than an example, and they can always do it, often enjoying the challenge. Do you really want more code?
Negative metrics aren’t immune to this either. Producing less of anything, defects included, is even easier – just slow your work down. If you’re not producing any code, you’re not producing defects either. I once worked in a company that cut developer bonuses for causing regressions. This lead to focusing only on superficial fixes, avoiding true code improvements or “perish the thought”, refactoring. We found a defect in how we handled how daylight saving time transitions, causing unintended consequences in a piece of downstream code. The fix? Modify that downstream code to recognize that exact defect situation and undo the damage, working backwards. Later, another piece of code needed to match the timestamps from those two but, of course, couldn’t due to the fix #1. Fix #2? Don’t touch the cause! Change the new code to recognize the situation and be able to match the mismatched timestamps. those anyway. Rinse and repeat five more time times, for a total of six “fixes” each trying to recognize and undo the previous ones.
Whether this is “gaming” or “honestly applying” the system doesn’t change the results, only the motivation. Leaders must choose, apply and interpret metrics wisely.
Twisted Thoughts
To help managers understand what they’re getting into - and to show developers how to boost their productivity without doing (much) more work - I’m wrapping this up with a list of commonly used metrics, how they can backfire, and how they can be “gamed”. Since it’s easy to find what these metrics are good for, let’s dive into their dark side instead:
-
Units of Code: Rewarding units of code encourages writing more code. But we don’t want more code. Developers can churn out lines of it without actually achieving anything extra.
-
Story Points: These usually reflect the effort or complexity of work, but neither is meant to be maximized. Developers can inflate estimates, insist on addressing obscure edge cases, or subdivide stories endlessly - each with its own overhead - while their “productivity” soars.
-
Velocity (Points per Sprint): Velocity isn’t the goal; accomplishing more with less is. It can be gamed by inflating point estimates or breaking stories into fragments. And if we factor in “true” available time, the clever among us might just show less available time altogether.
-
Cycle Time: This measures the time to complete a task from start to finish, but ambiguous “cycles” make comparisons murky, even within the same team. Worse, it’s unrelated to broader company goals. Developers can game it by shrinking task sizes, which increases overhead while artificially boosting apparent velocity.
-
Lead Time (LT): Measuring the time between task creation and completion discourages long-term vision. Big-picture goals inevitably involve dependencies and longer lead times. To game this, developers might refuse to create tasks until they’re ready to start and slice them into bite-sized chunks.
-
Throughput: The number of tasks completed per period is just the inverse of cycle time or lead time, so it suffers from the same pitfalls.
-
Commit Frequency: Encouraging frequent commits sounds good - backup, undo, and sync benefits abound. But it can create overhead, context switching, incomplete code, and automation-driven gaming. Plus, it detracts from meaningful refactoring and big, impactful changes.
-
Pull/Merge Request Time: Small, incremental changes are rewarded, while comprehensive, high-value changes are penalized. Authors of those big changes pay the price. Gaming? Break work into tiny tasks and keep reviews off the radar. This metric also discourages teamwork on shared branches, which usually involves more complex work.
-
CI/CD Build Success Rate: Yes, we want all builds to succeed, but this can be gamed by minimizing tests and subdividing changes to increase the success rate. Depending on what’s counted, it penalizes complex work and clashes with metrics like commit frequency or merge request time.
-
Deployment Frequency (DF): While rapid deployment helps customers, it comes at the cost of comprehensive testing-especially how changes interact. It’s like commit frequency all over again. How fast do we really need to go? Developers can game it by deploying non-functional code or slicing work into the tiniest of tasks.
-
Code Coverage: Sure, it motivates testing, but it should also encourage code removal - dead code doesn’t need tests! Developers can game this by writing generic, meaningless tests (e.g., no assertions) or removing untested code that might still be needed down the line. See my other post, Code coverage is an indicator, not a goal. Read my post on it.
-
Defect Density: Counting defects per unit of code ignores the complexity of that code. Developers can inflate code with trivial content to lower the density, making things worse overall by increasing the volume of code.
-
Change Failure Rate (CFR): This measures production defects but can be “improved” by producing less overall or focusing on superficial, low-risk changes. Remember those layers of DST fixes I mentioned?
-
Mean Time to Recover (MTTR): The time taken to fix critical production issues is a nice idea - unless developers create defects on purpose just so they can fix them quickly to improve this metric.
-
Code Churn: Frequent and voluminous code changes face the same problems as units of code, velocity, cycle time, lead time, commit frequency, and deployment frequency.
-
Technical Debt: The concept of technical debt focuses on existing issues and the effort required to fix them but ignores the value of doing so. That “debt” exists because its value is lower than other priorities. I don’t believe in technical debt. I believe in issues, necessary work, and the value they bring. If fixing something brings no value, I won’t bother - no matter how ugly it is. Left long enough, the problem might simply go away. This can be gamed by making debt subjective or expiring old issues.
Is there a way?
In my experience, there is a way! It doesn’t rely solely on individual, raw values - though those can be helpful in some cases. Instead, it’s about using a set of metrics that align well with effort. When combined, these metrics can reveal inconsistencies that stem either from areas needing improvement or from attempts to “game the system”.
While these metrics won’t automatically pinpoint the cause of an inconsistency, they’re enough to spark an investigation. And those investigations? They often lead to process improvements or even discourage further gaming attempts. While these metrics primarily reflect effort - and therefore cost - they can also contribute to feasibility analyses when paired with perceived ROI.
What are these magical metrics? That’s a topic worthy of its own blog post. But if you’re in a hurry, feel free to reach out!
Other posts dealing with similar topics