7 minutes
On the Cost of Code Resuse
- Increases Compile Times -> Lowers Programmer Productivity
- Harder to Change and doesn’t always create Higher Level Abstractions
- Harder to Understand/Debug
- Final Thoughts
Increases Compile Times -> Lowers Programmer Productivity
(Preface: This section will be in the context of C++, though the points will generally apply to other programming languages as well.)
Why Compile Time is Important
- You will start to amortize changes before rerunning tests. This will make it harder to debug, because bugs will not be caught early on and instead cascade, leading to times when the cause of a unit test failure at a particular call site actually happens earlier on in the call stack
- There will be less test coverage: If compiling my code and running the tests takes say, over 20 minutes each time, I am significantly less likely to througouhly test it, opting instead to cover just the “happy path” and a bit more. This is because in the real world, there are deadlines to meet. Once again, bugs will be caught later on(at demo time rather than test time), and in this situation the consequences are even worse. Instead of localizing the error to current git changes and call paths of a few unit tests, you will need to wade through logs and create the localization yourself.
- The shorter the compile time, the more you will be in the flow state. The issue is that as humans, it is hard for us to completely suspend our train of thoughts while we wait for the compiler. And so typically what happens is that we try to find some other task to do during the “downtime”. The issue with this is that after the compile is done, we have to bring back into working memory the original line of thinking. When compile times are short, we never need to “flush our brain’s cache”.
- Hotfixes will be more stressful, as you will spend more time under stress because of the long compiles.
How Code Reuse affects Compile Time
(Preface: This will be in the context of C++, though the idea applies to other programming languages as well, even if the terminology and extent of the effect may be different)
Often times, we will import the entire header of a library just for the sake of calling 1 or 2 functions. This has the effect of not only forcing us to parse unnesscary lines of code from that header, but all headers it includes recursively. But it gets even worse. If that header was not from a third-party library but rather internal/from the codebase itself, then anytime a change happens in the header, all files that depend on it are forced to recompile, and so the average incremental compilation will go up. If instead, however, we define the functions ourselves, we parse exactly what we need and no more. So the next time you think about importing yet another library with npm
, consider whether or not you can write the function yourself.
Harder to Change and doesn’t always create Higher Level Abstractions
An example of when code reuse gets you in trouble is the “Fragile Base Class” problem. In an effort to reuse the code from class A, we inherit from it in class B. The issue with this is that A no longer obeys the Single Responsibility Principle. Class A not only needs to maintain the correctness of its public interface, but also the correctness of all the methods and class invariants that B depends on/should continue to enforce.
And this applies to every subsequent class that inherits from A or B. What eventually happens, once your inheritance hierarchy is deep enough, is that when changing a method in a parent class, you break the functionality of a child class, with the classic example being the infinite recursion error. One can argue this is an issue caused by virtual methods, but the idea is still the same even without virtual methods and without classes. Everytime you change a function, you need to check all the call sites to make sure the function is still valid. And so for the same reason that abstracting into a function reduces code maintence in the future by removing redundancy(and thus changes to the function propagate automatically to all the call sites) it can increase code maintence for the exact same reason: changes to the function to satisfy the call site at location 1 inadvertently spread to all the other call sites too. What was once the same piece of functionality is now different.
But that said, there’s actually a good fix for this: copy paste the function into a new one, change the bit that you need to change, and call the new function at location 1 instead of the old one. Unfortunately, what people tend to do is to modify the old function to fit in the “new” logic, which is evident by code smell of additonal “boolean flags” to the function signature. The issue with this is that the function can quickly become scattered with if statments, leading to “tangled logic”. And perhaps at a more philosophical level, the function becomes more about resusing code structure(like a macro) than about reusing/creating higher level abstractions. For a concrete example of this, consider the “fold/reduce” function in many programming languages. It has been purported to make code higher level and more “declarative”. But does it though? Consider the following example.
// Implementation 1
let sum1 = vec.clone().into_iter()
.reduce(|accum, item | {
if pred(item){ accum+1 } else{ accum }
});
// Implementation 2
let sum2 = vec.clone().into_iter()
.count_if(|item|pred(*item))
// Implementation 3
let mut accum: i32 = 0;
for item in vec {
if pred(item) {accum+=1}
}
When I read the for loop implemenation(Implementation 3), I have to read what is inside the for loop to understand what the code is doing, make it supposedly “low level”. But when I read the fold function implemenation … I still have to read the lambda to understand what is going on. While I believe in Sean Parent’s statement in “no raw loops”, I do not believe reduce
is the solution to this. Rather, one should functions such as count_if
, which is more restrictive than reduce
and thus better expresses intent. count_if
is declarative. reduce
is not.
Harder to Understand/Debug
Suppose we are reading log files from a production server to debug an issue, and we see
void helper_function(Object& oranges){
log_line("2nd log line",oranges.string());
// ...
}
void func(){
log_line("first log line");
// ...
helper_function(apples);
// ...
}
// In the logs, we see
// [INFO] first log line
// ... stuff
// [INFO] 2nd log line, apples(weight:10)
// ... stuff2
// [INFO] 2nd log line, apples(weight:20)
This can happen if in between log_line
and helper_function
, another function calls out to helper_function
. But now the issue arises, which log line corresponds to the call by func
? Perhaps by examing the local call path of logs, you can disambiguate. But if there is a lot of log lines inbetween(aka stuff2
), you may not even realize that there was a duplicate!
Another concern is the mapping of variable names through the call path. In the code above I purposely created a naming mismatch between the function signature and the object passed in by func
. Although it usually is not that extreme, helper function tend to have different variable names, and it requires a non-trivial effort to keep this mapping in mind, especially for multiple variables across multiple depths. When trying to understand a codebase, you can forget your original purpose after making 3-5 “jump to definition” calls to your IDE/language server because of this, and also because while you were trying to understand something “specific”, the callpath brings you out to something more “general”, just for the sake of this “code reuse”.
Final Thoughts
Now, there are certainly benefits of code reuse. It can save time from needing to reimplement. Reusing code can also mean using code that someone has already tested. It can also allow programmers who are not experts on a certain subject to nevertheless write code for that domain. But as I hope this article shows, code reuse is not a “zero-cost” thing. So the next time you see code duplication in your codebase and want to refactor, be wary that you are not simply “seeing faces in the clouds”. As Sandi Metz has said: “Code duplication is far cheaper than the wrong abstraction”. 1.
-
See https://news.ycombinator.com/item?id=12061453 for more discussion about this ↩︎