Today, Carlton Gibson posted a wonderful short programming question:
I love little questions like this, and instantly came up with a solution. However, before I shared what I came up with, I took a quick look at the answers, and saw a bevy of very creative solutions.
None of these felt like an awesome solution for me. I’ve had some similar thoughts on some related matters, and it felt like time for a blog post.
My problem with clever code
I feel like all of these solutions are trying to be very clever. This is by no means an attack on any of these tweeters; they are all very smart people and they are trying to write code that is “Pythonic”, which has taken on a meaning for using things like lambdas, list comprehensions, built-in functions, and concise code to express interesting ideas. This isn’t bad code either, it works and is passable.
So what’s my problem with it?
Coincidentally, I gave a talk at Citrix (where I work my day job) on C++17 — I was presenting an old conference talk from 2017. One section talked about expressiveness and things that modern C++ improved upon. In this morning’s presentation, I put up a slide that talked about how I measure expressiveness and I came up with the following ratio
Expressiveness is defined as “Understandable Intent : # of characters”
Put another way, if you want to make your code more expressive, you need to drastically improve understandable intent without adding too much code, or reduce the amount of characters without drastically decreasing understandable intent.
And herein lies my problem with the one-liners above: they sacrifice understandable intent in order to shorten the code.
But… but … I can understand those lines perfectly… I hear you say. And yes, we do understand those one liners, but we are also burdened with the curse of knowledge. We have been living and breathing Python for a while, and we may be comfortable with more advanced topics, but we also have to consider how accessible this code is. How easy is it to review? How easy is it to understand if you don’t know the problem it’s solving? How easy is it for new developers or developers new to Python to grok?
I don’t think any of the proposed solutions violate the law of least surprise, but I don’t think they are high on understandable intent either. For a long time, I have always been a proponent of looking at a function and seeing if the code is self-explanatory. Take one of the more favored solutions (my favorite at least of the options provided:
def are_bools_same(bools: list[bool]) -> bool:
return all(bools) or not any(bools)
And again, I think this passes the sniff test, but I realized something today. It’s not enough to say, given a function, will the code seem self-explanatory. You also have to look at the code in isolation, and ask yourself if you could tell me an appropriate function name for it.
Forget the problem for a bit, and tell me what these functions should be named:
def ???(bools: list[bool]) -> bool:
return len(set(bools)) == 1
def ???(bools: list[bool]) -> bool:
return sorted(bools)[0] == sorted(bools)[-1]
def ???(bools: list[bool]) -> bool:
return functools.reduce(lambda x,y: x==y, bools)
All of these are the same function, solving the same problem, but I would really have to think about things if I did not know the function name to figure out what it’s doing. Yet this is what every reader/maintainer/reviewer will do as they come across this code, trying to understand what this line of code does (definitely put code like this in a named function, to provide some documentation, but this is just a band-aid, and not a solution).
Ultimately, my eye caught this tweet in the replies and I ended up agreeing with:
Python One-Liners
I’m in the middle of reading Python One Liners at the moment, and I have similar complaints. Don’t get me wrong, I like the book and I think there are a lot of really good techniques in the book; my problem is with the message it sends. The author holds one-liners up as the goal of writing “Pythonic” code, but I feel like this is decreasing # of characters without focusing on understandable intent.
Take this example for a second, and try to tell me what name you would give it if it were in a function:
q = lambda l: q( [x for x in l[1:] if x <= l[0]]) + [l[0]] + q([x for x in l if x > l[0]]) if l else []
I’m going to write this my own way, and let’s see if you can figure out what this code was doing:
def quicksort(l: list) -> list:
if not l:
return []
partition = l[0]
return quicksort(x for x in l[1:] if x <= partition) + [partition] + quicksort(x for x in l[1:] if x > partition])
Yes, my way is more verbose. But which would you rather be maintaining, or reviewing, or trying to understand.
As an industry, I think we have an affinity for the shiny features of a language (and this is coming from me, who loves the shiny new features of a language), but we don’t often talk about why we use those shiny bits.
In the above example, the main change I made was to just create a named function. I know, I wasn’t one of the cool kids using lambda, but that’s because I didn’t have a use case that warrants it. Lambdas are for anonymous functions — they are used to avoid polluting scope. But in the one-liner, we are immediately saving the function to a variable named q, in our current scope. We’ve picked the wrong abstraction for the job (and those of you who have read my book know that I talk about this a bunch in the first chapter).
And the root cause of this is (in my opinion) a drive to write “Pythonic” code.
“Pythonic” Code
When Python entered the programming scene, we were all wowed at how easy it was to write simple code that just worked. Over time, the sheer number of convenient built-ins, comprehension capabilities, and well-named operations on types made it really easy to identify “Pythonic” code.
I didn’t have to loop over indices of a collection , I could just do for item in collection
instead. (Or use enumerate
if I needed the index).
But…. over time, I think we lost sight of what it means to be “Pythonic”. If we look at the “Zen of Python“, one line states “There should be one — and preferably only one – way to do it”. But as this post shows, there are multiple ways of doing things, all of valid? How do we distinguish the most “Pythonic” way of doing something? Is it the most number of features? The shortest amount of code? No. It’s the most expressive.
Every choice we make conveys an intention. Lambdas convey anonymous functions that don’t pollute scope. While loops convey looping until a condition happens, not iterating over a collection. Dictionaries represent key-value lookups, not collections of heterogeneous data. Every time you go against the intent of the abstraction you use, you add cognitive burden to the user, and you decrease understandable intent.
Really, I think we get so caught up in the”one way to do it”, where instead, I think the Zen of Python intends to say “one way to convey the specific concept you are intending”. Thus, I don’t think multiple ways to solve a problem is a contradiction to Zen of Python, because there’s probably one of those ways that is the most representative of the intent, and that is what we should be writing for. Use the language features if they most accurately convey your intent, and don’t use them if they don’t.
(This is actually why I think map/filter are so contentious – they have so much overlap with generator comprehensions that they are almost entirely superfluous as to when I would use one over the other. Maybe, just maybe i could say that use map/filter if you have a named function to apply to a collection, but I also think that (f(x) for x in collection)
is just as clear (in roughly the same amount of code)).
My solution
So, given my advice above, how did I go about proposing my solution? Well, first, I thought about ways to express the problem domain. If you were to ask me how to see if everything was the same, I would never think that we would need a set, or to sort data, or to rely on integer conversions from bool (the sum(bools)
approach). Those are implementation details, and I feel like those are the wrong abstractions related to the domain. Instead, I want to check every element in the list for some predicate. This means that an all()
function seems most appropriate — I’m checking all elements in a list:
all(predicate(b) for b in bools)
Now what should that predicate be? Well, I want to make sure that every element matches each other, which means every element should be equal to the first:
all(b == bools[0] for b in bools)
Finally, I want to handle an empty list (I chose false for empty list, but if you want true ,the snippet above works just fine)
all(b == bools[0] for b in bools) if bools else False
I feel this is the most clear, and also the most accessible. Yes, I’m using a built-in and a generator expression, but I feel like those are more likely to be understood than lambdas, functools, set operations, boolean truth tables, sorted properties, or slicing that the other solutions proposed.
Accessibility is a subjective measure, I will warn. If you (and your team and organization — you typically don’t code in a vacuum) are comfortable with functional programming, then I think reduce/functools/lambdas might be okay. If you are mostly junior or new to Python, then I would be okay with a for loop instead of the one-liner. However, in my experience mentoring, running a Python meet-up, and trawling through many a code base, I think this solution fits a nice middle ground with expressiveness. In other words, I think understandable intent is high, while not being too much code at all.
(P. S. – some people might bring up performance concerns, and those are valid. If you are in a performance critical piece of code, AND you have demonstrably measured and identified the code as a problematic bottleneck – then you can sacrifice readability for performance, but you better have a really really good comment explaining why)
(P.P.S – here’s the code in a non one-liner. I like this slightly less, but it’s down to personal preference at this point)
def are_bools_same(bools: list[bool]) -> bool:
if not bools:
return False
# check every element against the first element to see if they match
return all(b == bools[0] for b in bools)
Commence the Feedback
(Note – this section was added after I published the blog)
One of the nice things about having a blog (or book) is that you can espouse your opinions as much as you want. One of the bad things is that people might have cognitive bias and treat anything written down in text as “the one and only way” (Ask me how bad impostor syndrome was when writing Robust Python). As we learned with the Zen of Python, intent matters, and I intend to share some airwaves with responses to this blog post that I thought were insightful, even if I still disagree 🙂
This tweet by Ryan Hiebert did give me pause, especially since the any/not all was a close second favorite of mine.
def ???(bools: list[bool]) -> bool all(b == bools[0] for b in bools) if bools else Fals
e
You might call this function “are elements the same” or “do elements match first element”
This is, admittedly, a different intent than “all true or all false”. I sacrificed semantics for genericity, but that might be the absolute wrong approach in a codebase that is very focused on boolean parameters and boolean logic. When thinking about accessibility of your code, keep in mind your collaborators and what they will intuit from your code. If they are comfortable with boolean truth table , then the all/not any approach is great. If they are FP aficionados, sure, use a lambda and a reduce (or map). Remember that my solution was aimed at a common denominator (one that I picked based on mentoring experiences) ; you can absolutely stray from that denominator if it makes sense for your codebase and collaborators.
Ryan also brought up that if the function were named all_or_none
, the all/not any approach would be more appropriate, and for this I whole-heartedly agree (goes to show you how important a name was; I didn’t consider this name, and if the name were in the tweet, I would have gone right alongside it). But in the spirit of this article, I believe that both of us can be right, it all depends on the intent of the codebase. If you want all_or_none
(as suggested by the prose of the original tweet), you should use the all/not any approach. If you want all_the_same
(as suggested by the code in the original tweet), I still maintain that my approach better conveys that intent.
Ultimately, I’ll pass the buck and call this a specification problem (with my tongue-in-cheek), where the customer gave us slightly contradictory intents, and illustrates that you really can’t trust written requirements and you should always go talk to your customer. After all, requirements are the hardest part of software engineering.
Solving this without using any() or all() is like coming up with your own way to count a string instead of using len().
The only reason you would do it that way is if you were not aware of the built-in function.
LikeLike