Entropy in Engineering Processes
Entropy is a simple concept. Yet every explanation available on the Internet seems to be unnecessarily complicated. As such, the concept is poorly understood outside of specialized circles, and the word is often used in less than rigorous ways. Perhaps the root of the problem is that entropy was initially discovered in the context of Thermodynamics. However, Thermodynamics isn't required for entropy to be introduced.
Let’s see.
Every process known to humanity has room for uncertainty. The entropy of a process is the size of that room. For a perfectly reproducible process, the size of the room is zero – given the same initial conditions the outcome is the same every time. This is all we need to know from a conceptual perspective.
We will now proceed with a more detailed analysis, but to secure my position on the conceptual front, I will first introduce an argument from authority by quoting a smarter person who has studied the subject extensively:
"… hencerforth we will consider entropy and uncertainty as synonymous”
That person is Edwin T. Jaynes.
Now that I'm safe from undeserved prejudice, let me list a few conceivable processes:
* configuring a Linux base image for a specific server role
* rebuilding a Docker base image for a specific container role
* setting up a complex cloud environment from scratch
* committing this post to its git repository and letting CI/CD publish it
* baking a cake
* preparing a coffee
* parking a car
* landing a plane
* approaching a stranger in a bar
* having a shower
* painting a picture
* delivering a baby
* flipping a coin
All these processes contain deterministic elements and elements that include significant uncertainty. All of them could go wrong – to different extents, for different reasons, with different consequences.
In the artistic world, processes should not be reproducible as that would eliminate creativity. In the engineering world, creativity is undesired in the execution of processes. In fact, reproducibility is essential, as irreproducibility could cost lives or even money.
The formal language of statistics would say that every process is represented by a random variable with a certain number of possible outcomes N. If we order those outcomes with an index i, we can define the entropy of a process by:
\( S = - \sum\limits_{i=1}^N P_i ln( P_i ) \) (1)
where Pi is the probability of the i-th outcome. It follows from this definition that if there is only one possible outcome (which will have probability one) the entropy of the process is zero. It also follows that entropy is maximized when all the outcomes have equal probability 1/N, in which case we have:
\( S = S_{max} = ln(N) \) (2)
If an outcome of index h is much more likely than all the others, we have:
\( S \simeq - P_h ln(P_h) \) (3)
where entropy is nearly zero. In that case, the “effective” number of outcomes will be nearly one:
\( N \simeq e^ {- P_h ln( P_h )} \) (4)
Whenever we don’t know the probabilities of the different outcomes we should prepare for the worst, and therefore expect entropy to be maximum, as given by (2). The way to lower that maximum is to bring N to a number as close as possible to the number of sucessful outcomes - for a reproducible process that number is one.
In addition to reducing the number of possible outcomes, we can further reduce entropy in any given process by reducing the probability of every undesired outcome. In the case of a reproducible process it would mean that expressions (3) and (4) are valid, with index h representing the successful outcome.
Obviously, every skilled engineer knows how important it is to reduce the number of undesired outcomes in a process and to reduce the probability of the ones that can’t be fully eliminated. But perhaps it's not so obvious that both actions are precisely aiming at reducing entropy.
What do those actions require in practice?
That’s where we find a slightly thermodynamical law: in order to reduce entropy we need to spend energy, but energy expenditure does not guarantee a reduction of entropy. Lowering entropy requires energy to be spent in the very particular way that reduces uncertainty. Unfocused energy expenditure will likely increase entropy.
But why is that?
Well, there are many more ways in which a process can have high entropy than there are ways for it to have low entropy, so any changes to a process without an entropy-lowering focus will likely increase it. In short, high entropy is generic whereas low entropy is specific.
Take the coin flipping process as an example. What are the actual possible outcomes?
* heads, no issues
* tails, no issues
* coin accidentally fell out of a window
* heads, but new flippings stopped for 10m as the coin slipped under a sofa
* tails, but new flippings stopped for 10m as the coin slipped under a sofa
* heads, but something got broken
* tails, but something got broken
* heads, but someone got hurt
* tails, but someone got hurt
The coin flipping process allows two sucessful outcomes: "heads, no issues" and "tails, no issues". The minimum entropy is, therefore, ln(2). This process is not reproducible because it allows for more than one sucessful outcomes but we'd like to make it as reproducible as possible, by excluding every other outcome.
When designing the coin flipping process, you could spend energy trying to get the prettiest or trendiest coin in the market, but that would do nothing to lower entropy in the process. In fact, it could even increase it if, for instance, the chosen coin was heavier than necessary, which would cause an increase in the probability of the last 4 undesired outcomes. You could also spend energy in decorating the execution environment with porcelain objects to suit your taste, unwittingly increasing entropy without adding any business value. Or you could leave the room as it is – by default, it's likely to have more porcelain than desired already.
On the other hand, if you're aware of every undesired outcome, you could spend energy preparing the coin flipping environment so as to avoid them: documenting and exercising the flipping maneuver, clearing the area, closing windows, buying replacement coins and so on. That is, you could perform the very specific actions that lower entropy, increasing performance and the probability of successful execution.
This description of the coin flipping process is a reductio ad absurdum exercise which illustrates my point in a way that's easy to follow. We could, of course, redo the exercise with a software engineering process, where porcelain objects are easy to find. That would come at the risk of discovering that Pip is a significant source of entropy compared to the apt python packages that are part of Ubuntu and Debian LTS distributions; or that Kubernetes results in unsustainable entropy levels below a certain business scale. To avoid offending my readers, I’ll go back to the coin flipping example.
I would not be surprised if rather than lower the entropy certain coin flipping developers opted for increasing it.
"You see, the coin flipping process has performance problems: coins are often lost out of a window requiring the purchase of replacements, coins get stuck under a sofa, coins break objects in the surroundings… all of this causes slowness or even downtimes, leaving the consumers of the coin flipping service waiting for too long, some times just to receive a timeout error".
The obvious solution, some would say, is having multiple coin flippers serving the coin flipping requests, to increase the probability that at least one is available when necessary. Since this requires coordination a coin flipping load balancer would be necessary and since the balancer himself could fail we would need a synchronized pair of coin flipping load balancers.
Assuming 4 parallel coin flippers are enough, we would have increased the staff cost by a factor of 6 without optimizing the individual flipping process, which is prone to failure. But ... hey! ... we have an Elastic Coin Flipping service! As for entropy, calculating the possible undesired outcomes of this Elastic Coin Flipping service is homework for the reader.