Bugs, humans and the P!=NP problem

Liviu Coconu
6 min readJun 11, 2024

--

It works on my machine.
Nothing has changed in our code !
We never tested this use case, who does that ?!

If you’re working in Software Engineering, chances are you’ve heard some or all of the above sentences in the context of software malfunctions, aka bugs. Also, chances are you’ve heard of the most (in)famous open problem in computing, the P!=NP conjecture, a notoriously hard problem that remains without proof in spite of decades of reseaarch(if you want to try you hand at it and win the $1Mio Millennium Prize, here it is: https://www.claymath.org/millennium/p-vs-np/ ).

Is there a connection between software bugs and P!=NP ? I think so. At the most superficial level, today, after decades of software engineering evolution, development of languages, methodologies, automated testing tools, test-driven development we still can’t write bug-free software, nor can we prove P!=NP. Indeed, I am not so sure at all which one is harder.

But if that was be all, I wouldn’t have written this article — it goes deeper than that. At the core of writing bug-free software lies the question: how can one tell, or prove, that a given program does what it’s supposed to do in all possible cases ? In other words, how can one prove that a program is correct ?

In practice, the (simplified) workflow is like this: some experts with domain knowledge write specifications, or specs, which describe what a program should do and a bunch of smart programmers design and write the code that implement — and test — that behavior. So, then, why do we still have bugs ? Apart from human error, mainly because the level of detail of writing specs cannot be arbitrarily high — or it will quickly take unreasonable amounts of time and effort.

Here’s when theoretical computer science enters the picture: a complete formal specification of what a program is supposed to do in all detail is equivalent to actually writing the program. This follows from the well-understood notion of Turing equivalence : any computer program is equivalent to a very simple, formally well-defined construction called a Turing machine (after Alan Turing, a pioneer of computing) and describing the behavior of the program in full detail is equivalent to fully describing a Turing machine implementing that program.

Writing perfect specs for a program is not impossible, but poses huge practical difficulties: how should we actually collect all boundary conditions — a term used by physicists to define the environment in which a system exists — of a program ? Fortunately, unlike physicists, we don’t need to go into particles, fields and forces. We work further up in the digital realm, so we have more control: some of the boundary conditions we can control: input interfaces, APIs, libraries, etc. But some we don’t: user behaviour and input data. For the latter, our specs need to go into enough details to cover all possible use cases. Here’s when we hit a computational complexity wall: the combinatorial explosion of all different combinations of inputs/outputs can take an astronomic time to hand-craft.

Gimme perfect specs

But let’s assume, for the sake of the argument, that we have perfect, complete specs. As per Turing equivalence, this is equivalent to writing the actual program. But how can we convert it to a real world program that runs on a computer ? And then, how can we ensure that we end up with a program that is also running efficiently ? Efficiency is a term that stems from computational complexity theory: in simple terms, a theory that tries to put estimates in terms of time and memory upon the execution of a program. Computational complexity is filling acrucial gap that Turing equivalence ignores: programs that are equivalent in terms of their behavior may differ dramatically in terms of time and resources needed to run and complete a task (a classic example are sorting algorithms). Clearly, a user expects their banking balance to be displayed in sub-seconds, not in a couple of years.

With this, let’s contemplate 2 scenarios:

1. If we have complete specs, formulated in terms of inputs and outputs, how can we write the corresponding program ?

There is one relatively straightforward way to do this: we can store all inputs and outputs as entries in a table. However, such a “program” — essentially a giant lookup table — is very unlikely to be efficient in terms of space, it can be very large and not human-readable, hence almost impossible to use in practice.
But converting such a “program” into a useful, efficient one is in general, a “hard”(*) problem that can take an amount of time and resources that grows exponentially with the size of the input program. This is because telling apart stuff that only looks random, but can be reduced to a much more simple form (abstractions) from stuff that is really random is known to be a hard problem (the technical details are beyond the scope of this post, but a very informative article can be found here: https://www.quantamagazine.org/complexity-theorys-50-year-journey-to-the-limits-of-knowledge-20230817/)

2. If we have the specs and the program, can we decide whether the program is compliant to the specs — in other words whether it is bug-free relative to the specs, which are supposed to be perfect ?

This is far easier: it’s enough to run the program for each and every input/output in the spec and check the results (**)

The above 2 statements are nothing more (and nothing less) than the P!=NP in (quite thin) disguise. P!=NP essentially conjectures that there are tasks, or problems, for which it’s computationally hard (slow) to find a solution, but it’s easy (quick) to verify if a solution is correct. So, if it turns out that P!=NP, this means that it’s hard — or indeed, practically impossible — to write reasonably small, bug-free, efficient programs even if we had perfect specs. So what is left to do then ?

The ways of Deep Learning

Scenario 1 talks about giant lookup-table programs. As it happens, a particular flavor of such a lookup table program is encountered in modern deep learning: a neural network is essentially a lookup table encoded in a smarter way (OK, in a much smarter way..). The whole point of deep learning is to end up with lookup tables that encode the humongous amount of input/output data into a much smaller structure — but not quite a program in the common sense. Unlike lookup tables, not all possible input/output combinations are stored. Rather, the algorithm “learns” patterns from a (large, but not exhaustive) training set. So there we go, problem solved ! Or not ?

Well, not quite. Deep learning always provides an approximate solution to any given task. In many, particularly very complex, cases, this is what we want, because the input-output mapping has some tolerance (cat classification may reach, say, 95% or 99% accuracy, but not 100%). But we wouldn’t want, for instance, our banking app to do money transaction approximately, with occasional errors.

So we still need regular programs with well-defined input-output behavior. An idea would be to use deep learning models not as the program itself, but as (approximate) specs to develop and test the regular program against. In doing so, we can hope to create a simplified model of reality that is, at the same time, much more detailed than any specs humans could come up with in reasonable time. We then write the program and apply the P part of the P!=NP (Scenario 2), checking the human-written program against the DL model — possibly in an automated way. Which will be as bug-free as the degree to which the simplified DL model mimics reality. That’s all the mighty P!=NP lets us get away with…

(*) what computer scientists mean by “hard” is exponential time in terms of input size. For example, running the program for input size N=10 could take one second, but running it for N=100 would take an hour, for N=1000 a few years and for a larger size more than the age of the Universe. So we can’t just throw faster computers at NP-hard problems, because no matter how fast they will still take an astronomic time for some input size

(**) By this time into the argument, I’ve seen people quoting Turing decidability/halting problem to (incorrectly) state that it’s not possible to determine if a given program is according to specs. But the halting problem is formulated in terms of arbitrary programs: it states that there can be no program that decides whether or not any other program will halt. It is known and well-understood that the halting problem does not apply to a specific, finite program. For any specific finite program and spec, the halting problem has a definite answer.

--

--