Education, Intelligence and the Turing Test
TL;DR: can ChatGPT or a similar system pass the Turing test now or in a few years ? The answer is very likely yes. Does that mean we finally have human-like intelligence in machines ? Most likely the answer is no. To see why, we go through the history of computing, the problem of induction, draw a parallel to the hard problem of consciousness in the context of intelligence and conclude with an extension to the Turing test, the Herbert test, named after Frank Herbert whose below quote nicely summarizes it all:
“Education is no substitute for intelligence. That elusive quality is defined only in part by puzzle-solving ability. It is in the creation of new puzzles reflecting what your senses report that you round out the definition.”
(Frank Herbert, “Chapterhouse: Dune”)
1. The problem
If you haven’t tried ChatGPT, you probably should. It’s an AI bot that has been specifically optimized for conversations. Indeed, in many respects ChatGPT is doing an impressive job at human-like conversations — see it for yourself [1]. But is it “intelligent” ? That’s a very different question, because it largely depends on what you understand by “intelligent”, your levels of expectations and, of course, your own intelligence.
Enter the Turing test: in the 1950s (please note the year, that’s 70 years ago), computing pioneer and one of the greatest minds of all times, Alan Turing, suggested to judge the ability of a computer to achieve intelligence by using a relatively simple test. Let’s ask ChatGPT:
“Me: what is the turing test ?
ChatGPT:The Turing Test is a test for determining whether or not a machine can exhibit intelligent behavior that is indistinguishable from a human. It is named after Alan Turing, the British mathematician and computer scientist who proposed the test in 1950.
The Turing Test involves three participants: a human evaluator, a human subject, and a machine. The evaluator communicates with both the human subject and the machine through a text-based interface, without knowing which is which. The evaluator’s goal is to determine which of the two is the human and which is the machine based on their responses. If the evaluator is unable to distinguish the machine from the human, the machine is said to have passed the Turing Test.”
Since no one knew (and according so some, no one knows still) a better definition for intelligence, the Turing test has been for a long time a placeholder for our lack of a better theory/ model of intelligence. It has been heavily popularized in science fiction as the holy grail of achieving AI, Artificial Intelligence.
In the times of Alan Turing, computing devices were very limited. The perspective of a computer “imitating” human intelligence was so far-fetched that defining intelligence in terms of imitation seemed to make sense. But we came a long way since in the Deep Learning era: as Pedro Domingo describes in detail [1], the machine learning paradigm shifted from what was known as“knowledge engineering” towards neural networks and large-scale, powerful Deep Learning models enabling us to teach computers large quantities of information about the world in a relatively convenient manner, including Natural Language Processing (NLP) conversational abilities.
Human-level cognitive abilities rely both on assimilating information about the world and inferring facts from that knowledge. We can, I think, confidently state that we master the first part by feeding large quantities of information in Deep Learning models. Even more, we succesfully taught computers “puzzle solving” to a superhuman degree: in chess and Go, machines have become superior to humans. Isn’t that proof that machines have achieved at least human-level inteligence ?
Not quite. Playing chess or Go is not the same thing as general intelligence. There is a term for the latter: AGI, Artificial General Intelligence, to distinguish it from “mere AI” specialized for a specific purpose. Specialized puzzle solving ability as in chess or Go also relies on learning data in an evironment with simple rules (chess and go moves), plus a limited form of inferring new moves by unsupervised learning (essentially generating new data by varying the existing). But no current AI model can figure out, by itself, how the world works beyond the data it has been explicitly trained on, even if it had access to all the information accessible to humans.,
To paraphrase a well-known problem in the philosophy of mind ([3]), teaching computers information is the “easy problem” of achieving human-level intelligence. The hard problem, or at least significantly harder, is, in my opinion, the ability of a machine to independently infer new facts, models and theories about the world — to create new knowledge. That’s Frank Herbert’s “inventing new puzzles”.
It can certainly come across as a gross and unfair understatement when I say “easy” in the context of teaching computers information about the world. Not by any means do I want to belittle the achievement. It took many decades and it is indeed a huge milestone. The term “easy” comes merely from the parralel with the “hard problem of consciousness”: it’s “easy” because in the meanwhile we have a pretty good idea of how to do it. Einstein famously said “Imagination is more important than knowledge”: we can now master “knowledge” part, which is an amazing achievement.
Back to the Turing test, the single most important aspect where the Turing test, in the context of the state-of-the-art of computing in our days, fails to be a good measure for “human-level intelligence” in a machine is that it only measures the “easy problem”. The Turing test is not designed to distinguish true intelligence (AGI, or Einstein’s “imagination”) from encoding large quantities of information and “serving” them to humans to what seems to be the right context. One can argue, of course, that figuring out the right context is in itself an intelligent feature. Again, it is a major achievement, but it’s not sufficient.
2. A way forward
Bots like ChatGPT can fool human observers into thinking they are talking to a human, thus passing the Turing test. Because we don’t have a good theory about what “human-level intelligence” is, some have argued that the Turing test is the next best thing, or even stronger: the only thing we can have.
But science doesn’t work that way. If you have a theory which fails to account for facts you observe, you don’t just keep using it and hope it will all somehow work out. In general, scientific theories exhibit what English philosopher David Hume called “the problem of induction”: we can never tell if current theories about the world will remain valid in the future. This is what happened when Newtonian gravity has been superseeded by Einstein’s relativity and the same must happen with the Turing test: what once seemed a good enough model to test computer intelligence is no longer sufficient, because it does not account for the human ability to create new knowledge as a key ingredient to intelligence.
A paradigm shift is thus needed, beyond the original Turing test measuring “imitation”. Crediting Frank Herbert’s observation about intelligence in the start of this article, I suggest as a better measure of human-like intelligence, the Herbert test. It has to handle the “hard problem” of a machine being able to come up with new stuff that isn’t in the training data but must also include the original Turing test, just like Einstein’s relativity includes Newtonian mechanics.
Here it is, the Herbert test for AGI in the times of Deep Learning: in order to pass the Herber test, a machine has to come up with a new piece of knowledge (theory, model) that is not present in its training data and express it in a way indistinguishable from a human. Only then, I argue, we would have achieved human-level intelligence.
3. Intuition or logic ?
So far, we only talked about how to test for the “hard problem”. But how do we even start actually building a machine that passes the Turing v2, aka the Herbert test ? I don’t think anyone has a clear answer yet as to how to acheive AGI. Again, I recommend Pedro Domingo’s great overview [1] of the evolution of machine learning algorithms.
Let’s look at the role model we want to meet and eventually exceed: how do human minds master “the hard problem” of inferring things about the world that they have not been taught in advance ? Again, we don’t know exactly (if we did, we could devise a specific test) but in general (human) terms we essentially use two tools: intuition and logic. Both are, in my view, absolutely necessary.
In terms of Einstein’s “imagination is more important than knowledge”, we can think of imagination as the ability to infer new knowledge out of existing knowledge. As physicist David Deutsch argues, logic alone is not enough, because we never generate new knowledge inductively [2]. What we do is use intuition to come up with new knowledge/theories and logic to validate that knowledge.
But what, exactly, is intuition ? Let me be lazy again and ask ChatGPT. It does a great job at reporting current knowledge:
Me: what is intuition?
ChatGPT : Intuition is a type of understanding or knowledge that is gained through direct perception or insight, rather than through logical reasoning or analysis. It is often described as a gut feeling or a hunch, and it can be a very useful tool for making decisions or solving problems.
ChatGPT’s answer is a fancy way to say “no idea”. We don’t really know, and so neither does ChatGPT. What we do know is that our minds somehow are able to (subconsciously) come up with sometimes very intelligent, sometioms downright wrong and dumb, new ideas. This is why we need logic as a verification system: you should not trust your guts, until you get it through your reason. It’s a dual system.
Incidentally, the intuition-logic dual system superficially, and probably misleadingly, resembles a particular machine learning architecture called GAN that has been very succesful in the recent years and on which ChatGPT is based on. Here it is straight from the horse’s mouth:
“Me: Please explain GAN
ChatGPT : A Generative Adversarial Network (GAN) is a type of deep learning model designed to generate new, previously unseen data samples that are similar to a training dataset. It consists of two neural networks: a generator network and a discriminator network.
The generator network is responsible for generating new data samples, while the discriminator network is responsible for determining whether a given data sample is real (from the training dataset) or fake (generated by the generator network).”
However, unlike human intuition, generating new data samples against a verification process can also only generate combinations of the existing knowledge — similar to inductivism in the philosophy of science. It is at this point, I would argue, that more work is needed towards human-like intuition. But this is already a long article, so I’ll leave this as a topic for a future one.
[1] Pedro Domingo, “The Master Algorithm”
[2] David Deutsch, “The Beginning of Infinity”
[3] Chalmers, D. J. 1996. The Conscious Mind: In Search of a Fundamental Theory