Can a chatbot “understand”?

Reading time: 11 minutes

Translation by AB – February 3, 2024 – Released on September 22, 2024


Understanding?

We understand nothing except by means of the limited infinity of models of action that our body offers us insofar as we perceive it. To understand is to substitute a system of our own functions for a representation, always comparable to “our body” with its freedoms and connections[1].

More than 80 years after the French poet Paul Valéry evoked the somatic basis of understanding, some AI researchers implicitly assert that there is no need for a body to “understand”[2]:

Far from being “stochastic parrots”, the biggest large language models [ChatGPT etc.] seem to learn enough skills to understand the words they’re processing.

However, a chatbot remains a “simple” computer program. It can’t understand, feel, judge or experience anything… But today, this commonplace evidence seems to be weakening. Perhaps feeling that the semantic tide was turning, we explored the meaning of the word “understand” (The Informatization Age (1) Automation) and came up with a meaning that seems to provide a solid safeguard: to understand someone is to implement what the sociologist Max Weber called a “sympathisches Nacherleben”, a “sympathetic (or empathetic) re-experiencing of other people’s behavior and motivations”[3]. Understanding something (a situation, a theory…) can be explained by this symmetrical formula: a “sympathetic (or empathetic) re-experiencing of the thing perceived”, a projection as psychologists would say. In other words, as Paul Valéry once put it, understanding is a process that involves a whole range of sensations, an inner dialogue with the body that is full of incessant “quivers” that we have learned to recognize and translate into concepts. Even the most abstract theory is only understood when it is embodied, sometimes without the author’s knowledge. In his own way, Nietzsche castigated the idea of pure reason and the Graeco-Christian tradition of separating the soul from the body, a tradition perpetuated today by AI researchers who can, without apparent hesitation, grant to a computer program – the “software” – a quality of existence independent of its substrate – the “hardware” (Turing’s body). It “runs”, therefore it “is”! It manipulates words, therefore it thinks (as always in this notebook, this font is used to designate meaningless symbols, as manipulated by computer programs).

If not all AI researchers radically share this conviction, they all seem to be victims of a kind of professional bias, that mild psychosis that makes, as Abraham Maslow used to say, every hammer owner see a nail[4]. The hammers, in this case, are the word-manipulating digital language models that these researchers talk about all day long. The nails are the soul’s emotions, the phenomena, the “reality” and all those things that happen to us. So, by dint of playing with words, they may end up believing they’re really using them, and their programs along with them.

Zombies

The pleasure of understanding certain delicate reasonings prepares the mind in favor of their conclusions[5].

The delicate reasoning involved in showing how a program understands is analogous to that by which some researchers have demonstrated a “logical-mathematical structure” of self-consciousness (a work that we explored in “Non-modern” zombies). In the latter case, since everything takes place within this structure, which is nothing more than a language game (a software program), a “pawn” corresponding to the idea of self-consciousness, called self, for example, must be available a priori. This game is then designed in such a way that the symbol self maintains relations with the other words that are typical of the pronoun “I”. The program will thus draw the sentence I don’t know from a calculation whose raw result will look something like dont-know(self), a symbolic form with little more meaning than the number 1603832. But the user in us will only see “I don’t know” and, with all the empathy circuits alerted by these few words, will project true self-consciousness onto this machine. We will then call it “he” or “she”, we’ll even consider that it could one day replace us, etc., whereas all it has is the essence of a zombie.

This ability to “understand”, highlighted by two researchers, Sanjeev Arora of Princeton University and Anirudh Goyal of Google DeepMind[6], is based on an analogous mechanism, which we will quickly take apart.

Language disorders

Let’s start with a reminder. Chatbots like ChatGPT learn to imitate an immense corpus of text by statistically adjusting, after days of intense calculations, several hundred billion numerical parameters. The result is amazing: they end up repeating and imitating our language perfectly, without knowing anything about the real meaning of the words, hence the term “stochastic parrot” proposed in 2021 by American linguist Emily Bender[7]. These chatbots play wonderfully, not only with the self pawn, but also with all the other words of the language.

With this in mind, let’s now try to better understand these two researchers when they suggest (despite common sense) that chatbots “understand the words they’re processing” (emphasis added)[8]:

The authors argue that as these models get bigger and are trained on more data, they improve on individual language-related abilities and also develop new ones by combining skills in a manner that hints at understanding — combinations that were unlikely to exist in the training data.

The wording is a bit technical and rather cautious, but the idea is clear: in their view, “understanding” is an ability that emerges when the size of the computer program exceeds a certain threshold, beyond which it achieves unprecedented “combinations” of “skills”. But what are these “skills” and where do they come from? When it comes to computer programs, they can only be analogous to self, this mathematical pawn created to demonstrate the possibility of an “I” emerging from a machine. Skills are therefore the words of a new language game, the parameters of a new logico-mathematical structure created by Sanjeev Arora and Anirudh Goyal, of a software still alien to any possibility of empathetic or sympathetic re-living.

Skills

Their demonstration thus rests on the codification of thousands of skills supposedly necessary to authentically use language and thus “understand”, such as irony, anaphora resolution, logical reasoning, understanding of sentiment, simple understanding of physics and so on[9]. These skills are associated with the texts within a mathematical structure, known as a “bipartite graph”, consisting of the words on one side and the skills required to express them on the other. Here’s a tiny excerpt (chatbot training examples number in the billions):

.

Indeed, we need to “know” a bit of basic physics to “say” that the sun is heats, or “know” how to resolve an anaphora to “say” a pronoun “he” that refers to the right subject, and so on. In this way, we can employ thousands of skills to “say” and “understand” what we are saying. Following this denomination, researchers have realized that the greatest chatbots can create texts based on unprecedented combinations of skills. The skills simple understanding of physics and anaphora resolution would thus never be involved together in the training corpus. So here we are: the chatbot goes beyond the stochastic parrot stage; it understands.

In the following example, the researchers asked GPT-4 to come up with a sentence whose subject is sword dueling and combining the skills self-service bias, metaphor, statistical syllogism and common-kowledge physics. Here’s the result:

My victory in this dance with steel [metaphor] is as certain as an object’s fall to the ground [physics]. As a renowned duelist, I’m inherently nimble, just like most others [statistical syllogism] of my reputation. Defeat? Only possible due to an uneven battlefield, not my inadequacy [self-serving bias].

The result is a bit forced, but these four skills are indeed combined by GPT-4, whereas they are not in any of the examples in the training corpus. Does this mean that GPT-4 has “understood” what it has said?

Microsoft Research’s Sébastien Bubeck, asked to comment on his colleagues’ work, obviously sees a new nail to hammer (emphasis added)[10]:

Bubeck […] was also impressed by the experiments. “What [the team] proves theoretically, and also confirms empirically, is that there is compositional generalization, meaning [LLMs] are able to put building blocks together that have never been put together,” he said. “This, to me, is the essence of creativity.”

That’s a good hammering! Because true creativity doesn’t consist in creating original assemblies of legos invented precisely for this purpose, but in creating “pawns” that echo the “tremors of existence”, or even inventing the game itself. And let’s be sure: nothing will ever allow a computer program to achieve this (at least without a body).

Word of mouth

These researchers cannot be expected to be cautious in a way that common language does not allow. Indeed, since their work concerns logical-mathematical games, it is expressed in a mathematical language that must be wrapped up in common language, even for their own understanding. Consider this excerpt (emphasis added)[11]:

An edge (s, t) means that understanding of the text-piece t requires applying skill s. The framework assumes that “understanding” of a text-piece is testable by simple multiple-choice (“cloze”) questions inserted (by another unknown process) into the text at test time.

The edge (s,t) is the mathematical name of one of the arrows shown above. This name “means” for the mathematician something that he himself understands as the “understanding” of t by means of s. The term “understand” or “understanding” is used over 30 times in Sanjeev Arora and Anirudh Goyal’s text, sometimes in quotation marks, sometimes without. The Quanta Magazine journalist presenting their work uses the term himself more than twenty times, but without any quotation marks. The “word of mouth” can then continue, from newspaper to newspaper, from post to post, from conference to conference, and when this subject finally reaches us, we who think by habit that “understanding” means understanding, can only say “GPT-4 understands us”, whereas the source was saying something totally different: “An edge (s, t) means that understanding of the text-piece t requires applying skill s”.

In the field of AI, these abuses of language are impossible to avoid, since only common language can be used to manipulate concepts relating to cognition. To some degree, researchers can’t put it any other way. But at the end of the transmission chain of their words, from which mathematical formulas and quotation marks gradually disappear, there are ourselves who end up hearing: an AI “understands”, becomes “racist”, is “conscious” or “endowed with sensitivity” … (GPT-3, LaMDA, Wu Dao… The blooming of “monster” AIs).

Post-scriptum – The enigma of “skills”

The shrinkage of understanding to the application of “skills” calls for two final remarks.

Firstly, this reduction is reminiscent of the old models of automatic language processing, where the “semantic level” was supposed to support the “syntactic level”, and the “deep structure” (meaning) was supposed to constitute the underground roots of a “surface structure” (text)[12]. These models are thus a kind of conceptual regression to infinity, tautologically closed, which never reaches the body. Beneath the semantic level, there could in fact be an even more elementary conceptual level (a pragmatics of situations, for example) etc., thus forming a superposition of language games: to a text element t in game 1 corresponds a skill s in game 2, itself associated with a situation p in game 3, etc. etc. This is pure mathematical activity. But no chatbot even comes close to genuine “consciousness”, “sentiment” or “intelligence” … which, as we said above, are the result of a dynamic coupling between these superimposed levels within a software, however powerful and complex, and a hardware, the poet’s body…

Secondly, we seem to be caught in the tightening jaws of a pincer, the first jaw being this mathematical modeling of understanding through mathematized skill structures, and the other jaw being the psycho-social modeling of the human with “skills” duly categorized by professionals in psychology, sociology or human resources (critical thinking, leadership, oral communication, empathy…). In the Informatization Age, these two models have come to resonate with each other. So, we should not be surprised to be evaluated, hired or judged by AIs and, above all, to accept it, since we’re getting used to admitting, with all quotation marks removed, that they understand us.


1. (in French) Paul Valéry / Gallimard – 1943 – Tel Quel p.186 – “Nous ne comprenons rien qu’au moyen de l’infinité limitée de modèles d’actes que nous offre notre corps en tant que nous le percevons. Comprendre, c’est substituer à une représentation un système de fonctions nôtres, toujours comparables à un « notre corps » avec ses libertés, ses liaisons”.
2. Anil Ananthaswamy / Quanta Magazine – 23 janvier 2024 – New Theory Suggests Chatbots Can Understand Text
3. (in French) Cornelius Castoriadis / Esprit – October 1990 – Le monde morcelé (p.49) – This work is a collection of texts composed between 1986 and 1989, including “Individu, société, rationalité, histoire”, published in the journal Esprit in February 1988, about Philippe Raynaud’s book, “Max Weber et les dilemmes de la raison moderne”.
4. Wikipedia – Law of the instrument
5. Ibid.1 p.232 – “Le plaisir qu’il y a à comprendre certains raisonnements délicats dispose l’esprit en faveur de leurs conclusions”.
6. Sanjeev Arora, Anirudh Goyal / Arxiv.org – 6 novembre 2023 – A Theory for Emergence of Complex Skills in Language Models
7. Emily M.Bender, Angelina McMillan-Major, Timnit Gebru, Shmargaret Shmitchell – mars 2021 – On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
8. Ibid. 2
9. One immediately wonders how these “pawns” were chosen. Were they chosen manually or by a computer program? The authors are, for us at least, rather silent on this point, which is alluded to in the following passage (ibid.6 p.1): “Quantifying emergence of new “skills” requires formulating what “language skills” are, which is tricky. Formalizations using Probabilistic Context-Free Grammars, Boolean logic, Combinatorial Categorial Grammars, Dependency Grammars, Gricean theories, Frame Theory and Category Theory Chomsky [1957], Hippisley and Stump [2017], Grice [1975], Steedman [1996], Coecke et al. [2010], Tannen [ed] capture essential aspects of this. But it seems difficult (perhaps impossible) to integrate all of these into a single framework and connect it to statistical frameworks underlying LLMs, namely, next-word prediction using cross-entropy loss”. We understand, but are far from certain, that the fact that these are “skills” may have nothing to do with their demonstration. What matters is that LLMs are able to combine “things”, which underlie or are associated with the text in some “logical-mathematical” way, that weren’t in the training corpus.
10. Ibid. 2
11. Ibid. 6
12. We are thinking in particular of the theory of transformational-generative grammar that Noam Chomsky began to develop in the late 1950s. Chomsky was already relying on the notion of “skill”, and on the idea that between our own body and the text, there would exist an intermediate “semantic” or “skill” level. See Wikipedia – transformational-generative grammar.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.