The Productive Pigeonhole Principle: An Answer to Searle’s Chinese Room
- Adam Timlett

- Dec 26, 2025
- 38 min read
Updated: Dec 28, 2025
PDF with timestamp certificate for download and citation.
Introduction
Searle’s Chinese Room is a thought experiment which is supposed to prime our intuition against the possibility that the brain is simply some type of computer. It has a long history. First published in 1980, it came to my awareness only in the 1990s, when it was then mainly being discussed in the context of progress in things like early neural networks and whether such ‘sub-symbolic’ architectures made Searle’s argument irrelevant.
Neural networks, and even generative AI and large learning models like ChatGPT don’t make Searle’s argument irrelevant. Indeed, they demonstrate just how trapped we still are inside the computationalist approach that Searle and his thought experiment sought to show us was actually a deeply flawed way to think about and to study minds.
The Chinese Room Argument is along the following lines:
Searle doesn’t speak or understand Chinese. But we imagine that he is placed in a room with no windows or contact with the outside world except for a kind of letter box or pigeonhole through which objects can be passed back and forth between the room and the outside world. Inside, the room is filled with filing cabinets with Chinese characters and there is a book full of rules, and paper and pencils and erasers for carrying out instructions.
There is also a fluent Chinese speaker outside the room. The native Chinese speaker writes down Chinese symbols on paper, as questions or comments in Chinese that she pushes through the letterbox.
Searle, then receives these pieces of paper through the letterbox and follows the rules written in the books where he has to look up other Chinese characters in the filing cabinets and write things down strictly according to these rules on other pieces of paper.
Eventually, by strictly following these rules, he ends up writing down a longer sequence of Chinese characters, also unintelligible to him, which he is then instructed by the rule book to push back through the letterbox to the outside.
The native Chinese speaker outside then reads these symbols which make perfect sense to her as an answer to the question or comments that she originally pushed through the letterbox.
In this way that Chinese Room conducts a conversation fluently in Chinese with the Chinese person outside the room.
Searle then asks us: In what sense, if any, can Searle be said to understand Chinese?

Discussion
The original paper has been revised several times in response to a huge amount of discussion online and in journals. I have an entire book of essays about the Chinese Room Argument which I read as a Research Masters student in Philosophy while at the University of East Anglia:
Preston, John, and Mark Bishop, eds. Views into the Chinese room: New essays on Searle and artificial intelligence. OUP Oxford, 2002.
What is remarkable is both the breadth of arguments and discussions, and the huge amount of confusion as to what constitutes a useful analysis of this thought experiment. What was debated was the veracity of the intuition that it gives us about the nature of minds, in contrast to the nature of computation.
Despite the volume of work, the only essay in this collection that conforms in any way closely to my own view is by Stevan Harnad:
Harnad, Stevan. “Minds, Machines and Searle 2.” Views Into the Chinese Room: New Essays on Searle and Artificial Intelligence 294 (2002).
(Also available online https://eprints.soton.ac.uk/255942/2/searlbook.htm )
Even if you are familiar with the arguments made for and against the intuition that Searle is providing, I still highly recommend Stevan Harnad’s article to cut through the noise and many misplaced arguments. In fact, I would say Harnad’s article is essential, and up to now, basically the last word on the Chinese Room Argument.
To summarise Harnad’s arguments, Searle didn’t frame this in the best way that he could have, which meant more noise and specious arguments against the Chinese Room were generated that are actually easily shown to be without substance when we make small inconsequential modifications to the thought experiment and also to the precise target of the attack on computationalism.
Also, there are lots of ad hoc arguments against the Chinese Room Argument which we can safely ignore. However, when the aim of Searle’s attack is suitably sharpened via Harnad’s clarifications and analysis, I would agree with Harnad that what Searle shows us is that computation cannot be all there is to having a mind.
What comes next
Harnad quotes Hexter on the value of ripping up a sign for London pointing towards the cliffs of Dover, as a productive act, even if we still don’t know the way to London. Searle’s achievement was to show that we are not heading where we want to go by just making more and more complex computational systems that mimic minds, if where we want to go is creating something that actually has a mind in the sense that a human being or other animals do. But, it is not at all clear where we would go next if we were interested, even in understanding what it really means, in scientific terms, to have a mind. That is, we don’t have anything really, outside of the computationalist and functionalist research program, which is based on the argument that the mind is actually just a particular type of computer that we can study just like any other kind of computational system.
What Harnad, himself, went on to research the subject known as ‘embodied cognition’. This is the idea/hope that by embedding computational systems in specific kinds of ‘substrate’ that we may solve the problems raised by Searle. In one way, this is logical, given Harnad’s argument that the substrate independence of computer theory, which is the theory’s strength, is also its greatest weakness. It is this substrate independence that lends the force of Searle’s argument against computationalism. (In subsequent discussions we had the idea of the entire Chinese telephone network and whether if it could be wired appropriately would then be considered to understand Chinese, etc. to demonstrate how indifferent the concept of computation is to its substrate). However, while Searle’s attack on the implementation independence of computational states, when clarified by Harnad, is correct, the consequences of what we should do about it are far from clear. It is not, in my opinion, as simple as looking at how we ‘embed’ computations, because that is also to misunderstand the nature of computer theory, and how it can be modified or extended in a mathematically meaningful way. What’s to stop us considering the substrate as part of the computational system? In fact, I believe this demands a rigorous approach to the problem of how we define relations between separate computational systems. Further, it needs to be motivated by a theory of risk that predicts why relations between separate computational systems will arise, and why they work the way they do. In other words, the task we have is to also explain what ‘drives’ biological systems to evolve minds which are not merely computational.
Meta-modular systems
I have looked at exploring computer theory, and extensions to it, via a theory of meta-modular sytems. I wrote about this in my book On the Origin of Risk. The main argument I make is that we should aim to learn from biology and also use computer theory to study biology and develop a new theory risk. This approach makes sense for theories of cognition too, because, in my opinion, Searle showed that we need new models of risk that relate to computer theory to develop a more successful scientific theory of cognition. We need to do this precisely on the lines of understanding productive relationships between distinct computational systems, as an answer to the Chinese Room Argument.
Modifying computer theory to construct a theory of risk
The main motivation or intuition, is that a mind is a consequence of solving problems in the environment, and therefore, the distinctive features of a mind are distinctive features of a strategy to manage risk. While many researchers have looked at the obvious relationship between evolutionary challenges in an environment, and the value of larger brains, or certain specialisations, such as social cognition, etc, this is all at the gross level of strategy, whereas what we want is a much sharper delineation between different risk strategies that actually relate to the structure of computations, themselves. Only this way can we possibly provide a coherent answer to Searle. Ultimately, because we need an answer to the direction that follows from Searle’s theory, we need a theory of risk written in computer theory, not any other kind of mathematics. While many computationalist researchers research the brain as a utiliser of Bayesian statistics, this ultimately, would just be running as an algorithm on a computer and so is not any kind of relevant answer to Searle’s Chinese Room Argument. Harnad’s article should suffice to rebut the idea that understanding Chinese is just a matter of executing the correct statistical learning algorithm or inductive procedure.
The main insight
The main insight that computer theory can be the basis of a mathematical theory for how risk is managed by minds comes from my previous article, which was framed in terms of the relation between computer theory and economic theory when we consider options for different extremes of adaptation. The Substack article is ‘Options Beyond Growth or Failure’, (also available on my website, https://www.turingmeta.org/blog )
What I show is that there are 2 default options when considering extremes of adaption: the 1st option of is to execute an existent learning algorithm (regardless of how sophisticated) and the 2nd more extreme option is to liquidate that learning algorithm if it is failing to provide good enough results so that the resources can be used for other purposes that might be more productive. This 2nd choice provides very high adaptive potential, but is also economically very costly due to the loss in the liquid phase. A liquidation phase means we lose almost all the infromation and invested energy that we had before, as we use the memory required by the original program for potentially a completely different purpose.
This basic choice, between: Pursing a learning goal, or abandoning it to free resources, is also the basis of mainstream capitalist thought about the efficient allocation of resources and occurs in lots of companies and at lots of levels. For example, a company can make some investment in some product run by a department in the company that is supposed to earn revenue. If it is not working as well as hoped, as the owner of the company we have a choice whether to continue to fund the loss-making department and product, or whether to ‘pull the plug’ and liquidate that department in order to free resources to be used more productively elsewhere in the company.
Some analogue of this is presumed to occur in the brain, in the same sense of the ‘use it or lose it’ idea when it comes to memories.
The structure of this basic choice between the two options is essential to capitalist society, also how economists recommend we work and manage incentives for workers. The threat of liquidation is also believed to be a necessary moral hazard in the form of the threat of losing your job, which then also incentivises workers to put maximum effort into success in the 1st option that we are trying to generate revenue from.
In my article, I argued that there is a 3rd option, which is to pivot from the existing learning algorithm to some other learning algorithm that reuses most of the bits from the prior algorithm, and so doesn’t go through a costly liquid phase to reallocate all resources. It’s a more extreme case of plasticity than just learning as we were learning before, no matter how flexible the original learning algorithm we’ve invested in, but, it isn’t as extreme as liquidation. The motivation for this is that any liquid phase is costly and involves lots of loss of energy and information. Also, it is predicated on identifying that the current learning algorithm is not performing well in some area. If the current learning algorithm is doing great, we don’t need it. But by adopting a 3rd option when a learning process is not performing well, of reusing existing information in a new task, we avoid the costly losses of a liquid phase, while we also avoid being trapped in the original learning algorithm with diminishing returns. In my article, I framed these pivots using game theory as a type of adaptation game played at multiple levels.
However, I also show in that article that the only way to coherently define this 3rd option is to also combine game theory (so that we can understand the moral hazard and different payoffs to the company versus the department) with computer theory to describe the relationship between the first algorithm and the second one that we pivot to, which reuses some of the bits from the 1st algorithm. (We can also use computer theory to define the 2nd option where we also escape the program and reallocate resources). The reason to use computer theory to define both the pivot and the liquid phase, is that, otherwise, we are trapped in a world of ad hoc processes and endless ‘levels’ of adaptation with arguments about extremes of adaptation that remain incoherent and confused. This is very similar to what happened with the arguments around Searle’s Chinese Room. A lot of the arguments against Searle were based on misunderstandings of what computer theory means, and how it works.
Understanding computer theory
The key idea for computer theory is that we can escape the first ‘program’ whatever it is, and however sophisticated it is, like pressing an escape key on a computer, or selecting ‘end task’ in task manager in Windows. Due to the Halting Problem/Theorem (I use these two terms interchangably), we also know that sufficiently complex computations have a weakness that means we may always need the option to escape them, without waiting for them to complete, because we may never know in advance if they will ever halt. This is the fundamental dilemma and weakness of sufficiently complex computation as shown by Turing and Church. As an adaptation game we therefore frame the dilemma from the point of view of an agent trying to obtain maximum value for their resources. They have different options which we know, in general, have no algorithmic solution where we can guarantee the value of the results of our actions.
The Halting Problem states that we will always have a dilemma with no algorithmic solution that we can deploy, about when to abandon an investment in some sufficiently complex algorithm, in favour of doing something better with the resources it is utilising.
Hence, we can use computer theory to give a hard edge to what we mean by the 2nd and 3rd options, regardless of the sophistication of the original program that we are escaping (given that we are seeing evidence of diminishing returns from it). So, the way we manage risk in an adaptation game can only be defined using computer theory.
One of the outstanding problems that I glossed over in this was the idea that there is no ‘outer’ program that then passes control to another program, once we escape. That is, if we want to do something better with the resources, what is the program that decides this? Aren’t we continuing to invest in this program, even though some other part of it is already failing? Or, we could put it like this: If we do ‘escape’ to an outer program and reuse some of the bits, rather than just liquidise the prior program, how we can ensure that this is actually a useful ‘pivot’? This is similar to the first objection. Can we ever really escape from some program that we are invested in? Isn’t there always another program that we need, however imperfect, in order to control this step which may also be giving us diminishing returns? We already know that the program can’t have guaranteed value for its results, but to formally study the options, it seems we need to define this imperfect program anyway. These are all excellent questions and I will show that we need new mathematics to combine with computer theory, which I have now developed, to answer them.
In my book, I already went into more detail about the concept of distributed meta-modular systems, by which we can look at multiple systems that interact without an upper level, outer program, higher up the hierarchy. Nevertheless, it is hard to understand how the choices can be made after that point that we escape, and control passes to another program, especially one that reuses prior bits instead of merely liquidising the old program and using it for an entirely different purpose. These are the things that I will address in this article and relate back to Searle’s Chinese Room Argument. Harnad argued Searle shows that computation cannot be all that there is to minds. I wholeheartedly agree, but I think that pivoting while understanding these gross changes relate to computer theory, is the key idea that we need in order to show that no algorithm is a mind on its own.
Pivoting is not simply escaping
My argument to understand how we can escape a program, but reuse the bits allocated to that original program, is based on the idea I’ve developed called type redundancy. I gave a talk on Type redundancy in Vienna at the end of 2025 (slides available from my website blog here). The idea can be seen very clearly when we make jokes, such as simple puns: ‘If mediums can contact the dead, imagine what a large can do!’. (maybe first seen on Reddit in 2022). The idea depends on taking information that is parsed with one grammar, and changing the way that it is parsed so that some of the information switches from one type (i.e. category,) of information to another. In the pun’s case a noun type, ‘medium’ turns into an adjective type. We can do this because the ambiguity is already in the ‘data’. We end up with a kind of cross-talk possiblity due to the ambiguity. This is not how computer programs work, they rely on unambiguous grammars. But natural language is full of potential ambiguity. This seems inefficient, but is likely related to how minds actually work, and why I believe they, like our bodies, are not merely computer programs or mechanisms in the ordinary sense, alone.
Cross-talk
This kind of ambiguity leads to cross-talk, which we now know, due to research in biology in the 21st century is ubiquitous in signalling systems and other processes within the human body. However, the very name we use, ‘cross-talk’, belies the mechanical lens that we are using to describe a biological system. In a biological system this cross-talk is functional, but, from a mechanical engineering mindset, cross-talk is literally ‘getting your wires crossed’. It is a ‘bug’ in the process, a flaw by which information sent to one destination ends up in the wrong place, or at least also disrupting another signal path it is supposed to have nothing to do with. It often comes about when two wires are just too close together. The signal one wire transmits then generates a signal in the other, that it is not meant to have anything to do with. The transmission is therefore unintentional, certainly not planned by the circuit designer, in the human engineering case. The consequences are often complex and unpredictable because we are talking about two parts of a system that are supposed to be separated and do separate tasks.
As I already said, this ‘bug’ type of cross-talk also happens in biology: We now think that people who sneeze when they see a bright light (like myself) have a gene which means that the optic nerve which partly runs down their face, is interfering with the nerve that controls the impulse to sneeze which is very close to it for people with that gene. Cross-talk in our body’s signalling systems like this is indeed a ‘bug’ (in the software/mechanical engineering meaning of the word ‘bug’), but lots of other examples in our bodies are not merely ‘bugs’, at all.
The whole idea I have is that biological systems are exploiting emergent cross-talk to manage risk, as a way of pivoting with no overall master control. It’s not just a ‘bug’, but an essential adaptive feature. One way that it can be used is if we have sufficient type redundancy. Then problems at one level can be corrected at another level using this special kind of redundancy. For example, if you manipulate genes so that the wrong gene or a mutation of a gene is read, then adjustments can be made at the level of the cells to correct for this by making adjustments to their signalling and regulation. This is an example of type redundancy and is a process that has been confirmed by analysis by biology researchers. This type of thing means that we can switch something at one level, and filter out many of the negative consequences downstream, because of this special type of redundancy in systems at across different levels.
Rather than switching from one program to another using a master program, we ‘escape’ the original program (the genetic ‘instruction’) but reuse the bits in some modified program (the adjusted behaviour of the cells) orignally by ‘accident’. This can be simplified to also be seen as a kind of cross-talk leveraging type redundancy to make the effect meaningful. This solves a real problem of risk when the first program is failing, and isn’t the same as relying on a master program to switch control, just as in the cellular example I alluded to, where the genes mutate in an undesirable way, but the cells can adjust their behaviour to compensate. Such management is now distributed in the system rather than coming from the top. One of the reasons for thinking this is important is to understand how genes are actually regulated. Each cell has the full repertoire of genes in the genome it carries, but only uses certain ones at certain times. Whereas, in the late 20th century, we tried to find just such ‘master control switches’ in gene regulation to identify how gene regulation works, we now know that for the most part these types of ‘master control switches’ just don’t exist. Their idea was an artefact of thinking in mechanical terms of the solutions that Nature would find, and more can be read on this in the excellent recent book How Life Works by Phillip Ball. This book is also good on the complexity of these types of cross-talk mechanisms and how they are functional. So, for a full treatment for the evidence of the value of cross-talk and similar processes, it’s recommended.
Type redundancy in software engineering
Despite not really being a mechanical idea, this same concept of utilising type redundancy is actually already central to the discipline of software engineering and analytics. We have systems that read programs and data in a mechanistic way, so that the name of a function can be anything, like a string of meaningless characters and numbers like 48gd8438%GR, and the computer will happily read it and reliably execute it if we write in the program that the 48gd8438%GR function should be executed.
But, as human beings developing the software, while we can simulate in our minds how the program is going to execute the function, we also want to name functions as if they are also adjectives, so that we have a clue before we read and try to edit the program’s flow of control. When the computer program errors, it has a bug, and we must ‘escape’ it, to exit the program, so that as developers we can now fix it. When we do that, our minds read the function names and then choose to read them as descriptions/adjectives. If we’ve named them appropriately, like OpenFile instead of 48gd8438%GR, then in this current mode as human ‘computers’ we can now use this ‘extra’ information to execute various function modification tasks to investigate and change the software, so that it works again. As human ‘computers’ reading the code we pivot to reuse the function names in a different way than the original program uses them. The meaningful descriptions of the functions, such as OpenFile, help us to fix it.
Mathematically, this is essentially the same type redundancy as interpreting ‘medium’ as a noun and then pivoting to regard ‘medium’ as an adjective in the same sentence. But instead of doing this for laughs, we do this to manage a risk problem, that of a program that is not working properly. In some sense, we can imagine biological cells also performing similar processes, using surplus information to correct errors in genes that get activated, and then compensating by fixing the problem with downstream changes. Again, this is predicated on type redundancy, and pivoting in the use of overlapping information which is surplus to one system but not to another.
Surplus information
It’s really important to realise that this type change of the use of the function names, from just ‘handles’ for what to execute, to descriptive information, means that from the perspective of the original program itself, this descriptive information is surplus to requirements. Just like if I use the same noun of the pun about ‘mediums’ who can contact the dead with no ambiguity: e.g. ‘Mediums are such weird people’. Without the shift in perspective, the fact that ‘medium’ could also be an adjective is surplus to parsing that sentence. It is information I didn’t use. What we need to understand then, is that from the perspective of computer theory, we can never discount that what is surplus information to us right now, might become essential information for some other program in the near future. This is the key to understanding why such pivots are just a productive version of the Halting Theorem.
The actual proof of the Halting Theorem (and also Godel’s Incompleteness Theorem) also relies on such surplus information. We can then ‘step outside’ of the program execution to see that surplus information is not surplus to the proof, but the ‘program’ cannot ‘see’ or utilise this surplus information. (Later, I will show that, in just the same way, Searle in the room can’t ‘see’ Chinese). In the proof, the surplus information a self-reference to the program itself. E.g. a program that we know is being fed itself as an input to its own program. But, in general, from a risk point of view, different programs can usefully utilise surplus information to reuse bits. In other words, we are extending computer theory to understand that surplus information is everywhere, and can always be utilised by programs exogenous to the program that it is currently surplus to. And this information can be useful to the overall stated objective of the original program.
What that also means, as we shall see, is that from the perspective of any algorithm, due to the Halting Theorem, some information is always ‘missing’ or not utilised by a program that we can see from outside might actually be useful. In other words, the Halting Theorem is a proof of the evergreen scarcity of information, just as Godel’s theory also showed this, via the evergreen incompleteness of mathematics.
Let’s now look at the consequences of being able to utilise such pivot options due to type redundancy, or ‘surplus’ information. and where they are not available how it restricts the adaptive capacity of the system.
Searle the software developer
We again imagine Searle in his room. If something goes wrong with the program Searle is following via his rule book, he has no way to fix it. All the symbols in the room are in Chinese, and Searle doesn’t understand Chinese, so there is no way for him to access the surplus information that he might use to correct the error.
‘Fixing’ is, we argue, for now, reliant on a type of pivot as defined in an adaptation game using computer theory. In other words, Searle must stop following the rules, i.e. ‘escape’ the program. Then he must re-read what he has and analyse rules, using information surplus to the original program, to guide him. This is equivalent to reading the Halting Theorem proof to utilise the surplus ‘self-reference’ information in the original program to follow the proof, which the program itself can’t see.
But here Searle is unable to pivot, or ‘the room’ if you like, is unable to pivot because he can’t read Chinese. This, I would argue, is the biggest clue as to why Searle’s intuition, when he constructed this argument, was right. Without sometimes being able to pivot, you don’t have a mind, you don’t understand what you are doing. Understanding here, is now reframed not merely as executing instruction, but as being able to fix something when those instructions are broken.
The solidity of the ‘fixing things’ intuition
This is a very solid intuition. For example, anyone can follow prescribed steps to solve an equation. But what if one of the steps is wrong? Only people who actually understand the equation, and the concepts behind it have the best chance of recognising the error and fixing it. One way to mathematically define this, is that if the equation has a name, we understand why it is called that, and the descriptive element of the name can be useful information to fix an error in the equation. The new program we ‘escape’ to utilises information surplus to the first program to fix it, i.e. the descriptive information in the equation name that we didn’t need to follow the rules of application. Type redundancy and pivoting is therefore essential and a rigorous description of a pivot to manage the risk of program failure. Note, that even if surplus information wasn’t in the equation name, we would have other information that overlaps with the procedure we were following by rote. We are reusing bits, but we do this reusing by using surplus bits that are overlapping with the original bits, in other words we rely on type redundancy for what we call understanding, in the sense of knowing how to fix something. And, this type redundancy is of the very same kind used to define and construct the proof of the Halting Theorem, and also Godel’s theorem of the Incompleteness of Mathematics. We have arrived at the idea of understanding involving a very specific form of strategy for robustness of a system.
Generalising to diminishing returns
We can now generalise to define any program as needing work outside it, if it needs information surplus to it to provide more value. So, we can go beyond gross simple error to diminishing returns as a model of the need to pivot, based on diminishing utility rather than simple gross error. This is essentially the model of an adaptation game that I used in my article ‘On Options Beyond Growth and Failure’. So, recall, it is when our original learning algorithm is seeing diminishing returns that we want the option to pivot and ‘fix’ it. But now such fixing need not be remedial, it could also be an improvement, to make the algorithm better adapted to the current situation it finds itself in.
As a simple example, in the context of a conversation, we can also analyse conversations in terms of the diminishing returns when one person dominates. This requires that she leaves pauses and options for the conversation to pivot, to let someone else speak in the conversation before the returns from the first speaker approach zero and the conversation runs out of value completely. The need to be able to exit the topic and let someone else speak in a conversation can be framed as an organisation game, another analysis of game theory that I explored in my book On the Origin of Risk. (I also have articles on my blog on organisation games and the need to pivot strategies in real conversations, see here).
We can now see a connection to the original idea of the Turing Test; the idea that a computer algorithm like ChatGPT could be said to have a mind if we are, with unlimited time, still unable to tell the difference between the algorithm and a real interlocuter. Current Large language Models have problems exiting or handing over to the interlocuter at the right time, or declining to offer advice when they actually have no good information of value. Since some diminishing returns are invisible to LLMs (using the Halting Theorem), I would indeed argue that they are indeed unable to effectively pivot to ask for more information, or withhold low quality information to keep a conversation going with high value to all parties. They certainly show signs of diminishing utility in many areas. But they cannot pivot.
This is likely to be a fundamental weakness by my analysis and not trivial. It ultimately relates to the nature of understanding itself. It is part of the Halting Problem: No single algorithm knows, and it is in general not knowable, when the program should stop ‘talking’, and when it should continue. Yet, other relatively simple algorithms can know very well that this one should stop talking. It’s about using information that is surplus to the current program that is currently holding the conversation.
Risk & understanding
Without this type redundancy, and the possiblity of exiting the original program there is a fragility to the way that the Chinese room understands because of this lack of an ability to pivot. It is a prisoner of diminishing returns on the value of the program. It’s a risk problem, not just a computational problem. Further, it is a complex risk problem, because the original simple program is potentially very efficient. It can be faster to execute a set of instructions than to understand the context and motivation behind them, such that they could be easily and safely modified. This itself is a statement of risk, about the inefficiency of deep understanding; the pressure on us due to the opportunity cost of understanding. This is another problem that LLMs like ChatGPT have, which is that they are not simple enough to execute deterministically and efficiently. They are too mono-modular, and so provide a very inefficient way to do lots of tasks like automation and workflows.
It can be quicker and more efficient to follow instructions, than to understand them. This can be far more productive, at least for a while. For example, simple programs scale efficiently. This creates the dilemma or the essential risk problem, and is ultimately why it makes sense to have simpler programs that we can escape from when we need to. We need to use the more complex program more sparingly, rather than just having a mono-modular program using all the information surplus to simpler programs that it can find. We need to execute programs, and also check what they are doing as they go, etc. In the end, from a risk perspective, we need both simple and complex programs. So, we see in the economics of computation understood through computer theory and game theory that we need relationships between distinct programs or algorithms to have a mind that can manage risk effectively.
So, we have now arrived at a reason to use computer theory to explain why from a risk point of view, a computation cannot be all there is to a mind. Just as Harnad says.
Back in the room leveraging type redundancy
If there is type redundancy that Searle can use, then the symbols would have to be in English for Searle to be able to exploit it. Then Searle could still follow all the instructions as before, by rote, as if merely a computer. Then, if there was an error, he could escape the ‘program’ he was ‘running’ by following those rules, and then actually look at the English characters written down, using them to understand what needs to be fixed. They have gone from being nouns, to being adjectives. Hence, from a risk point of view, utilising the same information and bits, but in a different program, by leveraging such type redundancy, is useful in order that one program can fix another by switching modes like this. This is a solution to a risk problem that seems to benefit from minds in this very specific sense, but it also doesn’t have the disbenefit from being mindful all the time.
It is the pressure from below not to need to understand, to simplify where possible that generates this meta-modular solution just as much as the pressure to understand when things go wrong and need changing.
This all suggests that ‘understanding’ is contingent on certain things, and is a solution to risk which also has essential solutions where understanding is unnecessary. And of course, it is true that we have the concept of automatic processes which happen in the brain unconsciously, versus where the mind becomes conscious and intervenes. This distinction perfectly maps to the concepts of mind as a risk management solution that I’m developing here. It is just that to be rigorous, the risk problem is defined in computer theory, and the solution of a mind relies on surplus information being available to other programs outside the one that currently has control. In other words, type redundancy and pivoting is essential to a mathematical description of the risk problems being solved by a mind, versus either a single simple or single complex algorithm.
So, when we add this capacity of Searle to utilise type redundancy we also take away the key part of the argument that gives the Chinese Room its force. If Searle understands Chinese the problem doesn’t exist. Notice, though, that my argument is not that this way Searle needs to understand what he writes to understand what he is doing. That is tautological. My argument is that meta-modularity, the ability to pivot and reuse information from one program in another different program, is what matters here as the basic plank that we can use to construct an argument that, in the end, Searle, and all other humans, are not just computers because they manage risk differently to any mono-modular system. While such systems cannot pivot, as defined using computer theory, minds can. This 3rd risk management option is crucial to understand what minds are ‘for’.
The computer doesn’t care about such descriptions. When it is executing what the precise name is, it is just a handle to the function, a way to call it. But we care as programmers: If we are to modify the program easily, we must name the functions things that describe what they do to us, so that the name also contains information as an adjective for us to do our jobs.
The function of naming things in useful ways to us to use later is probably also essential to a minimal model of a mind with limited computational sophistication; an emergent function that also has the minimal ingredients we need to add on top of simpler algorithms. We need this minimal activity before we can say that the processes they are part of actually represent something we could call a mind. I will now focus on defining the minimal requirements for the sophistication of this pivoting process and show that such pivoting is also probably happening in the cross-talk of our bodies cellular and other signalling and regulatory systems. This will get us free of the objection that my argument is circular, but it will also mean that we start to step away from the idea that our bodies are just complex machines, as well.
Reducing minds to a minimal model
It is not necessary that we are superior or complex or know more than the computer program in order to pivot. We don’t have to be clever to get the joke. First of all, it is conceivable by some measures of complexity that large language models like ChatGPT of Claude are already superior to us in certain respects. The point is that the pivot from one use of the information to another use is essential to manage risk regardless of relative complexity. But this pivot is defined by computer theory, so we only look at the extreme case of adaptation, no matter how sophisticated the current program is, computer theory mandates we look at the case of where we exit that program to access information that is surplus to it. And we know due to computer theory and the Halting Problem that there is always surplus information that might be relevant. So far, we looked at the risk that the program needs to be modified, e.g. due to an error or just diminishing returns. Only by reusing information, by utilising type redundancy can we efficiently achieve this.
My argument will be that this pivoting requiring meta-modularity is not any sort of specific algorithm. Also, that the algorithm that pivots doesn’t initially to be sophisticated, because that is no answer to the problem of what is a mind. At the moment I’ve talked about software developer’s minds being the ‘program’ we ‘pivot’ to, in order to fix a program. The programmer uses type redundancy, but she also has a complex mind, already. So that can’t be the final answer, as it would be a circular argument. What we need is a minimal model of such useful pivoting, and the useful exploitation of type redundancy.
Internalising the process
This pivoting of data types only happens because of type redundancy, types such as function names are simply nouns to one program, the computer, but adjectives to other programs (the human developers). This type redundancy is essential to reuse the bits that the prior program has, in the new program, e.g. the fixed program that we then execute after fixing the original program. This, again, is a feature of natural languages, and a feature of how meta-modular systems like minds use them to manage risk.
And, so, to make the argument clearer, instead of thinking of the program and developer as separate, we can now imagine that the program is running inside the human developer. A similar move was made by others to show that we can’t say that the whole room understands Chinese, while Searle doesn’t. (We can imagine that Searle memorises all the rules and everything in the filing cabinets. He then executes those rules in his head, but does he now understand Chinese any better than before?).
Sometimes the human developer uses internal information in their brain as if it is just a name in order to access some function. But, if this is not paying off, the human developer can switch to re-reading the name as an adjective, and can then pivot, escaping the original program in their own heads and modifying it using a different program in their heads. This is meta-modularity.
We can now consider the effect of reading things one way, versus reading them another. In one reading, the human reads the sentence of the pun with ‘medium’ as a noun. But then they re-read the sentence with ‘medium’ as an adjective.
Rather than being supplied with jokes by others, we can also do this internally ourselves. We construct our own jokes. They need not be jokes actually, they can just be sentences that we say with one grammatical parsing, but after we say them, we realise that they can be understood with another grammatical parsing. If someone is listening, then they might interpret it the way that we didn’t intend, but we may decide, that we prefer their interpretation to the intended one. We may even do that internally, and decide that our second interpretation of our own sentence is better than our original intended interpretation.
These are all examples of micro-pivots. We construct some information with one ‘program’ or intention, and then accidentally exit to pivot to a 2nd one where the 2nd program is now in control. There is a great example of one kind of such pivot in the film The Royal Tenenbaums, (again it’s a joke) where the main character Royal, who is the formerly estranged patriarch of the family and also a pathological liar, says something to the effect that his time with his family, recently, has been the happiest of his entire life.
Then the voiceover says:
“Immediately after he said this, Royal realised that it was true”.
The question then becomes how can we understand outside of any ‘greater program or master logic’ how these productive pivots happen, or if they can be generated without a master control. The answer I believe, is that they can happen due to a combination of type redundancy, and a theory I’ve developed called label theory, when augmented by the pigeonhole principle.
Label theory is something that I write about a while ago and published on my own blog, at least as a set of slides. Label Theory and the Value of Cross Talk The key innovation of this article is to add the use of the pigeonhole principle.
The pigeonhole principle
The pigeonhole principle is deceptively simple, because it is also very powerful. It merely says that if you have n things and m pigeonholes, and n > m, then if you put n things in the m pigeonholes, at least one pigeonhole must have more than one thing in it. It leads to counter-intuitive ideas such as given London has n people in it, and people have a maximum of m hairs on their head, given that n > m, we know that in London there must be at least two people who have exactly the same number of hairs on their heads, (see this Wikipedia entry for more information).
Applying the pigeonhole principle to label theory
Label theory is just a different way of expressing the ideas already discussed about type redundancy. It states that there is surplus information/type redundancy. But we now explore the case, specifically, when this results in cross-talk. An example is that we might expect to achieve some objective, such as agreeing with a friend to go to ‘The French Café’. Due to this cross-talk, when we go there together, we might end up at a different café, Boulangerie Jade, which our friend mistakenly thought we were describing. This is because our friend interpreted the statement ‘Let’s go to The French Café’, as referring to their local French café when in fact there is another French café, called The French Café. (I know that some businesses now also take advantage of this cross-talk potential when people search for places on Google Maps, by naming their business Café Near Me, and so on).
If we imagine a namespace or a codespace of n options, e.g. of short names of cafés that are also descriptive in some way, then this is constrained by the options to faithfully describe some aspect of the café and the limited length of the description. The options are also constrained by the variety of types of café that there are; similar cafes, e.g. French cafés, have more limited different possible descriptions. We can also then say that if there are n different cafés and they are in a space of a smaller number of m possible useful descriptions given these constraints, given that n > m, then there are some cafés that have exactly the same description, like the French cafés. Notice that the productive pigeonhole principle shows up that when we move from the written word to speech, because the number of distinct descriptions (m) drops, because we can no longer hear the difference between ‘The French Café’ and ‘the French café’: They end up in the same pigeonhole.
Now we can also say that we logically map some naming system, such that it behaves like a descriptive system, such that similar names in the naming system have similar positional properties. We can say that, for our purposes, that’s all mathematically that a ‘description’ really needs to mean. This is almost the opposite to assigning a random unique character to each thing. But, if you, for example, had a unique numerical ID assigned to each café, starting at 1 and which was incremented by 1 based on when each café was originally opened, then there would be a ‘descriptive’ element to the number ID, because cafes with a similar number and a similar position in the index, as their ID, also potentially have a similar age. You see this in membership databases, where the original members of some organisation all have a very low member ID number, etc. So any naming rules that we use might have some degree of similarity that the productive pigeonhole principle can potentially exploit. And what this means, is that many naming systems also have descriptive qualities. And so, we expect that we limit the loss of information from cross-talk that occurs due to this principle, because it creates type redundancy due to the descriptive value. Just as we can safely assume it could not possibly be that bad for me to accidentally end up at Boulangerie Jade instead of The French Café.
If we make these mistakes between naming things that have no relation but end up in the same pigeonhole, that can be very bad. Imagine getting not your prescription medicine oxylogin for treating a nasty rash but instead getting a blood thinning treatment called oclyogin. The loss of utility from cross-talk is only limited dramatically if both the things that end up in the same pigeon hole can be legitimately described in a similar way in the way that matters to you. Of course, this still assumes that the description contains the type of information I actually care about, but it is still better than purely random cross-talk like the prescription example. If a French café is what I was aiming for: I am still going to get a croissant, either way when I end up at some other French café. (Also, the less complete the description, the more positive surprise is possible, given the right sort of alignment. I might find that the French café I’ve ended up at also plays French House music which I love, while the other doesn’t. I discuss this positive surprise idea in my blog article in the slides).
In fact, any spatial or other dimension that we arrange any signalling system may contain ‘descriptive’ elements or information in the spatial and temporal location of the signal, especially when combined with other shared things like states. Another example in the temporal case as we zoom out is autocorrelation. Another example is where we send letters to a location, and get the address slightly wrong. If the content of the letter correlates with content about the location, then delivering to the wrong address still delivers relevant information. That’s why, for local news content, we often hand deliver the same letter to everyone in the local area without worrying about the exact address. The addressing system reflects the content we care about.
We can even try to exploit this productive cross-talk effect by more deliberately exploiting the productive pigeonhole principle, and reducing the uniqueness of descriptions of entities so that the pigeonhole effect is bound to occur, and there is bound to be cross-talk somewhere. This is actually how I am creative in my research; I spend a lot of time re-describing things, and waiting for this productive pigeonhole principle effect to occur somewhere. I will then realise I’ve described what I thought were two very different things in the same terms, suggesting productive cross-talk and a creative insight. Often this happens where I’m writing and just try putting something else as part of the subject of the sentence, and see if it actually makes more sense than what I was originally writing about.
Reminder that productive cross-talk is a form of escape, a pivot
Just as going to a different French café may be extra utility that was not expected (such as French House music is played), while the expected utility was also conserved, (because croissants are still available), so such cross-talk can be net positive versus no cross-talk. But, because we did not execute the action that was expected, we cannot say that the original program was followed. We ‘pivoted’, and we did so emergently, by accident. Yet it is not a random change, it’s not just exploiting noise, it is exploiting surplus information to the original program. At the same time, there is no master program. Only the productive pigeonhole principle, and some background process of organisation that is happening to ‘describe’ things in a way that affects how they are accessed.
When I was aiming for The French Café, I had no idea that another French café existed. Hence, the use of the productive pigeonhole principle as applied to a codespace or a namespace, and any other ordering or indexing principles, such as the arrangement of nerves in the human body, is potentially useful if such positioning contains information that relates the things arranged that way. It can be used in the arrangement of cellular signalling and regulation systems, which can exploit productive cross-talk. It can be used in the brain. And I can argue cogently and with rigour, that this is a minimal model for pivoting from one program to another. Rather than, as in the software development case, we pivot when an error happens in order to fix a problem. Here, the cross-talk pivots opportunistically, as an emergent error, that can actually have bonus utility. In other words, it is more about innovation than just remediation or rescue. And so, I have explained using mathematics how we can escape and pivot, without making a circular argument about needing a mind already, in order to do so.
An important idea to exploit this principle is that it doesn’t mean that we should rigidly organise everything by some fixed schema. In fact, the opposite. Different forms of organisation should overlap, so that there is diversity, and things should be reorganised continually to exploit potentially new productive pigeonhole events. The greater the unexpected correlations between how things are organised the greater the potential productivity of such pivots. Whereas, the more predictably things are ‘named’ and organised, the less can be discovered emergently this way because the thing pivoted to will be almost identical to the thing that we started with. This is why hierarchies of knowledge kill creativity.
Productive cross-talk in natural language and thought
In the context of Searle’s room, we can imagine such productive cross-talk only happening if Searle himself mixes up one symbol for another, such that the coding system and his propensity for error leads to such pivoting. Then let’s say that such mix-ups are productive, and more interesting for the Chinese interlocuter outside the room. This would only happen due to an arrangement where the indexing and arrangement of Chinese symbols is such that these mistakes are likely to contain bonus information due to the descriptive component in the indexing system. But, we can also say that in a stricter program, such productive mistakes are impossible, just as a computer database with a unique constraint on its ID index, never ‘accidentally’ retrieves a different ID row to the one that was requested, so Searle will never make mistakes of any kind. This is the other side of pivoting and how it relates to understanding. The pivoting is a kind of openness to novelty. It is ‘the crack in everything that lets the light in’.
I argue that phenomenologically, due to the relationship to descriptive elements and labelling, our ability to understand something new often comes from, and indeed depends on, productive cross-talk. This is another side to the Halting Theorem’s proof that some information is always surplus, but potentially relevant to the program that is running. And we know that for sufficient plasticity, we can only access that information by escaping from that initial program.
This is the scarcity of information such that no single program can access it all. Whereas, in my earlier discussion I was relying on errors, and it might seem that programs without error don’t need to pivot. Now we can see that due to the possibility of productive errors, i.e. cross-talk, no program that never errors, i.e. pivots, can understand all there is to understand.
Phenomenology and an empirical hypothesis
This analysis of Searle’s Chinese Room argument leads me to present an idea that having a mind results in a certain phenomenology of experience related closely to understanding, just as Searle intuited. That is, when we understand something new, independently of others merely supplying us with the information, the phenomenology is due to the productive pigeonhole principle and label theory. That is, we label things in such a way, that like Royal, we say it, think it or write it, and then, afterwards, we realise that there was another way to interpret it, read it, parse it, which is more true, and so a qualitatively new thought.
Creativity has a very specific sort of conscious experience, (phenomenology), where the production of new ideas relies on a certain sort of pivoting, whereby we are momentarily confused, and pivot and realise we are now somewhere better and different than where we were just a moment before. And we can feel very out of place in the new world we just entered. We might need to play along a bit and pretend we meant what we said. We might feel very mechanical and stumble as we say the new thing. And such realising is not the result of an intention in the present moment, so the conscious realisation that we produced something new, like with Royal, always comes immediately after the fact.
This suggests that it might be possible to identify the neural correlates of such productive pivoting, and so catch ourselves in the act of creating. And this means certain experiments could help to substantiate this discussion with empirical support of a meta-modular model of the mind, and help to illuminate the origin of novel, endogenously produced, understanding. This may also have implications for learning in general, and also for certain mental illness symptoms which have a unique phenomenology also related to learning and creativity, such as the experience of psychotic symptoms, where the pivoting may be out of control.
The scarcity of information
The question then arises, that at some point, do we just get to no longer worry about computer theory and escaping programs and modifying them. That, despite this, and my last article, can’t we just grow out of the need to play adaptation games and the need to pivot? Why does the risk that causes the need to pivot not disappear?
The answer I suspect is that we will never develop a technology that grows out of the need to play adaptation games, because what we are actually expressing here is the scarcity of information, and computer theory. Godel’s Incompleteness theorem and Turing’s Halting Theorem can be read as proofs of the evergreen scarcity of information. What I mean by that is, whatever computer program you have there will always be useful information that is inaccessible to that current computer program. That is what Godels’ theorem of the Incompleteness of Mathematics, and Turing’s Halting Problem showed. What adaptation games show is that rather than just trashing that existing program, we can potentially develop ways to relatively routinely modify them by pivoting using type redundancy to do so.
What I have shown is that there is a way to relatively routinely pivot and have success due to exogenous effects due to the productive pigeonhole principle, but this relies on us labelling things in new and useful ways, when we fail to say what things are, or what they could be, we likely fail to generate new understanding. If there was only one way to label things that would be fine. But, in fact many entities have very many ‘affordances’, they can be used in many different ways, and so can be usefully described in many different ways, and that means there will always be a scarcity of information. It is hard to understand things also, because of the pressure to execute and scale our use of things, this makes understanding come at a premium. If there is always going to be a scarcity of understanding, maybe that’s ok.
PDF with timestamp certificate for download and citation.
