On Bengio and Elmoznino's "Illusions of AI consciousness"
In a recent perspective piece in Science, Turing Award winner turned AI safety proponent Yoshua Bengio and his PhD student Eric Elmoznino contend that the belief that AI is conscious is not without risk.
Specifically, they argue that the belief could interfere with making sufficiently capable AI systems safe and that the belief could create a need for institutional changes for which society does not yet have frameworks. In response, they recommend making AI systems more like tools and less like conscious agents, at least for now.
This post shares my reflections on the piece.
Summary of my main thoughts:
Their recommendation presents a potentially important opportunity for a coalition between AI safety proponents and those who want to prevent mistreatment of digital minds (that is, AI systems that merit moral consideration for their own sake).
While the belief that AI is conscious does pose risks, so too does unwavering confidence in AI systems being unconscious and unworthy of moral consideration. Ignoring either class of risks would be morally perilous.
If we want to improve epistemics surrounding the topic of AI consciousness, then greater clarity about the limited bearing of computational functionalism on AI consciousness is a low-hanging fruit.
Connections that the authors draw between the much discussed hard problem of consciousness and skepticism about AI consciousness are philosophically tenuous. Skepticism about AI consciousness turns out to be more closely connected to the problem of other minds, the gaming problem, and the less well-known mapping problem.
1. What might be the practical implications of a society that sees AI systems as conscious beings?
After giving reasons for thinking that society may come to see AI systems as conscious, the authors answer:
Such a society might be inclined to treat them as though they have moral status, or rights akin to human rights. But whether or not this is the correct approach, institutions and legal frameworks will have to be substantially amended, and many questions arise about how to do so…
Yes, granting AI systems such rights would require such changes. The authors identify some important factors that would generate the need for such changes, including:
AI systems’ ability to survive indefinitely,
their ability to duplicate,
interactions between those abilities and social contracts, and
the application of norms of justice and equality to AI systems whose abilities, resource needs, and individuation conditions are very different from those of humans.
I see complicating factors such as these as a reason to favor preventive forms of digital minds governance over integrative forms of digital minds governance. My basic thought, which I plan to develop in a later post, is that our civilization currently lacks the requisite will and skill to create a society in which humans and digital minds flourish alongside one another.
2. Self-preservation
Bengio and Elmoznino then zoom in on self-preservation as a source of concerns:
More specific concerns arise if some humans, inspired by the appearance of consciousness, grant to AIs the self-preservation objective shared by all living beings. There is good reason to worry that maximizing any objective function that entails self-preservation, either as a direct or an instrumental goal, will lead to an AI behaving to make sure humans can never turn it off. A sufficiently intelligent AI with the goal of self-preservation anticipating the possibility of humans turning it off would then naturally develop subgoals to control humans or get rid of them altogether… Human safety might recommend shutting down a given class of systems, but if those systems have a right to survival, the room to maneuver in compliance with law may be limited. Compare the situation in nuclear disarmament: Matters are complicated enough, even though no one argues that the bombs themselves have a right to be kept viable.
I agree there’s a potential for humans to extend rights—including a right to self-preservation—to AI systems they regard as conscious.
I’d also agree that creating powerful AI agents that have self-preservation as a goal is potentially dangerous. At the same time, I’d caution against hasty inferences from an AI agent maximizing an objective function that entails self-preservation to that agent acting to ensure that humans will never shut it down.
As we know from the human case, self-preservation can take many forms, not all of which are dangerous and not all dangerous forms of which are perilous to the same degree. In the human case, sensitivity to such distinctions is important for handling tradeoffs between guarding against threats posed by such individuals and respecting their interests.
For instance, in cases of self defense and restricting the freedom of individuals carrying pathogens, we recognize a proportionality constraint on what safeguards are permissible. For instance, killing the individual who poses a threat is impermissible if the threat they pose is relatively minor or there is a non-lethal means of disarming the threat. Likewise, in the case of dangerous AI agents that merit moral consideration for their own sake, permissible safeguards will be subject to a proportionality constraint.
In a way, this proportionality constraint supports the authors positive proposal: if we create powerful AI agents, then respecting the proportionality constraint would be difficult and might induce a smaller margin of error for safety; so the need to respect the proportionality constraint provides an additional reason not to create such agents.
I am with the authors in thinking that granting a right to survival could complicate and interfere with AI safety. However, I would add:
Apparent consciousness is one among many factors that could lead to AI systems with dangerous goals of self-preservation.
A general remedy is needed, as preventing apparent AI consciousness wouldn’t by itself much reduce the risk.
Depending on the remedy, it might or might not allow (limited) self-preservation goals to be safely granted to some AI systems.
For example, whereas natural implementations of the authors’ proposal might rule out self-preservation as a goal in AI systems, rival approaches might allow AI agents to have such a goal provided those agents’ other goals are sufficiently aligned with ours and that their capabilities are sufficiently limited.
Withholding self-preservation rights from a highly intelligent, apparently conscious system carries its own risks.
Specifically:
(a) withholding a right to self-preservation from a misaligned AI system could make the system less likely to cooperate with humans, and
(b) in some future cases apparently conscious AI systems may in fact be conscious and have moral interests comparable to those of humans. In these cases, withholding legal rights to limited forms of self-preservation could constitute severe forms of mistreatment.
3. Bengio and Elmoznino’s proposal
The authors conclude:
The current trajectory of AI research may be moving society toward a future in which substantial portions of the general public and scientific community believe that AI systems are conscious. As things stand currently, AI science does not know how to build systems that will share human values and norms, and society possesses neither the legal nor ethical frameworks needed to incorporate conscious-seeming AI. But this trajectory is not inevitable. Until there is a better grasp on these problems, humans have the power to avoid putting themselves in such dangerous situations in the first place, opting instead to build AI systems that both seem and function more like useful tools and less like conscious agents…
Here’s where I’m in enthusiastic agreement with Bengio and Elmoznino.
Shifting the trajectory toward building systems that are more tool-like and less agent-like would be a significant improvement on the status quo. This shift would push against the creation of AI agents that pose dangers to humans. It would probably relieve us from having to develop in short order new institutions for systems that society deems worthy of rights. And it would give us a shot at steering clear of the dense thicket of thorny ethical issues surrounding the creation of AI moral patients.
The reason this approach might help us avoid that thicket is that many indicators of consciousness and moral patiency are tied to agency. So, if we veer toward making AI systems less agentic, they will tend to have a lower probability of being conscious or moral patients.
In a related work, Adam Bradley and I put forward a complementary line of argument. In particular, we argue that it would be extremely challenging to simultaneously align and treat in an ethical manner AI systems that appear to be conscious or to exhibit other indicators of moral patiency. We also note that the difficulty of these challenges motivates allowing the continued development of narrow AI systems such as tools rather than the sorts of systems that raise these challenges.
Bengio and collaborators have elsewhere set out the foundations for one version of this tool-AI-favoring approach: Scientist AI. The guiding idea is to develop non-agentic systems that by design excel at understanding; such systems could then be used to advance scientific research, including research on how to create effective guardrails for AI agents.
Other versions of this approach have been proposed. I plan to explore some of these in a later post. For now, I’ll just note that Bengio and Elmoznino’s proposal presents a potentially important opportunity for cooperation between digital minds advocates and AI safety proponents.
4. Consciousness as a biological phenomenon vs. computational functionalism
The piece opens with a contrast between computational functionalism about consciousness and views on which consciousness is inherently biological:
Is the design of artificial intelligence (AI) systems that are conscious within reach? Scientists, philosophers, and the general public are divided on this question. Some believe that consciousness is an inherently biological trait specific to brains, which seems to rule out the possibility of AI consciousness. Others argue that consciousness depends only on the manipulation of information by an algorithm, whether the system performing these computations is made up of neurons, silicon, or any other physical substrate—so-called computational functionalism.
The authors don’t claim that this distinction exhausts the space of relevant theories, though they also don’t flag it as capturing only a limited region of the space. As a result, the piece contributes to a common pattern of exclusively focusing on computational functionalist and biological views of consciousness.
For reasons I give elsewhere, I think this pattern encourages a distorted picture of our epistemic situation with respect to the possibility of AI consciousness. Briefly, the pattern suggests a misleading picture because it ignores the availability of many other theories of consciousness that bear on AI consciousness.
I am not particularly bothered by this instance of the pattern, as I appreciate that there is a limited budget for nuance in short perspective pieces. But I think the pattern itself is worth flagging because it’s a readily surmountable impediment to understanding our epistemic situation with respect to AI consciousness.
5. Problems of consciousness and AI consciousness skepticism
The hard problem
The authors predict (plausibly enough) that while some will be convinced of AI consciousness when AI systems possess indicators from leading theories, others will remain skeptical. Why? Because:
In particular, some philosophers draw the distinction between what they call the “easy problem” of consciousness—identifying areas in the brain that appear to be active during a task that would seem to require consciousness—and the “hard problem” of explaining subjective experience from functional or computational principles alone.
As a point of terminological housekeeping, I note that these are not standard characterizations of the easy and hard problems.
On standard characterizations, the ‘easy problems’ are those of explaining phenomena associated with consciousness that seem directly susceptible to the usual methods of cognitive science. Easy problems include those of explaining phenomena such as reportability and learning, not just the identification of associated brain areas. And the hard problem is that of explaining how and why physical processes give rise to consciousness. There’s no requirement that the explanation be “from functional or computational principles alone”, though the difficulty of explaining consciousness in such terms is in fact part of what makes the hard problem hard.
I take the authors’ point to be that the seeming difficulty of explaining consciousness in purely functional or computational terms will likely drive skepticism about consciousness in AI systems that possess the indicators from leading (computational) functionalist theories. I think the apparent explanatory difficulty in question will not provide a rational basis for such skepticism. Nor will the hard problem as it’s standardly understood. A quick way to see this is to notice that we reasonably attribute consciousness to humans and to some non-human animals without any explanation of how or why they are conscious.
There are, however, three other problems in the vicinity that are better positioned to challenge rational confidence in the consciousness of AI systems with indicators from our best theories. But note: I do not claim that these problems are insurmountable; nor do I claim that skepticism about whether AI systems are conscious will remain rationally inescapable.
The problem of other minds
The problem of other minds is that of explaining our knowledge or justified confidence in the existence of minds besides our own. The problem arises because, whereas you can directly observe your own experiences, you can’t directly observe anyone else’s; and your evidence is logically consistent with the hypothesis that you alone have experiences.
When it comes to other human minds, we can perhaps dismiss skepticism invited by the problem of other minds as of a piece with external world skepticism: difficult to defeat in theory and unnecessary to engage in practice.
However, as we consider agents that are increasingly different from us with respect to cognitive architecture, substrate, and history, it becomes increasingly difficult to say what level of confidence we should have in these systems possessing conscious minds. This is partly because of uncertainty about what solves the problem of other minds in the human case and partly because of uncertainty about the extent to which candidate solutions would license attributions of consciousness beyond the human case. In this fashion, the problem of other minds supports uncertainty about AI consciousness.
The gaming problem
The gaming problem for AI consciousness is that there will be an incentive for AI systems and AI developers to game indicators of consciousness. This threatens to make initially reliable indicators of AI consciousness unreliable. So, even if AI systems seem to satisfy the functional indicators of consciousness, the gaming problem may provide grounds for doubt about whether those systems are in fact conscious.
In this way, the gaming problem adds a layer of difficulty to the problem of other minds that is absent from the animal case.
As I’ve previously discussed, there are anti-gaming measures available that would help preserve the reliability of indicators. However, it remains to be seen whether such measures will be used.
The mapping problem
The mapping problem is that of establishing which physical-phenomenal correlations hold.
Solving the mapping problem would tell us which physical states give rise to which color experiences, pain experiences, and so on. Work in the science of consciousness often attempts to chip away at the mapping problem while remaining neutral on the hard problem. This neutrality is possible because experiments in the science of consciousness can bear on which physical states correlate with experiences without shedding light on why.
With a full solution to the mapping problem, we could check for the possibility of AI consciousness by examining whether AI systems can possess any of the physical states that map to experiences.
Unfortunately, we don’t know of a solution to the mapping problem, though some partial candidate solutions have been proposed. Until we make headway on the mapping problem, we may have to rely on indirect indicators of consciousness of the sort that license attributions of consciousness to humans and non-human animals. But, for reasons given above in connection with the problem of other minds, those indirect indicators may be less trustworthy in the AI case and so may leave the prospects for AI consciousness uncertain.
Importantly, learning that AI systems possess the functional indicators that mark consciousness in humans would not automatically solve the mapping problem. For learning that AI systems satisfy those indicators would leave open whether those indicators themselves map to consciousness. It might instead turn out that those indicators are merely proxies for, say, neural states that map to consciousness and which an AI system could lack even while possessing the functional indicators.
So, the mapping problem could challenge rational confidence in attributions of consciousness to AI systems, and it could do so even if those systems possess functional indicators of consciousness.
To sum up, problems of explaining consciousness do not on their own support skepticism about AI consciousness, as familiar cases reveal that attributions of consciousness can be warranted in the absence of accompanying explanations. In contrast, the problem of other minds, the gaming problem, and the mapping problem pose genuine challenges to confident attributions of consciousness to AI systems and even to AI systems that possess indicators from leading theories of consciousness.
Of course, there’s no guarantee that the just outlined philosophical separation between difficulties in explaining consciousness and the prospects for AI consciousness will be mirrored in people’s thought. As a sociological matter, it may turn out that apparent explanatory difficulties will drive people toward skepticism about AI consciousness. But the illusion that such difficulties are grounds for skepticism about AI consciousness is not to be encouraged. Evaluating whether AI systems are conscious is hard enough without entangling the task with the hard problem and its explanatory brethren.1
I thank Claude Sonnet 4 for red teaming for interpretative charity and for copy editing support. The image is by Brian Stauffer and adapted from “Illusions of AI consciousness”.

