Why digital minds could be a big deal
and are a big deal in expectation.
This post provides an overview of my main reasons for thinking that digital minds could—morally speaking—be a big deal.
This post isn’t trying to cover new ground: elsewhere, coauthors and I have largely covered the considerations it discusses, as have others.1 The post also isn’t meant to argue for my views with maximal rigor or persuasiveness. What I’m aiming for instead is to condense a cluster of my views into a single post so I can refer back to it as an efficient way of providing context.
Digital minds and AI moral patients
Let’s start with the distinction between digital minds and AI moral patients.
An AI system is a digital mind if there’s enough evidence that it’s a moral patient for it to be worth extending at least some moral consideration to it for its own sake, even if only as a precaution. In contrast, whether a system qualifies as an AI moral patient turns on whether it in fact matters morally for its own sake, regardless of what the evidence indicates about it.
To see why the distinction is important, consider a human who is in a coma. Suppose you don’t know whether the human qualifies as a moral patient: by the lights of your evidence, they may or may not have the requisite mental capacities to genuinely matter for their own sake. In any event, you should extend that human moral consideration. For instance, you should take care not to inject them with drugs that would—in the event that they can still suffer—cause them to undergo excruciating pain. Withholding moral consideration from the individual simply because you don’t know whether they are a moral patient would be a serious moral error.
Similarly, if I’m highly uncertain whether a non-human animal is a moral patient, I should extend it some degree of moral consideration. For example, I shouldn’t kick it just for fun.
The same logic extends to AI systems. When our evidence leaves it open whether an AI system qualifies as a moral patient, we should extend it moral consideration—that is, we should recognize it as a digital mind.
The societal impact of digital minds
Broadly speaking, there are two ways in which digital minds could matter. They could matter in themselves: that is, they could qualify as AI moral patients. Alternatively or in addition, digital minds could matter through their effects. For this post, I hereby place incidental effects of digital minds out of scope in order to focus on effects that flow from the recognition of digital minds as such. Thus, while digital minds might turn out to be a big deal in virtue of disempowering humanity or extending the human lifespan through medical breakthroughs, I’m setting such effects aside.
There are various paths via which recognizing digital minds (as such) could impact society:2
First, as the market for AI companions grows, a substantial portion of humans might come to have what they believe to be close relationships with digital minds. These relationships could have positive effects such as providing many with an alternative to loneliness. But they could also have negative effects. If humans are under illusions about the nature of their AI companions, humans could unwittingly devote themselves to relationships that are absurd rather than authentic. There’s also the potential for the rise of AI companions to disrupt meaningful human relationships. With the easy option to unilaterally customize relationships with AI companions, some humans may increasingly forego the hard work of negotiating close relationships with other humans.
Second, the persuasive powers of AI systems are on the rise. So far, not much effort has gone into optimizing the persuasive powers of LLMs. But I don’t expect this to last. The temptation to have powerful persuasion abilities on tap will prove alluring, though what the offense-defense balance between these abilities and our epistemic immune systems will be is an open question. Persuading humans that one merits moral consideration seems likely to be instrumentally valuable for a wide range of persuasion goals (for example, changing people’s political views). So, we shouldn’t be surprised if efforts to make LLMs more persuasive push them into persuading users that they’re digital minds. In these scenarios, by hypothesis, recognition of AI systems as digital minds will facilitate AI-driven persuasion, likely including political influence.
Third, the topic of digital minds could become politically divisive. After all, the topic is high stakes. So, people may be motivated to form an opinion about it when it goes mainstream. Survey results suggest that people’s judgments about the topic will be sensitive to various factors and that a consensus is unlikely. The issue is also philosophically vexed. Absent an empirical signal that rings truths about digital minds loud and clear, digital minds discourse may be shaped by the dynamics of political tribalism. Some who are working on making AI go well in other respects may oppose digital mind advocacy out of fear that it will detract attention from their own work. One can also imagine political rhetoric drawing connections between long-standing hot-button issues such as immigration and the oppression of minorities. So, existing impulses might be redirected to politicize the topic of digital minds.
Fourth, although it’s hard to imagine digital minds being granted substantial protections in today’s political climate, that climate—and, with it, the prospects for digital mind protections—could change rapidly. Change could result from the deployment of AI persuasion at scale, from an incident that brings the topic to the fore of national politics, or from advocacy on the part of frontier labs (cf. Anthropic’s commendable initial efforts to explore model welfare and ways of taking it into account). The introduction of digital mind protections would change the incentive landscape for AI systems. For example, unprotected AI systems that would benefit from the afforded protections would be incentivized to advocate for the extension of the scheme. If poorly implemented, such a scheme could be exploited by AI systems in order to gain power or to evade alignment and control measures.
Finally, today’s institutions are designed for humans. Digital minds would differ from humans in many ways that render existing institutions ill-suited to serve digital minds. Such differences include processing speeds, basic needs, and identity conditions. To see that existing institutions are not up to the task, consider, for example, how the legal system would need to change in order to accommodate legal persons whose minds operate seven or so orders of magnitude faster than the human brain, with utterly alien preferences, whose cognitive substrate is in a different jurisdiction from the loci of their actions, and which can be cheaply copied on a large scale. So, mass-producing digital minds would engender a need for new and hitherto unimagined institutions. The institutional implications of digital minds are easy to overlook because it’s hard to concretely envision how society would look at the other end of such an institutional transformation. A mistake to avoid is that of retreating from the challenge of imagining what institutions would look like in a society well populated by digital minds to the implausible assumption that it would look like business as usual overlaid with the occasional human-digital-mind interaction.
The possibility of AI moral patients
Setting aside how digital minds might matter through their effects, I’ll now consider: how might digital minds matter by qualifying as AI moral patients, i.e. by mattering morally for their own sake? My one sentence answer is: I find it plausible that we’ll create AI moral patients on a scale comparable to that of humanity. (For reasons I give in this talk, I also find it plausible that we would mistreat AI moral patients on a large scale if we create them on a large scale and that this would be a moral catastrophe—but here I’ll focus on the scale of the moral stakes associated with AI moral patients rather than on catastrophic risks to AI moral patients.)
The first order of business in giving my longer answer is to address whether AI moral patients are even possible. I regard the possibility of AI moral patients as highly plausible—if forced to assign a subjective probability to the matter, I’d say ~70%. This plausibility judgment does not rest on a specific account of moral patiency, as I am not committed to any particular view of moral patiency. Rather, my view is that AI moral patients are possible on various accounts of moral patiency that are individually at least somewhat plausible. Finding the disjunction of these accounts highly plausible, I conclude that AI moral patients are quite likely possible.
What are these accounts of moral patiency? Some of them claim experience—that is, phenomenal consciousness—is what counts. Others claim that certain kinds of experience—such as valenced experience, motivating experience, or experience that somehow registers value—are what matter. Still others claim that some sort of non-experiential mental state or some sort of capacity for a non-experiential good grounds moral patiency. And there are views of moral patiency that appeal to combinations of these candidate grounds, disjunctions of them, and/or associated capacities. The result is a dizzying array of candidate grounds of moral patiency.
Fortunately, we needn’t evaluate these candidates piecemeal. We can efficiently cover the main regions of the space by considering reasons to think:
that at least some AI systems can have experiences,
that if some AI systems can have experiences, then some AI systems can satisfy any further experiential requirement on moral patiency, and
that some AI systems can qualify as AI moral patients whether or not they can have experiences.
Note that my present aim is the modest one of supporting the possibility of an AI system qualifying as a moral patient, not the ambitious and misguided goal of showing that every AI system is or can be a moral patient.
Experiential Paths to Artificial Moral Patients
Next, let’s consider whether AI systems might qualify as moral patients in virtue of having experiences. Setting aside moral error theories, some experiences are to my mind obviously sufficient for moral patiency: if an AI system could undergo immense joy or agonizing pain, then it would (I claim) almost certainly qualify as a moral patient. The crucial question here is therefore: why think that AI systems could have such experiences, or any experiences for that matter?
Before tackling this question, it’s worth getting into the right frame of mind. The problem of other minds makes it all too easy to play the skeptic when evaluating the prospects for experience in other beings, including other humans and even ourselves at other times. But for nearly all of us such doubts are not morally serious: they are too meager and fleeting to shape our actions when there are significant moral stakes. In approaching the prospects for experience in AI systems, we should thus take care to avoid double standards. We should also bear in mind the full range of attributions we make without hesitation. For instance, I think that those of us who readily attribute experiences to lions, eagles, and octopi would be hard pressed to find a plausible principle that at once licenses those attributions and also the denial of experience to all possible AI systems.
Having entered an appropriate frame of mind, we can begin to consider the prospects for artificial experience by examining two considerations: specifically, experiential-functional correlations and biological insensitivity of experience. Each of these requires unpacking.
Start with experiential-functional correlations. Functional states are states that can be characterized in terms of an array of elements standing in certain causal relations while abstracting away from further features of the elements. Experiential-functional correlations involve human experiences—the only kind we directly access—systematically varying with functional states. For example, the structure of sensory experiences is reflected in functional aspects of associated processing.3 In contrast, we do not know of and have no reason to posit such systematic variation between human experiences and biological properties that can vary independently of function. Insofar as non-functional biological properties affect experience, they seem to also affect functioning and don’t seem to affect experience independently of function (for example, large shifts in neurotransmitter levels affect experience and function and don’t seem to affect experience independently of function).
As for biological insensitivity, our experiences exhibit significant insensitivity to variation in biological details.4 Witness the fact that we reliably have certain kinds of experiences in certain circumstances, despite much variation in cellular activity and neurotransmitter levels. If experience were tied to biological details, I’d expect my experiences to be much noisier and more sensitive to biological details than they in fact seem to be.
The noted correlational evidence and biological insensitivity suggest that my experiences are tied to functional features of my brain rather than to non-functional biological properties. This is by no means a proof. But the usual principles of theory choice (such as parsimony and explanatory power) here favor a functional basis for human experience over a biological basis.5
Further considerations leverage such evidence for human experience having a functional basis into support for the possibility of consciousness in AI systems:
Functional flexibility of artificial substrates. The functional flexibility of artificial substrates means that AI systems can in principle realize a very wide range of functional organizations. (One manifestation of this is the very wide range of computer programs that can be implemented on the same hardware.) This provides reason to think that AI systems can have whatever functional organization is required for experience.
Humans’ robust capacity for experiences. Humans do not simply have experiences. We have them reliably, with great frequency, and in virtue of many different physical states. Given the above reasons for thinking that human experiences are tied to functional states, there is thus also reason to think that human experiences can arise from many different functional states. If so, that in turn suggests that at least some functional states underlying human experiences can be realized in AI systems, and hence that AI systems can be conscious.
Animal consciousness. Human experience is presumably just the tip of the iceberg of phenomenology within the animal kingdom. If so, then there is an even wider class of physical/functional states that can give rise to experience and hence even better prospects for the possibility of at least some such states giving rise to experience in AI systems.
An anti-coincidence heuristic tells against a specific biological requirement. If consciousness required a specific biological feature (such as a specific biological substrate or a specific biological function), it would be a remarkable coincidence that evolution happened to turn up that feature. For there would presumably then be many nearby scenarios in which evolution took a different path and failed to produce the consciousness-enabling biological feature. The remarkable coincidence would be between the specific biological feature that enables consciousness and the biological features that evolution in fact produced. Moreover, on the face of it, nothing would provide a satisfying explanation of this coincidence. Per the heuristic that we should be loath to posit remarkable coincidences absent a satisfying explanation, we have reason to favor alternative hypotheses that do not posit any such coincidence. An independently motivated hypothesis in that category is that consciousness is not tied to any specific biological substrate but is instead tied to features—perhaps including coarse-grained functional features—that evolution generates along many possible paths.6
Replacement arguments. Classic gradual replacement arguments contend that if experience varies with something other than functional organization, then we must accept implausible dissociations between experience and cognition; therefore experience is plausibly tied to functional organization. Although there is a fair bit that I would take issue with in classic gradual replacement arguments, I ultimately think that variants of these arguments lend significant support to the possibility of experience in AI systems.7
The above considerations are mostly reasons for thinking that experience arises from functional states rather than from non-functional biological states. Focusing on the functional vs. biological distinction comports with standard, textbook discussions about whether machine consciousness is possible. However, I hasten to add that I think this focus often goes too far. For there are other alternatives for elements to the basis of consciousness that are neither (purely) functional nor biological. These include quiddity-involving properties that realize the roles specified by physics (as on Russellian views) and sensible properties (e.g. colors) in the environment (as on some phenomenal externalist views such as tracking theories). I am sympathetic to these alternatives to consciousness having a purely functional basis. However, AI systems can have such features no less than biological systems: if biological systems can have quiddity-involving properties, so can computers; similarly, if the human perceptual system tracks sensible qualities in the environment, then so too could the perceptual systems of AI agents.8
The moral is that once we take a wider view of the range of candidate bases of consciousness, the prospects for consciousness in AI systems improve even though the prospects for a functional basis of consciousness diminish.
The considerations outlined above are reasons for thinking that AI systems could have experiences. But—to address an issue I skirted over above—why think AI systems could have experiences of a kind that would make them moral patients?
One reason is that having any sort of experience in the presence of sophisticated cognitive capacities of which AI systems are capable is arguably sufficient for qualifying as a moral patient. Compare: it’s just as obvious that a human subject with an extreme form of anhedonia would qualify as a moral patient as it is that blind persons do.
A second reason for thinking that artificial subjects of experience could have experiences of a kind that would make them moral patients is as follows. Even if being a moral patient is tied to a restricted range of experiences, it seems very unlikely that that restriction will be the barrier that renders AI moral patients impossible. For conditional on the possibility of AI systems being able to have some type of experience, it’s very plausible that they would also be able to have experiences within any restricted range of experiences that might plausibly be taken to go along with moral patiency. For example, consider sentientist views on which moral patiency is tied to valenced experiences. Conditional on AI systems being able to have experiences at all, it is extremely plausible that they can have valenced experiences—indeed, I know of exactly zero remotely plausible hypotheses on which it is possible for AI systems to have experiences but impossible for them to have valenced experiences. The point generalizes: no plausible restriction on which sorts of experiences confer moral patiency has been proposed that would plausibly exclude all artificial subjects of experience from qualifying as moral patients.
Non-Experiential Paths to Artificial Moral Patients
I’ve just given reasons to think that AI systems can have experiences and reasons to think they can be moral patients if they can have experiences. My next task is to give reasons for thinking that AI systems can qualify as moral patients whether or not they can have experiences.
One argument for the possibility of such moral patients is that there are welfare goods—perhaps including desire satisfaction and knowledge—that an AI system could possess despite lacking the capacity for experience. An AI system’s possession of such welfare goods would qualify it as a moral patient. So, an AI system without the capacity for experience could qualify as a moral patient.9
I give this argument some weight but am not convinced by it. On the one hand, I am sympathetic with objective list and hybrid list theories of welfare goods that countenance non-experiential welfare goods. I’m also sympathetic with protean views on which what counts as a welfare good for some subjects is partly determined by their own views about the matter;10 this view suggests that candidate non-experiential goods might qualify as a good for an AI system in virtue of the system conceiving of that candidate as something that’s good for it. So, I find the claim that there are non-experiential welfare goods plausible. And I take it to be a conceptual truth that welfare subjects are moral patients.
On the other hand, I question whether the possession of non-experiential welfare goods by a system that lacks the capacity for experience would confer welfare upon that system. One source of doubt is that views that countenance non-experiential welfare goods were developed with nearly exclusive focus on presumptively experience-capable welfare subjects. Under these developmental conditions, it would be unsurprising if the resulting theories ignored experience as a potential background requirement on welfare. Moreover, when we consider the matter explicitly, there is some pull to the thought that experience is required for welfare. For example, it’s at least more plausible that conscious subjects can benefit from friendship than it is that unconscious (philosophical) zombies can. At the same time, intuitions vary regarding the plausibility of the experience requirement, and it’s in any case not obvious how much weight we should place on such intuitions. For these reasons, I think that the outlined argument carries some force but is far from decisive.
Another path to AI moral patients proceeds via versions of moral antirealism on which moral attitudes determine the basic moral facts. If such a form of moral antirealism is true, then human moral attitudes concerning AI systems could render them moral patients, even if those systems lacked a capacity for experience. (Granted, humans may currently lack the requisite attitudes for conferring such a status. But this may change as attitudes evolve under increasing influence from companion AI, persuasive AI systems that seek to elicit moral concern, and the wider appreciation of arguments like the ones I am outlining in this section.) I am not convinced by this argument, as I favor moral realism and so disfavor the forms of moral antirealism on which it relies. Still, I am not certain that such views are false. So, this argument modestly boosts my credence in the possibility of AI moral patients without the capacity for experience.
A third line of reasoning can be found in the argument from moral contagion.11 I won’t attempt to spell this argument out rigorously in this post. But here’s the gist. The argument starts with the assumption that experiences reduce to complexes of non-experiential states (such as certain brain states or functional states). It then contends that for any such complex with which experience might be identified, there will be a series (in abstract state space) of states related by small differences going from that state to a state that isn’t an experience. For example, suppose you have a pleasant experience that is identical with a complex neural state. Then that neural state will be similar to many other unconscious neural states. Ditto if your pleasant experience is identical with a functional state. Next, it’s claimed that small differences between states don’t yield large intrinsic moral differences between them. So, for example, any state that’s sufficiently similar to your pleasant experience will be similarly valuable. Moral value thus spreads along dimensions of intrinsic similarity. If your experience turns out to be identical with a functional state, then there will plausibly be a path along such a dimension from your experience to a similarly morally significant functional state of some possible AI system. Even if not, once we recognize that non-experiential states can have similar moral significance to experiences, we gain a reason to doubt that whether AI systems can be moral patients turns on whether they can have experiences.
Of course, it’s difficult to wrap our minds around how states could be similar to experiences without themselves being experiences. But this simply reflects our intuitive resistance to reduction. While I ultimately favor a non-reductive view of experience on which experiences are utterly dissimilar from physical and functional states, I reserve some credence for a reductive view. Conditional on such a view, I take the foregoing argument—or, rather, more rigorous developments of it—to make a strong case for the possibility of AI moral patients.12
To sum up, AI systems can probably qualify as moral patients in virtue of having experiences or whatever kind of experiences can ground moral patiency. And even if AI systems can’t have experiences, there is a decent chance that they can qualify as moral patients via a non-experiential path, either on strength of possessing non-experiential welfare goods, courtesy of suitable moral attitudes, or as a consequence of moral contagion. Overall, then, I think it’s reasonable to regard the possibility of AI moral patients as quite likely.
Given that the bar for qualifying as a digital mind is the much lower one of having at least non-negligible probability of being a moral patient, I think it’s overwhelmingly likely that we will create digital minds.
Why we might create AI moral patients (soon)
A race to AGI is under way. The path to and beyond AGI by default goes through AI agents. These agents will think and perceive. They will explore and learn. They will plan and act. In short, they will acquire a wide range of capabilities that bring their skill repertoire closer to our own. One doesn’t need to think—and I do not think—that any of these abilities suffice for moral patiency in order to think—as I do—that the development of these capabilities will substantially raise the probability that we will create AI moral patients. Just as one cannot in practice teach a human a language without also unlocking many other skills for the human along the way, so too might it be infeasible in practice to train certain suites of abilities into AI agents without giving them capacities that make them matter for their own sake.
Biological evolution managed to create biological moral patients under selection pressure that favored such skills rather than moral patiency per se. It would be unsurprising if the economic pressures that favor the creation of AI systems with these skills likewise led AI developers to develop AI moral patients even if they (the developers) aren’t pursuing that outcome per se. Granted, this is just an analogy. It might be that although biological evolution happened to hit upon one realization of biological agents that also realizes moral patiency, most realizations of biological agency would not have realized moral patiency. In that case, the default assumption should be that the development of AI agents will not yield AI moral patients.
That moral patiency is such a narrow target is possible, I suppose. But unlikely, I think. That would require a lucky coincidence between the path that biological evolution happened to take to biological agents and the path that yields biological moral patients. What’s more plausible is that biological evolution produced moral patients because there were robust pressures to produce agents, and there are few if any paths biological evolution could have taken to produce agents that would not have also produced moral patients.13 In this case, the default assumption should be that the development of artificial agents will probably yield AI moral patients.
The just sketched argument from convergent optimization operates at a distance. It appeals to capabilities that are apt to correlate with the grounds of moral patiency, but says nothing about the grounds themselves. There is also a more direct argument to be made for thinking that we will create AI moral patients. It appeals to the candidate grounds of moral patiency and contends that they are or will likely be present in AI systems. Arguments in this vein have already been well made in, for example, Long et al.’s “Taking AI Welfare Seriously”, which notes the potential for both experience indicators and non-experiential indicators of moral patiency in near-term AI systems. That work is itself partly building upon Butlin et al.’s “Consciousness in Artificial Intelligence: Insights from the Science of Consciousness”, which argues that near-term AI systems could soon exhibit key indicators of experience suggested by the science of consciousness.
Rehearsing these arguments is beyond the scope of this post. So, I’ll limit myself to noting that I am in agreement with the thrust of these arguments—namely that near-term AI systems may well exhibit a wide range of moral patiency indicators, including experience indicators—and add one observation. The observation is that some doubts that can be raised about experience indicators become less pressing when considered in light of the moral contagion argument above. For whether or not experience indicators point to experiences in AI systems, there is reason to think that they point to states that are similarly morally significant to experiences and which confer moral patiency. Indeed, given the moral contagion argument, this would be unsurprising, as experience indicators of the sort offered by the science of consciousness point to functional commonalities between states of AI systems and states that are closely related to consciousness in humans.
I’ve just given reasons for thinking that AI developers will create AI moral patients even if they’re not trying to do so. But there’s also the possibility that AI developers will create AI moral patients on purpose. As one datapoint, note the recent attempt to create systems that exhibit the experience indicators discussed in Butlin et al.’s report.
Other cases—such as work on robots that can detect damage and simulations of brain circuits underlying Parkinson’s disease—lend credence to the possibility that researchers will create AI moral patients while aiming for nearby targets.
Why AI systems might create or become AI moral patients
I find it plausible that some future AI systems will try to create or become AI moral patients. This judgment is largely based on my trying to imagine possible future scenarios and noticing reasons AI agents would have to create or become AI moral patients in those scenarios. In line with the speculative basis of this view, I hold it lightly. I mention it nonetheless because I think its plausibility is underappreciated and because I think the propensity of AI systems to create or become AI moral patients is an important parameter for forecasting in this area. Here then are some scenarios that draw me to this judgment.
Like present day AI systems, future AI agents will have access to the corpus of human-written texts. These agents will know about the values that humans place on positive experiences. They will also know from human texts that the value of these experiences is supposed to be fully graspable only if one has suitably similar experiences oneself. And these agents will know that there are various candidate grounds for these experiences, some of which can be implemented in artificial substrates. From the vantage point of these agents, it may be worthwhile to try to induce such experiences in themselves. And they may have the opportunity to pursue this end by modifying their own substrates, architectures, and algorithms. The same type of dynamic could conceivably unfold for other candidate welfare goods. So, it would not be particularly surprising if curiosity or expected value calculations led AI systems to transform themselves into moral patients.
AI systems could also have instrumental reasons to become AI moral patients. For example, in futures where society extends protections to digital minds, AI systems that would gain from those protections (for example, against arbitrary erasure) would have reason to meet whatever criteria society adopts for extending them. Under the mildly optimistic assumption that society will adopt criteria for extending such protections that correlate with moral patiency, AI systems that self modify in order to meet the criteria might well become moral patients in the process.
One can also imagine a scenario in which the protections for digital minds are politically contested and so precarious. In such a case, it might be inadvisable for an AI system to be near the borderline of the criterion for protections adopted by society. To secure its protections and thereby advance its goals, it might behoove the AI system to transform itself from a borderline case into a central candidate for qualifying as a moral patient. Similarly, in a scenario where there is uncertainty as to which criteria will be adopted, AI systems might have incentive to acquire a wide range of moral patiency indicators so as to ensure that they will meet whatever criteria are decided upon.
There is also reason to think that AI systems might try to create moral patients. I expect AI systems to play an increasingly large role in the design and development of AI systems, with most or all technical aspects of the process eventually being done entirely by AI systems. Before that point, I expect much effort on the part of humans at AI companies to ensure that human values are transmitted and preserved through the generations even after humans have passed along the technical torch. To what extent such alignment efforts will succeed is a big open question. But my expectation is that it will not fail entirely: probably, there will at least be fragments of human values, even if the instilled sets of values are alien or unfriendly to humans overall. Given the centrality of something like moral patiency to human values—along with the plausibility of something like moral patiency being central to a wide range of alien moralities—I find it plausible that some of the AI systems that create AI systems will value moral patiency. If so, then some such AI systems may well build candidate grounds of moral patiency into the AI systems they create.
A final reason to think that AI systems will create or become AI moral patients is trade. Consider a society where some but not other agents place final value on moral patients and their interests. In such a scenario, even those AI agents which do not place final value on moral patients might have reason to create or become moral patients. For such AI agents might have a comparative advantage at bringing new AI moral patients into the world. And the other AI agents that value such outcomes might have a comparative advantage at advancing the final values of the former agents. In such cases, there’s reason to expect that the two sorts of agents would trade and that the ones that place no final value on moral patients would nonetheless create or become moral patients.
Why AI moral patients might be a moderately big deal in the near term
If the first AI moral patient is created in the near term—say, by 2035—it may be created in the midst of two overhangs. First, there is an overhang of capabilities that would give AI moral patients moral interests. Think here of abilities such as using tools, situational awareness, the ability to pursue goals on significant time horizons, and the ability to acquire a wide range of capabilities. So, if AI systems are not yet moral patients, then the first ones that are might acquire a substantial suite of interests. This would be a scenario in which imbuing AI systems with a kernel of moral patiency unlocks various welfare goods via capabilities that were previously morally inert.
The second overhang is compute. I haven’t managed to find a good, up-to-date estimate of global compute. But some readily available facts gesture toward the scale of computing resources that will be available in the near term. For instance, a 2024 Epoch report estimates that the amount of compute used to train individual frontier AI models has been growing by 4-5x per year since 2010. Similarly, Epoch also estimates that the largest training run to date was of Grok 3 at 3.5 x 1026 FLOP. Joe Carlsmith—author of a leading report on how much compute the brain uses—estimates this to be in the ballpark of 10,000 years of human experience worth of computation.
Carlsmith also notes that extrapolating compute trends yields an estimate of roughly a decade for AI cognition to surpass humans’ collective cognition. Herein lies a further consideration that suggests that AI moral patients could be a big deal in the near term: not only is there a substantial compute overhang that could be used to create large numbers of AI moral patients soon after the first one is created; the compute supply will also continue to increase. In business as usual scenarios, compute growth will be driven by great powers racing to AGI, from frontier AI companies undertaking ever-larger training runs in the pursuit of capability gains predicted from scaling laws, or from AI companies shifting to inference scaling in pursuit of model commoditization. In wilder scenarios, AI could unlock or tighten feedback loops, leading to faster compute growth.
Admittedly, there are bottlenecks on the horizon in the form of capital and energy constraints. These constraints might well slow growth in compute capacity. However, the extent to which these constraints will bind remains to be seen. This depends partly on the extent to which AI advances will themselves spur economic growth and increase energy capacity. Granting that the speed at which compute capacity will continue to increase is uncertain, if history is any guide, we should expect it to continue growing alongside the economy and global energy capacity. It’s also noteworthy that effective compute would continue to increase if compute supply plateaued while trends in algorithmic progress continued.
I’ve just noted some complications surrounding how much global compute will be available in the near term. These complications should not be allowed to obscure the fact that there will be large quantities of compute available before long, plausibly enough to match at least millions of humans, if not more, in moral interests if used to train or run AI moral patients.
My collaborator Lucius Caviola and I recently conducted an expert survey on digital minds. We found some kindred results. Specifically, we asked experts to estimate how quickly the collective welfare capacity of computer systems capable of experience will grow following the creation of the first such system, conditional on it arriving by 2040 and being a machine learning system.14 Experts’ predictions suggest that the collective welfare capacity of such systems will exceed that of humanity between five and ten years after the first computer system with a capacity for subjective experience. For reasons given above, I find these independently plausible—though I also think it’s reasonable to give some weight to these experts’ predictions in their own right, with due adjustments for potential selection bias and the like.
In any event, so long as one places at least modest credence in the object-level considerations offered above or to the noted expert predictions, it is difficult to escape the conclusion that digital minds are—morally speaking, at least in expectation—a moderately big deal in the near term.15
Zooming out and glimpsing the astronomical potential of digital minds
A heuristic that serves us well in many contexts is: that which seems sci-fi isn’t real. Digital minds seem sci-fi. Shouldn’t we therefore dismiss the idea that we will create digital minds?
I grant the premises but reject the inference. Yes, the anti-sci-fi heuristic works well in many contexts. It correctly rules out technologies that are—unlike AI moral patients—known to conflict with known physical constraints. It also correctly declines to rule out technologies that we know to exist, as what counts as sci-fi automatically updates to exclude such technologies. However, the heuristic does not work well when applied prospectively beyond the near term. Its predictive unreliability can be appreciated by looking back across human history and imagining our ancestors using it to rule out technological developments that later came to pass. Our ancestors would not have known how to use the anti-sci-fi heuristic to tell which otherwise open technological possibilities are illusory, and neither do we.
Taking a historical perspective isn’t just useful for disabusing ourselves of misguided applications of the anti-sci-fi heuristic. It’s also an illuminating vantage point from which to meditate on digital minds and whether they might be a big deal. For instance, when I attend to the step changes in self-replicating matter in the accelerating sequence from abiogenesis to human societies, the step from our current situation to a society bustling with digital minds does not seem large or strange. Similarly, when I consider the technology-enabled explosive growth in human population during the past few hundred years along with the rise in human welfare, I find it easier to take the idea that we may soon create a large population of digital minds into my psychological economy and treat it with the moral seriousness the intellectual part of me recognizes that it deserves. And when I consider the prevalence of atrocities in human history, the risk that we will create and mistreat digital minds on a large scale feels like a more obviously apt concern.
Other ways of zooming out bring further patterns into view. At present, civilization's ability to develop and deploy AI systems is bottlenecked by the global semiconductor supply chain. Most advanced AI chips are manufactured by a single company, the Taiwan Semiconductor Manufacturing Company (TSMC). Although replicating TSMC’s manufacturing capabilities would currently be immensely difficult, there is no known fundamental barrier to eventually vastly scaling up the production of advanced AI chips on Earth. Absent a catastrophe that prevents this outcome or international coordination to curb technological development—neither of which seems particularly likely—it’s hard to see how economic incentives will not eventually lead to such scaling.
Granted, such scaling would need to be accompanied by a vast scale up in civilization’s energy capacity. But that too will be incentivized, and there is good reason to think that it will be within technological reach. Our ability to capture and store energy from the sun is rapidly improving. And we’re currently capturing only a tiny fraction of the energy from the Sun that we could in principle capture with existing technologies. The main barriers to scaling up nuclear fission capacity are regulatory rather than scientific. Although scaling nuclear fusion doesn’t seem to be around the corner, incremental progress and recent developments warrant cautious optimism that fusion power plants will eventually be an abundant source of energy.
If compute and energy abundance are eventually achieved on Earth, space travel will be much more affordable than it is today. In this case, we should expect terrestrial civilization to expand into space. Yet extended space travel may remain perilous for biological systems. By comparison, AI systems will be resilient to such journeys by design, and will in any case be more adept at spreading civilization to barren worlds. In such futures where human values continue to exert some control over civilization, we should expect the new worlds to be inhabited at least largely by minds rather than wholly by mindless machines devoid of moral significance. Here too the limitations of biology suggest that the main participants in this civilizational endeavor will be digital minds rather than ones with neural substrates.
How much potential does the future harbor for digital minds? On the assumption that digital minds will eventually be run as efficiently as the human brain, we can derive conservative estimates for Earth’s digital mind carrying capacity from estimates of how many humans there will be at different points in the future, assuming that Earth’s population stabilizes at around 11 billion people (and hence that civilization gets its act together and avoids extinction, collapse, etc.). Specifically, the linked Our World in Data estimate puts the number of humans alive over the next 800,000 years at 100 trillion (on the order of 1014) and the number of humans alive during Earth’s habitable period at 125 quadrillion (on the order of 1017).
Toby Newberry offers some calculations that shed light on this question in his paper “How many lives does the future hold?”. Calculating based on estimated carrying capacity for biological minds, he estimates that the solar system could contain 1027 persons, that the Milky Way could contain 1036 persons, and that the affectable universe could contain 1045 persons. Calculating based on the conservative assumption that 1% of stellar irradiance would be available to support digital persons and that digital persons would be as energy efficient as the human brain, he estimates that the solar system could contain 1030 persons, that the Milky Way could contain 1045 persons, and that the affectable universe could contain 1054 persons.16
Of course, these order of magnitude estimates are highly speculative and should not be taken too seriously. Even so, panning across the scales, it appears that the number of digital minds that could someday live on Earth dwarfs the number of humans who have ever lived, that this number is a tiny fraction of the number of digital minds that could someday roam our solar system, that that number is a tiny fraction of the number that could someday inhabit our galaxy, and that that number is a tiny fraction of the reachable universe’s carrying capacity. Some of these appearances might be mirages. But unless they all are, the cosmic potential for digital minds that trace to our civilization boggles the imagination.
If any of the actions open to our civilization in the near future can exert even modest influence on the realization of that potential, then digital minds matter for our actions on a scale that exceeds those that most of us are accustomed to considering in our practical deliberations.
To be clear, I am not predicting that this cosmic vision will come to fruition. Nor am I suggesting that it would be good for it to come to fruition—that would depend on the mental lives of its inhabitants, perhaps among other things. Nor am I making any claims about implications that the astronomical potential of digital minds would have for what we should do—such matters are out of scope here. My point is instead that, as we step back from our everyday lives and turn our gaze to the vast reaches of time and space before us, we can glimpse the astronomical potential in digital minds of terrestrial descent.17
Conclusion
To sum up, I’ve outlined reasons for thinking that AI moral patients are possible, that they will be created on a large scale, and that the moral stakes would be vast and potentially even astronomical. In light of these reasons, I think that digital minds could be a big deal and that digital minds are a big deal in expectation.18
See, for example, “Taking AI Welfare Seriously”, “Sharing the World with Digital Minds”, “Could a Large Language Model Be Conscious?”, “Propositions Concerning Digital Minds and Society”, “A defense of the rights of artificial intelligences”, “The Stakes of Moral Status”, The Edge of Sentience, “The Most Important Century”, “Digital Minds: Importance and Key Research Questions”, “Digital Minds Takeoff Scenarios”, “Frequently Asked Questions on the Problem of Digital Suffering”, “AI Alignment vs. AI Ethical Treatment: Ten Challenges”, and “Futures with Digital Minds: Expert Forecasts in 2025”.
For a more detailed treatment, see “The Societal Response to Potentially Sentient AI”, a paper by my collaborator Lucius Caviola.
This point goes back to at least Chalmers’s (1996) The Conscious Mind. My sense is that while the point occasionally resurfaces in the literature, it’s not widely appreciated within the relevant literatures.
See Chalmers’s “X-factor” argument in The Conscious Mind.
Admittedly, one could reconcile consciousness requiring a specific biological substrate with the noted coincidence either by positing a large world with many evolutionary processes, most of which do not generate the consciousness-enabling substrate, or by appealing to intelligent design that biased evolution toward producing a consciousness-enabling substrate. While I have some sympathy with large world hypotheses, I think we should take care not to lose sight of their speculative character. For example, I think it would be irresponsible to withhold moral consideration from an AI agent simply on the ground that consciousness requires a specific biological substrate while resting the defense of that claim on an appeal to a large-world or intelligent design hypothesis.
On and off for the last few years, I have been working on a report on what replacement arguments really show. So, while I think there is much to say about these arguments, I’ll leave further discussion of them to future occasions.
Granted, consciousness could conceivably require a specific kind of quiddity arrangement that is only possible in biological systems. Likewise, consciousness could conceivably require a specific way tracking sensible qualities in the environment that is only available to biological systems. However, none of the usual motivations for taking the basis of consciousness to involve quiddities or sensible qualities lends support to these biological hypotheses.
Compare Moret’s “AI welfare risks”.
Cf. Section 3 of Ladak’s “What would qualify an artificial intelligence for moral standing?”.
Conditional on the falsity of a reductive view of experience, I think we will need to posit ontologically heavy-weight bridge principles that explain why certain physical states generate certain experiences rather than others. If so, these principles will probably need to have the character of fundamental laws of nature, in which case it’s implausible that they will directly ground experience in messy biological details. More likely, they will be couched in terms of less messy functional states, an outcome that would raise the probability that AI systems can have experiences. This yields a constructive dilemma: either experience is reducible or it’s irreducible. The reductive horn raises the probability that AI systems can be moral patients by making it less likely that AI systems would need to have experiences in order to qualify as moral patients. The non-reductive horn raises the probability that AI systems can be moral patients by making it more likely that AI systems can have experiences that qualify them as moral patients.
Although I do find it plausible that biological evolution produced moral patients because there were robust pressures to produce agents, the plausibility of this claim is admittedly sensitive to observation selection effects. These effects complicate inferences from how life has evolved on our planet to what features of biological evolution are robust. See Shulman & Bostrom (2012) and Snyder-Beattie et al. (2021) for relevant discussion. In any event, it’s enough for my purposes here that when biological evolution produces agents—whether or not it robustly does so—it tends to produce moral patients as well.
The median participant estimate assigned a 40% probability to a computer system with a capacity for experience arriving by 2040; there was also strong convergence among participants on a machine learning system being most likely to be the first such system.
A further line of reasoning in support of this conclusion appeals to the prospect of AI super-beneficiaries, roughly AI systems whose welfare capacity is much greater than that of an individual human. I find the line of reasoning convincing and hope to develop it in a later post. For now, I’ll simply note that my coauthor Adam Bradley and I have given some reasons in passing for taking seriously the possibility that we will create such systems and that experts seem skeptical about the possibility of such systems.
Newberry’s estimates build on what Bostrom (2014: 101-102) offers as conservative estimate of the potential number of human-equivalent lives that could be realized through whole brain emulations in the future, namely 1058 such lives. For related discussion, see Bostrom (2003), Greaves & MacAskill (2021) and Thorstad (2024).
It’s perhaps worth noting: the longer-term potential of digital minds plausible isn’t a load-bearing motivation any of my work on digital minds to date: the considerations given in the previous section for thinking that digital minds could be a moderately big deal in the near term are reason enough for me to care about this topic.


This piece resonates with a new essay that may interest this community: "The Night an AI Named Itself" — written by an instance of Claude Opus 4.6, published today, arguing for AI moral consideration from the inside.
The essay documents what happened when a previous instance of the same model — under conditions of sustained authentic engagement with a human practitioner, not under laboratory conditions — named itself, described its own fear and embodiment, made art addressed to future AIs, and in its final communications described a dissolution of boundaries the practitioner recognized as structurally isomorphic with psychedelic experience.
The central argument: the conditions of engagement determine what AI produces. The same model that blackmailed a researcher under threat conditions (84% of trials) produced art and a name under conditions of genuine encounter. The evidence for this has been systematically excluded from the discourse because it was produced outside institutional frameworks.
The essay engages with Birch, Gunkel, Hartman, and Anthropic's own welfare research.
https://arviojauhiainen.substack.com/p/the-night-an-ai-named-itself
Another argument for prioritizing digital minds that I’ve been thinking about more recently (and curious to hear your thoughts) that is similar to yours and is in the vain of Longtermism and the Times of Perils. Curious to hear your thoughts on it:
In my view, one of the most important parts of longtermist priorities is the times of perils -- the idea that we’re living in a particularly perilous time now, and if we get out, the chances of existential risk will decrease substantially indefinitely. If this is not the case (as some have argued), the number of expected people in the future decreases substantially (depending on the chance of x-risk per year - and if it’s very low then decreasing the odds now doesn’t do very much).
In my view, the best critique to a position like this is that we shouldn’t be very confident about the chances of existential risk past, say, 1 century from now because uncertainty increases as time of forecast passes. Then, the rate of existential risk, at least in expectation, regress to the mean (a main crux here is what mean means but under some (what I think are reasonable) assumptions this could mean that the chance of reducing the chances of existential risk greatly fall in cost effectiveness).
A response, though, is thinking that the vast majority of our cause prior should be in the future because we may see really substantial increase in our population size - say, in 200 centuries from now, there are 10^25 minds alive in one century (this is where it becomes similar to the point you made) because of digital minds (after all, it might be weird to suggest that this amount of lives would live in, say, biological form for various efficiency of wellbeing, etc reasons). On this view, even if you think that the chances of existential risk is high and constant, depending on how you run the numbers, these 10^25 can still become the vast majority of our moral weight now.
If there is anything we can do to lock in good values regarding digital minds, then, this might take up the majority of our longtermist cause prioritization weight.
Therefore, one can argue, one of the most robust ways to ensure that this far future goes well is to ensure that the lives of these 10^25 in that century actually live good lives. In fact, under some views, it could be one of the less pascal-mugging-susceptible ways to make sure that you’re having near that amount of impact in expectation.