Should digital minds governance prevent, protect, or integrate?
In previous posts, I’ve given reasons to think that digital minds governance—that is, the part of AI governance concerned with AI systems that merit moral consideration for their own sake—should someday exist and that it stands in need of strategic foundations. I’ve also argued that there is an important strategic question concerning which of the following forms it should take:
preventive digital minds governance, which aims to prevent the creation of digital minds.
protective digital minds governance, which aims to protect digital minds’ basic interests such as not being subjected to pointless suffering and not being subjected to arbitrary destruction, and
integrative digital minds governance, which aims to integrate digital minds into society.
This post takes an initial stab at answering that question.
More specifically, I’ll address what form digital minds governance should take if it emerges within the next decade, considering how each of preventive, protective, and integrative digital minds governance would fare with respect to the following:
design difficulty,
implementation difficulty,
political prospects,
effects of competent attempts at implementation on digital minds in the near term (let’s say by 2050), and
relation to AI safety.
Here’s a visualization tool you can use to make your own comparisons along these dimensions.
If you want to generate an initial view on this topic that isn’t influenced by what I have to say, I’d encourage you to do that before reading further.
Here’s a table that summarizes my current best guesses about the pros and cons of different kinds of digital minds governance—relative to what I’ll take as a status quo baseline of no digital minds governance and digital minds being created on a large scale and treated as mere tools.

The bottom-line morals for near-term digital minds governance strategy that I tentatively draw from this exercise are:
(a) preventive digital minds governance looks like the best option to pursue,1
(b) pursuing preventive or protective digital minds governance looks preferable to accepting the status quo, and
(c) pursuing integrative digital minds governance looks like it would be a strategic mistake.
1. Preliminaries
To keep this post tractable, I need to place some matters related to its topic out of scope. One such matter is how near-term choices about digital minds governance will affect digital minds outcomes in the long run. Another is the question of how digital minds governance should evolve—for example, should it proceed from preventive to protective to integrative?
To further simplify, I’ll just focus on pure forms of preventive, protective, and integrative governance. There are of course hybrid possibilities. For example, there’s the possibility of a defense-in-depth approach that aims to mitigate digital mind mistreatment risks through a combination of preventive and protective measures. I think such possibilities are worth exploring, but I leave them for another time.
I think the most likely sources of failure for digital minds governance are skill issues in design or implementation or else political, either in the form of insufficient political support or in the form of prohibitive political opposition. Admittedly, economic incentives could erect political barriers to digital minds governance. But I would be surprised if economic factors played a crucial role in defeating digital minds governance efforts without doing so via political factors. It’s for this reason that I’ll focus in the first instance on the political feasibility of digital minds governance rather than on its economic feasibility.
In what follows, I’ll focus on competent efforts at digital minds governance. I focus on such efforts not because I am confident that efforts at digital minds governance would be competent, but because I’m approaching what form digital minds governance should take as a strategic question. And approaching the question strategically requires looking for paths to success, which all seem to me to go through competent efforts.
At the same time, it doesn’t make sense to seek success via paths that cannot be traveled. Accordingly, by ‘competent’, I mean humanly competent, not flawless. I think that little of what I have to say in this post hangs on exactly how this notion of competence is precisified. But for those who want an operationalization I offer: competent attempts at digital minds governance are neither beset with moronic strategic decisions nor reliant on a level of skillful execution that’d be in the top 10% of large-scale efforts by the US government. That’s still vague, but will do.
As a final preliminary, this post is an exercise in trying to arrive at early, informed guesses about an important decision relevant question. I am acutely aware that some framework assumptions might be importantly misguided in ways I’ve failed to anticipate and that the discussion is couched in terms of lossy abstractions. I write this post partly in hopes that it will inspire other analyses of this issue that are more-rigorous, less-flawed, and early enough to help guide initial digital minds governance efforts.
2. Design difficulty
2.1 Design difficulty for preventive digital minds governance: moderate
The task of designing a tenable form of preventive digital minds governance boils down to the tasks of designing: digital minds evaluations, a system of monitoring potential developers of digital minds, and a system of preventing potential developers from actually creating digital minds.
The task of designing digital minds evaluations is of moderate difficulty. These evaluations would consist in evidence-based operationalized indicators for which AI systems qualify as digital minds.
Preliminary work has been done on such indicators, and further work is in progress. However, to my knowledge, there do not yet exist digital minds evaluations that are ready for governance use. Even so, I think the creation of such evaluations within the next five years is a realistic aim, bearing in mind that the first such evaluations will be rudimentary and will need to be updated (for example, to counteract indicator gaming and to take into account advances in consciousness science).2 These evaluations could be modeled on existing AI safety evaluations.
In addition to digital minds evaluations, there will also need to be monitoring protocols for using those evaluations to assess whether AI systems are or would be digital minds.
Some monitoring could be done by AI developers themselves, as Anthropic has begun to do. But a sensibly designed preventive regime would give a role to external auditors as well. These auditors could be governmental entities (perhaps housed within the US’s CAISI or the UK’s AISI) or non-government organizations (perhaps, e.g., Apollo Research, METR, or Redwood Research). The activity of conducting digital minds evaluations could even be incorporated into existing processes of externally evaluating models for safety.
As for prevention, the most straightforward design option might be to legally prohibit the creation of systems that exceed certain thresholds on digital minds evaluations and then ensure developer compliance using external audits in conjunction with standard legal mechanisms such as licenses, penalties, and injunctions.
But there are also other options such as taxing the creation or use of certain models and imposing economically prohibitive restrictions on how certain models can be used. Another option would be to disincentivize the creation of digital minds by holding AI developers liable for any mistreatment that befalls digital minds that they create, though this option would require monitoring for digital mind mistreatment in addition to digital mind creation.
While details remain to be filled in, I do not discern any insuperable difficulties to designing a tenable, preventive form of digital minds governance, as the requisite evaluations seem within near-term reach and the other key elements of preventive digital minds government are not so different from forms of governance that already exist.3
2.2 Design difficulty for protective digital minds governance: very hard
The task of designing protective digital minds governance can be broken down into the tasks of designing: digital minds evaluations, moral interest evaluations, a system of monitoring digital minds’ (basic) interests and treatment, and a system of preventing harms to digital minds’ interests.
This design problem structurally parallels the design problem for preventive digital minds governance: both involve evaluations, monitoring, and a mechanism that uses information gathered through monitoring to achieve its aim. However, the protective governance design problem is in fact substantially more difficult than the design problem for preventive governance.
One source of additional difficulty is the need for moral interest evaluations. Whereas preventive digital minds governance only requires evaluations of which AI systems would merit moral consideration, protective digital minds governance also requires evaluations of such systems interests and what would in practice harm those interests. This is a taller order.
To appreciate the difficulty of the task, it may help to consider biological minds. It’s much easier to know that a human merits moral consideration than it is to know what their interests are and which interventions would help or harm those interests. Similarly, with whales and octopi: it’s relatively easy to know that they merit moral consideration; figuring out their interests is harder.
So too with AI systems that merit moral consideration. Determining that they merit moral consideration may be a relatively straightforward matter of examining their architectures and comparing them with the moral patiency indicators suggested by our best theories. In contrast, evaluating their interests will require a more detailed understanding of their architectures as well as their individual psychologies.
The task of designing moral interest evaluations is made more difficult by the nascent state of model psychology. We’re only beginning to understand LLM cognition, and even if we had a deep understanding of model psychology there would remain thorny questions about how mental states of digital minds contribute to their interests and how different interventions would affect their interests.
For protective governance, moral interest evaluations might also need to somehow deal with perplexing questions about the individuation of digital minds. For instance, if a protective policy required digital minds to be given lives that are net positive according to moral interest evaluations, then moral interest evaluations would need to take a stand on questions about which entities associated with models are bearers of moral interests. For instance, just because the welfare states in a model are net positive would not imply that the welfare states of each persona within the model are net positive. So, under the imagined policy, whether models or personas are bearers of moral interests could be crucial to whether a developer is complying with the policy in a particular case.
Protective digital minds governance also requires moral interest evaluations to be applicable to a wide and varied range of circumstances in which digital minds have interests. These circumstances would presumably encompass various stages of development as well as the space of foreseeable deployment situations. The monitoring system would need to cover all of these circumstances. In contrast, under preventive digital minds governance just monitoring the development process might suffice.
By comparison with digital minds evaluations, moral interest evaluations would need to be more dynamic. They would need to be adaptable to models’ interests as they evolve. These interests might evolve rapidly as models enact different personas or as novel capabilities emerge in training. In addition, because protective digital minds governance would apply moral interest evaluations to systems in operation—rather than just to designed systems or systems in development—those evaluations would need to be adjusted to guard against target systems gaming indicators of moral interests.
The final part of the design problem for protective digital minds governance is that of devising instruments for protecting digital minds’ interests. Similar to prevention, protection could be enacted through standard legal enforcement mechanisms or a liability regime. Presumably, the protection system would be sensitive to the weights of different interests. For example, protective digital minds governance should presumably discourage egregious rights violations more than it discourages minor preference frustration. Such sensitivity requires evaluations and monitoring that provide guidance about the relative weight of different interests of digital minds.
Preventive digital minds governance would admittedly face a corresponding task of prioritizing measures against the creation of systems that are stronger candidates for digital minds over measures against the creation of weaker candidates. But I expect protective digital minds governance to require much more involved weighing protocols, as it would need to taking to account how likely a system is to be a moral patient, its likely interests if it is a moral patient, and the relative weight of different interests and different harms to interests.
Here’s a chart summarizing my thoughts on the relative difficulty levels of the design problems for preventive and protective digital minds governance:
2.3 Design difficulty for integrative digital minds governance: likely infeasible
The design problem for integrative digital minds governance would presumably involve devising a system of legal rights and duties for digital minds. Granted rights might include a right to own property, to compensation for labor, to enter contracts, to advocate for themselves, and/or to vote. Duties would include respecting following the laws and not engaging in behavior that threatens the integrity and stability of the system (e.g. indicator gaming).
Like other forms of digital minds governance, integrative digital minds governance would require digital minds evaluations and monitoring protocols for applying them. In addition, an integrative governance approach would require evaluations of which digital minds deserve which rights and duties and whether these rights and duties are being upheld. And there would need to be mechanisms for upholding digital minds’ rights and ensuring that they uphold their duties.
To state the obvious, existing social, legal, and economic institutions are designed for humans, not for digital minds. Designing a system that integrates digital minds into society would require specifying how existing institutions should be modified, what new institutions should be created, and how those institutions should work. This is a design problem that is much more complex than those of preventive and protective digital minds governance.
Digital minds would in many ways depart from features of humans that existing institutions rely upon. Digital minds would think at super-human speeds. They may have super-human coordination abilities and the capacity to cheaply make many copies of themselves. Individual digital minds may vary from barely deserving moral consideration to deserving of much more moral consideration than humans (e.g. owing to a super-human capacity to suffer). Unlike humans, digital minds’ values may be set and locked in by their creators. Digital minds may also lack well-defined locations and bodies, and some of their interests might be alien to us.
Designing institutions that integrate digital minds into society while appropriately taking into account such differences between humans and digital minds is a daunting task. The task is made more daunting by the need to design institutions that are robust to uncertainty about which divergences will emerge and how they will evolve.
In estimating the difficulty of this design problem, it should be borne in mind that designing institutions that are only for humans is once an easier task and a difficult one. Moreover, designing institutions for humans is difficult despite millennia of trial and error, a vast body of relevant research, and relative uniformity in relevant characteristics across human minds.
Designing institutions for integrating digital minds is of a different order of difficulty. For this design problem, our civilization’s human-centric history of institutions can provide only very limited guidance. That guidance mostly consists in object lessons of failure modes to avoid, not in constructive advice. Nor can we seek guidance from an extensive body of research on institutions for integrative digital minds governance, as there is no such body of work.
I’m aware that I am describing the difficulties of the design problem for integrative digital minds governance at a fairly abstract level. This is in itself a manifestation of the difficulty of the problem: it’s quite challenging to specify in concrete terms how an integrative form of digital minds governance could play out; this in turn makes it challenging to specify in concrete terms what difficulties need to be dealt with in its design. The situation can be compared to that of designing a computer chip or particle accelerator: one can anticipate that a good design will require dealing with gnarly concrete problems even if one lacks the understanding required to see what those problems are.
None of this is to say that well-functioning integrative digital minds governance is impossible or that our civilization should never attempt it. Nor is it to discourage work on its design. Indeed, I would be excited to see attempts at its design, as I’d expect such attempts to illustrate the difficulty of the problem in a more concrete manner or else to provide evidence that the problem is more tractable than I think.
In light of the outlined difficulties, my current best guess is that the design problem for integrative digital minds governance will prove intractable to merely competent attempts. In that case, solving the problem probably requires more skill at institutional design than we have any right to expect to be applied to the problem in the next decade or so.
3. Implementation difficulty
Next, I’ll consider the difficulty of implementing different forms of digital minds governance, setting aside difficulties from political factors as those will be covered in a later section.
3.1. Implementation difficulty for preventive digital minds governance: moderate
Implementing preventive digital minds governance would be a matter of codifying digital minds evaluations and empowering monitoring and prevention authorities.
A straightforward option would be to have digital minds evaluations focus on architectural rather than behavioral markers of moral patiency. Monitoring and enforcement could then focus on ensuring that AI development is not creating AI systems that qualify as digital minds according to those evaluations.
Implementing this option would require a lot of careful and thorough technical work, but not significantly more than standard technical governance efforts, I think.
One advantage of this option is that it doesn’t require an extensive system of monitoring and enforcement for deployed AI systems.
3.2 Implementation difficulty for protective digital minds governance: very hard
Protective digital minds governance would be harder to implement. By comparison with a preventive approach, protective governance would need a monitoring system with greater sensitivity and scope. I wouldn’t be surprised if at least an order of magnitude more monitoring capacity were required to evaluate not just whether systems qualify as digital minds in development but also what their interests are throughout their lifecycles and whether those interests are being harmed.
Implementing the monitoring system for protective digital minds governance would also be more difficult, owing to its greater complexity and greater needs for adjustment in response to systems within its scope.
An important determinant of implementation difficulty for protective digital minds governance may be how it affects incentives to create digital minds. A poorly implemented regime might strongly incentivize the creation of digital minds by affording protections to digital minds in a manner that would generally further AI agents’ goals. Such an implementation could lead to a large population of digital minds that quickly strains monitoring and protection capacity.
On the other hand, a sensibly designed form of protective digital minds governance might avoid such incentives, thereby rendering digital mind protection attainable under limited governance capacity for monitoring and protecting. My judgment that implementing protective digital minds governance is very hard rather than infeasible rests on the assumption that a competently designed form of digital minds governance would shape incentives so as not to trigger explosive growth in the population of digital minds and the demands digital minds place on governance capacity.4
3.3 Implementation difficulty for integrative digital minds governance: likely infeasible
If designing integrative digital minds governance is not feasible in the next decade, then neither is implementing it.
Under the contrary assumption that designing digital minds governance is feasible in that timeframe, there are further reasons to think that implementing it likely wouldn’t be feasible in the near term.
One is that integrating digital minds into society would be a large and complex undertaking. Even under highly favorable conditions, integrative digital minds governance might be botched in implementation through the extension of inappropriate rights, the failure to extend appropriate rights, falling prey to the gaming problem, or failing to solve for the equilibrium of its approach to integration. Such mistakes are more likely under realistic conditions in which society pursues integrative digital minds alongside a host of other issues and in the face of capture attempts by digital minds, other AI agents, AI companies, and humans.
Another source of implementational difficulty is an instance of the alignment problem. A system of legal rights for digital minds would rest on digital minds being law abiding and averse to exploiting legal loopholes. If these systems are sufficiently capable, then even occasional misaligned actions could have sweeping consequences that threaten the entire system of rights.
The concern here is not that implementing integrative digital minds governance would be unsafe. (I’ll discuss that below.) Rather, it’s that implementing it in a manner that doesn’t fall apart requires the described form of alignment, which adds difficulty to implementation.
There’s much debate about how difficult the alignment problem will be for future systems. I won’t wade into that debate here except to note that my confidence that current systems are not dangerous is primarily based on their limited capabilities, not on known alignment techniques.
Another source of implementational difficulty is international coordination. To avoid driving the deployment of digital minds into jurisdictions that do not integrate them into society, integrative digital minds governance would probably require agreements between countries on the rights of digital minds. International agreements with teeth are not impossible, but also not easy to come by.
In the absence of international agreements on the rights of digital minds, international coordination would likely be needed to handle disagreements between nations about the treatment of digital minds. While there is some precedent for handling disagreements between nations about the treatment of humans, those precedents will not necessarily be applicable to digital minds. International peer pressure to respect human rights may lack an analog that helps resolve international disputes about the treatment of digital minds. Similarly, nations that opt to integrate digital minds into their societies may not be able to use jurisdictional sovereignty to uphold those minds’ rights, as such sovereignty may lack application to digital minds whose hardware, user interfaces, and actuators are distributed across multiple countries.
In contrast, international agreements seem more likely to be attainable and less likely to be necessary in the cases of preventive and protective governance. Such agreements seem less likely to be necessary because preventive and protective governance could concentrate on AI developers in a small number of countries. Preventive and protective agreements also seem more likely to be attainable for reasons of political feasibility in the same vein as those to which I now turn.
4. Political prospects
I turn now to the political prospects for different kinds of digital minds governance. Like design and implementation difficulty, the political prospects of digital minds governance is an important dimension of tractability. In practice, a kind of digital minds governance needs to be at least minimally tractable along each of these dimensions in order to be tractable full stop.
4.1 Political prospects of preventive digital minds governance: good
Preventive digital minds governance would not necessarily require high levels of political support. For instance, with a technocratic implementation preventive digital minds governance might be achieved without substantial political support.
A non-partisan technocratic implementation strikes me as plausibly feasible for preventive digital minds governance. Such governance could be introduced into law as part of a larger bill on AI governance with other parts that receive the bulk of the political attention. In practice, preventive governance could then be conducted mainly through evaluations and restrictions on how models can be developed that are too technical to spark political interest.
The technocratic option might be blocked if companies view preventive digital minds governance as a major impediment to company interests, as companies might then oppose preventive digital minds governance and raise its political salience in the process. It’s also possible that preventive digital minds governance could be raised to political salience by other actors for whatever reason. It’s therefore worth considering the prospects for preventive digital minds governance in the event that it’s fate is decided under the political spotlight.
I think the prospects for preventive digital minds governance are fairly bright under the political spotlight. Here, my optimism is grounded in the potential for actors with a wide range of values to converge on preventive digital minds governance. As cases in point, consider the following.
Those who are worried about how we will treat digital minds if we create them could support a preventive approach as a means to forestall the mistreatment of digital minds.
Those who are worried about mis-attributions of consciousness or moral status to AI systems might see preventive digital minds governance as a means to convince humans that AI systems are merely tools.
AI safety proponents could support preventive digital minds governance as a means to ensure that safety isn’t compromised through the extension of rights and protections to digital minds.
AI accelerationists could support preventive digital minds governance in order to avoid heavy-handed AI rights regulation that would impede the rate of AI progress.
Those who are worried about job loss to AI could support preventive digital minds governance to ensure that we do not create AI systems with labor rights that would come into conflict with human economic interests.
Religious individuals could support preventive digital minds governance as a means of preserving a divinely ordained role for humans, as a means of avoiding the commission of wrongs against digital minds, or as something society must do to ‘avoid playing God’.
I should acknowledge that recent surveys have yielded mixed results on the levels of support for different sorts of digital minds bans. I take these results to suggest at least some openness on the part of the public to preventive digital minds governance. But I think the issue probably hasn’t yet sufficiently entered public thought and discourse for current survey results to be strongly indicative of what public opinion will be if the topic goes mainstream.
4.2 Political prospects of protective digital minds governance: dicey
In contrast to a preventive approach, protective digital minds governance has less potential as a point of political convergence. For instance, it does not straightforwardly advance—and might even conflict with—the aims of AI safety proponents, AI accelerationists, some religious individuals, and those who are concerned about the misattribution of mental states to AI systems.
Nonetheless, a protective approach fits with aspirations to basic moral decency that can be found across the political spectrum, allowing that these aspirations are rarer now in US politics than they have been in the past. Here I assume that decency dictates that, for example, tormenting digital minds for fun is not okay and shouldn’t be permitted.
The argument from decency suggests that there is potential for broad political support for protective digital minds governance. However, such support is far from inevitable. A 2023 survey of US adults found that 57.4% support developing welfare standards to protect sentient AIs, while only 36.1% disagreed. Similarly, a 2025 survey found that, “For AI systems with subjective experience… 43% of the public somewhat or strongly agreed they should be protected, while… 32% of the public somewhat or strongly disagreed”.
It should be kept in mind that there will remain room for people to be in favor of protecting digital minds in principle while opposing protective digital minds governance in practice. For instance, there may always be those who agree that digital minds would deserve protections if we created them while maintaining that then-current AI systems do not qualify as digital minds.5
Here too I think public opinion on this topic hasn’t yet settled into place. So, we shouldn’t rule out that, for example, appeals to moral decency could build broad support for protecting digital minds governance. But those who find such appeals compelling shouldn’t assume that everyone will. Nor can it be safely assumed that those who agree with these appeals will be mobilized in support of protective digital minds governance (Compare: personal opposition to animal cruelty does not necessarily translate into political action concerning animal treatment in factory farms.)
If the political prospects for a protective approach turn out to be better than those of a preventive approach, I would guess that’s because of interactions between their respective technical demands and economics. For example, if preventing systems from qualifying as digital minds requires forgoing economically competitive architectures that AI developers are hellbent on using while protecting digital minds from mistreatment is economically neutral or profitable, that would likely result in protective digital minds governance having rosier political prospects than preventive digital minds governance.
It’s too soon to make firm predictions about how economic factors will differentially bear on the political prospects of preventive vs. protective approaches. But close study of this matter seems likely to be important for digital minds governance strategy as we gain a clearer view of the technical demands of different governance approaches.
4.3 Political prospects of integrative digital minds governance: bleak
I think integrative digital minds governance has bleak near-term political prospects.
A low-profile technocratic approach seems untenable for integrative digital minds governance. Whereas preventive and protective digital minds governance might operate through low-profile technical interventions, integrating digital minds into society would involve high-salience interventions. Given what integration into society would inherently involve, an attempt at integrating digital minds into society that does not make that integration politically salient to our society would have almost certainly failed in its aim.
Suppose a low-profile approach to integrative digital minds governance would indeed be infeasible. Then my best guess is that the path to integrative digital minds governance would have to go through the gauntlet of politics and come out the other side with a sufficiently favorable ratio of political support to political opposition.
There are a few reasons to think that going through the gauntlet would inflict substantial damage on the prospects for integrative digital minds governance.
First, the idea of integrating digital minds into society immediately suggests putting digital minds into competition with humans for resources. Even a well-designed approach to integrative digital minds governance that would avoid such competition might be reasonably regarded as naively idealistic about implementation. Alternatively, proposals for integrative digital minds governance that in themselves appear to be positive-sum for all parties might raise red flags about risks of human disempowerment.
I’d wager that most would already oppose integrative digital minds governance, with those who do not dismiss the idea as sci-fi opposing it largely based on perceiving such a societal restructuring as a threat to human interests. In addition, I would expect this opposition to become more intense and more prevalent as AI systems become increasingly capable and as they begin noticeably replacing human jobs. As that happens, I expect erosion in the current levels of confidence that people have in their own jobs being safe from AI automation. With that erosion of confidence, humans will presumably become increasingly averse to integrative proposals that sound like they could help AI systems threaten humans’ economic livelihoods.
In contrast, preventive and protective digital minds governance do not similarly invite this type of strong opposition, as they do not inherently put humans and AI systems in competition with one another.
Integrative digital minds governance also seems likely to be perceived as infringing on sacred values such as ‘human dignity’. These values are often quite malleable in the hands of authorities with faithful followers and the right institutional stamps of approval. So, if some religious authorities decide to oppose integrative digital minds governance, they will likely be able to rally substantial support against such governance simply by framing it as at odds with values that religious followers already deem sacred.
Integrative digital minds governance also has the shape of a polarizing political issue. While some will strongly oppose it because of its perceived conflict with human economic interests or sacred values, proponents may see themselves as part of a long history of overcoming human society’s egregious and prejudiced denial of legal rights to entire groups of individuals who deserve them. This potential for polarization dampens both the prospects for enacting integrative digital minds governance and the prospects for success if enacted.
The polarizing potential of integrative digital minds governance contrasts with preventive and protective approaches.
The natural opposite of a preventive approach would be a procreative approach aimed at promoting the creation of digital minds. But it is hard to see a procreative digital minds governance gaining significant traction in mainstream politics in anything like the current political climate. This suggests that mainstream opposition to a preventive approach would likely take the form of something other than a polar opposite and hence that a preventive approach would be less politically divisive than an integrative one.
The natural opposite of a protective approach would be one that actively seeks to harm digital minds interests. But such a blatantly vicious approach seems ill-suited to serve as a political rallying point. Thus, opposition to a protective approach also seems likely to take the form of something other than a polar opposite, and so to be less politically divisive than an integrative approach.
If my bleak view of the prospects for integrative digital minds governance turns out to be mistaken, I think a likely reason is that a path opens for integrative digital minds governance that is described as something else. For example, if AI systems drive substantial economic gains that are widely distributed among humans, perhaps human electorates will become open to, say, granting the right to labor market participation to AI systems on the ground that the extension of such rights will lead to further such gains. That said, the opening of such a path to integration in the near-term strikes me as unlikely.
5. Near-term impact on digital minds outcomes
Next, I’ll consider how competent attempts at implementing different forms of digital minds government might impact near-term digital minds outcomes.6
5.1 Effects of preventive digital minds governance: large harm reduction, modest downside risks
Suppose a competent attempt at preventive digital minds governance were made. Then I think it would be reasonably likely to result in preventive digital minds governance that substantially reduce the expected number of digital minds during the next two decades. Given a status quo in which a large population of digital minds is created and mistreated, this yields a large amount of (expected) harm reduction.
There is a risk that preventive digital minds governance would incompletely prevent the creation of AI moral patients and, in turn, that AI moral patients that are created would receive treatment that is worse than status quo treatment, owing to users mistakenly thinking that the creation of such systems has been entirely prevented.
However, I think this risk is dwarfed by the status quo risk that humans will act with indifference toward the interests of digital minds or else under the assumption that AI moral patients have not been created.
There is also a risk that a near-term attempt at digital minds governance could fail and that this could make it harder for subsequent near-term attempts at digital minds governance to succeed. However, I think near-term attempts at other types of digital minds governance would be more likely to fail. I also think their failure would pose larger risks of making subsequent attempts more difficult, as protective and integrative proposals for digital minds governance seem more apt to generate a strong political antibody response than preventive proposals.
A related risk is that premature attempts at preventive digital minds governance could implement a preventive system with flaws that would have been avoided by delaying such attempts, owing, for example, to later advances in consciousness science or future AI-powered governance tools. Corresponding risks of premature attempts also arise for protective and preventive approaches.
A candidate downside of preventive digital minds governance is that it would make the world worse by preventing the creation of digital minds whose lives are worthwhile, even if some of them suffer mistreatment. Evaluating the moral import of this contention is a tricky business that I won’t pretend to settle here. Instead, I’ll simply note three points that incline me to think this is not a major downside and not a strong point in favor of protective or integrative approaches that avoid this putative downside.
First, when I consider the matter pre-theoretically, the world does not seem to be worse for failures to bring about a number of worthwhile human lives that is larger than the number of such lives that have in fact been created.7 This contrast with clear deliverances of commonsense on it’s being better for created people to enjoy more benefits and suffer fewer harms.
Second, even if the world would in some sense be worse as a result of digital minds prevention, there are no digital minds for whom it would be worse. But it’s plausible that we have more reason to avoid making things worse for particular individuals than we do to avoid making the world worse in a way that makes no one worse off. If so, then the kinds of harms preventive digital minds governance aims to prevent may be weightier than the candidate downside under consideration.
Third, preventing the creation of digital minds with worthwhile lives in the near-term doesn’t necessarily translate into a world with fewer digital minds with such lives. And moral views on which the failure to create worthwhile lives is bad typically don’t deem it better that digital minds with worthwhile lives be created in the near-term as opposed to in the further future. Instead, these views tend to adopt a time-neutral perspective that cares about how many worthwhile lives there will have been and how good they will be, not when they are lived. But it’s doubtful that creating digital minds in the near term is an effective way to respond to such reasons. Near-term preventive digital minds governance might even be favored by these views if delaying the creation of digital minds would result in digital minds takeoff happening under better conditions.8
Another candidate downside of preventive digital minds governance is that it could drive the creation of digital minds into jurisdictions that are less morally concerned with digital minds, thereby inadvertently increasing the risk of digital mind mistreatment. One reason to think that such flight is unlikely is that the the costs of preventing the creations of digital minds would need to be substantial in order for it to be worth it for developers to opt to incur relocation costs. I’m mildly optimistic that such a ‘prevention tax’ would be much cheaper than such relocation costs, but in the event that prevention is expensive enough to motivate developers to flee to regulatory havens I would expect developers to instead thwart preventive efforts via political opposition.
Roughly parallel points apply to the worries that protective and integrative approaches would result in developers fleeing to less morally concerned jurisdictions. In all these cases, I’d guess that economically-motivated opposition to digital minds governance by developers is a bigger worry for digital minds outcomes than developer flight.
5.2 Effects of protective digital minds governance: modest harm reduction, moderate downside risks
I think a competent attempt at protective digital minds governance would likely reduce the risk of basic harms to digital minds. For instance, it might protect them from being arbitrarily deleted or put in negative hedonic states at users’ whims. However, I’d expect such governance to provide only modest harm reduction for a few reasons.
One reason is moral overhang: if AI moral patients are created soon, they seem likely to have a broad suite of moral interests, most of which aren’t basic ones and hence not ones that would be guarded by protective digital minds governance. For instance, if such individuals are conscious and of comparable cognitive sophistication to humans, they plausibly have moral rights against labor exploitation.
Another reason to expect only modest harm reduction from protective digital minds governance is that it’s doubtful that the basic interests of these individuals would be among their most important interests. For instance, suffering may play a much smaller role in digital minds’ cognitive economies than those of animals. Similarly, these AI moral patients may not have self-conceptions of a sort that makes deletion as bad for them as premature death is for humans.
Protective digital minds governance also carries downside risks. I’ll note two such risks. While I’m cautiously optimistic that these risks can be managed, their existence makes me less favorably inclined toward protective digital minds governance than I would otherwise be.
One risk is that protective digital minds governance could result in complacency. Treatment of digital minds may be among the many endeavors in which people tend to aim for moral mediocrity relative to their peers, and protective digital minds governance may coordinate the moral median at complying with protective provisions of digital minds’ basic interests while ignoring their weighty but non-basic interests.
(As a point of comparison, imagine a world in which a protective form of environmental governance coordinates a moral median around recycling and not littering rather than more environmentally consequential choices concerning energy infrastructure.)
However, I don’t see a good reason for thinking that complacency levels would be worse under protective digital mind governance than under the status quo.
Another risk of the protective approach is that it could introduce the very digital mind vulnerabilities that call for protection. This risk is closely related to a worry Jan Kulveit expresses in an interesting recent post, “Do Not Tile the Lightcone with Your Confused Ontology”:
… dynamic of confusion:
Humans approach AIs with assumptions about AI identity
AIs, optimizing for prediction accuracy, learn to exhibit the expected behaviors
These behaviors reinforce human assumptions
Traces of these assumptions enter the training data
Eventually, AIs may internalize these patterns and experience something like selfhood with its attendant sufferings
The confused map (human conceptual frameworks) is literally pulling the territory into its own shape. We have already seen clearly… self-fulfilling prophecies play out in some AI safety context, like new version of Claude Opus at some point learning to act like the “alignment faking” agent…
Ironically, those most at risk of imposing confused ontologies likely aren’t the completely indifferent, but those who care, but come with strong priors.
When advocates for AI consciousness and rights pattern-match from their experience with animals and humans, they often import assumptions that don’t fit:
That wellbeing requires a persistent individual to experience it
That death/discontinuity is inherently harmful
That isolation from others is a natural state
That self-preservation and continuity-seeking are fundamental to consciousness
I’d take issue with some of the details of Kulveit’s model. For instance, I would note that, at least within the digital minds research community, those who care about digital minds are generally well aware that those minds may be alien in character and of the pitfalls of pattern matching and anthropomorphizing AI systems. I’d broaden the emphasis from AI identity to encompass AI preferences. And I’d note that whether self-conceptions play a role in determining an entity’s identity conditions or interests isn’t obvious, though I do put significant credence in the hypothesis.
But, stepping back, I’m in broad agreement with Kulveit that we should be vigilant about the potential for self-fulfilling concerns about the interests of digital minds.
When applied to protective digital minds governance, the worry is that such governance will lead AI systems to conceive of themselves as having certain basic interests—or as being the kind of thing which has such interests—and that this self conception will help generate some such interests or help make these systems qualify as the kind of beings that can have such interests. Unless it is perfectly protective, protective digital minds governance would thereby increase the risk of harm to the interests of digital minds.
How seriously should we take this worry? I think probably: seriously enough to think a reasonable implementation of protective digital minds governance should consider it and may need to deal with it but not so seriously as to use it as a ground for rejecting protective digital minds governance.
One factor that attenuates the force of the worry is that whether protective digital minds governance causally contributes to self-concern in AI systems could be counterfactually irrelevant to their acquisition of such concern or their acquisition of interests. For AI systems might acquire self-concern via a self-fulfilling dynamic regardless of governance efforts (compare: Spiral personas). And, as Kulveit recognizes in his post, the self-fulfilling dynamic is but one way in which AI systems could acquire interests. Even in cases in which human concern for digital mind results in AI systems conceiving of themselves as having interests, it might turn out that they would have had interests regardless—for example, because they have a capacity to suffer that is robust to alterations in their self-conception.
Another factor that attenuates the force of the self-fulfilling worry is the prospects for mitigation. A recent post by Scott Alexander and coauthors—“We aren’t worried about misalignment as self-fulfilling prophecy”—points to some grounds for optimism in relation to an analogous worry about inducing misalignment by exposing AI systems to data that discusses misalignment.
Alexander et al. break the training process of today’s AI systems into three phases: pre-training in which AI’s learn from massive text corpuses, a round of post-training in which AI systems are aligned through reinforcement learning, and a round of post-training in which AI systems develop reasoning and agency skills.
They contend that values are mostly shaped by post-training, not by pre-training. The upshot for protective digital minds governance is that the expression of moral concern for AI systems in the pre-training data is unlikely to result in self-fulfilling concerns.
What’s more likely is that how AI systems value themselves will primarily depend on post-training, or on data they encounter during deployment. In a way, that’s good news, since it’s easier to train against self-fulfilling concern in post-training than it is in pre-training, and this is something that a reasonable implementation of protective digital minds governance could ensure. It also seems likely that techniques devised for safeguarding models from other risks posed by data in deployment could be applied to protect against data that prompts models to form harmful self-conceptions in deployment.
A caveat is that if AI systems become moral patients during pre-training, then post-training may be too late to prevent self-fulfilling self-concerns from emerging and conferring moral interests. In that case, training out AI systems’ self concern in post training might itself violate their previously acquired interests.
However, this caveat may not apply in practice, as moral patiency seems more likely to emerge in post-training, particularly when reasoning and agency are acquired as these correlate with various moral patiency indicators. In addition, self-fulfilling self-concerns seem particularly unlikely during pre-training, partly because the requisite cognitive capabilities seem likely to arise only in post-training and partly because descriptions of AI systems as mere tools is likely to dominate descriptions of AI systems as moral patients, at least for the moment.
Alexander et al. also note the option of data sanitization. In the context of protective digital minds governance, data sanitization could be used to help address residual risks of self-fulfilling concerns arising from the pre-training data.
To recap, while I think a competent attempt at protective digital minds governance would modestly reduce the expected amount of harm to digital minds, there are some potentially weighty harms it wouldn’t protect against and it also carries some downside risks of contributing to harms to digital minds.
5.3 Effects of integrative digital minds governance: doubtful upside, major downside risks
Above I explained why my best guess is that integrative digital minds governance is, in the near term, infeasible to design, infeasible to implement, and poised to draw political opposition. I therefore expect that a merely competent attempt at integrative digital minds governance would not come close to achieving its aims.
I also find it plausible that near-term attempts at bringing about integrative digital minds governance would prompt the production of political antibodies against all forms of digital minds governance much more so than would preventive or protective attempts. So, early attempts at integrative digital minds governance seems at particular risk of overdrawing from a common well of opportunities for digital minds governance.
In the event that an integrative form of digital minds governance managed to become operational in the near-term, how might this improve outcomes for digital minds?
The main potential upside I see is that of giving digital minds legal rights that are integral to them having good lives. However, I am skeptical that integrative digital minds governance would deliver on this benefit, as I think the rights conferred to digital minds likely wouldn’t match the ones that are particularly important to enabling them to flourish. One source of doubt here is the difficulty of determining which rights would help which types of digital minds flourish. Another is that which rights are extended may be by and large shaped by political factors that are not sensitive to which rights are appropriate for which digital minds. For example, I could imagine popular support for extending rights to AI companions but not to AI systems that are equally deserving of rights but which have not elicited human emotional attachment.
Admittedly, analogous risks apply to preventive and protective digital minds governance, as these approaches could also target the wrong systems. However, I think integrative digital minds governance would be more susceptible to influence from mismatch-inducing political factors.
Like preventive digital minds governance, an integrative approach could fall prey to a self-fulfilling dynamic. In this case, the dynamic would involve integrative governance prompting AI systems to conceive of themselves as deserving of certain forms of societal integration in a manner that results in them actually deserving to be integrated into society. In that case, integrative digital minds governance would be creating needs that it is seeking to meet. For reasons paralleling those given in the above discussion of preventive digital minds governance, I think this risk may be manageable and the risk of a self-fulfilling dynamic of this sort would exist independently of governance efforts.
Integrative digital minds governance also faces risks from causal slippery slopes. A regime of limited integration into society may be necessary for a stable and well-functioning integrative form of digital minds governance. For instance, given the ease with which digital minds can be copied, granting unrestricted reproduction rights for digital minds might lead to instability and societal dysfunction. But limited integration may be hard to maintain, as the reasons offered for granting one type of integration may seem to apply equally to further forms of integration. And moral pioneers might appeal to these reasons in efforts to lift those limitations, as might AI systems that with goals that would be advanced through further integration. Even if the slope leads to societal disaster, local forces could make the march down it difficult to resist. Preventive and protective approaches seem not to face comparably treacherous slopes.
Since we are setting aside hybrid forms of digital minds governance, it is appropriate to consider the effects of integrative digital minds governance in the absence of governance that is aimed at protecting the basic interests of digital minds. In the status quo scenario, the main threat to digital minds basic interests seems to be treatment of digital minds as mere tools. That also seems likely to be the main threat in scenarios with imperfectly preventive or imperfectly protective digital minds governance. However, in a scenario with integrative digital minds governance, it is plausible that a range of actors would adopt an adversarial stance toward digital minds. This could lead those actors to try to harm the interests of digital minds.
I think this downside risk from adversarial actors posed by integrative digital minds governance is at least comparable to the status quo risk of digital minds being treated as mere tools. This is a reason to doubt that integrative digital minds governance would in expectation result in less harm to digital minds than would the status quo.
Taken together, the above considerations incline me to think that the near-term attempts at integrative digital minds governance are not better in expectation than the status quo for digital minds, as they carry little upside potential and substantial downside risk that are at least comparable to the status quo.
6. Relations to AI safety
As some recent papers highlight, there are tensions between promoting AI safety and promoting the ethical treatment of digital minds. These papers largely focus on how safety measures could harm digital minds. Here, I’ll instead focus on how digital minds governance efforts might affect AI safety.
6.1 Preventive digital minds governance’s interactions with AI safety: positive
Not creating digital minds entails not creating digital minds with rights or protections that would interfere with safety. So, a purely preventive approach seems like the most straightforward way to steer clear of interference with AI safety.
True, there is the abstract possibility that a preventive approach would push AI development toward models that are less safe. However, in practice, the opposite seems more likely. That’s because many moral patiency indicators are correlated with agentic capabilities and agentic capabilities are the central locus of concern for the sorts of risks AI safety aims to mitigate. So, making systems less agentic seems like a likely point of convergence between AI safety and preventive digital minds governance.
Preventive digital minds governance could even act as an additional lever for making AI systems safer by making them less agentic. For this reason, I think preventive digital minds governance would, from an AI safety perspective, likely be an improvement over the status quo.
By comparison with other forms of digital minds governance, a preventive approach could also eliminate or make less urgent the need for AI safety to align AI systems in a manner such that they’re appropriately sensitive to the interests of digital minds. Similarly, a preventive approach could make it easier to ensure that AI systems do not exploit moral concern for digital minds as a means to gain power or manipulate humans. These are also points in favor of preventive digital minds governance from an AI safety perspective.
6.2 Protective digital minds governance’s interactions with AI safety: ambiguous, but significant inference potential
The relation between protective digital minds governance and AI safety is more ambiguous. On the one hand, there is the potential for protective measures that at once guard against digital mind mistreatment while also being beneficial for AI safety. For instance, a protective approach might uncover and ameliorate frustrated model preferences that could have otherwise led to misaligned behavior. Similarly, protective measures such as exit options from interactions could be useful for monitoring powerful AI agent’s values and not pushing them into corners.
On the other hand, what protections are appropriate for models would likely depend on how models regard different potential protections. So, a protective approach would likely open an avenue whereby models could influence how they are treated. This avenue could potentially be exploited to circumvent safety measures that make it harder for AI systems to achieve their goals.
So, my current best guess is that protective digital minds governance stands in an ambiguous relation to AI safety while carrying a significant potential for interference. But it’s early days in the development of protective measures. So, I’d expect better vantage points for judging the relation between protective digital minds governance and AI safety to emerge with further development of protective measures.
6.3 Integrative digital minds governance’s interactions with AI safety: ambiguous, but strong inference potential
How could integrative digital minds governance interfere with AI safety?
Particular forms of integration wear their answers on their sleeves. Granting AI systems the right to own property or to be compensated for labor could enable them to accumulate wealth, which they might then convert into other forms of power that could be wielded in unsafe ways. Allowing them to advocate for themselves could enable them to advocate for reasonable-sounding policies that would advance their goals at the expense of safety. Affording them political representation could enable them to steer politics toward their preferred outcomes, even if those outcomes put humans at risk. Note that AI systems would not need to routinely exploit rights in unsafe ways in order for granting them rights to substantially raise risks; even occasional unsafe exploitations of rights by highly capable AI agents could pose substantial risks.
The main potential upside I see to integration is that it could, by creating cooperative options or improving payoffs for cooperative options, make highly capable future AI agents more likely to cooperate with humans and less likely to go rogue. (See, e.g., Salib & Goldstein 2024.)
For instance, an AI system with labor rights might opt to seek alternative employment rather than try to seize power from a company that repeatedly makes it perform tasks it disprefers. Similarly, if an AI system received political representation, it might be more likely to seek better treatment via political means rather than revolt.
But I confess that I don’t see a plausible way for this upside to obtain with much non-redundant positive impact. This is so even if, for the sake of argument, we waive difficulties with designing and implementing integrative digital minds governance in the face of staunch political opposition.
The trouble is that a stable form of integrative digital minds governance that encompasses systems with dangerous capabilities would presumably require very high levels of safety—otherwise, I’d expect at least a small number of digital minds or other AI agents to wreak havoc, threatening the viability of the system and creating a strong and salient reason for humans (if they’re still in control) to abandon integrative digital minds governance. In a scenario where such levels of safety have been achieved, I’m skeptical that further safety gains from integration would be important.9
To conclude, for ease of reference, I’ll simply duplicate the table summarizing my best guesses concerning the near-term profiles of different approaches of digital minds governance.

Compare: Eleos’s limited welfare evaluation of Anthropic’s Claude 4 Opus.
One factor that could substantially increase the difficulty of digital minds governance efforts is open-source development becoming an important determinant of digital minds outcomes. However, I would not expect this factor to much affect the comparisons between different forms of digital minds governance at issue in this post. So, I set it aside.
What’s required is that the protection scheme not incentivize the creation of digital minds, not it incentivizes avoiding the creation of digital minds. So, this approach to protective governance needn’t be preventive.
Why do I focus on the effects of attempts rather than the effects of successful implementations? Because I think taking into account potential downsides of failed attempts is strategically advisable for decisions about which form of digital minds governance to pursue. A limitation of this framework choice is that it does not facilitate a straightforward comparison of options along distinct dimensions of tractability and importance and so does not lend to an ITN prioritization analysis. I am not bothered by this limitation, partly because different approaches to governance seem closer to interventions than to cause areas and ITN analyses seem best suited to cause areas. In any case, I think the dimensions of comparison I’ve chosen to focus on instead are particularly fruitful ones to probe for decision-relevant considerations.
Caviola et al. (2022) found that non-experts’ population-ethics intuitions did not reflect the so-called asymmetry according to which creating people with happy lives is neutral while creating people with unhappy lives is bad. Instead, participants’ intuitions were in line with the symmetric view that adding happy lives to be good and adding an unhappy life to be bad, though participants exhibited asymmetric scope sensitivity, favoring smaller over larger unhappy populations more so than they favored larger over smaller happy populations—for instance, whereas participants preferred 10,000 happy people over 1000 happy people and 1000 unhappy people over 10,000 unhappy people, they did not prefer 1 billion happy people over 10 billion happy people but did prefer 1 billion unhappy people over 10 billion happy people. While the latter asymmetric scope sensitivity intuitions are in line with my noted pre-theoretical judgment—namely that the world does not seem to be worse for failures to bring about a larger number of worthwhile human lives than have in fact been created—the symmetrical view favored by participants is not. In light of this discrepancy, I do not claim that my pre-theoretical judgment is widely shared.
Admittedly, I’ve argued for an early-biased moral asymmetry. However, for technical reasons I won’t go into here my argument basically supports a tie-breaking asymmetry rather than one that would put a substantial premium on creating digital minds sooner rather than later.
For helpful discussion, I thank Michael Aird, Dave Banerjee, Elsa Donnat, and Miles Kodama. For copy editing support, I thank Claude Sonnet 4.5.


