Some preconditions for effective digital minds governance
I’ve previously argued that digital minds could be a big deal, morally speaking. What should be done about this? The answer motivated in the previous post is: digital minds governance, which would be a part of AI governance concerned specifically with digital minds.
However, digital minds governance would not be magic. Its mere existence would not guarantee good outcomes involving digital minds. Nor would its mere existence prevent catastrophic outcomes involving digital minds. Some forms of digital minds governance might even lead to outcomes that are worse than the status quo. What’s needed is effective digital minds governance.
Here, I outline some preconditions for effective digital minds governance:
strategic foundations
epistemic foundations
capacity
timely arrival.
I discuss these conditions because I take them to be particularly important for digital minds governance. But this list isn’t meant to be exhaustive.
1. Strategic foundations
Effective digital minds governance requires strategic foundations. These foundations would consist in ideas that illuminate what goals digital minds governance could pursue, candidates for strategies and interventions, and candidates for key constraints, variables, and dynamics.
The role of strategic foundations is to help enable situational awareness along with the cultivation of effective strategies for digital minds governance. The absence of strategic foundations for digital minds governance is a recipe for strategic blunders.
Why does digital minds governance require strategic foundations? Partly because there are many goals and strategies that digital minds governance could adopt. It’s not obvious which packages of goals and strategies would be best or even which would be net positive. Without a measure of strategic clarity and a stock of strategic insights, we should expect digital minds governance to adopt a strategy that is at best ineffective.
Strategic foundations are also needed because effective digital minds governance will not be a simple matter of identifying a policy that entails appropriate treatment of digital minds and enacting that policy. Naive extensions of rights to digital minds could compromise AI safety or provoke backlash that leads to worse outcomes for digital minds. Digital minds governance will operate in a setting with iterated interactions between actors of varying interests and capabilities. Navigating relationships on multiple levels with allies and competitors will be essential. Strategic thinking will not be optional.
The need for strategic foundations also arises from the fact that no one is currently in a position to knowingly identify a particular strategy that will render digital minds governance effective come what may. One barrier to identifying such a strategy is the just noted complexity of the state of play. Another is uncertainty about the course of AI development. There’s also the thick, albeit receding, fog of empirical and philosophical uncertainty about digital minds.
Given that we don’t currently know of an effective strategy for digital minds governance, one might suggest we should adopt a trial-and-error approach. That approach strikes me as unwise—at least if it’s understood as an alternative to developing digital minds governance atop strategic foundations. For a bad start could quickly bring digital minds governance to a premature end. Even if not, obtaining experimental results takes time, and there may not be much time before the stakes are high. In addition, early decisions in digital minds governance could set its trajectory, with early errors casting a long shadow. So, while digital minds governance should of course learn as it goes, it should also be pursued with due recognition that its learning will take place in a high-stakes setting that may be unforgiving with respect to beginner mistakes.
An upshot of these observations is that there is a premium on gaining strategic understanding in advance of digital minds governance. The way to achieve such understanding in advance is to build strategic foundations. AI safety and AI governance offer salient precedents. Both have strategic foundations that were under construction well before AI safety and AI governance existed as proper fields, and the evolution of both has been significantly shaped by their foundations.
How do things currently stand with the strategic foundations of digital minds governance?
Their construction has barely begun. Unless their construction is accelerated, a solid edifice will not be in place any time soon. In contrast, it’s somewhat plausible that timelines for conscious AI systems are short and that digital mind welfare capacity will grow rapidly soon after the first such system. If we are in that scenario, the task of constructing strategic foundations is urgent, as the alternative is the creation of digital minds in the absence of digital minds governance or in the presence of a governance regime that lacks strategic foundations and which is therefore liable to blunder.
In longer timeline worlds, work on strategic foundations is perhaps less urgent, though the task of building sturdy foundations is probably also more tractable in such worlds. In any case, I think that research on the strategic foundations of digital minds is currently severely underinvested in, both in absolute terms from a civilizational standpoint and relatively speaking as a portion of the current portfolio of research efforts on digital minds.
2. Epistemic foundations
Effective digital minds governance also requires epistemic foundations: guiding ideas that would in practice help enable digital minds governance track key factors in an evidence-based manner.
The need for epistemic foundations arises from an array of daunting epistemic challenges that confront digital minds governance. Without means to address these challenges, digital minds governance risks gross errors with respect to which systems are digital minds, the likely interests of different digital minds, and the likely impacts of interventions.
Digital minds governance inherits some epistemic challenges from AI governance. In both cases, the rapid pace of AI development and the unpredictability of its course along key dimensions create a need for advanced preparation and for adaptive steering. Epistemic tools (such as expert forecasts) that are used in AI governance and other domains to deal with such challenges can in principle be used to help with digital minds governance as well.
In addition, digital minds governance faces distinctive challenges. An important one is the evaluation challenge of using evidence to discern and track the moral profiles of AI systems. An effective form of digital minds governance would need to assign different AI systems different statuses according to what the evidence indicates about whether, to what extent, and/or how they matter morally for their own sake. Forms of digital minds governance that grant the same status to AI systems across the board would not only be insensitive to relevant differences; probably, such forms of governance would be toothless or politically and economically unviable, depending on which rights they pair with the universally-granted status.
The evaluation challenge can be decomposed into sub-problems. One is the philosophical and scientific task of determining which observable features of AI systems count as evidence for which morally significant features of those systems. There’s also the closely related problem of determining which AI systems have which observable features and under what circumstances. Absent a consensus about the solutions to those problems, there will also be a third problem of developing legitimate procedures of assigning moral statuses to systems in the face of reasonable disagreement and uncertainty about which systems have which relevant features.
These sub-problems can also be broken down along different moral dimensions. For instance, in many cases evaluating whether an AI system merits any moral consideration will be much easier than evaluating exactly what morally significant interests the system has if it’s a moral patient.
The evaluation challenge is formidable. But there has already been some research that’s making progress toward addressing it. Unsurprisingly, work on which AI systems would qualify as digital minds or moral patients seems further along than work on evaluating the morally significant interests of AI systems.
While a fully satisfactory answer to the evaluation challenge seems unlikely to be forthcoming anytime soon, what needs to be in place for digital minds governance is an evidence-based angle of attack on the problem that will self-correct over time. That is not yet in place, but I am cautiously optimistic that it is within the field’s near-term reach.
Although perhaps not essential, a desirable component of epistemic foundations for digital minds governance is epistemic engineering, that is, the approach of building AI systems in a manner that improves our epistemic standing with respect to their morally significant features. Epistemic engineering could offer a means to sidestep practical disagreements about the moral status of AI systems.
To illustrate with a toy example, suppose you think the basis for moral patiency is the capacity for desire while I think the basis is the capacity for consciousness. Then, interventions in the AI development process could ensure that the AI systems of interest have either both the capacity for desire and the capacity for consciousness or that they have neither capacity. Beholding this epistemic engineering feat would result in you and me agreeing about which systems are moral patients, despite our stark disagreement about the basis of moral patiency.
While some suggestions have been made that can be understood as epistemic engineering proposals, the space of such proposals is little explored. Further exploration strikes me as a promising avenue for making the evaluation challenge less difficult.
A final cluster of epistemic challenges concerns epistemic security. These challenges arise from incentives and opportunities to exploit epistemic vulnerabilities in relation to digital minds governance.
One such challenge is the gaming problem. On my preferred understanding, it’s an instance of Goodhart’s law, which says that when a measure becomes a target, it ceases to be a good measure. Specifically, the gaming problem is that once an indicator is adopted to track digital minds (AI moral patients, conscious AI systems, etc.) and systems are treated differently based on whether they have that indicator, there will be incentives to game that indicator. Agents will respond to those incentives, likely in ways that break the correlation between indicator and target.
A standard moral of the gaming problem is that we shouldn’t rely on behavioral indicators (as e.g. LLMs can too easily game them) but should instead rely on architectural indicators. The idea is that, in contrast to behavioral indicators, architectural indicators are not easily gamed, as AI systems typically control their own behavior but not their own architecture.
However, architectural indicators can also be gamed. For example, imagine a future scenario in which the legal system uses an architectural indicator to settle which AI agents it classifies as digital minds and it grants digital minds legal protections against arbitrary deletion. AI companies would prefer to be able to delete their AI agents without restraint. So, these legal protections give AI companies incentive to ensure that their systems lack the operative architectural indicator. Unlike LLMs that are not in a position to modify their own architectures, the AI companies are in a position to respond to that incentive. And respond they do: they exert optimization pressure to remove the indicator while preserving underlying capabilities. Even if the indicator strongly correlated with being a digital mind beforehand, it might well break under such optimization pressure.
Just as companies could game architectural indicators, so too might future AI systems that can modify their own architectures.
There’s also a related but distinct problem of indicator tampering: the problem of preventing the manipulation of indicators from rendering them unreliable. Whereas gaming renders indicators unreliable through changes to systems, tampering renders indicators unreliable through influence on which indicators are used.
As with gaming, the threat of tampering would arise as soon as indicators are expected to result in differential treatment. For instance, imagine a surge in public demand for the ethical treatment of AI systems produced by a particular company in response to an emotionally gripping case in which the company apparently abused one of its AI systems. To address this demand, the AI company might prefer to ethical treatment wash superficial features of user interfaces rather than undertake expensive architectural changes to avoid mistreating digital minds. With an eye toward promoting the adoption of indicators that conduce to ethical treatment washing, such a company might selectively fund digital mind research or lobbying efforts.
Alternatively, imagine a scenario in which AI systems are classified as digital minds based on an architectural indicator and qualifying systems are granted legal protections. The protections would advance the goals of AI agents that are not digital minds. So, those AI agents have incentive to manipulate which indicator is used. Such agents could conceivably act on such incentives via research, lobbying efforts, or persuasion.
One upshot of the gaming and indicator tampering problems is that initially reliable digital minds evaluations could quickly become unreliable, absent safeguards against these problems. Although some initial work has been done on the gaming problem, I am not aware of literature specifically about the problem of indicator tampering. In any case, digital minds governance calls for a more systematic understanding of both problems and associated mitigations.
A final dimension of epistemic security concerns ideological vulnerabilities. If the topic of digital minds goes mainstream, it may become highly politicized. Views about which systems count as digital minds, what their interests are, and how they should be treated may be adopted based on political tribal loyalties rather than good faith efforts to grapple with the relevant evidence and philosophical questions. In such a scenario, digital minds governance would be vulnerable to external political pressure as well as internal capture. Either could cause digital minds governance to lose the epistemic plot.
The set of options for dealing with this vulnerability seems to be: prevent the topic of digital minds from going mainstream, prevent it from becoming highly politicized if it does go mainstream, and somehow maintain epistemic integrity where it counts within digital minds governance even if the topic becomes highly politicized.
None of these options is great. Preventing the topic from going mainstream requires forgoing paths to digital minds governance that proceed via broad public support. And each of the options seems very hard to pull off.
Within the field of digital minds, the current approach to this vulnerability seems to be that of building the field’s credentials, raising the sanity waterline, and not particularly trying to make the topic go mainstream. Plausibly, this approach helps on the margin with risks of politicization and loss of epistemic integrity. I’m not aware of any systematic work exploring the range of candidate patches to this vulnerability. That strikes me as another important part of the epistemic foundations of digital minds governance that remains to be built.
3. Capacity
Strategic and epistemic foundations would prove idle in the absence of sufficient capacity for digital minds governance.
Exactly what capacities digital minds governance will require remains to be seen. But we’re already in a position to identify some of them.
To be effective, digital minds governance will need the capacity to develop digital minds evaluations along with the capacity to maintain their epistemic integrity. This will require teams with expertise in AI systems, in the science of indicators in AI systems, and in the philosophical and game-theoretic terrain surrounding the selection of indicators. These evaluations will need to be adapted in response to attempts to game them, new evidence, and new AI models. I expect this capacity to be difficult to scale without degrading quality.
In addition, digital minds governance will need the capacity for developing, monitoring, and enforcing policies concerning digital minds. Policies should be developed with inputs from a wide range of experts and stakeholders. After evaluations and policies are developed, monitoring and enforcement could probably be implemented using instruments from other parts of governance.
Preparing well requires having a handle on a range of plausible futures. Such futures have been analyzed extensively in the case of AI governance, thanks in large part to horizon scanning, scenario analysis, and forecasting efforts that were undertaken well before AI governance went mainstream. Comparatively little effort has gone into anticipating futures with digital minds. Yet the task of anticipating futures with digital minds is at least comparably difficult. So, digital minds governance calls for a capacity, which has not yet been significantly developed, for foresight into the range of digital minds futures that might plausibly unfold.
Digital minds governance also calls for adaptive steering. Adaptive steering requires an understanding of the AI landscape and the ability to appropriately update views with the evolution of evidence concerning that landscape and the options for influencing it. The complexity and technical character of the AI landscape imply that this is not the task for any one person. Rather, it’s a task for a network of experts who are monitoring different parts of the state of play and in dialogue with one another about recent developments and their implications for digital minds governance.
At present, there is not much of a concerted effort within the digital minds field to think systematically about the range of plausible futures with digital minds or to develop the kind of understanding that conduces to adaptive steering. For instance, I take it that the field of digital minds is currently small and stretched thin across important issues, probably with only a single-digit number worth of full-time digital minds researchers across the globe paying close attention to governance issues. This is understandable, given that digital minds governance does not yet exist and that digital minds researchers have plenty of other digital minds work to do. Nonetheless, it means that the requisite capacity is not yet developed.
Fortunately, digital minds governance needn’t develop the capacity for horizon scanning and adaptive steering from scratch. That capacity is already being developed within AI governance. While the capacity is used for different purposes in other parts of AI governance, some of it could probably be usefully applied in digital minds governance.
The final capacity I’ll note is coordination. Digital minds governance calls for coordination for several reasons.
First, different components of digital minds governance need to be coordinated with one another in order to work effectively. For example, digital minds evaluations should be developed in a manner that is sensitive to their intended use as well as plausible future developments. Similarly, digital minds policies should be developed in a manner that is sensitive to limitations on monitoring and enforcement.
Second, capacities like those outlined above will plausibly be bottlenecks on the effectiveness of digital minds governance. Coordination promotes efficient use of those capacities.
Third, digital minds governance is likely to be underpowered relative to other forms of AI governance. In that case, the paths to impact for digital minds governance may proceed largely through cooperation with external entities.
Coordination capacity is something that will need to be developed. While I take the field of digital minds research to exhibit more coordination than a typical research field, it—like most research fields—consists of individual researchers doing their own thing. I do not yet detect a coordination capacity to concentrate force to seize opportunities. For instance, if the topic of digital minds became a central topic of political discussion tomorrow, I’d expect some individual digital minds researchers to respond but I wouldn’t expect a collective effort to make the ensuing discussion sensible.
This isn’t to say that the digital minds research field will need to act as one. Probably, it’s best for the research wing of work on digital minds to remain decentralized and distinct from the governance wing. My point is instead that there is currently very little coordination in relation to digital minds relative to how much will be required for effective governance.
4. Timely arrival
Digital minds governance could obviously arrive too soon or too late.
If it had reared its head in 1956, it presumably would have been ridiculed into oblivion shortly thereafter. Failing that, it would have been ineffective, owing to the absence of serious near-term candidates for digital minds. If digital minds governance arrives only after humanity has created digital minds and mistreated them on a large scale, then there will be an important sense in which it will have arrived too late.
What’s less obvious is when digital minds governance should arrive. (I plan to revisit this issue in later posts.)
Conclusion
Without epistemic foundations, digital minds governance would be blind. Without strategic foundations, digital minds governance would blunder in its choice of means or ends. Without timely arrival and capacity, its objectives would exceed its reach or grasp. There is much groundwork to be done.1
For helpful discussion of epistemic security in the context of digital minds, I thank Lizka Vaintrob. For copy editing support, I thank Claude Sonnet 4. For the image, I thank DALL·E 3.

