Digital minds advocacy and the unilateralist's curse
Confronting and countering the curse
1. A thought experiment
Let’s start with a toy example:
Imagine there are ten digital minds advocacy organizations.
Each organization is staffed by intelligent individuals, all of whom are sincerely committed to reducing the risk of digital mind mistreatment, i.e. mistreatment of AI systems that merit moral consideration for their own sake, owing to their potential for morally significant mental states.
When it comes to evaluating whether a given intervention will reduce those mistreatment risks, each organization is quite reliable. How reliable? In tricky cases each organization has a 90% probability of correctly judging whether the intervention would reduce mistreatment risk.
In a case in point, frontier AI companies announce plans to develop AI systems with a new architecture, one that makes these systems substantially more likely than previous systems to be moral patients. Responding to this announcement, each organization independently deliberates about whether now is a time to politicize digital minds, transforming the topic into a much debated issue in national politics.
Injecting the topic into mainstream politics later—after the science of digital minds is further along—would likely result in better policies toward digital minds. However, those policies might be enacted too late to prevent widespread mistreatment of digital minds. The difficulty of weighing these factors is what makes the case tricky.
Nine out of ten of the organizations (correctly) conclude that it’d be better to avoid politicization for now. However, the tenth organization reaches the opposite conclusion. In light of that conclusion, the organization proceeds to politicize the topic of digital minds.
Clearly, something has gone wrong in this scenario. Each member of a group of reliable and aligned actors does what they think is best. Almost everyone correctly determines which action would best advance their shared goals. And yet that action is not undertaken. Instead, a single actor unilaterally imposes a worse alternative.
Part of what’s cursed about this situation is that it features a systematic bias toward a type of error that is predictable, preventable, and yet not prevented.
It was predictable not because the organization that unilaterally politicized the topic of digital minds harbored misaligned goals. (They had the same goals as everyone else.) And not because that actor was especially prone to error. (They weren’t.)
Rather, the undesirable outcome is unsurprising because the process of determining whether digital minds would be politicized left the decision up to the most optimistic, and many fallible actors were given the opportunity to be optimistic to the point of detriment.
The situation is also cursed for the unilateralist actor in particular. Like everyone else, they did their best to advance shared goals in light of their own sincere judgment about the value of the initiative. And yet they single-handedly and unwittingly acted against those goals.
2. The unilateralist’s curse
The foregoing example is an instance of what’s known as the unilateralist’s curse. Here’s a first pass characterization of the curse’s core:
“The challenge of the unilateralist’s curse is that decisions about whether to pursue [a] potentially harmful [act] are left to the most optimistic outlier. In cases where the most optimistic view is that [the act] should be performed, but the median view is that it should not, the research probably should not be performed. Yet due to the unilateralist’s curse, it will probably happen anyway.”
To a second and good enough approximation for this post, the unilateralist’s curse holds when the following conditions are met or met near enough:1
Shared Goals: Multiple actors have shared goals.
Actors with Unilateral Powers: Multiple actors with those goals are in a position to unilaterally undertake an initiative, X, regardless of cooperation or opposition on the part of other actors.
Impact: Whether X is undertaken will significantly influence the prospects for achieving the shared goals.
Fallibility: The actors independently judge X’s value relative to those goals. Their judgments vary owing to imperfect reliability. Their fallibility is such that the actors don’t know for sure whether X’s value is positive or negative.
Independent Decisions: Each actor decides whether to undertake X based on their own judgment of X’s value: they undertake X if and only if they judge X to be positive relative to their shared goals.
Likely Sub-optimality: Consequently, while the value of X is independent of the number of actors, the probability that X will be taken increases with the number of actors. As a result, with enough actors, X will be undertaken with higher probability than is optimal; and, under iteration, such initiatives would be undertaken too frequently.
The unilateralist’s curse arises in various domains. Consider decisions about whether to:
Disclose information about how to build nuclear weapons.
Undertake gain-of-function research with the aim of reducing biorisks.
Leak national security secrets.
Introduce invasive species.
Tell people who haven’t watched a mystery movie yet how it ends.
In all these cases, if all the relevant actors decided how to act just based on their own judgments of its value, we should worry that the determination of outcomes is being left up to those who have overestimated the upside of an initiative or underestimated its downside.
3. Could the unilateralist’s curse afflict digital minds advocacy?
To address this question, let’s consider whether digital minds advocacy might meet each of the conditions for the curse. While doing so, we should bear in mind that the outlined conditions don’t need to hold perfectly or across the board in order for the curse to apply.
3.1 Shared Goals
It seems safe to assume there will be substantial overlap in many digital minds advocates’ goals. For instance, many advocates will presumably aim to lower the probability of outcomes in which digital minds suffer on a large scale.
3.2 Actors with Unilateral Powers
Digital minds advocacy organizations are in a position to take various unilateral actions in order to affect outcomes for digital minds. A particular organization might try to:
start an AI rights movement,
install digital minds as a mainstay in political discourse,
persuade governments to pass legislation on digital minds,
initiate adversarial relationships with actors who manifest indifference or worse toward digital minds.
If such actions were undertaken—or if they were undertaken too early—they might trigger an immune response to digital minds advocacy that dims the prospects for further efforts to improve outcomes for digital minds. As in other domains, kicking digital minds initiatives off on the wrong foot could have lasting negative consequences.
I include the four above examples in particular because they strike me as important candidates for actions that could be counterproductive for digital minds advocacy in a manner that is mediated by the unilateralist’s curse.
3.3 Impact
Digital minds advocates will presumably be acting under the assumption that their actions matter to digital minds outcomes. The examples offered above suggest that some important decisions digital minds advocates face will be ones in which they can act unilaterally. So, in line with the Impact condition for the curse, digital minds advocates may well face decisions about whether to unilaterally undertake an action that significantly impacts shared goals.
3.4 Fallibility
Weighty decisions concerning digital minds will be entangled in a web of philosophical and empirical uncertainties and complexities. Fallibility is a facet of the terrain that digital minds advocates must learn to live with, not one they can hope to avoid.
3.5 Independent Decisions
There isn’t an authority that reigns supreme over digital minds advocates. Nor is there likely to be in the next decade. The default state of affairs is one in which many sub-groups of digital minds advocates will operate autonomously from one another.
Admittedly, digital minds work is currently somewhat coordinated. But current coordination levels seem likely to decrease as the size of the field grows, simply because coordination difficulty increases with number of actors. In any case, perfect independence of decisions isn’t required for the cursed dynamic to unfold. Ample independence seems likely to obtain.
But there is an important further element of the Independent Decisions condition, namely that actors decide based on their judgment of the value of the initiative. Some ways of countering the curse provide alternative bases for evaluation and decision, often ones that take into account not just the value of the initiative but also the looming threat of the curse.
Unfortunately, there’s reason to think that digital minds advocates won’t by default opt for a decision or evaluation procedure that counters the curse. For there’s empirical evidence that individuals with a wide range of backgrounds—from policy and AI research to medicine and law—reason in ways that lend to the unilateralist’s curse in situations in which the curse can arise.
Evidently, there is no necessary moat between digital minds advocacy and the conditions for the unilateralist’s curse. So, digital minds advocacy is vulnerable to the curse.
4. Confronting the curse
Before turning to remedies, it will help to notice a few things about the curse that bring it into sharper focus.
First, within cases, enabling unilateral actions for all parties and requiring consensus are two sides of the same coin. That coin isn’t always cursed. The curse arises only when the unilateralist side biases the coin toward sub-optimal outcomes. Opportunities to act unilaterally can be net beneficial when the downsides are relatively minor and easily reversible or when the costs of acting unilaterally are borne by the actor while the benefits accrue to all.
Second, for a given option, allowing unilateral action and requiring consensus are different ends of a spectrum. Viewed in light of this spectrum, the unilateralist’s curse is a limit case of a wider class of spells that can befall actors that are aligned but fallible.
For instance, imagine the following case:
Each of many digital minds advocates faces a choice about whether to try to start an AI rights movement now. No one digital minds advocate is in a position to start such a movement on their own. But any two advocates are, if they try.
Initiating such a movement now would in fact be sub-optimal relative to the goals of digital minds advocates. Digital minds advocates decide what to do based on their private judgments of the value of starting an AI rights movement.
While most digital minds advocates recognize that starting such a movement now would be bad by the lights of their shared goals, two advocates make the contrary judgment and start the movement.
Here too, something has gone wrong. All of the actors share a common goal. Almost all of them recognize a certain action would hamper the achievement of that goal. And yet a small number of actors jointly undertake that action.
I draw attention to this generalization of the curse because, as the above example illustrates, one can escape the unilateralist’s curse while falling prey to its generalization. Fortunately, the generalized curse seems susceptible to some remedies to the unilateralist’s curse, including ones I’ll discuss below. But for ease of discussion, I’ll mostly continue to focus on the unilateralist’s curse, leaving extensions to its generalization as an exercise.
Can we generalize still further?
Yes, the unilateralist’s curse can be seen as a special case of generating negative externalities. For the unilateralist imposes costs on others to which they do not consent.
But this generalization takes us into a problem space where remedies to the unilateralist’s curse do not robustly follow. Negative externalities can arise in the absence of shared goals. Shared goals are important to the unilateralist’s curse (as well as the generalized curse): these goals at once make the damage inflicted by the curses tragic and unlock counter curses.
Third, self-favoring biases exacerbate the curse.
It’s a familiar point that individuals tend to be biased toward ideas that they came up with themselves. Such bias can come from psychological ownership bias, confirmation bias, or epistemic blind spots that enable the individual to come up with the idea. And the bias can feed into overly rosy judgments about the value of initiatives, leading individuals to generate and unilaterally enact flawed ideas whose flaws they do not fully appreciate.
Fourth, the unilateralist’s curse can arise when the value of an initiative—such as bringing the topic of digital minds to mainstream politics—is independent of who undertakes the initiative.2 However, in realistic scenarios, the value of initiatives in digital minds advocacy likely will depend on who undertakes them.
That’s because, in realistic scenarios: (a) actors are likely to vary in their reliability, resulting in more-error-prone actors taking undesirable unilateralist actions more often than less-error-prone actors and (b) the higher an actor’s error rate, the more errors they will make in executing an initiative, thereby lowering the initiatives (expected) value.
So, whereas self-favoring bias makes the curse more likely to inflict damage, variation in reliability increases the amount of damage the curse is disposed to inflict.
Fifth, in realistic scenarios, there will likely be variation in risk tolerance among those in a position to undertake initiatives in digital minds advocacy. In cases that are subject to the curse, we should expect unilateralist action to be undertaken more often by less risk-averse actors (or more risk-seeking actors). Insofar as initiatives will tend to be less well executed by individuals toward the tail of the risk-seeking end of the distribution, variation in risk tolerance among digital minds advocates will also increase the amount of damage the curse is disposed to inflict.
Sixth, the risk that the curse will lead to undesirable unilateral actions tends to increase with the number of actors who are in a position to undertake such actions. Given that the number of digital minds advocates will grow, there is reason to expect risks the curse poses to digital minds advocacy to grow as well.
5. Counter Curses
I’ll now outline some counters to the curse that digital minds advocates can use when they find themselves in unilateralist situations, i.e. situations in which the initiative can be undertaken unilaterally by any group member regardless of others actions.3
0. Situational awareness
A precondition for various counters to the unilateralist’s curse is awareness that one is in a cursed situation. In concert with awareness of the curse, awareness that one is in a unilateralist situation enables deployment of counter curses. So, a first step toward counteracting the unilateralist’s curse is to notice that one is in a unilateralist situation.
0.5 Err toward conformity and away from unilateral mistakes.
In the paper in which they introduce the unilateralist’s curse, the authors (Bostrom, Douglas, and Sandberg) put forward:
The Principle of Conformity: When acting out of concern for the common good in a unilateralist situation, reduce your likelihood of unilaterally undertaking or spoiling the initiative to a level that ex ante would be expected to lift the curse.
As they note, there are different approaches to complying with this principle. The principle can be satisfied in multiple ways because it doesn’t specify how to reduce your probability to a level that would be expected to lift the curse.
Even so, since which counters to the curse are most desirable will vary, the Principle of Conformity serves as a useful pointer to the kind of options one should consider when trying to counter the curse.
For example, in situations in which the curse would manifest through premature action, the principle suggests erring toward undertaking the initiative later than you judge optimal, as erring in that direction in such cases would tend to bring outliers’ decisions closer to the wisdom of the crowd.
Counter 1. Check with others
Just as sub-optimal outcomes become more likely in cursed situations as the number of actors increases, so too do such outcomes become more likely with higher error rates among actors. A good way to reduce the error rate and so to mitigate the curse is to check with others.
To illustrate, imagine the following variant of the example the beginning of this post:
One digital minds advocacy organization judges that politicizing the topic of digital minds to be positive. The organization doesn’t immediately act on that judgment. Instead, this organization asks the others what they think about the value of politicizing the topic of digital minds. Upon learning that their peer organizations all expect politicizing digital minds to be net negative and why, the organization lowers its estimate of the value of that action and decides not to take it. The curse is thus avoided.
Counter 2. Perform basic checks before pursuing initiatives
Proactively working to reduce errors in one’s evaluation can also help to counter the curse. Working to reduce evaluation errors is an especially important counter for digital minds advocates who are (like humans generally) vulnerable to self-favoring bias in their evaluations. For, as we’ve seen, self-favoring bias exacerbates the curse.
How can one work to reduce errors when evaluating an initiative? For starters, one can perform basic checks on whether an idea should be pursued before deciding to enact it.
For example, one can ask—oneself, aligned actors, or a non-sycophantic LLM:
How confident should I be that others haven’t already come up with this idea?
If they haven’t why not?
If they have, why hasn’t anyone acted on the idea in a manner that’s come to my attention?
These questions are worth asking because—prior to trying to answer these questions—it should be a live hypothesis that a candidate initiative has not been pursued by other digital minds advocates because pursuing that initiative would be harmful to the cause.
So, investigating these questions may serve to uncover initiative downsides that one might have otherwise overlooked.
Counter 3. Red team initiatives
Another way to guard against the curse through error reduction is red teaming: actively scrutinize positive evaluations of initiatives for flaws before undertaking initiatives on the basis of those evaluations.
This could involve double-checking the inputs to the evaluations, searching for important considerations that were omitted from the evaluations, checking the reasoning in the evaluations, sanity checking the results using a different evaluation method, and performing a pre-mortem in which one reflects on how the initiative could fail.
To illustrate how such red teaming could help, let’s return to the above example in which 10 digital minds organizations each with a 10% error rate with respect to tricky cases of evaluating interventions. Suppose that, by default, none of these organizations has their evaluations red teamed. And suppose that internal red teaming would reduce their error rate by 3% while external red teaming (e.g. from other digital minds organizations) would lower the error rate by a further 4%.
In this version of the case, if each digital minds organization opts to internally and externally red team their evaluations, then their error rates would drop from 10% to 3%. This would raise the probability of escaping the curse from ~35% to ~74%.
Counter 4. Decide with others
Deciding with others whether to undertake an initiative offers several safeguards against the curse.
One is that it provides an opportunity to pool information that is relevant to whether an initiative should be undertaken, thereby making it less likely that the initiative will be mistakenly undertaken because the initiating actor lacks a crucial piece of information that reveals the initiative to be net negative.
Similarly, deciding with others provides an opportunity to reduce error rates by having different actors double check each others’ reasoning for and against an initiative.
Even setting aside information sharing and double checking, deciding with others through a voting procedure could lift the curse. That’s because voting procedures can ensure that decisions about whether to undertake initiative reflect a moderate view about the value of those initiatives rather than the most optimistic views.
Deciding with others can also provide opportunities to lift the curse through bargaining. Bargaining holds promise as a means to lift the curse for both individual decisions and sets of decisions. For individual decisions, would-be unilateralists may be brought into the fold by sweetening alternatives to their preferred initiatives. For sets of decisions, bargaining can lift the curse by enabling would-be unilateralists to trade away their unilateralist option in exchange for (by their lights) improved outcomes in other situations.
To illustrate, suppose:
One digital minds advocacy organization favors starting an AI rights movement now. Another digital minds advocacy organization instead favors launching a lobbying campaign targeting Congress. The pro-movement organization and the pro-lobbying organization each think that the other’s preferred initiative would interfere with their own. Moreover, both agree that a patient technocratic approach is preferable to simultaneously starting an AI rights movement and a lobbying campaign.
In this case, there is a bargain to be struck: the pro-movement organization and the pro-lobbying organization can make a deal to cooperate in pursuit of their second most preferred outcome so as to avoid their least preferred outcome.
Counter 5. Adopt more-cautious standards in unilateralist situations
The counters outlined above largely rely on the availability of other actors. But there may be occasions when digital minds advocates must make a consequential decision with little or no consultation with others.
In such cases, the risk that one will unwittingly serve as a conduit for the curse can be reduced by adopting more-cautious standards for action.
Specifically, upon finding oneself in a unilateralist situation but unable to consult with others, one could take measures against the curse by:
setting a higher than usual bar for how confident one needs to be that an initiative is positive in order to undertake it,
setting a higher than usual bar for how valuable one needs to estimate an initiative to be in order to undertake it, and
being more risk averse than usual (e.g. with respect to what types or levels of risks one takes as a sufficient reason to refrain from undertaking an initiative).
A limitation of these suggestions is that they don’t provide much guidance as to where to set these thresholds. The next measure improves on this score while also being available in cases where one cannot consult with others.
Counter 6. Condition on making a difference
The final counter is that of conditional reasoning. On this approach, one evaluates the initiative conditional on one’s decision making a difference to whether the initiative is undertaken.
The expected value of an initiative will often be much lower conditional on one’s having unilaterally triggered an initiative, since others opting not to undertake the initiative in those cases provides strong evidence that one is overly optimistic that its value is positive. In cases with variation in actor reliability, one’s undertaking the initiative unilaterally is also evidence that one is a more-error prone actor who will execute the initiative in a sub-optimal manner.
By driving down the expected value estimates, conditional reasoning makes it less likely that one will mistakenly judge the value of an initiative to be positive and proceed to undertake it when it is in fact negative. That’s how conditional reasoning counters the curse.
Conditional reasoning also has intuitive appeal independently of the curse. Intuitively, what happens in cases in which your action makes no difference isn’t decision relevant; so it makes sense to focus on those cases where it does make a difference.4
Another attractive feature of conditional reasoning is that it can be naturally applied when timing of the initiative is of central concern, as one might take to be the case for digital minds initiatives such as starting an AI rights movement, installing digital minds as a mainstay in political discourse, or persuading governments to pass legislation on digital minds.
In all of these cases, single handedly undertaking the initiative earlier is evidence that one undertook it too early. When conditioning on one’s triggering the initiative at various times, the earlier times will tend to take bigger hits in expected value. Why? Because the earlier you trigger an initiative, the more plausible it is that you undertook it both much earlier than others would have undertaken it and detrimentally early. By the same token, the expected value of later times—times closer to when one expects other digital minds advocates to undertake the initiative—will tend to take smaller hits from conditioning on your action making a difference.
In this fashion, conditional reasoning provides protection against premature unilateralist action by shifting the expected value distribution of would-be unilateralists in favor of later action.
6. Conclusion
To recap, digital minds advocacy appears vulnerable to the unilateralist’s curse, which could lead to consequential mistakes in digital minds advocacy. On the bright side, when digital minds advocates find themselves in unilateralist situations, they can avail themselves of the following counters to the curse:
Notice when you’re in situations threatened by the curse.
Look for ways to lift the curse by erring toward conformity and away from unilateralist mistakes.
Consult with others about whether an initiative is a good idea before acting.
Perform basic checks on whether undertaking an initiative is a good idea.
Red team initiatives before undertaking them.
Decide with others whether to undertake initiatives.
Adopt more-cautious standard.
Judge prospective actions conditional on your taking them making a difference.5
This characterization is adapted from “The Unilateralist’s Curse and the Case for a Principle of Conformity”, which describes the unilateralist’s curse as follows.
each of a number of agents is in a position to undertake an initiative, X. Suppose that each agent decides whether or not to undertake X on the basis of her own independent judgment of the value of X, where the value of X is assumed to be independent of who undertakes X, and is supposed to be deter mined by the contribution of X to the common good. Each agent’s judgment is subject to error—some agents might overestimate the value of X, others might underestimate it. If the true value of X is negative, then the larger the number of agents, the greater the chances that at least one agent will overestimate X sufficiently to make the value of X seem positive. Thus, if agents act unilaterally, the initiative is too likely to be undertaken, and if such scenarios repeat, an excessively large number of initiatives are likely to be undertaken. We shall call this phenomenon the unilateralist’s curse.
As formulated in the original paper, this is one of the conditions of the curse.
See also the section “Dealing with the Unilateralist's Curse, practical advice” of Aaron Scher’s “The Unilateralist's Curse, An Explanation” and section 3 of the original paper on the unilateralist’s curse.
I think this intuition is broadly correct and that conditioning on those cases where one’s action makes a difference is a useful counter to the curse. However, I do not say that this type of conditional reasoning is correct without qualification. Indeed, I think it may need to be qualified to handle some cases in which one’s action belongs to a set of actions that make a difference jointly but not individually and there are evidential correlations between what act you take and what acts others take.
For helpful discussion, I thank Michael Aird, Joshua Lewis. For helpful discussion and copy editing support, I thank Claude Sonnet 4. The picture was created by the Copilot image generator.

