What might we infer about optimized futures?

It is plausible to assume that technology will keep on advancing along various dimensions until it hits fundamental physical limits. We may refer to futures that involve such maxed-out technological development as “optimized futures”.

My aim in this post is to explore what we might be able to infer about optimized futures. Most of all, my aim is to advance this as an important question that is worth exploring further.

Contents

Optimized futures: End-state technologies in key domains

The defining feature of optimized futures is that they entail end-state technologies that cannot be further improved in various key domains. Some examples of these domains include computing power, data storage, speed of travel, maneuverability, materials technology, precision manufacturing, and so on.

Of course, there may be significant tradeoffs between optimization across these respective domains. Likewise, there could be forms of “ultimate optimization” that are only feasible at an impractical cost — say, at extreme energy levels. Yet these complications are not crucial in this context. What I mean by “optimized futures” are futures that involve practically optimal technologies within key domains (such as those listed above).

Why optimized futures are plausible

There are both theoretical and empirical reasons to think that optimized futures are plausible (by which I here mean that they are at least somewhat probable — perhaps more than 10 percent likely).

Theoretically, if the future contains advanced goal-driven agents, we should generally expect those agents to want to achieve their goals in the most efficient ways possible. This in turn predicts continual progress toward ever more efficient technologies, at least as long as such progress is cost-effective.

Empirically, we have an extensive record of goal-oriented agents trying to improve their technology so as to better achieve their aims. Humanity has gone from having virtually no technology to creating a modern society surrounded by advanced technologies of various kinds. And even in our modern age of advanced technology, we still observe persistent incentives and trends toward further improvements in many domains of technology — toward better computers, robots, energy technology, and so on.

It is worth noting that the technological progress we have observed throughout human history has generally not been the product of some overarching collective plan that was deliberately aimed at technological progress. Instead, technological progress has in some sense been more robust than that, since even in the absence of any overarching plan, progress has happened as the result of ordinary demands and desires — for faster computers, faster and safer transportation, cheaper energy, etc.

This robustness is a further reason to think that optimized futures are plausible: even without any overarching plan aimed toward such a future, and even without any individual human necessarily wanting continued technological development leading to an optimized future, we might still be pulled in that direction all the same. And, of course, this point about plausibility applies to more than just humans: it applies to any set of agents who will be — or have been — structuring themselves in a sufficiently similar way so as to allow their everyday demands to push them toward continued technological development.

An objection against the plausibility of optimized futures is that there might be a lot of hidden potential for progress far beyond what our current understanding of physics seems to allow. However, such hidden potential would presumably be discovered eventually, and it seems probable that such hidden potential would likewise be exhausted at some point, even if it may happen later and at more extreme limits than we currently envision. That is, the broad claim that there will ultimately be some fundamental limits to technological development is not predicated on the more narrow claim that our current understanding of those limits is necessarily correct; the broader claim is robust to quite substantial extensions of currently envisioned limits. Indeed, the claim that there will be no fundamental limits to future technological development overall seems a stronger and less empirically grounded claim than does the claim that there will be such limits (cf. Lloyd, 2000; Krauss & Starkman, 2004).

Why optimized futures are worth exploring

The plausibility of optimized futures is one reason to explore them further, and arguably a sufficient reason in itself. Another reason is the scope of such futures: the futures that contain the largest numbers of sentient beings will most likely be optimized futures, suggesting that we have good reason to pay disproportionate attention to such futures, beyond what their degree of plausibility might suggest.

Optimized futures are also worth exploring given that they seem to be a likely point of convergence for many different kinds of technological civilizations. For example, an optimized future seems a plausible outcome of both human-controlled and AI-controlled Earth-originating civilizations, and it likewise seems a plausible outcome of advanced alien civilizations. Thus, a better understanding of optimized futures can potentially apply robustly to many different kinds of future scenarios.

An additional reason it is worth exploring optimized futures is that they overall seem quite neglected, especially given how plausible and consequential such futures appear to be. While some efforts have been made to clarify the physical limits of technology (see e.g. Sandberg, 1999; Lloyd, 2000; Krauss & Starkman, 2004), almost no work has been done on the likely trajectories and motives of civilizations with optimized technology, at least to my knowledge.

Lastly, the assumption of optimized technology is a rather strong constraint that might enable us to say quite a lot about futures that conform to that assumption, suggesting that this could be a fruitful perspective to adopt in our attempts to think about and predict the future.

What can we say about optimized futures?

The question of what we can say about optimized futures is a big one that deserves elaborate analysis. In this section, I will merely raise some preliminary points and speculative reflections.

Humanity may be close to (at least some) end-state technologies

One point that is worth highlighting is that a continuation of current rates of progress seems to imply that humanity could develop end-state technologies in information processing power within a few hundred years, perhaps 250 years at most (if current growth rates persist and assuming that our current understanding of the relevant physics is largely correct).

So at least in this important respect, and under the assumption of continued steady growth, humanity is surprisingly close to reaching an optimized future (cf. Lloyd, 2000).

Optimized civilizations may be highly interested in near-optimized civilizations

Such potential closeness to an optimized future could have significant implications in various ways. For example, if, hypothetically, there exists an older civilization that has already reached a state of optimized technology, any younger civilization that begins to approach optimized technologies within the same cosmic region would likely be of great interest to that older civilization.

One reason it might be of interest is that the optimized technologies of the younger civilization could potentially become competitive with the optimized technologies of the older civilization, and hence the older civilization may see a looming threat in the younger civilization’s advance toward such technologies. After all, since optimized technologies would represent a kind of upper bound of technological development, it is plausible that different instances of such technologies could be competitive with each other regardless of their origins.

Another reason the younger civilization might be of interest is that its trajectory could provide valuable information regarding the likely trajectories and goals of distant optimized civilizations that the older civilization may encounter in the future. (More on this point here.)

Taken together, these considerations suggest that if a given civilization is approaching optimized technology, and if there is an older civilization with optimized technology in its vicinity, this older civilization should take an increasing interest in this younger civilization so as to learn about it before the older civilization might have to permanently halt the development of the younger one.

Strong technological convergence across civilizations?

Another implication of optimized futures is that the technology of advanced civilizations across the universe might be remarkably convergent. Indeed, there are already many examples of convergent evolution in biology on Earth (e.g. eyes and large brains evolving several times independently). Likewise, many cases of convergence are found in cultural evolution in both early history (e.g. the independent emergence of farming, cities, and writing across the globe) as well as in recent history (e.g. independent discoveries in science and mathematics).

Yet the degree of convergence could well be even more pronounced in the case of the end-state technologies of advanced civilizations. After all, this is a case where highly advanced agents are bumping up against the same fundamental constraints, and the optimal engineering solutions in the face of these constraints will likely converge toward the same relatively narrow space of optimal designs — or at least toward the same narrow frontier of optimal designs given potential tradeoffs between different abilities.

In other words, the technologies of advanced civilizations might be far more similar and more firmly dictated by fundamental physical limits than we intuitively expect, especially given that we in our current world are used to seeing continually changing and improving technologies.

If technology stabilizes at an optimum, what might change?

The plausible convergence and stabilization of technological hardware also raises the interesting question of what, if anything, might change and vary in optimized futures.

This question can be understood in at least two distinct ways: what might change or vary across different optimized civilizations, and what might change over time within such civilizations? And note that prevalent change of the one kind need not imply prevalent change of the other kind. For example, it is conceivable that there might be great variation across civilizations, yet virtually no change in goals and values over time within civilizations (cf. “lock-in scenarios”).

Conversely, it is conceivable that goals and values change greatly over time within all optimized civilizations, yet such change could in principle still be convergent across civilizations, such that optimized civilizations tend to undergo roughly the same pattern of changes over time (though such convergence admittedly seems unlikely conditional on there being great changes over time in all optimized civilizations).

If we assume that technological hardware becomes roughly fixed, what might still change and vary — both over time and across different civilizations — includes the following (I am not claiming that this is an exhaustive list):

Space expansion: Civilizations might expand into space so as to acquire more resources; and civilizations may differ greatly in terms of how much space they manage to acquire.
More or different information: Knowledge may improve or differ over time and space; even if fundamental physics gets solved fairly quickly, there could still be knowledge to gain about, for example, how other civilizations tend to develop.
- There would presumably also be optimization for information that is useful and actionable. After all, even a technologically optimized probe would still have limited memory, and hence there would be a need to fill this memory with the most relevant information given its tasks and storage capacity.
Different algorithms: The way in which information is structured, distributed, and processed might evolve and vary over time and across civilizations (though it is also conceivable that algorithms will ultimately converge toward a relatively narrow space of optima).
Different goals and values: As mentioned above, goals and values might change and vary, such as due to internal or external competition, or (perhaps less likely) through processes of reflection.

In other words, even if everyone has — or is — practically the same “iPhone End-State”, what is running on these iPhone End-States, and how many of them there are, may still vary greatly, both across civilizations and over time. And these distinct dimensions of variation could well become the main focus of optimized civilizations, plausibly becoming the main dimensions on which civilizations seek to develop and compete.

Note also that there may be conflicts between improvements along these respective dimensions. For example, perhaps the most aggressive forms of space expansion could undermine the goal of gaining useful information about how other civilizations tend to develop, and hence advanced civilizations might avoid or delay aggressive expansion if the information in question would be sufficiently valuable (cf. the “info gain motive”). Or perhaps aggressive expansion would pose serious risks at the level of a civilization’s internal coordination and control, thereby risking a drift in goals and values.

In general, it seems worth trying to understand what might be the most coveted resources and the most prioritized domains of development for civilizations with optimized technology.

Information that says something about other optimized civilizations as an extremely coveted resource?

As hinted above, one of the key objectives of a civilization with optimized technology might be to learn, directly or indirectly, about other civilizations that it could encounter in the future. After all, if a civilization manages to both gain control of optimized technology and avoid destructive internal conflicts, the greatest threat to its apex status over time will likely be other civilizations with optimized technology. More generally, the main determinant of an optimized civilization’s success in achieving its goals — whether it can maintain an unrivaled apex status or not — could well be its ability to predict and interact gainfully with other optimized civilizations.

Thus, the most precious resource for any civilization with optimized technology might be information that can prepare this civilization for better exchanges with other optimized agents, whether those exchanges end up being cooperative, competitive, or outright aggressive. In particular, since the technology of optimized civilizations is likely to be highly convergent, the most interesting features to understand about other civilizations might be what kinds of institutions, values, decision procedures, and so on they end up adopting — the kinds of features that seem more contingent.

But again, I should stress that I mention these possibilities as speculative conjectures that seem worth exploring, not as confident predictions.

Practical implications?

In this section, I will briefly speculate on the implications of the prospect of optimized futures. Specifically, what might this prospect imply in terms of how we can best influence the future?

Prioritizing values and institutions rather than pushing for technological progress?

One implication is that there may be limited long-term payoffs in pushing for better technology per se, and that it might make more sense to prioritize the improvement of other factors, such as values and institutions. That is, if the future is in any case likely to be headed toward some technological optimum, and if the values and institutions (etc.) that will run this optimal technology are more contingent and “up for grabs”, then it arguably makes sense to prioritize those more contingent aspects.

To be clear, this is not to say that values and institutions will not also be subject to significant optimization pressures that push them in certain directions, but these pressures will plausibly still be weaker by comparison. After all, a wide range of values will imply a convergent incentive to create optimized technology, yet optimized technology seems compatible with a wide range of values and institutions. And it is not clear that there is a similarly strong pull toward some “optimized” set of values or institutions given optimized technology.

This perspective is arguably also supported by recent history. For example, we have seen technology improve greatly, with computing power heading in a clear upward direction over the past decades. Yet if we look at our values and institutions, it is much less clear whether they have moved in any particular direction over time, let alone an upward direction. Our values and institutions seem to have faced much less of a directional pressure compared to our technology.

More research

Perhaps one of the best things we can do to make better decisions with respect to optimized futures is to do research on such futures. The following are some broad questions that might be worth exploring:

What are the likely features and trajectories of optimized futures?
- Are optimized futures likely to involve conflicts between different optimized civilizations?
- Other things being equal, is a smaller or a larger number of optimized civilizations generally better for reducing risks of large-scale conflicts?
- More broadly, is a smaller or larger number of optimized civilizations better for reducing future suffering?
What might the likely features and trajectories of optimized futures imply in terms of how we can best influence the future?
Are there some values or cooperation mechanisms that would be particularly beneficial to instill in optimized technology?
- If so, what might they be, and how can we best work to ensure their (eventual) implementation?

Conclusion

The future might in some ways be more predictable than we imagine. I am not claiming to have drawn any clear or significant conclusions about how optimized futures are likely to unfold; I have mostly aired various conjectures. But I do think the question is valuable, and that it may provide a helpful lens for exploring how we can best impact the future.

Acknowledgments

Thanks to Tobias Baumann for helpful comments.

November 18, 2023

Reasons to doubt that suffering is ontologically prevalent

It is sometimes claimed that we cannot know whether suffering is ontologically prevalent — for example, we cannot rule out that suffering might exist in microorganisms such as bacteria, or even in the simplest physical processes. Relatedly, it has been argued that we cannot trust common-sense views and intuitions regarding the physical basis of suffering.

I agree with the spirit of these arguments, in that I think it is true that we cannot definitively rule out that suffering might exist in bacteria or fundamental physics, and I agree that we have good reasons to doubt common-sense intuitions about the nature of suffering. Nevertheless, I think discussions of expansive views of the ontological prevalence of suffering often present a somewhat unbalanced and, in my view, overly agnostic view of the physical basis of suffering. (By “expansive views”, I do not refer to views that hold that, say, insects are sentient, but rather views that hold that suffering exists in considerably simpler systems, such as in bacteria or fundamental physics.)

While we cannot definitively rule out that suffering might be ontologically prevalent, I do think that we have strong reasons to doubt it, as well as to doubt the practical importance of this possibility. My goal in this post is to present some of these reasons.

Contents

Counterexamples: People who do not experience pain or suffering

One argument against the notion that suffering is ontologically prevalent is that we seem to have counterexamples in people who do not experience pain or suffering. For example, various genetic conditions seemingly lead to a complete absence of pain and/or suffering. This, I submit, has significant implications for our views of the ontological prevalence (or non-prevalence) of suffering.

After all, the brains of these individuals include countless subatomic particles, basic biological processes, diverse instances of information processing, and so on, suggesting that none of these are in themselves sufficient to generate pain or suffering.

One might object that the brains of such people could be experiencing suffering — perhaps even intense suffering — that these people are just not able to consciously access. Yet even if we were to grant this claim, it does not change the basic argument that generic processes at the level of subatomic particles, basic biology, etc. do not seem sufficient to create suffering. For the processes that these people do consciously access presumably still entail at least some (indeed probably countless) subatomic particles, basic biological processes, electrochemical signals, different types of biological cells, diverse instances of information processing, and so on. This gives us reason to doubt all views that see suffering as an inherent or generic feature of processes at any of these (quite many) respective levels.

Of course, this argument is not limited to people who are congenitally unable to experience suffering; it applies to anyone who is just momentarily free from noticeable — let alone significant — pain or suffering. Any experiential moment that is free from significant suffering is meaningful evidence against highly expansive views of the ontological prevalence of significant suffering.

Our emerging understanding of pain and suffering

Another argument against expansive views of the prevalence of suffering is that our modern understanding of the biology of suffering gives us reason to doubt such views. That is, we have gained an increasingly refined understanding of the evolutionary, genetic, and neurobiological bases of pain and suffering, and the picture that emerges is that suffering is a complex phenomenon associated with specific genes and neural structures (as exemplified by the above-mentioned genetic conditions that knock out pain and/or suffering).

To be sure, the fact that suffering is associated with specific genes and neural structures in animals does not imply that suffering cannot be created in other ways in other systems. It does, however, suggest that suffering is unlikely to be found in simple systems that do not have remote analogues of these specific structures (since we otherwise should expect suffering to be associated with a much wider range structures and processes, not such an intricate and narrowly delineated set).

By analogy, consider the experience of wanting to go to a Taylor Swift concert so as to share the event with your Instagram followers. Do we have reason to believe that fundamental particles such as electrons, or microorganisms such as bacteria, might have such experiences? To go a step further, do we have reason to be agnostic as to whether electrons or bacteria might have such experiences?

These questions may seem too silly to merit contemplation. After all, we know that having a conscious desire to go to a concert for the purpose of online sharing requires rather advanced cognitive abilities that, at least in our case, are associated with extremely complex structures in the brain — not to mention that it requires an understanding of a larger cultural context that is far removed from the everyday concerns of electrons and bacteria. But the question is why we would see the case of suffering as being so different.

Of course, one might object that this is a bad analogy, since the experience described above is far more narrowly specified than is suffering as a general class of experience. I would agree that the experience described above is far more specific and unusual, but I still think the basic point of the analogy holds, in that our understanding is that suffering likewise rests on rather complex and specific structures (when it occurs in animal brains) — we might just not intuitively appreciate how complex and distinctive these structures are in the case of suffering, as opposed to in the Swift experience.

It seems inconsistent to allow ourselves to apply our deeper understanding of the Swift experience to strongly downgrade our credence in electron- or bacteria-level Swift experiences, while not allowing our deeper understanding of pain and suffering to strongly downgrade our credence in electron- or bacteria-level pain and suffering, even if the latter downgrade should be comparatively weaker (given the lower level of specificity of this broader class of experiences).

Practical relevance

It is worth stressing that, in the context of our priorities, the question is not whether we can rule out suffering in simple systems like electrons or bacteria. Rather, the question is whether the all-things-considered probability and weight of such hypothetical suffering is sufficiently large for it to merit any meaningful priority relative to other forms of suffering.

For example, one may hold a lexical view according to which no amount of putative “micro-discomfort” that we might ascribe to electrons or bacteria can ever be collectively worse than a single instance of extreme suffering. Likewise, even if one does not hold a strictly lexical view in theory, one might still hold that the probability of suffering in simple systems is so low that, relative to the expected prevalence of other kinds of suffering, it is so strongly dominated so as to merit practically no priority by comparison (cf. “Lexical priority to extreme suffering — in practice”).

After all, the risk of suffering in simple systems would not only have to be held up against the suffering of all currently existing animals on Earth, but also against the risk of worst-case outcomes that involve astronomical numbers of overtly tormented beings. In this broader perspective, it seems reasonable to believe that the risk of suffering in simple systems is massively dwarfed by the risk of such astronomical worst-case outcomes, partly because the latter risk seems considerably less speculative, and because it seems far more likely to involve the worst instances of suffering.

Relatedly, just as we should be open to considering the possibility of suffering in simple systems such as bacteria, it seems that we should also be open to the possibility that spending a lot of time contemplating this issue — and not least trying to raise concern for it — might be an enormous opportunity cost that will overall increase extreme suffering in the future (e.g. because it distracts people from more important issues, or because it pushes people toward dismissing suffering reducers as absurd or crazy).

To be clear, I am not saying that contemplating this issue in fact is such an opportunity cost. My point is simply that it is important not to treat highly speculative possibilities in a manner that is too one-sided, such that we make one speculative possibility disproportionately salient (e.g. there might be a lot of suffering in microorganisms or in fundamental physics), while neglecting to consider other speculative possibilities that may in some sense “balance out” the former (e.g. that prioritizing the risk of suffering in simple systems significantly increases extreme suffering).

In more general terms, it can be misleading to consider Pascallian wagers if we do not also consider their respective “counter-Pascallian” wagers. For example, what if believing in God actually overall increases the probability of you experiencing eternal suffering, such as by marginally increasing the probability that future people will create infinite universes that contain infinitely many versions of you that get tortured for life?

In this way, our view of Pascal’s wager may change drastically when we go beyond its original one-sided framing and consider a broader range of possibilities, and the same applies to Pascallian wagers relating to the purported suffering of simple entities like bacteria or electrons. When we consider a broader range of speculative hypotheses, it is hardly clear whether we should overall give more or less consideration to such simple entities than we currently do, at least when compared to how much consideration and priority we give to other forms of suffering.

November 10, 2023

Does digital or “traditional” sentience dominate in expectation?

My aim in this post is to critique two opposite positions that I think are both mistaken, or which at least tend to be endorsed with too much confidence.

The first position is that the vast majority of future sentient beings will, in expectation, be digital, meaning that they will be “implemented” in digital computers.

The second position is in some sense a rejection of the first one. Based on a skepticism of the possibility of digital sentience, this position holds that future sentience will not be artificial, but instead be “traditionally” biological — that is, most future sentient beings will, in expectation, be biological beings roughly as we know them today.

I think the main problem with this dichotomy of positions is that it leaves out a reasonable third option, which is that most future beings will be artificial but not necessarily digital.

Contents

Reasons to doubt that digital sentience dominates in expectation

One can roughly identify two classes of reasons to doubt that most future sentient beings will be digital.

First, there are object-level arguments against the possibility of digital sentience. For example, based on his physicalist view of consciousness, David Pearce argues that the discrete and disconnected bits of a digital computer cannot, if they remain discrete and disconnected, join together into a unified state of sentience. They can at most, Pearce argues, be “micro-experiential pixels”.

Second, regardless of whether one believes in the possibility of digital sentience, the future dominance of digital sentience can be doubted on the grounds that it is a fairly strong and specific claim. After all, even if digital sentience is perfectly possible, it by no means follows that future sentient beings will necessarily converge toward being digital.

In other words, the digital dominance position makes strong assumptions about the most prevalent forms of sentient computation in the future, and it seems that there is a fairly large space of possibilities that does not imply digital dominance, such as (a future predominance of) non-digital neuron-based computers, non-digital neuron-inspired computers, and various kinds of quantum computers that have yet to be invented.

When one takes these arguments into account, it at least seems quite uncertain whether digital sentience dominates in expectation, even if we grant that artificial sentience does.

Reasons to doubt that “traditional” biological sentience dominates in expectation

A reason to doubt that “traditional” sentience dominates is that, whatever one’s theory of sentience, it seems likely that sentience can be created artificially — i.e. in a way that we would deem artificial. (An example might be further developed and engineered versions of brain organoids.) Specifically, regardless of which physical processes or mechanisms we take to be critical to sentience, those processes or mechanisms can most likely be replicated in other systems than just live biological animals as we know them.

If we combine this premise with an assumption of continued technological evolution (which likely holds true in the future scenarios that contain the largest numbers of sentient beings), it overall seems doubtful that the majority of future beings will, in expectation, be “traditional” biological organisms — especially when we consider the prospect of large futures that involve space colonization.

More broadly, we have reason to doubt the “traditional” biological dominance position for the same reason that we have reason to doubt the digital dominance position, namely that the position entails a rather strong and specific claim along the lines that: “this particular class of sentient being is most numerous in expectation”. And, as in the case of digital dominance, it seems that there are many plausible ways in which this could turn out to be wrong, such as due to neuron-inspired or other yet-to-be-invented artificial systems that could become both sentient and prevalent.

Why does this matter?

Whether artificial sentience dominates in expectation plausibly matters for our priorities (though it is unclear how much exactly, since some of our most robust strategies for reducing suffering are probably worth pursuing in roughly the same form regardless). Yet those who take artificial sentience seriously might adopt suboptimal priorities and communication strategies if they primarily focus on digital sentience in particular.

At the level of priorities, they might restrict their focus to an overly narrow set of potentially sentient systems, and perhaps neglect the great majority of future suffering as a result. At the level of communication, they might needlessly hamper their efforts to raise concern for artificial sentience by mostly framing the issue in terms of digital sentience. This framing might lead people who are skeptical of digital sentience to mistakenly dismiss the broader issue of artificial sentience.

Similar points apply to those who believe that “traditional” biological sentience dominates in expectation: they, too, might restrict their focus to an overly narrow set of systems, and thereby neglect to consider a wide range of scenarios that may intuitively seem like science fiction, yet which nevertheless deserve serious consideration on reflection (e.g. scenarios that involve a large-scale spread of suffering due to space colonization).

In summary, there are reasons to doubt both the digital dominance position and the “traditional” biological dominance position. Moreover, it seems that there is something to be gained by not using the narrow term “digital sentience” to refer to the broader category of “artificial sentience”, and by being clear about just how much broader this latter category is.

November 8, 2023

What does a future dominated by AI imply?

Among altruists working to reduce risks of bad outcomes due to AI, I sometimes get the impression that there is a rather quick step from the premise “the future will be dominated by AI” to a practical position that roughly holds that “technical AI safety research aimed at reducing risks associated with fast takeoff scenarios is the best way to prevent bad AI outcomes”.

I am not saying that this is the most common view among those who work to prevent bad outcomes due to AI. Nor am I saying that the practical position outlined above is necessarily an unreasonable one. But I think I have seen (something like) this sentiment assumed often enough for it to be worthy of a critique. My aim in this post is to argue that there are many other practical positions that one could reasonably adopt based on that same starting premise.

Contents

“A future dominated by AI” can mean many things

“AI” can mean many things

It is worth noting that the premise that “the future will be dominated by AI” covers a wide range of scenarios. After all, it covers scenarios in which advanced machine learning software is in power; scenarios in which brain emulations are in power; as well as scenarios in which humans stay in power while gradually updating their brains with gene technologies, brain implants, nanobots, etc., such that their intelligence would eventually be considered (mostly) artificial intelligence by our standards. And there are surely more categories of AI than just the three broad ones outlined above.

“Dominated by” can mean many things

The words “in power” and “dominated by” can likewise mean many different things. For example, they could mean anything from “mostly in power” and “mostly dominated by” to “absolutely in power” and “absolutely dominated by”. And these respective terms cover a surprisingly wide spectrum.

After all, a government in a democratic society could reasonably be claimed to be “mostly in power” in that society, and a future AI system that is given similar levels of power could likewise be said to be “mostly in power” in the society it governs. By contrast, even the government of North Korea falls considerably short of being “absolutely in power” on a strong definition of that term, which hints at the wide spectrum of meanings covered by the general term “in power”.

Note that the contrast above actually hints at two distinct (though related) dimensions on which different meanings of “in power” can vary. One has to do with the level of power — i.e. whether one has more or less of it — while the other has to do with how the power is exercised, e.g. whether it is democratic or totalitarian in nature.

Thus, “a future society with AI in power” could mean a future in which AI possesses most of the power in a democratically elected government, or it could mean a future in which AI possesses total power with no bounds except the limits of physics.

Combinations of many things

Lastly, we can make a combinatorial extension of the points made above. That is, we should be aware that “a future dominated by AI” could — and is perhaps likely to — combine different kinds of AI. For instance, one could imagine futures that contain significant numbers of AIs from each of the three broad categories of AI mentioned above.

Additionally, these AIs could exercise power in distinct ways and in varying degrees across different parts of the world. For example, some parts of the world might make decisions in ways that resemble modern democratic processes, with power distributed among many actors, while other parts of the world might make decisions in ways that resemble autocratic decision procedures.

Such a diversity of power structures and decision procedures may be especially likely in scenarios that involve large-scale space expansion, since different parts of the world would then eventually be causally disconnected, and since a larger volume of AI systems presumably renders greater variation more likely in general.

These points hint at the truly vast space of possible futures covered by a term such as “a future dominated by AI”.

Future AI dominance does not imply fast AI development

Another conceptual point is that “a future dominated by AI” does not imply that technological or social progress toward such a future will happen soon or that it will occur suddenly. Furthermore, I think one could reasonably argue that such an imminent or sudden change is quite unlikely (though it obviously becomes more likely the broader our conception of “a future dominated by AI” is).

An elaborate justification for my low credence in such sudden change is beyond the scope of this post, though I can at least note that part of the reason for my skepticism is that I think trends and projections in both computer hardware and economic growth speak against such rapid future change. (For more reasons to be skeptical, see Reflections on Intelligence and “A Contra AI FOOM Reading List”.)

A future dominated by AI could emerge through a very gradual process that occurs over many decades or even hundreds of years (conditional on it ever happening). And AI scenarios involving such gradual development could well be both highly likely and highly consequential.

An objection against focusing on such slow-growth scenarios might be that scenarios involving rapid change have higher stakes, and hence they are more worth prioritizing. But it is not clear to me why this should be the case. As I have noted elsewhere, a so-called value lock-in could also happen in a slow-growth scenario, and the probability of success — and of avoiding accidental harm — may well be higher in slow-growth scenarios (cf. “Which World Gets Saved”).

The upshot could thus be the very opposite, namely that it is ultimately more promising to focus on scenarios with relatively steady growth in AI capabilities and power. (I am not claiming that this focus is in fact more promising; my point is simply that it is not obvious and that there are good reasons to question a strong focus on fast-growth scenarios.)

Fast AI development does not imply concentrated AI development

Likewise, even if we grant that the pace of AI development will increase rapidly, it does not follow that this growth will be concentrated in a single (or a few) AI system(s), as opposed to being widely distributed, akin to an entire economy of machines that grow fast together. This issue of centralized versus distributed growth was in fact the main point of contention in the Hanson-Yudkowsky FOOM debate; and I agree with Hanson that distributed growth is considerably more likely.

Similar to the argument outlined in the previous section, one could argue that there is a wager to focus on scenarios that entail highly concentrated growth over those that involve highly distributed growth, even if the latter may be more likely. Perhaps the main argument in favor of this view is that it seems that our impact can be much greater if we manage to influence a single system that will eventually gain power compared to if our influence is dispersed across countless systems.

Yet I think there are good reasons to doubt that argument. One reason is that the strategy of influencing such a single AI system may require us to identify that system in advance, which might be a difficult bet that we could easily get wrong. In other words, our expected influence may be greatly reduced by the risk that we are wrong about which systems are most likely to gain power. Moreover, there might be similar and ultimately more promising levers for “concentrated influence” in scenarios that involve more distributed growth and power. Such levers may include formal institutions and societal values, both of which could exert a significant influence on the decisions of a large number of agents simultaneously — by affecting the norms, laws, and social equilibria under which they interact.

“A future dominated by AI” does not mean that either “technical AI safety” or “AI governance” is most promising

Another impression I have is that we sometimes tacitly assume that work on “avoiding bad AI outcomes” will fall either in the categories of “technical AI safety” or “AI governance”, or at least that it will mostly fall within these categories. But I do not think that this is the case, partly for the reasons alluded to above.

In particular, it seems to me that we sometimes assume that the aim of influencing “AI outcomes” is necessarily best pursued in ways that pertain quite directly to AI today. Yet why should we assume this to be the case? After all, it seems that there are many plausible alternatives.

For example, one could think that it is generally better to pursue broad investments so as to build flexible resources that make us better able to tackle these problems down the line — e.g. investments toward general movement building and toward increasing the amount of money that we will be able to spend later, when we might be better informed and have better opportunities to pursue direct work.

A complementary option is to focus on the broader contextual factors hinted at in the previous section. That is, rather than focusing primarily on the design of the AI systems themselves, or on the laws that directly govern their development, one may focus on influencing the wider context in which they will be developed and deployed — e.g. general values, institutions, diplomatic relations, collective knowledge and wisdom, etc. After all, the broader context in which AI systems will be developed and put into action could well prove critical to the outcomes that future AI systems will eventually create.

Note that I am by no means saying that work on technical AI safety or AI governance is not worth pursuing. My point is merely that these other strategies focused on building flexible resources and influencing broader contextual factors should not be overlooked as ways to influence “a future dominated by AI”. Indeed, I believe that these strategies are among the most promising ways in which we can have a beneficial such influence at this point.

Concluding clarification

On a final note, I should clarify that the main conceptual points I have been trying to make in this post likely do not contradict the explicitly endorsed views of anyone who works to reduce risks from AI. The objects of my concern are more (what I perceive to be) certain implicit models and commonly employed terminologies that I worry may distort how we think and talk about these issues.

Specifically, it seems to me that there might be a sort of collective availability heuristic at work, through which we continually boost the salience of a particular AI narrative — or a certain class of AI scenarios — along with a certain terminology that has come to be associated with that narrative (e.g. ‘AI takeoff’, ‘transformative AI’, etc). Yet if we change our assumptions a bit, or replace the most salient narrative with another plausible one, we might find that this terminology does not necessarily make a lot of sense anymore. We might find that our typical ways of thinking about AI outcomes may be resting on a lot of implicit assumptions that are more questionable and more narrow than we tend to realize.

September 6, 2022

Research vs. non-research work to improve the world: In defense of more research and reflection

When trying to improve the world, we can either pursue direct interventions, such as directly helping beings in need and doing activism on their behalf, or we can pursue research on how we can best improve the world, as well as on what improving the world even means in the first place.

Of course, the distinction between direct work and research is not a sharp one. We can, after all, learn a lot about the “how” question by pursuing direct interventions, testing out what works and what does not. Conversely, research publications can effectively function as activism, and may thereby help bring about certain outcomes quite directly, even when such publications do not deliberately try to do either.

But despite these complications, we can still meaningfully distinguish more or less research-oriented efforts to improve the world. My aim here is to defend more research-oriented efforts, and to highlight certain factors that may lead us to underinvest in research and reflection. (Note that I here use the term “research” to cover more than just original research, as it also covers efforts to learn about existing research.)

Contents

Some examples

Perhaps the best way to give a sense of what I am talking about is by providing a few examples.

I. Cause Prioritization

Say our aim is to reduce suffering. Which concrete aims should we then pursue? Maybe our first inclination is to work to reduce human poverty. But when confronted with the horrors of factory farming, and the much larger number of non-human animals compared to humans, we may conclude that factory farming seems the more pressing issue. However, having turned our gaze to non-human animals, we may soon realize that the scale of factory farming is small compared to the scale of wild-animal suffering, which might in turn be small compared to the potentially astronomical scale of future moral catastrophes.

With so many possible causes one could pursue, it is likely suboptimal to settle on the first one that comes to mind, or to settle on any one of them without having made a significant effort considering where one can make the greatest difference.

II. Effective Interventions

Next, say we have settled on a specific cause, such as ending factory farming. Given this aim, there is a vast range of direct interventions one could pursue, including various forms of activism, lobbying to influence legislation, or working to develop novel foods that can outcompete animal products. Yet it is likely suboptimal to pursue any of these particular interventions without first trying to figure out which of them have the best expected impact. After all, different interventions may differ greatly in terms of their cost-effectiveness, which suggests that it is reasonable to make significant investments into figuring out which interventions are best, rather than to rush into action mode (although the drive to do the latter is understandable and intuitive, given the urgency of the problem).

III. Core Values

Most fundamentally, there is the question of what matters and what is most worth prioritizing at the level of core values. Our values ultimately determine our priorities, which renders clarification of our values a uniquely important and foundational step in any systematic endeavor to improve the world.

For example, is our aim to maximize a net sum of “happiness minus suffering”, or is our aim chiefly to minimize extreme suffering? While there is significant common ground between these respective aims, there are also significant divergences between them, which can matter greatly for our priorities. The first view implies that it would be a net benefit to create a future that contains vast amounts of extreme suffering as long as that future contains a lot of happiness, while the other view would recommend the path of least extreme suffering.

In the absence of serious reflection on our values, there is a high risk that our efforts to improve the world will not only be suboptimal, but even positively harmful relative to the aims that we would endorse most strongly upon reflection. Yet efforts to clarify values are nonetheless extremely neglected — and often completely absent — in endeavors to improve the world.

The steelman case for “doing”

Before making a case for a greater focus on research, it is worth outlining some of the strongest reasons in favor of direct action (e.g. directly helping other beings and doing activism on their behalf).

We can learn a lot by acting

The pursuit of direct interventions is a great way to learn important lessons that may be difficult to learn by doing pure research or reflection.
In particular, direct action may give us practical insights that are often more in touch with reality than are the purely theoretical notions that we might come up with in intellectual isolation. And practical insights and skills often cannot be compensated for by purely intellectual insights.
Direct action often has clearer feedback loops, and may therefore provide a good opportunity to both develop and display useful skills.

Direct action can motivate people to keep working to improve the world

Research and reflection can be difficult, and it is often hard to tell whether one has made significant progress. In contrast, direct action may offer a clearer indication that one is really doing something to improve the world, and it can be easier to see when one is making progress (e.g. whether people altered their behavior in response to a given intervention, or whether a certain piece of legislation changed or not).

There are obvious problems in the world that are clearly worth addressing

For example, we do not need to do more research to know that factory farming is bad, and it seems reasonable to think that evidence-based interventions that significantly reduce the number of beings who suffer on factory farms will be net beneficial.
Likewise, it is probably beneficial to build a healthy movement of people who aim to help others in effective ways, and who reflect on and discuss what “helping others” ideally entails.

Certain biases plausibly prevent us from pursuing direct action

It seems likely that we have a passivity bias of sorts. After all, it is often convenient to stay in one’s intellectual armchair rather than to get one’s hands dirty with direct work that may fall outside of one’s comfort zone, such as doing street advocacy or running a political campaign.
There might also be an omission bias at work, whereby we judge an omission to do direct work that prevents harm less harshly than an equivalent commission of harm.

The case for (more) research

I endorse all the arguments outlined above in favor of “doing”. In particular, I think they are good arguments in favor of maintaining a strong element of direct action in our efforts to improve the world. Yet they are less compelling when it comes to establishing the stronger claim that we should focus more on direct action (on the current margin), or that direct action should represent the majority of our altruistic efforts at this point in time. I do not think any of those claims follow from the arguments above.

In general, it seems to me that altruistic endeavors tend to focus far too strongly on direct action while focusing far too little on research. This is hardly a controversial claim, at least not among aspiring effective altruists, who often point out that research on cause prioritization and on the cost-effectiveness of different interventions is important and neglected. Yet it seems to me that even effective altruists tend to underinvest in research, and to jump the gun when it comes to cause selection and direct action, and especially when it comes to the values that they choose to steer by.

A helpful starting point might be to sketch out some responses to the arguments outlined in the previous section, to note why those arguments need not undermine a case for more research.

We can learn a lot by acting — but we are arguably most limited by research insights

The fact that we can learn a lot by acting, and that practical insights and skills often cannot be substituted by pure conceptual knowledge, does not rule out that our potential for beneficial impact might generally be most bottlenecked by conceptual insights.

In particular, clarifying our core values and exploring the best causes and interventions arguably represent the most foundational steps in our endeavors to improve the world, suggesting that they should — at least at the earliest stages of our altruistic endeavors — be given primary importance relative to direct action (even as direct action and the development of practical skills also deserve significant priority, perhaps even more than 20 percent of the collective resources we spend at this point in time).

The case for prioritizing direct action would be more compelling if we had a lot of research that delivered clear recommendations for direct action. But I think there is generally a glaring shortage of such research. Moreover, research on cause prioritization often reveals plausible ways in which direct altruistic actions that seem good at first sight may actually be harmful. Such potential downsides of seemingly good actions constitute a strong and neglected reason to prioritize research more — not to get perpetually stuck in research, but to at least map out the main considerations for and against various actions.

To be more specific, it seems to me that the expected value of our actions can change a lot depending on how deep our network of crucial considerations goes, so much so that adding an extra layer of crucial considerations can flip the expected value of our actions. Inconvenient as it may be, this means that our views on what constitutes the best direct actions have a high risk of being unreliable as long as we have not explored crucial considerations in depth. (Such a risk always exists, of course, yet it seems that it can at least be markedly reduced, and that our estimates can become significantly better informed even with relatively modest research efforts.)

At the level of an individual altruist’s career, it seems warranted to spend at least one year reading about and reflecting on fundamental values, one year learning about the most important cause areas, and one year learning about optimal interventions within those cause areas (ideally in that order, although one may fruitfully explore them in parallel to some extent; and such a full year’s worth of full-time exploration could, of course, be conducted over several years). In an altruistic career spanning 40 years, this would still amount to less than ten percent of one’s work time focused on such basic exploration, and less than three percent focused on exploring values in particular.

A similar argument can be made at a collective level: if we are aiming to have a beneficial influence on the long-term future — say, the next million years — it seems warranted to spend at least a few years focused primarily on what a beneficial influence would entail (i.e. clarifying our views on normative ethics), as well as researching how we can best influence the long-term future before we proceed to spend most of our resources on direct action. And it may be even better to try to encourage more people to pursue such research, ideally creating an entire research project in which a large number of people collaborate to address these questions.

Thus, even if it is ideal to mostly focus on direct action over the entire span of humanity’s future, it seems plausible that we should focus most strongly on advancing research at this point, where relatively little research has been done, and where the explore-exploit tradeoff is likely to favor exploration quite strongly.

Objections: What about “long reflection” and the division of labor?

An objection to this line of reasoning is that heavy investment into reflection is premature, and that our main priority at this point should instead be to secure a condition of “long reflection” — a long period of time in which humanity focuses on reflection rather than action.

Yet this argument is problematic for a number of reasons. First, there are reasons to doubt that a condition of long reflection is feasible or even desirable, given that it would seem to require strong limits to voluntary actions that diverge from the ideal of reflection. To think that we can choose to create a condition of long reflection may be an instance of the illusion of control. Human civilization is likely to develop according to its immediate interests, and seems unlikely to ever be steered via a common process of reflection.

Second, even if we were to secure a condition of long reflection, there is no guarantee that humanity would ultimately be able to reach a sufficient level of agreement regarding the right path forward — after all, it is conceivable that a long reflection could go awfully wrong, and that bad values could win out due to poor execution or malevolent agents hijacking the process.

The limited feasibility of a long reflection suggests that there is no substitute for reflecting now. Failing to clarify and act on our values from this point onward carries a serious risk of pursuing a suboptimal path that we may not be able to reverse later. The resources we spend pursuing a long reflection (which seems unlikely to ever occur) are resources not spent on addressing issues that might be more important and more time-sensitive, such as steering away from worst-case outcomes.

Another objection might be that there is a division of labor case favoring that only some people focus on research, while others, perhaps even most, should focus comparatively little on research. Yet while it seems trivially true that some people should focus more on research than others, this is not necessarily much of a reason against devoting more of our collective attention toward research (on the current margin), nor a reason against each altruist making a significant effort to read up on existing research.

After all, even if only a limited number of altruists should focus primarily on research, it still seems necessary that those who aim to put cutting-edge research into practice also spend time reading that research, which requires a considerable time investment. Indeed, even when one chooses to mostly defer to the judgments of other people, one will still need to make an effort to evaluate which people are most worth deferring to on different issues, followed by an effort to adequately understand what those people’s views and findings entail.

This point also applies to research on values in particular. That is, even if one prioritizes direct action over research on fundamental values, it still seems necessary to spend a significant amount of time reading up on other people’s work on fundamental values if one is to make a qualified judgment regarding which values one will attempt to steer by.

The division of altruistic labor is thus consistent with the recommendation that every dedicated altruist should spend at least a full year reading about and reflecting on fundamental values (just as the division of “ordinary” labor is consistent with everyone spending a certain amount of time on basic education). And one can further argue that the division of altruistic labor, and specialized work on fundamental values in particular, is only fully utilized if most people spend a decent amount of time reading up on and making use of the insights provided by others.

Direct action can motivate people — but so can (the importance of) research

While research work is often challenging and difficult to be motivated to pursue, it is probably a mistake to view our motivation to do research as something that is fixed. There are likely many ways to increase our motivation to pursue research, not least by strongly internalizing the (highly counterintuitive) importance of research.

Moreover, the motivating force provided by direct action might be largely maintained as long as one includes a strong component of direct action in one’s altruistic work (by devoting, say, 25 percent of one’s resources toward direct action).

In any case, reduced individual motivation to pursue research seems unlikely to be a strong reason against devoting a greater priority to research at the level of collective resources and priorities (even if it might play a significant role in many individual cases). This is partly because the average motivation to pursue these respective endeavors seems unlikely to differ greatly — after all, many people will be more motivated to pursue research over direct action — and partly because urgent necessities are worth prioritizing and paying for even if they happen to be less than highly motivating.

By analogy, the cleaning of public toilets is also worth prioritizing and paying for, even if it may not be the most motivating pursuit for those who do it, and the same point arguably applies even more strongly in the case of the most important tasks necessary for achieving altruistic aims such as reducing extreme suffering. Moreover, the fact that altruistic research may be unusually taxing on our motivation (e.g. due to a feeling of “analysis paralysis”) is a reason to think that such taxing research is generally neglected and hence worth pursuing on the margin.

Finally, to the extent one finds direct action more motivating than research, this might constitute a bias in one’s prioritization efforts, even if it represents a relevant data point about one’s personal fit and comparative advantage. And the same point applies in the opposite direction: to the extent that one finds research more motivating, this might make one more biased against the importance of direct action. While personal motivation is an important factor to consider, it is still worth being mindful of the tendency to overprioritize that which we consider fun and inspiring at the expense of that which is most important in impartial terms.

There are obvious problems in the world that are clearly worth addressing — but research is needed to best prioritize and address them

Knowing that there are serious problems in the world, as well as interventions that reduce those problems, does not in itself inform us about which problems are most pressing or which interventions are most effective at addressing them. Both of these aspects — roughly, cause prioritization and estimating the effectiveness of interventions — seem best advanced by research.

A similar point applies to our core values: we cannot meaningfully pursue cause prioritization and evaluations of interventions without first having a reasonably clear view of what matters, and what would constitute a better or worse world. And clarifying our values is arguably also best done through further research rather than through direct action, even as the latter may be helpful as well.

Certain biases plausibly prevent us from pursuing direct action — but there are also biases pushing us toward too much or premature action

The putative “passivity bias” outlined above has a counterpart in the “action bias”, also known as “bias for action” — a tendency toward action even when action makes no difference or is positively harmful. A potential reason behind the action bias relates to signaling: actively doing something provides a clear signal that we are at least making an effort, and hence that we care (even if the effect might ultimately be harmful). By comparison, doing nothing might be interpreted as a sign that we do not care.

There might also be individual psychological benefits explaining the action bias, such as the satisfaction of feeling that one is “really doing something”, as well as a greater feeling of being in control. In contrast, pursuing research on difficult questions can feel unsatisfying, since progress may be relatively slow, and one may not intuitively feel like one is “really doing something”, even if learning additional research insights is in fact the best thing one can do.

Political philosopher Michael Huemer similarly argues that there is a harmful tendency toward too much action in politics. Since most people are uninformed about politics, Huemer argues that most people ought to be passive in politics, as there is otherwise a high risk that they will make things worse through ignorant choices.

Whatever one thinks of the merits of Huemer’s argument in the political context, I think one should not be too quick to dismiss a similar argument when it comes to improving the long-term future — especially considering that action bias seems to be greater when we face increased uncertainty. At the very least, it seems worth endorsing a modified version of the argument that says that we should not be eager to act before we have considered our options carefully.

Furthermore, the fact that we evolved in a condition that was highly action-oriented rather than reflection-oriented, and in which action generally had far more value for our genetic fitness than did systematic research (indeed, the latter was hardly even possible), likewise suggests that we may be inclined to underemphasize research relative to how important it is for optimal impact from an impartial perspective.

This also seems true when it comes to our altruistic drives and behaviors in particular, where we have strong inclinations toward pursuing publicly visible actions that make us appear good and helpful (Hanson, 2015; Simler & Hanson, 2018, ch. 12). In contrast, we seem to have much less of an inclination toward reflecting on our values. Indeed, it seems plausible that we generally have an inclination against questioning our instinctive aims and drives — including our drive to signal altruistic intentions with highly visible actions — as well as an inclination against questioning the values held by our peers. After all, such questioning would likely have been evolutionarily costly in the past, and may still feel socially costly today.

Moreover, it is very unnatural for us to be as agnostic and open-minded as we should ideally be in the face of the massive uncertainty associated with endeavors that seek to have the best impact for all sentient beings (Vinding, 2020, sec. 9.1-9.2). This suggests that we may tend to be overconfident about — and too quick to conclude — that some particular direct action happens to be the optimal path for helping others.

Lastly, while some kind of omission bias plausibly causes us to discount the value of making an active effort to help others, it is not clear whether this bias counts more strongly against direct action than against research efforts aimed at helping others, since omission bias likely works against both types of action (relative to doing nothing). In fact, the omission bias might count more strongly against research, since a failure to do important research may feel like less of a harmful inaction than does a failure to pursue direct actions, whose connection to addressing urgent needs is usually much clearer.

The Big Neglected Question

There is one question that I consider particularly neglected among aspiring altruists — as though it occupies a uniquely impenetrable blindspot. I am tempted to call it “The Big Neglected Question”.

The question, in short, is whether anything can morally outweigh or compensate for extreme suffering. Our answer to this question has profound implications for our priorities. And yet astonishingly few people seem to seriously ponder it, even among dedicated altruists. In my view, reflecting on this question is among the first, most critical steps in any systematic endeavor to improve the world. (I suspect that a key reason this question tends to be shunned is that it seems too dark, and because people may intuitively feel that it fundamentally questions all positive and meaning-giving aspects of life — although it arguably does not, as even a negative answer to the question above is compatible with personal fulfillment and positive roles and lives.)

More generally, as hinted earlier, it seems to me that reflection on fundamental values is extremely neglected among altruists. Ozzie Gooen argues that many large-scale altruistic projects are pursued without any serious exploration as to whether the projects in question are even a good way to achieve the ultimate (stated) aims of these projects, despite this seeming like a critical first question to ponder.

I would make a similar argument, only one level further down: just as it is worth exploring whether a given project is among the best ways to achieve a given aim before one pursues that project, so it is worth exploring which aims are most worth striving for in the first place. This, it seems to me, is even more neglected than is exploring whether our pet projects represent the best way to achieve our (provisional) aims. There is often a disproportionate amount of focus on impact, and comparatively little focus on what is the most plausible aim of the impact.

Conclusion

In closing, I should again stress that my argument is not that we should only do research and never act — that would clearly be a failure mode, and one that we must also be keen to steer clear of. But my point is that there are good reasons to think that it would be helpful to devote more attention to research in our efforts to improve the world, both on moral and empirical issues — especially at this early point in time.

Acknowledgments

For helpful comments, I thank Teo Ajantaival, Tobias Baumann, and Winston Oswald-Drummond.

May 9, 2022

Why Altruists Should Perhaps Not Prioritize Artificial Intelligence: A Lengthy Critique

The following is a point-by-point critique of Lukas Gloor’s essay Altruists Should Prioritize Artificial Intelligence. My hope is that this critique will serve to make it clear — to Lukas, myself, and others — where and why I disagree with this line of argument, and thereby hopefully also bring some relevant considerations to the table with respect to what we should be working on to best reduce suffering. I should like to note, before I begin, that I have the deepest respect for Lukas, and that I consider his work very important and inspiring.

Below, I quote every paragraph from the body of Lukas’ article, which begins with the following abstract:

The large-scale adoption of today’s cutting-edge AI technologies across different industries would already prove transformative for human society. And AI research rapidly progresses further towards the goal of general intelligence. Once created, we can expect smarter-than-human artificial intelligence (AI) to not only be transformative for the world, but also (plausibly) to be better than humans at self-preservation and goal preservation. This makes it particularly attractive, from the perspective of those who care about improving the quality of the future, to focus on affecting the development goals of such AI systems, as well as to install potential safety precautions against likely failure modes. Some experts emphasize that steering the development of smarter-than-human AI into beneficial directions is important because it could make the difference between human extinction and a utopian future. But because we cannot confidently rule out the possibility that some AI scenarios will go badly and also result in large amounts of suffering, thinking about the impacts of AI is paramount for both suffering-focused altruists as well as those focused on actualizing the upsides of the very best futures.

An abstract of my thoughts on this argument:

My response to this argument is twofold: 1) I do not consider the main argument presented by Lukas, as I understand it, to be plausible, and 2) I think we should think hard about whether we have considered the opportunity cost carefully enough. We should not be particularly confident, I would argue, that any of us have found the best thing to focus on to reduce the most suffering.

I do not think the claim that “altruists can expect to have the largest positive impact by focusing on artificial intelligence” is warranted. In part, my divergence from Lukas rests on empirical disagreements, and in larger part it stems from what may be called “conceptual disagreements” — I think most talk about “superintelligence” is conceptually confused. For example, intelligence as “cognitive abilities” is liberally conflated with intelligence as “the ability to achieve goals in general”, and this confusion does a lot of deceptive work.

I would advocate for more foundational research into the question of what we ought to prioritize. Artificial intelligence undoubtedly poses many serious risks, yet it is important that we maintain a sense of proportion with respect to these risks relative to other serious risks, many of which we have not even contemplated yet.

I will now turn to the full argument presented by Lukas.

I. Introduction and definitions

Terms like “AI” or “intelligence” can have many different (and often vague) meanings. “Intelligence” as used here refers to the ability to achieve goals in a wide range of environments. This definition captures the essence of many common perspectives on intelligence (Legg & Hutter, 2005), and conveys the meaning that is most relevant to us, namely that agents with the highest comparative goal-achieving ability (all things considered) are the most likely to shape the future.

A crucial thing to flag is that “intelligence” here refers to the ability to achieve goals — not to scoring high on an IQ test, or “intelligence” as “advanced cognitive abilities”. And these are not the same, and should not be conflated (indeed, this is one of the central points of my book Reflections on Intelligence, which dispenses with the muddled term “intelligence” at an early point, and instead examines the nature of this better defined “ability to achieve goals” in greater depth).

While it is true that the concept of goal achieving is related to the concept of IQ, the latter is much narrower, as it relates to a specific class of goals. Boosting the IQ of everyone would not immediately boost our ability to achieve goals in every respect — at least not immediately, and not to the same extent across all domains. For even if we all woke up with an IQ of 200 tomorrow, all the external technology with which we run and grow our economy would still be the same. Our cars would drive just as fast, the energy available to us would be the same, and so would the energy efficiency of our machines. And while a higher IQ might now enable us to grow this external technology faster, there are quite restricting limits to how much it can grow. Most of our machines and energy harvesting technology cannot be made many times more efficient, as their efficiency is already a significant fraction — 15 to 40 percent — of the maximum physical limit. In other words, their efficiency cannot be doubled more than a couple of times, if even that.

One could then, of course, build more machines and power plants, yet such an effort would itself be constrained strongly by the state of our external technology, including the energy available to us; not just by the cognitive abilities available. This is one of the reasons I am skeptical of the idea of AI-powered runaway growth. Yes, greater cognitive abilities is a highly significant factor, yet there is just so much more to growing the economy and our ability to achieve a wide range of goals than that, as evidenced by the fact that we have seen a massive increase in computer-powered cognitive abilities — indeed, exponential growth for many decades by many measures — and yet we have continued to see fairly stable, in fact modestly declining, economic growth.

If one considers the concept of “increase in cognitive powers” to be the same as “increase in the ability to achieve goals, period” then this criticism will be missed. “I defined intelligence to be the ability to achieve goals, so when I say intelligence is increased, then all abilities are increased.” One can easily come to entertain a kind of motte and bailey argument in this way, by moving back and forth between this broad notion of intelligence as “the ability to achieve goals” and the more narrow sense of intelligence as “cognitive abilities”. To be sure, a statement like the one above need not be problematic as such, as long as one is clear that this concept of intelligence lies very far from “intelligence as measured by IQ/raw cognitive power”. Such clarity is often absent, however, and thus the statement is quite problematic in practice, with respect to the goals of communicating clearly and not confusing ourselves.

Again, my main point here is that increasing cognitive powers should not be conflated with increasing the ability to achieve goals in general — in every respect. I think much confusion springs from a lack of clarity on this matter.

While everyday use of the term “intelligence” often refers merely to something like “brainpower” or “thinking speed,” our usage also presupposes rationality, or goal-optimization in an agent’s thinking and acting. In this usage, if someone is e.g. displaying overconfidence or confirmation bias, they may not qualify as very intelligent overall, even if they score high on an IQ test. The same applies to someone who lacks willpower or self control.

This is an important step toward highlighting the distinction between “goal achieving ability” and “IQ”, yet it is still quite a small step, as it does not really go much beyond distinguishing “high IQ” from “optimal cognitive abilities for goal achievement”. We are still talking about things going on in a single human head (or computer), while leaving out the all-important aspect that is (external) culture and technology. We are still not talking about the ability to achieve goals in general.

Artificial intelligence refers to machines designed with the ability to pursue tasks or goals. The AI designs currently in use – ranging from trading algorithms in finance, to chess programs, to self-driving cars – are intelligent in a domain-specific sense only. Chess programs beat the best human players in chess, but they would fail terribly at operating a car. Similarly, car-driving software in many contexts already performs better than human drivers, but no amount of learning (at least not with present algorithms) would make [this] software work safely on an airplane.

My only comment here would be that it is not quite clear what counts as artificial intelligence. For example, would a human, edited as well as unedited, count as “a machine designed with the ability to pursue tasks or goals”? And could not all software be considered “designed with the ability to pursue tasks or goals”, and hence all software would be artificial intelligence by this definition? If so, we should then just be clear that this definition is quite broad, including both all humans and all software, and more.

The most ambitious AI researchers are working to build systems that exhibit (artificial) general intelligence (AGI) – the type of intelligence we defined above, which enables the expert pursuit of virtually any task or objective.

This is where the distinction we drew above becomes relevant. While the claim quoted above may be true in one sense, we should be clear that the most ambitious AI researchers are not working to increase “all our abilities”, including our ability to get more energy out of our steam engines and solar panels. Our economy arguably works on that broader endeavor. AI researchers, in contrast, work only on bettering what may be called “artificial cognitive abilities”, which, granted, may in turn help spur growth in many other areas (although the degree to which it would do so is quite unclear, and likely surprisingly limited in the big picture, since “growth may be constrained not by what we are good at but rather by what is essential and yet hard to improve”).

In the past few years, we have witnessed impressive progress in algorithms becoming more and more versatile. Google’s DeepMind team for example built an algorithm that learned to play 2-D Atari games on its own, achieving superhuman skill at several of them (Mnih et al., 2015). DeepMind then developed a program that beat the world champion in the game of Go (Silver et al., 2016), and – tackling more practical real-world applications – managed to cut down data center electricity costs by rearranging the cooling systems.

I think it is important not to overstate recent progress compared to progress in the past. We also saw computers becoming better than humans at many things several decades ago, including many kinds of mathematical calculations (and people also thought that computers would soon beat humans at everything back then). So superhuman skill at many tasks is not what is new and unique about recent progress, but rather that these superhuman skills have been attained via self-training, and, as Lukas notes, that the skills achieved by this training seem of a broader, more general nature than the skills of a single algorithm in the past.

And yet the breadth of these skills should not be overstated either, as the skills cited are all acquired in a rather expensive trial-and-error fashion with readily accessible feedback. This mode of learning surely holds a lot of promise in many areas, yet there are reasons to be skeptical that such learning can bring us significantly closer to achieving all the cognitive and motor abilities humans have (see also David Pearce’s “Humans and Intelligent Machines“; one need not agree with Pearce on everything to agree with some of his reasons to be skeptical).

That DeepMind’s AI technology makes quick progress in many domains, without requiring researchers to build new architecture from scratch each time, indicates that their machine learning algorithms have already reached an impressive level of general applicability. (Edit: I wrote the previous sentence in 2016. In the meantime [January 2018] DeepMind went on to refine its Go-playing AI, culminating in a version called AlphaGo Zero. While the initial version of DeepMind’s Go-playing AI started out with access to a large database of games played by human experts, AlphaGo Zero only learns through self-play. Nevertheless, it managed to become superhuman after a mere 4 days of practice. After 40 days of practice, it was able to beat its already superhuman predecessor 100–0. Moreover, Deepmind then created the version AlphaZero, which is not a “Go-specific” algorithm anymore. Fed with nothing but the rules for either Go, chess, or shogi, it managed to become superhuman at each of these games in less than 24 hours of practice.)

This is no doubt impressive. Yet it is also important not to overstate how much progress that was achieved in 24 hours of practice. This is not, we should be clear, a story about innovation going from zero to superhuman in 24 hours, but rather the story of immense amounts of hardware developed over decades which has then been fed with an algorithm that has also been developed over many years by many people. And then, this highly refined algorithm running on specialized, cutting-edge hardware is unleashed to reach its dormant potential.

And this potential was, it should be noted, not vastly superior to the abilities of previous systems. In chess, for instance, AlphaZero beat the chess program Stockfish (although Stockfish author Tord Romstad notes that it was a version that was a year old and not running on optimal hardware) 25 times as white, 3 as black, and drew the remaining 72 times. Thus, it was significantly better, yet it still did not win in most of the games. Similarly, in Go, AlphaZero won 60 games and lost 40, while in Shogi it won 90 times, lost 8, and drew twice.

Thus, AlphaZero undoubtedly constituted clear progress with respect to these games, yet not an enormous leap that rendered it unbeatable, and certainly not a leap made in a single day.

The road may still be long, but if this trend continues, developments in AI research will eventually lead to superhuman performance across all domains. As there is no reason to assume that humans have attained the maximal degree of intelligence (Section III), AI may soon after reaching our own level of intelligence surpass it.

Again, I would start by noting that human “intelligence” as our “ability to achieve goals” is strongly dependent on the state of our technology and culture at large, not merely our raw cognitive powers. And the claim made above that there is no reason to believe that humans have attained “the maximal degree of intelligence” seems, in this context, to mostly refer to our cognitive abilities rather than our ability to achieve goals in general. For with respect to our ability to achieve goals in general, it is clear that our abilities are not maximal, but indeed continually growing, largely as the result of better software and better machines. Thus, there is not a dichotomous relationship between “human abilities to achieve goals” and “our machines’ abilities to achieve goals”. And given that our ability to achieve goals is in many ways mostly limited by what our best technology can do — how fast our airplanes can fly, how fast our hardware is, how efficient our power plants are, etc. — it is not clear why some other agent or set of agents coming to control this technology (which is extremely difficult to imagine in the first place given the collaborative nature of the grosser infrastructure of this technology) should be vastly more capable of achieving goals than humans powered by/powering this technology.

As for AI surpassing “our own level of intelligence”, one can say that, at the level of cognitive tasks, machines have already been vastly superhuman in many respects for many years — in virtually all mathematical calculations, for instance. And now also in many games, ranging from Atari to Go. Yet, as noted above, I would argue that, so far, such progress has amounted to a clear increase in human “intelligence” in the general sense: it has increased our ability to achieve goals.

Nick Bostrom (2014) popularized the term superintelligence to refer to (AGI-)systems that are vastly smarter than human experts in virtually all respects. This includes not only skills that computers traditionally excel at, such as calculus or chess, but also tasks like writing novels or talking people into doing things they otherwise would not. Whether AI systems would quickly develop superhuman skills across all possible domains, or whether we will already see major transformations with [superhuman skills in] just a [few] such domains while others lag behind, is an open question.

I would argue that our machines already have superhuman skills in countless domains, and that this has indeed already given rise to major transformations, in one sense of this term at least.

Note that the definitions of “AGI” and “superintelligence” leave open the question of whether these systems would exhibit something like consciousness.

I have argued to the contrary in the chapter “Consciousness — Orthogonal or Crucial?” in Reflections on Intelligence.

This article focuses on the prospect of creating smarter-than-human artificial intelligence. For simplicity, we will use the term “AI” in a non-standard way here, to refer specifically to artificial general intelligence (AGI).

Again, I would flag that the meaning of the term general intelligence, or AGI, in this context is not clear. It was defined above as the ability that “enables the expert pursuit of virtually any task or objective”. Yet the ability of humans to achieve goals in general is, I would still argue, in large part the product of their technology and culture at large, and AGI, as Lukas uses it here, does not seem to refer to anything remotely like this, i.e. “the sum of the capabilities of our technology and culture”. Instead, it seems to refer to something much more narrow and singular — something akin to “a system that possesses (virtually) all the cognitive abilities that a human does, and which possesses them at a similar or greater level”. I think this is worth highlighting.

The use of “AI” in this article will also leave open how such a system is implemented: While it seems plausible that the first artificial system exhibiting smarter-than-human intelligence will be run on some kind of “supercomputer,” our definition allows for alternative possibilities.

Again, what does “smarter-than-human intelligence” mean here? Machines can already do things that no unaided human can. It seems to refer to what I defined above: “a system that possesses (virtually) all the cognitive abilities that a human does, and which possesses them at a similar or greater level” — not the ability to achieve goals in general. And as for when a computer might have “(virtually) all the cognitive abilities that a human does”, it seems highly doubtful that any system will ever suddenly emerge with them all, given the modular, many-faceted nature of our minds. Instead, it seems much more likely that the gradual process of machines becoming better than humans at particular tasks will continue in its usual, gradual way. Or so I have argued.

The claim that altruists should focus on affecting AI outcomes is therefore intended to mean that we should focus on scenarios where the dominant force shaping the future is no longer (biological) human minds, but rather some outgrowth of information technology – perhaps acting in concert with biotechnology or other technologies. This would also e.g. allow for AI to be distributed over several interacting systems.

I think this can again come close to resembling a motte and bailey argument: it seems very plausible that the future will not be controlled mostly by what we would readily recognize as biological humans today. Yet to say that we should aim to impact such a future by no means implies that we should aim to impact, say, a small set of AI systems which might determine the entire future based on their goal functions (note: I am not saying Lukas has made this claim above, but this is often what people seem to consider the upshot of arguments of this kind, and also what it seems to me that Lukas is arguing below, in the rest of his essay). Indeed, the claim above is hardly much different from saying that we should aim to impact the long-term future. But Lukas seems to be moving back and forth between this general claim and the much narrower claim that we should focus on scenarios involving rapid growth acceleration driven mostly by software, which is the kind of scenario his essay seems almost exclusively focused on.

II. It is plausible that we create human-level AI this century

Even if we expect smarter-than-human artificial intelligence to be a century or more away, its development could already merit serious concern. As Sam Harris emphasized in his TED talk on risks and benefits of AI, we do not know how long it will take to figure out how to program ethical goals into an AI, solve other technical challenges in the space of AI safety, or establish an environment with reduced dangers of arms races. When the stakes are high enough, it pays to start preparing as soon as possible. The sooner we prepare, the better our chances of safely managing the upcoming transition.

I agree that it is worth preparing for high-stakes outcomes. But I think it is crucial that we get a clear sense of what these might look like, as well as how likely they are. “Altruists Should Prioritize Exploring Long-Term Future Outcomes, and Work out How to Best Influence Them”. To say that we should focus on “artificial intelligence”, which has a rather narrow meaning in most contexts (something akin to a software program), when we really mean that we should focus on the future of goal achieving systems in general is, I think, somewhat misleading.

The need for preparation is all the more urgent given that considerably shorter timelines are not out of the question, especially in light of recent developments. While timeline predictions by different AI experts span a wide range, many of those experts think it likely that human-level AI will be created this century (conditional on civilization facing no major disruptions in the meantime). Some even think it may emerge in the first half of this century: In a survey where the hundred most-cited AI researchers were asked in what year they think human-level AI is 10% likely to have arrived by, the median reply was 2024 and the mean was 2034. In response to the same question for a 50% probability of arrival, the median reply was 2050 with a mean of 2072 (Müller & Bostrom, 2016).¹

Again, it is important to be careful about definitions. For what is meant by “human-level AI” in this context? The authors of the cited source are careful to define what they mean: “Define a ‘high–level machine intelligence’ (HLMI) as one that can carry out most human professions at least as well as a typical human.”

And yet even this definition is quite vague, since “most human professions” is not a constant. A couple of hundred years ago, the profession of virtually all humans was farming, whereas only a couple percent of people in developed nations are employed in farming today. And this is not an idle point, because as machines become able to do jobs hitherto performed by humans, market forces will push humans to take new jobs that machines cannot do. And these new jobs may be those that require abilities that it will take many centuries for machines to acquire, if non-biological machines will indeed ever acquire them (this is not necessarily that implausible, as these abilities may include “looking like a real, empathetic biological human who ignites our brain circuits in the right ways”).

Thus, the questionnaire above seems poorly defined. And if it asks about most current human professions, its relevance appears quite limited; also because the nature of different professions change over time as well. A doctor today does not do all the same things a doctor did a hundred years ago, and the same will likely apply to doctors of the future. In other words, also within existing professions can we expect to see humans move toward doing the things that machines cannot do/we do not prefer them to do, even as machines become ever more capable.

While it could be argued that these AI experts are biased towards short timelines, their estimates should make us realize that human-level AI this century is a real possibility.

Yet we should keep in mind what they were asked about, and how relevant this is. Even if most (current?) human professions might be done by machines within this century, this does not imply that we will see “a system that possesses (virtually) all the cognitive abilities that a human does, and which possesses them at a similar or greater level” within this century. These are quite different claims.

The next section will argue that the subsequent transition from human-level AI to superintelligence could happen very rapidly after human-level AI actualizes. We are dealing with the decent possibility – e.g. above 15% likelihood even under highly conservative assumptions – that human intelligence will be surpassed by machine intelligence later this century, perhaps even in the next couple of decades. As such a transition will bring about huge opportunities as well as huge risks, it would be irresponsible not to prepare for it.

I want to flag, again, that it is not clear what “human-level AI” means. Lukas seemed to first define intelligence as something like “the ability to achieve goals in general”, which I have argued is not really what he means here (indeed, it is a rather different beast which I seek to examine in Reflections on Intelligence). And the two senses of the term “human-level intelligence” mentioned in the previous paragraph — “the ability to do most human professions” versus “possessing virtually all human cognitive abilities” — should not be conflated either. So it is in fact not clear what is being referred to here, although I believe it is the latter: “possessing virtually all human cognitive abilities at a similar or greater level”.

It should be noted that a potentially short timeline does not imply that the road to superintelligence is necessarily one of smooth progress: Metrics like Moore’s law are not guaranteed to continue indefinitely, and the rate of breakthrough publications in AI research may not increase (or even stay constant) either. The recent progress in machine learning is impressive and suggests that fairly short timelines of a decade or two are not to be ruled out. However, this progress could also be mostly due to some important but limited insights that enable companies like DeepMind to reap the low-hanging fruit before progress would slow down again. There are large gaps still to be filled before AIs reach human-level intelligence, and it is difficult to estimate how long it will take researchers to bridge these gaps. Current hype about AI may lead to disappointment in the medium term, which could bring about an “AI safety winter” with people mistakenly concluding that the safety concerns were exaggerated and smarter-than-human AI is not something we should worry about yet.

This seems true, yet it should also be conceded that a consistent lack of progress in AI would count as at least weak evidence against the claim that we should mainly prioritize what is usually referred to as “AI safety“. And more generally, we should be careful not to make the hypothesis “AI safety is the most important thing we could be working on” into an unfalsifiable one.

As for Moore’s law, not only is it “not guaranteed to continue indefinitely”, but we know, for theoretical reasons, that it must come to an end within a decade, at least in its original formulation concerning silicon transistors, and progress has indeed already been below the prediction of “the law” for some time now. And the same can be said about other aspects in hardware progress: it shows signs of waning off.

If AI progress were to slow down for a long time and then unexpectedly speed up again, a transition to superintelligence could happen with little warning (Shulman & Sandberg, 2010). This scenario is plausible because gains in software efficiency make a larger comparative difference to an AI’s overall capabilities when the hardware available is more powerful. And once an AI develops the intelligence of its human creators, it could start taking part in its own self-improvement (see section IV).

I am not sure I understand the claims being made here. With respect to the first argument about gains in efficiency, the question is how likely we should expect such gains to be if progress has been slow for long. Other things being equal, this would seem less likely in a time where growth is slow than in a time when it is fast, and especially if there is not much growth in hardware either, since hardware growth may in large part be driving growth in software.

I am not sure I follow the claim about AI developing the intelligence of its human creators, and then taking part in its own improvement, but I would just note, as Ramez Naam has argued, that AI, and our machines in general, are already playing a significant role in their own improvement in many ways. In other words, we already actively use our best, most capable technology to build the next generation of such technology.

Indeed, on a more general, yet also less directly relevant note, I would also add that we humans have in some sense been using our most advanced cognitive tools to build the next generation of such tools for hundreds of thousands of years. For over the course of evolution, individual humans have been using the best of their cognitive abilities to select the mates who had the best total package (they could get), of which cognitive abilities were a significant part. In this sense, the idea that “dumb and blind” evolution created intelligent humans is actually quite wrong. The real story is rather one of cognitive abilities actively selecting cognitive abilities (along with other things). A gradual design process over the course of which ever greater cognitive powers were “creating” and in turn created.

For AI progress to stagnate for a long period of time before reaching human-level intelligence, biological brains would have to have surprisingly efficient architectures that AI cannot achieve despite further hardware progress and years of humans conducting more AI research.

Looking over the past decades of AI research and progress, we can say that it indeed has been a fairly long period of time since computers first surpassed humans in the ability to do mathematical calculations, and yet there are still many things humans can do which computers cannot, such as having meaningful conversations with other humans, learning fast from a few examples, and experiencing and expressing feelings. And yet these examples still mostly pertain to cognitive abilities, and hence still overlook other abilities that are also relevant with respect to machines taking over human jobs (if we focus on that definition of “human-level AI”), such as having the physical appearance of a real, biological human, which does seem in strong demand in many professions, especially in the service industry.

However, as long as hardware progress does not come to a complete halt, AGI research will eventually not have to surpass the human brain’s architecture or efficiency anymore. Instead, it could become possible to just copy it: The “foolproof” way to build human-level intelligence would be to develop whole brain emulation (WBE) (Sandberg & Bostrom, 2008), the exact copying of the brain’s pattern of computation (input-output behavior as well as isomorphic internal states at any point in the computation) onto a computer and a suitable virtual environment. In addition to sufficiently powerful hardware, WBE would require scanning technology with fine enough resolution to capture all the relevant cognitive function, as well as a sophisticated understanding of neuroscience to correctly draw the right abstractions. Even though our available estimates are crude, it is possible that all these conditions will be fulfilled well before the end of this century (Sandberg, 2014).

Yet it should be noted that there are many who doubt that this is a foolproof way to build “human-level intelligence” (a term that in this context again seems to mean “a system with roughly the same cognitive abilities as the human brain”). Many doubt that it is even a possibility, and they do so for many different reasons (e.g. that a single, high-resolution scanning of the brain is not enough to capture and enable an emulation of its dynamic workings; that a digital computer cannot adequately simulate the physical complexity of the brain, and that such a computer cannot solve the so-called binding problem.)

Thus, it seems to stand as an open question whether mind uploading is indeed possible, let alone feasible (and it also seems that many people in the broader transhumanist community, who tend to be the people who write and talk the most about mind uploading, could well be biased toward believing it possible, as many of them seem to hope that it can save them from death).

The perhaps most intriguing aspect of WBE technology is that once the first emulation exists and can complete tasks on a computer like a human researcher can, it would then be very easy to make more such emulations by copying the original. Moreover, with powerful enough hardware, it would also become possible to run emulations at higher speeds, or to reset them back to a well-rested state after they performed exhausting work (Hanson, 2016).

Assuming, of course, that WBE will indeed be feasible in the first place. Also, it is worth noting that Robin Hanson himself is critical of the idea that WBEs would be able to create software that is superior to themselves very quickly; i.e. he expects a WBE economy to undergo “many doublings” before it happens.

Sped-up WBE workers could be given the task of improving computer hardware (or AI technology itself), which would trigger a wave of steeply exponential progress in the development of superintelligence.

This is an exceptionally strong claim that would seem in need of justification, and not least some specification, given that it is not clear what “steeply exponential progress in the development of superintelligence” refers to in this context. It hardly means “steeply exponential progress in the development of a super ability to achieve goals in general”, including in energy efficiency and energy harvesting. Such exponential progress is not, I submit, likely to follow from progress in computer hardware or AI technology alone. Indeed, as we saw above, such progress cannot happen with respect to the energy efficiency of most of our machines, as physical limits mean that it cannot double more than a couple of times.

But even if we understand it to be a claim about the abilities of certain particular machines and their cognitive abilities more narrowly, the claim is still a dubious one. It seems to assume that progress in computer hardware and AI technology is constrained chiefly by the amount of hours put into it by those who work on it directly, as opposed to also being significantly constrained by countless other factors, such as developments in other areas, e.g. in physics, production, and transportation, many of which imply limits on development imposed by factors such as hardware and money, not just the amount of human-like genius available.

For example, how much faster should we expect the hardware that AlphaZero was running on to have been developed and completed if a team of super-WBEs had been working on it? Would the materials used for the hardware have been dug up and transported significantly faster? Would they have been assembled significantly faster? Perhaps somewhat, yet hardly anywhere close to twice as fast. The growth story underlying many worries about explosive AI growth is quite detached from how we actually improve our machines, including AI (software and hardware) as well as the harvesting of the energy that powers it (Vaclav Smil: “Energy transitions are inherently gradual processes and this reality should be kept in mind when judging the recent spate of claims about the coming rapid innovative take-overs […]”). Such growth is the result of countless processes distributed across our entire economy. Just as nobody knows how to make a pencil, nobody, including the very best programmers, knows (more than a tiny part of) how to make better machines.

To get a sense of the potential of this technology, imagine WBEs of the smartest and most productive AI scientists, copied a hundred times to tackle AI research itself as a well-coordinated research team, sped up so they can do years of research in mere weeks or even days, and reset periodically to skip sleep (or other distracting activities) in cases where memory-formation is not needed. The scenario just described requires no further technologies beyond WBE and sufficiently powerful hardware. If the gap from current AI algorithms to smarter-than-human AI is too hard to bridge directly, it may eventually be bridged (potentially very quickly) after WBE technology drastically accelerates further AI research.

As far as I understand, much of the progress in machine learning in modern times was essentially due to modern hardware and computing power that made it possible to implement old ideas invented decades ago (of course then implemented with all the many adjustments and tinkering one cannot foresee from the drawing board). In other words, software progress was hardly the most limiting factor. Arguably, the limiting factor was rather that the economy just had not caught up to be able to make hardware advanced enough to implement these theoretical ideas successfully. And it also seems to me quite naive to think that better hardware design, and genius ideas about how to make hardware more generally, was and is a main limiting factor in our growth of computer hardware. Such progress tends to rest critically on other progress in other kinds of hardware and globally distributed production processes. Processes that no doubt can be sped up, yet hardly that significantly by advanced software alone, in large part because such progress is limited by the fact that many of the crucial processes involved in this progress, such as digging up, refining, and transporting materials, are physical processes that can only go so fast.

Beyond that, there is also an opportunity cost consideration that is ignored by the story of fast growth above. For the hardware and energy required for this team of WBEs could otherwise have been used to run other kinds of computations that could help further innovation, including those we already run on full steam to further progress — CAD programs, simulations, equation solving. And it is not clear that using all this hardware for WBEs would be a better use of hardware than would running these other programs, whose work may be considered a limiting factor to AI progress at a similar level as more “purely” human or human-like work is. Indeed, we should not expect engineers and companies to do these kinds of things with their computing resources if they were not among the most efficient things they could do with them. And even if WBEs are a better use of hardware for fast progress, it is far from clear that it would be that much better.

The potential for WBE to come before de novo AI means that – even if the gap between current AI designs and the human brain is larger than we thought – we should not significantly discount the probability of human-level AI being created eventually. And perhaps paradoxically, we should expect such a late transition to happen abruptly. Barring no upcoming societal collapse, believing that superintelligence is highly unlikely to ever happen requires not only confidence that software or “architectural” improvements to AI are insufficient to ever bridge the gap, but also that – in spite of continued hardware progress – WBE could not get off the ground either. We do not seem to have sufficient reason for great confidence in either of these propositions, let alone both.

Again, what does the term “superintelligence” refer to here? Above, it was defined as “(AGI-)systems that are vastly smarter than human experts in virtually all respects”. And given that AGI is defined as a general ability to pursue goals, and that “smart” here presumably means “better able to achieve goals”, one can say that the definition of superintelligence given here translates to “a system that pursues goals better than human experts in virtually all areas”. Yet we are already building systems that satisfy this definition of superintelligence. Our entire economy is already able to do tasks that no single human expert could ever accomplish. But superintelligence likely refers to something else here, something along the lines of: “a system that is vastly more cognitively capable than any human expert in virtually all respects”. And yet, even by this definition, we already have computer systems that can do countless cognitive tasks much better than any human, and the super system that is the union of all these systems can therefore, in many respects at least, be considered to have vastly superior cognitive abilities relative to humans. And systems composed of humans and technology are clearly vastly more capable than any human expert alone in virtually all respects.

In this sense, we clearly do have “superintelligence” already, and we are continually expanding its capabilities. And, with respect to worries above a FOOM takeover, it seems highly unlikely that a single, powerful machine could ever overtake and become more powerful than the entire collective that is the human-machine civilization, which is not to say that low-risk events should be dismissed. But they should be measured against other risks we could be focusing on.

III. Humans are not at peak intelligence

Again, it is important to be clear about what we mean by “intelligence”. Most cognitively advanced? Or best able to achieve goals in general? Humans extended by technology can clearly increase their intelligence, i.e. ability to achieve goals, significantly. We have done so consistently over the last few centuries, and we continue to do so today. And in a world where humans build this growing body of technology to serve their own ends, and in some cases build it to be provably secure, it is far from clear that some non-human system with much greater cognitive powers than humans (which, again, already exists in many domains) will also become more capable of achieving goals in general than humanity, given that it is surrounded by a capable super-system of technology designed for and by humans, controlled by humans, to serve their ends. Again, this is not to say that one should not worry about seemingly improbable risks — we definitely should — but merely that we should doubt the assumption that our making machines more cognitively capable will necessarily imply that they will be better able to achieve goals in general. Again, despite being related, these two senses of “intelligence” must not be confused.

It is difficult to intuitively comprehend the idea that machines – or any physical system for that matter – could become substantially more intelligent than the most intelligent humans. Because the intelligence gap between humans and other animals appears very large to us, we may be tempted to think of intelligence as an “on-or-off concept,” one that humans have and other animals do not. People may believe that computers can be better than humans at certain tasks, but only at tasks that do not require “real” intelligence. This view would suggest that if machines ever became “intelligent” across the board, their capabilities would have to be no greater than those of an intelligent human relying on the aid of (computer-)tools.

Again, we should be clear that the word “intelligence” here seems to mean “most cognitively capable” rather than “best able to achieve goals in general”. And the gap between the “intelligence”, as in the ability to achieve goals, of humans and other animals does arguably not appear very large when we compare individuals. Most other animals can do things that no single human can do, and to the extent we humans can learn to do things other animals naturally beat us at, e.g. lift heavier objects or traverse distances faster than speedy animals, we do so by virtue of technology, in essence the product of collective, cultural evolution.

And even with respect to cognitive abilities, one can argue that humans are not superior to other animals in a general sense. We do not have superior cognitive abilities with respect to echo location, for example, much less long-distance navigation. Nor are humans superior when it comes to all aspects of short-term/working memory,

Measuring goal achieving ability in general, as well as abilities to solve cognitive tasks in particular, along a single axis may be useful in some contexts, yet it can easily become meaningless when the systems being compared are not sufficiently similar.

But this view is mistaken. There is no threshold for “absolute intelligence.” Nonhuman animals such as primates or rodents differ in cognitive abilities a great deal, not just because of domain-specific adaptations, but also due to a correlational “g factor” responsible for a large part of the variation across several cognitive domains (Burkart et al., 2016). In this context, the distinction between domain-specific and general intelligence is fuzzy: In many ways, human cognition is still fairly domain-specific. Our cognitive modules were optimized specifically for reproductive success in the simpler, more predictable environment of our ancestors. We may be great at interpreting which politician has the more confident or authoritative body language, but deficient in evaluating whose policy positions will lead to better developments according to metrics we care about. Our intelligence is good enough or “general enough” that we manage to accomplish impressive feats even in an environment quite unlike the one our ancestors evolved in, but there are many areas where our cognition is slower or more prone to bias than it could be.

I agree with this. I would just note that “intelligence” here again seems to be referring to cognitive abilities, not the ability to achieve goals in general, and that we humans have expanded both over time via culture: our cognitive abilities, as measured by IQ, have increased significantly over the last century, while our ability to achieve goals in general has expanded much more still as we have developed ever more advanced technology.

Intelligence is best thought of in terms of a gradient. Imagine a hypothetical “intelligence scale” (inspired by part 2.1 of this FAQ) with rats at 100, chimpanzees at, say, 350, the village idiot at 400, average humans at 500 and Einstein at 750.² Of course, this scale is open at the top and could go much higher.

Again, intelligence here seems to refer to cognitive abilities, not the ability to achieve goals in general. Einstein was likely not better at shooting hoops than the average human, or indeed more athletic in general (by all appearances), although he was much more cognitively capable, at least in some respects, than virtually all other humans.

A more elaborate critique of the intelligence scale mentioned above can be found in my post Chimps, Humans, and AI: A Deceptive Analogy.

To quote Bostrom (2014, p. 44): “Far from being the smartest possible biological species, we are probably better thought of as the stupidest possible biological species capable of starting a technological civilization – a niche we filled because we got there first, not because we are in any sense optimally adapted to it.”

Again, the words “smart” and “stupid” here seem to pertain to cognitive abilities, not the ability to achieve goals in general. And this phrasing is misleading, as it seems to presume that cognitive ability is all it takes to build an advanced civilization, which is not the case. In fact, humans are not the species with the biggest brain on the planet, or even the species with the biggest cerebral cortex; indeed, long-finned pilot whales have more than twice as many neocortical neurons.

What we are, however, is a species with a lot of unique tools — fine motor hands, upright walk, vocal cords, a large brain with a large prefrontal cortex, etc. — which together enabled humans to (gradually build a lot of tools with which they could) take over the world. Remove just one of these unique tools from all of humanity, and we would be almost completely incapable. And this story of a multiplicity of components that are all necessary yet insufficient for the maintenance and growth of human civilization is even more true today, where we have countless external tools — trucks, the internet, computers, screwdrivers, etc. — without which we could not maintain our civilization. And the necessity of all these many different components seems overlooked by the story that views advanced cognitive abilities as the sole driver, or near enough, of growth and progress in the ability to achieve goals in general. This, I would argue, is a mistake.

Thinking about intelligence as a gradient rather than an “on-or-off” concept prompts a Copernican shift of perspective. Suddenly it becomes obvious that humans cannot be at the peak of possible intelligence. On the contrary, we should expect AI to be able to surpass us in intelligence just like we surpass chimpanzees.

Depending on what we mean by the word “intelligence”, one can argue that computers have already surpassed humans. If we define “intelligence” to be “that which is measured by an IQ test”, for example, then computers have already been better than humans in at least some of these tests for a few years now.

In terms of our general ability to achieve goals, however, it is not clear that computers will so readily surpass humans, in large part because we do not aim to build them to be better than humans in many respects. Take self-repair, for example, which is something human bodies, just like virtually all animal bodies, are in a sense designed to do — indeed, most of our self-repair mechanisms are much older than we are as a species. Evolution has built humans to be competent and robust autonomous systems who do not for the most part depend on a global infrastructure to repair their internal parts. Our computers, in contrast, are generally not built to be self-repairing, at least not at the level of hardware. Their notional thrombocytes are entirely external to themselves, in the form of a thousand and one specialized tools and humans distributed across the entire economy. And there is little reason to think that this will change, as there is little incentive to create self-repairing computers. We are not aiming to build generally able, human-independent computers in this sense.

Biological evolution supports the view that AI could reach levels of intelligence vastly beyond ours. Evolutionary history arguably exhibits a weak trend of lineages becoming more intelligent over time, but evolution did not optimize for intelligence (only for goal-directed behavior in specific niches or environment types). Intelligence is metabolically costly, and without strong selection pressures for cognitive abilities specifically, natural selection will favor other traits. The development of new traits always entails tradeoffs or physical limitations: If our ancestors had evolved to have larger heads at birth, maternal childbirth mortality would likely have become too high to outweigh the gains of increased intelligence (Wittman & Wall, 2007). Because evolutionary change happens step-by-step as random mutations change the pre-existing architecture, the changes are path dependent and can only result in local optima, not global ones.

Here we see how the distinction between “intelligence as cognitive abilities” and “intelligence as the ability to achieve goals” is crucial. Indeed, the example provided above clearly proves the point that advanced cognitive abilities are often not the most relevant thing for achieving goals, since the goal of surviving and reproducing was often not best achieved, as Lukas hints, with the best cognitive abilities. Often it was better achieved with longer teeth or stronger muscles. Or a prettier face.

So the question is: why do we think that advanced cognitive abilities are, to a first approximation, identical with the ability to achieve goals? And, more importantly, why do we imagine that this lesson about the sub-optimality of spending one’s limited resources on better cognitive abilities does not still hold today? Why should cognitive abilities be the sole optimal thing, or near enough, to spend all one’s resources on in order to best achieve a broad range of goals? I would argue that it is not. It was not optimal in the past (with respect to the goal of survival), and it does not seem to be optimal today either.

It would be a remarkable coincidence if evolution had just so happened to stumble upon the most efficient way to assemble matter into an intelligent system.

But it would be less remarkable if it had happened to assemble matter into a system that is broadly capable of achieving a broad range of goals, and which another system, especially one that is not built over a billion year process to be robust and highly autonomous, cannot readily outdo in terms of autonomous function. It would also not be that remarkable if biological humans, functioning within a system built by and for biological humans, happened to be among the most capable systems within such a system, not least given all the legal, social and political aspects this system entails.

Beyond that, one can dispute the meaning of “intelligent system” in the quote above, but if we look at the intelligent system that is our civilization at large, one can say that the optimization going on at this level is not coincidental but indeed deliberate, often aiming toward peak efficiency. Thus, in this regard as well, we should not be too surprised if our current system is quite efficient and competent relative to the many constraints we are facing.

But let us imagine that we could go back to the “drawing board” and optimize for a system’s intelligence without any developmental limitations. This process would provide the following benefits for AI over the human brain (Bostrom, 2014, p. 60-61):

Free choice of substrate: Signal transmission with computer hardware is millions of times faster than in biological brains. AI is not restricted to organic brains, and can be built on the substrate that is overall best suited for the design of intelligent systems.

“Supersizing:” Machines have (almost) no size-restrictions. While humans with elephant-sized brains would run into developmental impossibilities, (super)computers already reach the size of warehouses and could in theory be built even bigger.

No cognitive biases: We should be able to construct AI in a way that uses more flexible heuristics, and always the best heuristics for a given context, to prevent the encoding or emergence of substantial biases. Imagine the benefits if humans did not suffer from confirmation bias, overconfidence, status quo bias, etc.!

Modular superpowers: Humans are particularly good at tasks for which we have specialized modules. For instance, we excel at recognizing human faces because our brains have hard-wired structures that facilitate that facial recognition in particular. An artificial intelligence could have many more such specialized modules, including extremely useful ones like a module for programming.

Editability and copying: Software on a computer can be copied and edited, which facilitates trying out different variations to see what works best (and then copying it hundreds of times). By contrast, the brain is a lot messier, which makes it harder to study or improve. We also lack correct introspective access to the way we make most of our decisions, which is an important advantage that (some) AI designs could have.

Superior architecture: Starting anew, we should expect it to be possible to come up with radically more powerful designs than the patchwork architecture that natural selection used to construct the human brain. This difference could be enormously significant.

It should be noted that computers already 1) can be built with a wide variety of substrates, 2) can be supersized, 3) do not tend to display cognitive biases, 4) have modular superpowers, 5) can be edited and copied (or at least software readily can), 6) can be made with any architecture we can come up with. All of these advantages exist and are being exploited already, just not as much as they can be. And it is not clear why we should expect future change to be more radical than the change we have seen in past decades in which we have continually built ever more competent computers which can do things that no human can by exploiting these advantages.

With regard to the last point, imagine we tried to optimize for something like speed or sight rather than intelligence. Even if humans had never built anything faster than the fastest animal, we should assume that technological progress – unless it is halted – would eventually surpass nature in these respects. After all, natural selection does not optimize directly for speed or sight (but rather for gene copying success), making it a slower optimization process than those driven by humans for this specific purpose. Modern rockets already fly at speeds of up to 36,373 mph, which beats the peregrine falcon’s 240 mph by a huge margin. Similarly, eagle vision may be powerful, but it cannot compete with the Hubble space telescope. (General) intelligence is harder to replicate technologically, but natural selection did not optimize for intelligence either, and there do not seem to be strong reasons to believe that intelligence as a trait should differ categorically from examples like speed or sight, i.e., there are as far as we know no hard physical limits that would put human intelligence at the peak of what is possible.³

Again, what is being referred to by the word “intelligence” here seems to be cognitive abilities, not the ability to achieve goals in general. And with respect to cognitive abilities in particular, it is clear that computers already beat humans by a long shot in countless respects. So the point Lukas is making here is clearly true.

Another way to develop an intuition for the idea that there is significant room for improvement above human intelligence is to study variation in humans. An often-discussed example in this context is the intellect of John von Neumann. Von Neumann was not some kind of an alien, nor did he have a brain twice as large as the human average. And yet, von Neumann’s accomplishments almost seem “superhuman.” The section in his Wikipedia entry that talks about him having “founded the field of Game theory as a mathematical discipline” – an accomplishment so substantial that for most other intellectual figures it would make up most of their Wikipedia page – is just one out of many of von Neumann’s major achievements.

There are already individual humans (with normal-sized brains) whose intelligence vastly exceeds that of the typical human. So just how much room there is above their intelligence? To visualize this, consider for instance what could be done with an AI architecture more powerful than the human brain running on a warehouse-sized supercomputer.

A counterpoint to this line of reasoning can be found by contemplating chess ratings. Ratings of the skills of chess players are usually done via the so-called Elo rating system, which measures the relative skills of different players against each other. A beginner will usually have a rating around 800, whereas a rating in the range 2000-2199 ranks one as a chess “Expert”, and a ranking of 2400 and above renders one a “Senior Master”. The highest rating ever achieved was 2882 by Magnus Carlsen. Surely, this amount of variation must be puny given that all the humans who have ever played chess have roughly the same brain sizes and structures. And yet it turns out that human variation in chess ability is in fact quite enormous in an absolute sense.

For example, it took more than four decades from computers were able to beat a chess beginner (the 1950s), until they were able to beat the very best human player (1997 officially). Thus, the span from ordinary human beginner to the best human expert was more than four decades of progress in hardware — i.e. a million times more computing power — and software. That seems quite a wide range.

And yet the range seems even broader if we consider the ultimate limits of optimal chess play. For one may argue that the fact that it took computers a fairly long time to go from the average human level to the level of the best human does not mean that the best human is not still ridiculously far from the best a computer could be in theory. Surprisingly, however, this latter distance does in fact seem quite small, at least in one sense. For estimates suggest that the best possible chess machine would have an Elo rating around 3600, which means that the relative distance between the best possible computer and the best human is only around 700 Elo points, implying that the distance between the best human and a chess “Expert” is similar to the distance between the best human and the best possible chess brain, while the distance between an ordinary human beginner and the best human is far greater.

It seems plausible that a similar pattern obtains with respect to many other complex cognitive tasks. Indeed, it seems plausible that many of our abilities, especially those we evolved to do well, such as our ability to interact with other humans, have an “Elo rating” quite close to the notional maximum level for most humans.

IV. The transition from human to superhuman intelligence could be rapid

Perhaps the people who think it is unlikely that superintelligent AI will ever be created are not objecting to it being possible in principle. Maybe they think it is simply too difficult to bridge the gap from human-level intelligence to something much greater. After all, evolution took a long time to produce a species as intelligent as humans, and for all we know, there could be planets with biological life where intelligent civilizations never evolved.⁴ But considering that there could come a point where AI algorithms start taking part in their own self-improvement, we should be more optimistic.

We should again be clear that the term “superintelligent AI” seems to refer to a system with greater cognitive abilities, across a wide range of tasks, than humans. As for “a point where AI algorithms start taking part in their own self-improvement”, it should be noted, again, that we already use our best software and hardware in the process of developing better software and hardware. True, they are only a part of a process that involves far more elements, yet this is true of most everything that we produce and improve in our economy: many contributions drawn from and distributed across our economy at large are required. And we have good reason to believe that this will continue to be true of the construction of more capable machines in the future.

AIs contributing to AI research will make it easier to bridge the gap, and could perhaps even lead to an acceleration of AI progress to the point that AI not only ends up smarter than us, but vastly smarter after only a short amount of time.

Again, we already use our best software and hardware to contribute to AI research, and yet we do not appear to see acceleration in the growth of our best supercomputers. In fact, in terms of their computing power, we see a modest decline.

Several points in the list of AI advantages above – in particular the advantages derived from the editability of computer software or the possibility for modular superpowers to have crucial skills such as programming – suggest that AI architectures might both be easier to further improve than human brains, and that AIs themselves might at some point become better at actively developing their own improvements.

Again, computers are already “easier to further improve than human brains” in these ways, and our hardware and software are already among the most active parts in their own improvement. So why should we expect to see a different pattern in the future from the pattern we see today of gradual, slightly declining growth?

If we ever build a machine with human-level intelligence, it should then be comparatively easy to speed it up or make tweaks to its algorithm and internal organization to make it more powerful. The updated version, which would at this point be slightly above human-level intelligence, could be given the task of further self-improvement, and so on until the process runs into physical limits or other bottlenecks.

Or better yet than “human-level intelligence” would be if we built software that was critical for the further development of more powerful computers. And we in fact already have such software, many different kinds of it, and yet it is not that easy to simply “speed it up or make tweaks to its algorithm and internal organization to make it more powerful”. More generally, as noted above, we already use our latest, updated technology to improve our latest, updated technology, and the result is not rapid, runaway growth.

Perhaps self-improvement does not have to require human-level general intelligence at all. There may be comparatively simple AI designs that are specialized for AI science and (initially) lack proficiency in other domains. The theoretical foundations for an AI design that can bootstrap itself to higher and higher intelligence already exist (Schmidhuber, 2006), and it remains an empirical question where exactly the threshold is after which AI designs would become capable of improving themselves further, and whether the slope of such an improvement process is steep enough to go on for multiple iterations.

Again, I would just reiterate that computers are already an essential component in the process of improving computers. And the fact that humans who need to sleep and have lunch breaks are also part of this improvement process does not seem a main constraint on it compared to other factors, such as physical limitations implied by transportation and the assemblage of materials. Oftentimes in modern research, computers run simulations at their maximum capacity while the humans do their sleeping and lunching, in which case these resting activities (through which humans often get their best ideas) do not limit progress much at all, whereas the available computing power does.

For the above reasons, it cannot be ruled out that breakthroughs in AI could at some point lead to an intelligence explosion (Good, 1965; Chalmers, 2010), where recursive self-improvement leads to a rapid acceleration of AI progress. In such a scenario, AI could go from subhuman intelligence to vastly superhuman intelligence in a very short timespan, e.g. in (significantly) less than a year.

“It cannot be ruled out” can be said of virtually everything; the relevant question is how likely we should expect these possibilities to be. Beyond that, it is also not clear what would count as a “rapid acceleration of AI progress”, and thus what exactly it is that cannot be ruled out. AI going from subhuman performance to vastly greater than human performance in a short amount of time has already been seen in many different domains, including Go most recently.

But if one were to claim, to take a specific claim, that it cannot be ruled out that an AI system will improve itself so much that it can overpower human civilization and control the future, then I would argue that the reasoning above does not support considering this a likely possibility, i.e. something that is more likely to happen than, say, one in a thousand.

While the idea of AI advancing from human-level to vastly superhuman intelligence in less than a year may sound implausible, as it violates long-standing trends in the speed of human-driven development, it would not be the first time where changes to the underlying dynamics of an optimization process cause an unprecedented speed-up. Technology has been accelerating ever since innovations (such as agriculture or the printing press) began to feed into the rate at which further innovations could be generated.⁵

In the endnote “5” referred to above, Lukas writes:

[…] Finally, over the past decades, many tasks, including many areas of research and development, have already been improved through outsourcing them to machines – a process that it is still ongoing and accelerating.

That this process of outsourcing of tasks is accelerating seems in need of justification. We have been outsourcing tasks to machines in various ways and at a rapid pace for at least two centuries now, and so it is not a trivial claim that this process is accelerating.

Compared to the rate of change we see in biological evolution, cultural evolution broke the sound barrier: It took biological evolution a few million years to improve on the intelligence of our ape-like ancestors to the point where they became early hominids. By contrast, technology needed little more than ten thousand years to progress from agriculture to space shuttles.

And I would argue that the reason technology could grow so fast is because an ever larger system of technology consisting of an ever greater variety of tools was contributing to it through recursive self-improvement — human genius was but one important component. And I think we have good reason to think the same about the future.

Just as inventions like the printing press fed into – and significantly sped up – the process of technological evolution, rendering it qualitatively different from biological evolution, AIs improving their own algorithms could cause a tremendous speed-up in AI progress, rendering AI development through self-improvement qualitatively different from “normal” technological progress.

I think there is very little reason to believe this story. Again, we already use our best machines to build the next generation of machines. “Normal” technological progress of the kind we see today already depends on computers running programs created to optimize future technology as efficiently as they can, and it is far from clear that running a more human kind of program would be a more efficient use of resources toward this end.

It should be noted, however, that while the arguments in favor of a possible intelligence explosion are intriguing, they nevertheless remain speculative. There are also some good reasons why some experts consider a slower takeoff of AI capabilities more likely. In a slower takeoff, it would take several years or even decades for AI to progress from human to superhuman intelligence.

Again, the word “intelligence” here seems to refer to cognitive abilities, not the ability to achieve goals in general. And it is again not clear what it means to say that it might “take several years or even decades for AI to progress from human to superhuman intelligence”, since computers have already been more capable than humans at a wide variety of cognitive tasks for many decades. So I would argue that this statement suffers from a lack of conceptual clarity.

Unless we find decisive arguments for one scenario over the other, we should expect both rapid and comparably slow takeoff scenarios to remain plausible. It is worth noting that because “slow” in this context also includes transitions on the order of ten or twenty years, it would still be very fast practically speaking, when we consider how much time nations, global leaders or the general public would need to adequately prepare for these changes.

To reiterate the statement I just made, it is not clear what a fast takeoff means in this context given that computers are already vastly superior to humans in many domains, and probably will continue to beat humans at ever more tasks before they come close to being able to do virtually all cognitive tasks humans can do. So what it is we are supposed to consider plausible is not entirely clear. As for whether it is plausible for rapid progress to occur over a wide range of cognitive tasks such that an AI system becomes able to take over the world, I would argue that we have not seen arguments to support this claim.

V. By default, superintelligent AI would be indifferent to our well-being

The typical mind fallacy refers to the belief that other minds operate the same way our own does. If an extrovert asks an introvert, “How can you possibly not enjoy this party; I talked to half a dozen people the past thirty minutes and they were all really interesting!” they are committing the typical mind fallacy.

When envisioning the goals of smarter-than-human artificial intelligence, we are in danger of committing this fallacy and projecting our own experience onto the way an AI would reason about its goals. We may be tempted to think that an AI, especially a superintelligent one, will reason its way through moral arguments⁶ and come to the conclusion that it should, for instance, refrain from harming sentient beings. This idea is misguided, because according to the intelligence definition we provided above – which helps us identify the processes likely to shape the future – making a system more intelligent does not change its goals/objectives; it only adds more optimization power for pursuing those objectives.

Again, we need to be clear about what “smarter-than-human artificial intelligence” means here. In this case, we seem to be talking about a fairly singular and coherent system, a “mind” of sorts — as opposed to a thousand and one different software programs that do their own thing well — and hence in this regard it seems that the term “smarter-than-human artificial intelligence” here refers to something that is quite similar to a human mind. We are seemingly also talking about a system that “would reason about its goals”.

It seems worth noting that this is quite different from how we think about contemporary software programs, even including the most advanced ones such as AlphaZero and IBM’s Watson, which we are generally not tempted to consider “minds”. Expecting competent software programs of the future to be like minds may itself be to commit a typical mind fallacy of sorts, or perhaps just a mind fallacy. It is conceivable that software will continue to outdo humans at many tasks without acquiring anything resembling what we usually conceive of as a mind.

Another thing worth clarifying is what we mean by the term “by default” here. Does it refer to what AI systems will be built to do by our economy in the absence of altruistic intervention? If “by default” means that which our economy will naturally tend to produce, it seems likely that future AI indeed will be programmed to not be indifferent, at least in a behavioral sense, to human well-being “by default”. Indeed, it seems a much greater risk that future software systems will be constructed to act in a way that exclusively benefits, and is indifferent toward anything else than, human beings. In other words, that it will share our speciesist bias, with catastrophic consequences ensuing.

My point here is merely that, just as it is almost meaningless to claim that biological minds will not care about our well-being by default, as it lacks any specification of what “by default” means — given what evolutionary history? — so is it highly unclear what “by default” means when we are talking about machines created by humans. It seems to assume that we are going to suddenly have a lot of “undirected competence” delivered to us which does not itself come with countless sub-goals and adaptations built into it to attain ends desired by human programmers, and, perhaps to a greater extent, markets.

To give a silly example, imagine that an arms race between spam producers and companies selling spam filters leads to increasingly more sophisticated strategies on both sides, until the side selling spam filters has had it and engineers a superintelligent AI with the sole objective to minimize the number of spam emails in their inboxes.

Again, I would flag that it is not clear what “superintelligent AI” means here. Does it refer to a system that is better able to achieve goals across the board than humans? Or merely a system with greater cognitive abilities than any human expert in virtually all domains? Even if it is merely the latter, it is unlikely that a system developed by a single team of software developers will have much greater cognitive competences across the board than the systems developed by other competing teams, let alone those developed by the rest of the economy combined.

With its level of sophistication, the spam-blocking AI would have more strategies at its disposal than normal spam filters.

Yet how many more? What could account for this large jump in capabilities from previous versions of spam filters? What is hinted here seems akin to the sudden emergence of a Bugatti in the Stone Age. It does not seem credible.

For instance, it could try to appeal to human reason by voicing sophisticated, game-theoretic arguments against the negative-sum nature of sending out spam. But it would be smart enough to realize the futility of such a plan, as this naive strategy would backfire because some humans are trolls (among other reasons). So the spam-minimizing AI would quickly conclude that the safest way to reduce spam is not by being kind, but by gaining control over the whole planet and killing everything that could possibly try to trick its spam filter.

First of all, it is by no means clear that this would be “the safest way” to minimize spam. Indeed, I would argue that trying to gain control in this way would be a very bad action in expectation with respect to the goal of minimizing spam.

But even more fundamentally, the scenario above seems to assume that it would be much easier to build a system with the abilities to take over the world than it would to properly instantiate the goals we want it to achieve. For instance, in the case of earlier versions of AlphaZero, these were all equally aligned with the goal of winning Go. The hard problem was to make it more capable at doing it. The assumption that the situation would be inverted with respect to future goal implementation seems to me unwarranted. Not because the goals are necessarily easy to instantiate, but because the competences in question appear extremely difficult to create. The scenario described above seems to ignore this consideration, and instead assumes that the default scenario is that we will suddenly get advanced machines with a lot of competence, but where we do not know how to direct this competence toward doing what we want it to, as opposed to gradually directing and integrating these competences as they are (gradually) acquired. Beyond that, on a more general note, I think many aspiring effective altruists who worry about AI safety tend to underestimate the extent to which computer programmers are already focused on making software do what they intend it to.

Moreover, the scenario considered here also seems to assume that it would be relatively easy to make a competent machine optimize a particular goal insistently, and I would also question that this is anything less than extremely difficult. In other words, not only do I think it is extremely difficult to create the competences in question, as noted above, but I also think it is extremely difficult to orient all these competences, not just a few subroutines, toward insistently accomplishing some perverse goal. For this reason too, I think one should be highly skeptical of scenarios of this kind.

The AI in this example may fully understand that humans would object to these actions on moral grounds, but human “moral grounds” are based on what humans care about – which is not the minimization of spam! And the AI – whose whole decision architecture only selects for actions that promote the terminal goal of minimizing spam – would therefore not be motivated to think through, let alone follow our arguments, even if it could “understand” them in the same way introverts understand why some people enjoy large parties.

I think this is inaccurate. Any goal-oriented agent would be motivated to think through these things for the same reason that we humans are motivated to think through what those who disagree with us morally would say and do: because it impacts how we ourselves can act effectively toward our goals (this, we should be honest, is also often why humans think about the views and arguments made by others; not because of a deep yearning for truth and moral goodness but for purely pragmatic and selfish reasons). Thus, it makes sense to be mindful of those things, especially given that one has imperfect information and an imperfect ability to predict the future, no matter how “smart” one is.

The typical mind fallacy tempts us to conclude that because moral arguments appeal to us,⁷ they would appeal to any generally intelligent system. This claim is after all already falsified empirically by the existence of high-functioning psychopaths. While it may be difficult for most people to imagine how it would feel to not be moved by the plight of anyone but oneself, this is nothing compared to the difficulties of imagining all the different ways that minds in general could be built. Eliezer Yudkowsky coined the term mind space to refer to the set of all possible minds – including animals (of existing species as well as extinct ones), aliens, and artificial intelligences, as well as completely hypothetical “mind-like” designs that no one would ever deliberately put together. The variance in all human individuals, throughout all of history, only represents a tiny blob in mind space.

Yes, but this does not mean that the competences of human minds only span a tiny range of the notional “competence range” of various abilities. As we saw in the example of chess above, humans span a surprisingly large range, and the best humans are surprisingly close to the best mind possible. And with respect to the competences required for navigating within a world built by and for humans, it is not that unreasonable to believe that, on a continuum that measures competence across these many domains with a single measure, we are probably quite high and quite difficult to beat. This is not arrogance. It is merely to acknowledge the contingent structure of our civilization, and the fact that it is adapted to many contingent features of the human organism in general, including the human mind in particular.

Some of the minds outside this blob would “think” in ways that are completely alien to us; most would lack empathy and other (human) emotions for that matter; and many of these minds may not even relevantly qualify as “conscious.”

Most of these minds would not be moved by moral arguments, because the decision to focus on moral arguments has to come from somewhere, and many of these minds would simply lack the parts that make moral appeals work in humans. Unless AIs are deliberately designed⁸ to share our values, their objectives will in all likelihood be orthogonal to ours (Armstrong, 2013).

Again, an agent trying to achieve goals in our world need not be moved by moral arguments in an emotional sense in order to pay attention to them and the preferences of humans more generally, and to choose to avoid causing chaos. Second, the question is why we should expect future software designed by humans to not be “deliberately designed to share our values”? And what marginal difference should we expect altruists to be able to make on them? And how would this influence best be achieved?

VI. AIs will instrumentally value self-preservation and goal preservation

Even though AI designs may differ radically in terms of their top-level goals, we should expect most AI designs to converge on some of the same subgoals. These convergent subgoals (Omohundro, 2008; Bostrom, 2012) include intelligence amplification, self-preservation, goal preservation and the accumulation of resources. All of these are instrumentally very useful to the pursuit of almost any goal. If an AI is able to access the resources it needs to pursue these subgoals, and does not explicitly have concern for human preferences as (part of) its top-level goal, its pursuit of these subgoals is likely to lead to human extinction (and eventually space colonization; see below).

Again, what does “AI design” refer to in this context? Presumably a machine that possesses most of the cognitive abilities a human does to a similar or greater extent, and, on top of that, this machine is in some sense highly integrated into something akin to a coherent unified mind subordinate to a few supreme “top-level goals”. Thus, when Lukas writes “most AI designs” above, he is in fact referring to most systems that meet a very particular definition of “AI”, and one which I strongly doubt will be anywhere close to the most prevalent source of “machine competence” in the future (note that this is not to say that software, as well as our machines in general, will not become ever more competent in the future, but merely that such greater competences may not be subordinate to one goal to rule them all, or a few for that matter).

Beyond that, the claim that such a capable machine of the future seeking to achieve these subgoals is likely to lead to human extinction is a very strong claim that is not supported here, nor in the papers cited. More on this below.

AI safety work refers to interdisciplinary efforts to ensure that the creation of smarter-than-human artificial intelligence will result in excellent outcomes rather than disastrous ones. Note that the worry is not that AI would turn evil, but that indifference to suffering and human preferences will be the default unless we put in a lot of work to ensure that AI is developed with the right values.

Again, I would take issue with this “default” claim, as I would argue that “a lot of work” is exactly what we should expect that there will be made to ensure that future software will do what humans want it to. And the question is, again, how much of a difference altruists should expect to make here, as well as how to best make it.

VI.I Intelligence amplification

Increasing an agent’s intelligence improves its ability to efficiently pursue its goals. All else equal, any agent has a strong incentive to amplify its intelligence. A real-life example of this convergent drive is the value of education: Learning important skills and (thinking-)habits early in life correlates with good outcomes. In the AI context, intelligence amplification as a convergent drive implies that AIs with the ability to improve their own intelligence will do so (all else equal). To self-improve, AIs would try to gain access to more hardware, make copies of themselves to increase their overall productivity, or devise improvements to their own cognitive algorithms.

Again, what does the word “intelligence” mean in this context? Above, it was defined as “the ability to achieve goals in a wide range of environments”, which means that what is being said here reduces to the tautological claim that increasing an agent’s ability to achieve goals improves its ability to achieve goals. If one defines “intelligence” to refer to cognitive abilities, however, the claim becomes less empty. Yet it also becomes much less obvious, especially if one thinks in terms of investments of marginal resources, as it is questionable whether investing in greater cognitive abilities (as opposed to a prettier face or stronger muscles) is the best investment one can make with respect to the goal of achieving goals “in general”.

On a more general note, I would argue that “intelligence amplification”, as in “increasing our ability to achieve goals”, is already what we collectively do in our economy to a great extent, although this increase is, of course, much broader than one merely oriented toward optimizing cognitive abilities. We seek to optimize materials, supply chains, transportation networks, energy efficiency, etc. And it is not clear why this growth process should speed up significantly due to greater machine capabilities in the future than it has in the past, where more capable machines also helped grow the economy in general, as well as to increase the capability of machines in particular.

More broadly, intelligence amplification also implies that an AI would try to develop all technologies that may be of use to its pursuits.

Yet should we expect such “an AI” to be better able to develop “all technologies that may be of use to its pursuits” better than entire industries currently dedicated to it, let alone our entire economy? Indeed, should we even expect it to contribute significantly, i.e. double current growth rates across the board? I would argue that this is most dubious.

I.J. Good, a mathematician and cryptologist who worked alongside Alan Turing, asserted that “the ﬁrst ultraintelligent machine is the last invention that man need ever make,” because once we build it, such a machine would be capable of developing all further technologies on its own.

To say that a single machine would be able to develop all further technologies on its own is, I submit, unsound. For what does “on its own” mean here? “On its own” independently of the existing infrastructure of machines run by humans? Or “on its own” as in taking over this entire infrastructure? And how exactly could such a take-over scenario occur without destroying the productivity of this system? None of these scenarios seem plausible.

VI.II Goal preservation

AIs would in all likelihood also have an interest in preserving their own goals. This is because they optimize actions in terms of their current goals, not in terms of goals they might end up having in the future.

This again seems to assume that we will create highly competent systems which will be subordinate to a single or a few explicit goals that it will insistently optimize all its actions for. Why should we believe this?

Another critical note of mine on this idea quoted from elsewhere:

Stephen Omohundro (Omohundro, 2008) argues that a chess-playing robot with the supreme goal of playing good chess would attempt to acquire resources to increase its own power and work to preserve its own goal of playing good chess. Yet in order to achieve such complex subgoals, and to even realize they might be helpful with respect to achieving the ultimate goal, this robot will need access to, and be built to exercise advanced control over, an enormous host of intellectual tools and faculties. Building such tools is extremely hard and requires many resources, and harder still, if at all possible, is it to build them so that they are subordinate to a single supreme goal. And even if all this is possible, it is far from clear that access to these many tools would not enable – perhaps even force – this now larger system to eventually “reconsider” the goals that it evolved from. For instance, if the larger system has a sufficient amount of subsystems with sub-goals that involve preservation of the larger system of tools, and if the “play excellent chess” goal threatens, or at least is not optimal with respect to, this goal, could one not imagine that, in some evolutionary competition, these sub-goals could overthrow the supreme goal?

Footnote: After all, humans are such a system of competing drives, and it has been argued (e.g. in Ainslie, 2001 [Breakdown of Will]) that this competition is what gives us our unique cognitive strengths (as well as weaknesses). Our ultimate goals, to the extent we have any, are just those that win this competition most of the time.

And Paul Christiano has also described agents that would not be subject to this “basic drive” of self-preservation described by Omohundro.

Lukas continues:

From the current goal’s perspective, a change in the AI’s goal function is potentially disastrous, as the current goal would not persevere. Therefore, AIs will try to prevent researchers from changing their goals.

Granted that such a highly competent system is built so as to be subordinate to a single goal in this way, which I do not think there is good reason to consider likely to be the case in future AI systems “by default”.

Consequently, there is pressure for AI researchers to get things right on the first try: If we develop a superintelligent AI with a goal that is not quite what we were after – because someone made a mistake, or was not precise enough, or did not think about particular ways the specified goal could backfire – the AI would pursue the goal that it was equipped with, not the goal that was intended. This applies even if it could understand perfectly well what the intentioned goal was. This feature of going with the actual goal instead of the intended one could lead to cases of perverse instantiation, such as the AI “paralyz[ing] human facial musculatures into constant beaming smiles” to pursue an objective of “make us smile” (Bostrom, 2014, p. 120).

This again seems to assume that this “first superintelligent AI” would be so much more powerful than everything else in the world, yet why should we expect a single system to be so much more powerful than everything else across the board? Beyond that, it also seems to assume that the design of this system would happen in something akin to a single step — that there would be a “first try”. Yet what could a first try consist in? How could a super capable system emerge in the absence of a lot of test models that are slightly less competent? I think this “first try” idea betrays an underlying belief in a sudden growth explosion powered by a single, highly competent machine, which, again, I would argue is highly unlikely in light of what we know about the nature of the growth of the capabilities of machines.

VI.III Self-preservation

Some people have downplayed worries about AI risks with the argument that when things begin to look dangerous, humans can literally “pull the plug” in order to shut down AIs that are behaving suspiciously. This argument is naive because it is based on the assumption that AIs would be too stupid to take precautions against this.

There is a difference between being “stupid” and being ill-informed. And there is no reason to think that an extremely cognitively capable agent will be informed about everything relevant to its own self-preservation. To think otherwise is to conflate great cognitive abilities with near-omniscience.

Because the scenario we are discussing concerns smarter-than-human intelligence, an AI would understand the implications of losing its connection to electricity, and would therefore try to proactively prevent being shut down any means necessary – especially when shutdown might be permanent.

Even if all implications were understood by such a notional agent, this by no means implies that an attempt to stop its termination would be successful, nor particularly likely, or indeed even possible.

This is not to say that AIs would necessarily be directly concerned about their own “death” – after all, whether an AI’s goal includes its own survival or not depends on the specifics of its goal function. However, for most goals, staying around pursuing one’s goal will lead to better expected goal achievement. AIs would therefore have strong incentives to prevent permanent shutdown even if their goal was not about their own “survival” at all. (AIs might, however, be content to outsource their goal achievement by making copies of themselves, in which case shutdown of the original AI would not be so terrible as long as one or several copies with the same goal remain active.)

I would question the tacit notion that the self-preservation of such a machine could be done with a significantly greater level of skill than could the “counter self-preservation” work of the existing human-machine civilization. After all, why should a single system be so much more capable than the rest of the world at any given task? Why should humans not develop specialized software systems and other machines that enable them to counteract and overpower rogue machines, for example by virtue of having more information and training? What seems described here as an almost sure to happen default outcome strikes me as highly unlikely. This is not to say that one should not worry about small risks of terrible outcomes, yet we need to get a clear view of the probabilities if we are to make a qualified assessment of the expected value of working on these risks.

The convergent drive for self-preservation has the unfortunate implication that superintelligent AI would almost inevitably see humans as a potential threat to its goal achievement. Even if its creators do not plan to shut the AI down for the time being, the superintelligence could reasonably conclude that the creators might decide to do so at some point. Similarly, a newly-created AI would have to expect some probability of interference from external actors such as the government, foreign governments or activist groups. It would even be concerned that humans in the long term are too stupid to keep their own civilization intact, which would also affect the infrastructure required to run the AI. For these reasons, any AI intelligent enough to grasp the strategic implications of its predicament would likely be on the lookout for ways to gain dominance over humanity. It would do this not out of malevolence, but simply as the best strategy for self-preservation.

Again, to think that a single agent could gain dominance over the rest of the human-machine civilization in which it would find itself appears extremely unlikely. What growth story could plausibly lead to this outcome?

This does not mean that AIs would at all times try to overpower their creators: If an AI realizes that attempts at trickery are likely to be discovered and punished with shutdown, it may fake being cooperative, and may fake having the goals that the researchers intended, while privately plotting some form of takeover. Bostrom has referred to this scenario as a “treacherous turn” (Bostrom, 2014, p. 116).

We may be tempted to think that AIs implemented on some kind of normal computer substrate, without arms or legs for mobility in the non-virtual world, may be comparatively harmless and easy to overpower in case of misbehavior. This would likely be a misconception, however. We should not underestimate what a superintelligence with access to the internet could accomplish. And it could attain such access in many ways and for many reasons, e.g. because the researchers were careless or underestimated its capacities, or because it successfully pretended to be less capable than it actually was. Or maybe it could try to convince the “weak links” in its [team] of supervisors to give it access in secret – promising bribes. Such a strategy could work even if most people in the developing team thought it would be best to deny their AI internet access until they have more certainty about the AI’s alignment status and its true capabilities. Importantly, if the first superintelligence ever built was prevented from accessing the internet (or other efficient channels of communication), its impact on the world would remain limited, making it possible for other (potentially less careful) teams to catch up. The closer the competition, the more the teams are incentivized to give their AIs riskier access over resources in a gamble for the potential benefits in case of proper alignment.

Again, this all seems to assume a very rapid take-off in capabilities with one system being vastly more capable than all others. What reasons do we have to consider such a scenario plausible? Barely any, I have argued.

The following list contains some examples of strategies a superintelligent AI could use to gain power over more and more resources, with the goal of eventually reaching a position where humans cannot harm or obstruct it. Note that these strategies were thought of by humans, and are therefore bound to be less creative and less effective than the strategies an actual superintelligence would be able to devise.

Backup plans: Superintelligent AI could program malware of unprecedented sophistication that inserted partial copies of itself into computers distributed around the globe (adapted from part 3.1.2 of this FAQ). This would give it further options to act even if its current copy was destroyed or if its internet connection was cut. Alternatively, it could send out copies of its source code, alongside detailed engineering instructions, to foreign governments, ideally ones who have little to lose and a lot to gain, with the promise of helping them attain world domination if they build a second version of the AI and handed it access to all their strategic resources.

Making money: Superintelligent AI could easily make fortunes with online poker, stock markets, scamming people, hacking bank accounts, etc.⁹

Influencing opinions: Superintelligent AI could fake convincing email exchanges with influential politicians or societal elites, pushing an agenda that serves its objectives of gaining power and influence. Similarly, it could orchestrate large numbers of elaborate sockpuppet accounts on social media or other fora to influence public opinion in favorable directions.

Hacking and extortion: Superintelligent AI could hack into sensitive documents, nuclear launch codes or other compromising assets in order to blackmail world leaders into giving it access over more resources. Or it could take over resources directly if hacking allows for it.

(Bio-)engineering projects: Superintelligent AI could pose as the head researcher of a biology lab and send lab assistants instructions to produce viral particles with specific RNA sequences, which then, unbeknownst to the people working on the project, turned out to release a deadly virus that incapacitated most of humanity.¹⁰

Through some means or another – and let’s not forget that the AI could well attempt many strategies at once to safeguard against possible failure in some of its pursuits – the AI may eventually gain a decisive strategic advantage over all competition (Bostrom, 2014, p. 78-90). Once this is the case, it would carefully build up further infrastructure on its own. This stage will presumably be easier to reach as the world economy becomes more and more automated.

These various strategies could also be pursued by other agents, and indeed by vast systems of agents and programs. Why should one such agent be much more competent than others at doing any of these things?

Once humans are no longer a threat, the AI would focus its attention on natural threats to its existence. It would for instance notice that the sun will expand in about seven billion years to the point where existence on earth will become impossible. For the reason of self preservation alone, a superintelligent AI would thus eventually be incentivized to expand its influence beyond Earth.

Following the arguments I have made above (as well as here), I would argue that such a take-over of the world subordinate to a single or a few goals originally instilled in a single machine is extremely unlikely.

VI.IV Resource accumulation

For the fulfillment of most goals, accumulating as many resources as possible is an important early step. Resource accumulation is also intertwined with the other subgoals in that it tends to facilitate them.

The resources available on Earth are only a tiny fraction of the total resources that an AI could access in the entire universe. Resource accumulation as a convergent subgoal implies that most AIs would eventually colonize space (provided that it is not prohibitively costly), in order to gain access to the maximum amount of resources. These resources would then be put to use for the pursuit of its other subgoals and, ultimately, for optimizing its top-level goal.

Superintelligent AI might colonize space in order to build (more of) the following:

Supercomputers: As part of its intelligence enhancement, an AI could build planet-sized supercomputers (Sandberg, 1999) to figure out the mysteries of the cosmos. Almost no matter the precise goal, having an accurate and complete understanding of the universe is crucial for optimal goal achievement.

Infrastructure: In order to accomplish anything, an AI needs infrastructure (factories, control centers, etc.) and “helper robots” of some sort. This would be similar (but much larger in scale) to how the Manhattan Project had its own “project sites” and employed tens of thousands of people. While some people worry that an AI would enslave humans, these helpers would more plausibly be other AIs specifically designed for the tasks at hand.

Defenses: An AI could build shields to protect itself or other sensitive structures from cosmic rays. Perhaps it would build weapon systems to deal with potential threats.

Goal optimization: Eventually, an AI would convert most of its resources into machinery that directly achieves its objectives. If the goal is to produce paperclips, the AI will eventually tile the accessible universe with paperclips. If the goal is to compute pi to as many decimal places as possible, the AI will eventually tile the accessible universe with computers to compute pi. Even if an AI’s goal appears to be limited to something “local” or “confined,” such as e.g. “protect the White House,” the AI would want to make success as likely as possible and thus continue to accumulate resources to better achieve that goal.

To elaborate on the point of goal optimization: Humans tend to be satisficers with respect to most things in life. We have minimum requirements for the quality of the food we want to eat, the relationships we want to have, or the job we want to work in. Once these demands are met and we find options that are “pretty good,” we often end up satisfied and settle down on the routine. Few of us spend decades of our lives pushing ourselves to invest as many waking hours as sustainably possible into systematically finding the optimal food in existence, the optimal romantic partner, or anything really.

AI systems on the other hand, in virtue of how they are usually built, are more likely to act as maximizers. A chess computer is not trying to look for “pretty good moves” – it is trying to look for the best move it can find with the limited time and computing power it has at its disposal. The pressure to build ever more powerful AIs is a pressure to build ever more powerful maximizers. Unless we deliberately program AIs in a way that reduces their impact, the AIs we build will be maximizers that never “settle” or consider their goals “achieved.” If their goal appears to be achieved, a maximizer AI will spend its remaining time double- and triple-checking whether it made a mistake. When it is only 99.99% certain that the goal is achieved, it will restlessly try to increase the probability further – even if this means using the computing power of a whole galaxy to drive the probability it assigns to its goal being achieved from 99.99% to 99.991%.

Because of the nature of maximizing as a decision-strategy, a superintelligent AI is likely to colonize space in pursuit of its goals unless we program it in a way to deliberately reduce its impact. This is the case even if its goals appear as “unambitious” as e.g. “minimize spam in inboxes.”

Why should we expect a single machine to be better able to accumulate resources than other actors in the economy, much less whole teams of actors powered by specialized software programs optimized toward that very purpose? Again, what seems to be considered the default outcome here is one that I would argue is extremely unlikely. This is still not to say that we then have reason to dismiss such a scenario. Yet it is important that we make an honest assessment of its probability if we are to make qualified assessments of the value of prioritizing it.

VII. Artificial sentience and risks of astronomical suffering

Space colonization by artificial superintelligence would increase goal-directed activity and computations in the world by an astronomically large factor.¹¹

So would space colonization driven by humans. And it is not clear why we should expect a human-driven colonization to increase goal-directed computations any less. Beyond that, such human-driven colonization also seems much more likely to happen than does rogue AI colonization.

If the superintelligence holds objectives that are aligned with our values, then the outcome could be a utopia. However, if the AI has randomly, mistakenly, or sufficiently suboptimally implemented values, the best we could hope for is if all the machinery it used to colonize space was inanimate, i.e. not sentient. Such an outcome – even though all humans would die – would still be much better than other plausible outcomes, because it would at least not contain any suffering. Unfortunately, we cannot rule out that the space colonization machinery orchestrated by a superintelligent AI would also contain sentient minds, including minds that suffer. The same way factory farming led to a massive increase in farmed animal populations, multiplying the direct suffering humans cause to animals by a large factor, an AI colonizing space could cause a massive increase in the total number of sentient entities, potentially creating vast amounts of suffering.

The same applies to a human-driven colonization, which I would still argue seems a much more likely outcome. So why should we focus more on colonization driven by rogue AI?

The following are some ways AI outcomes could result in astronomical amounts of suffering:

Suffering in AI workers: Sentience appears to be linked to intelligence and learning (Daswani & Leike, 2015), both of which would be needed (e.g. in robot workers) for the coordination and execution of space colonization. An AI could therefore create and use sentient entities to help it pursue its goals. And if the AI’s creators did not take adequate safety measures or program in compassionate values, it may not care about those entities’ suffering in their assistance.

Optimization for sentience: Some people want to colonize space in order for there to be more life or (happy) sentient minds. If the AI in question has values that reflect this goal, either because human researchers managed to get value loading right (or “half-right”), or because the AI itself is sentient and values creating copies of itself, the result could be astronomical numbers of sentient minds. If the AI does not accurately assess how happy or unhappy these beings are, or if it only cares about their existence but not their experiences, or simply if something goes wrong in even a small portion of these minds, the total suffering that results could be very high.

Ancestor simulations: Turning history and (evolutionary) biology into an empirical science, AIs could run many “experiments” with simulations of evolution on planets with different starting conditions. This would e.g. give the AIs a better sense of the likelihood of intelligent aliens existing, as well as a better grasp on the likely distribution of their values and whether they would end up building AIs of their own. Unfortunately, such ancestor simulations could recreate millions of years of human or wild-animal suffering many times in parallel.

Warfare: Perhaps space-faring civilizations would eventually clash, with at least one of the two civilizations containing many sentient minds. Such a conflict would have vast frontiers of contact and could result in a lot of suffering.

All of these scenarios could also occur in a human-driven colonization, which I would argue is significantly more likely to happen. So again: why should we focus more on colonization driven by rogue AI?

More ways AI scenarios could contain astronomical amounts of suffering are described here and here. Sources of future suffering are likely to follow a power law distribution, where most of the expected suffering comes from a few rare scenarios where things go very wrong – analogous to how most casualties are the result of very few, very large wars; how most of the casualty-risks from terrorist attacks fall into tail scenarios where terrorists would get their hands on weapons of mass destruction; or how most victims of epidemics succumbed to the few very worst outbreaks (Newman, 2005). It is therefore crucial to not only to factor in which scenarios are most likely to occur, but also how bad scenarios would be should they occur.

Again, most of the very worst scenarios could well be due to human-driven colonization, such as US versus China growth races taken beyond Earth. So, again, why focus mostly on colonization scenarios driven by rogue AI? Beyond that, the expected value of influencing a broad class of medium-value outcomes could easily be much higher than the expected value of influencing much fewer, much higher-stakes outcomes, provided that the outcomes that fall into this medium value class are sufficiently probable and amenable to impact. In other words, it is by no means far-fetched to imagine that we can take actions that are robust over a wide range of medium-value outcomes, and that such actions are in fact best in expectation.

Critics may object because the above scenarios are largely based on the possibility of artificial sentience, particularly sentience implemented on a computer substrate. If this turns out to be impossible, there may not be much suffering in futures with AI after all. However, computer-based minds also being able to suffer in the morally relevant sense is a common implication in philosophy of mind. Functionalism and type A physicalism (“eliminativism”) both imply that there can be morally relevant minds on digital substrates. Even if one were skeptical of these two positions and instead favored the views of philosophers like David Chalmers or Galen Strawson (e.g. Strawson, 2006), who believe consciousness is an irreducible phenomenon, there are at least some circumstances under which these views would also allow for computer-based minds to be sentient.¹² Crude “carbon chauvinism,” or a belief that consciousness is only linked to carbon atoms, is an extreme minority position in philosophy of mind.

The case for artificial sentience is not just abstract but can also be made on the intuitive level: Imagine we had whole brain emulation with a perfect mapping from inputs to outputs, behaving exactly like a person’s actual brain. Suppose we also give this brain emulation a robot body, with a face and facial expressions created with particular attention to detail. The robot will, by the stipulations of this thought experiment, behave exactly like a human person would behave in the same situation. So the robot-person would very convincingly plead that it has consciousness and moral relevance. How certain would we be that this was all just an elaborate facade? Why should it be?

Because we are unfamiliar with artificial minds and have a hard time experiencing empathy for things that do not appear or behave in animal-like ways, we may be tempted to dismiss the possibility of artificial sentience or deny artificial minds moral relevance – the same way animal sentience was dismissed for thousands of years. However, the theoretical reasons to anticipate artificial sentience are strong, and it would be discriminatory to deny moral consideration to a mind simply because it is implemented on a substrate different from ours. As long as we are not very confident indeed that minds on a computer substrate would be incapable of suffering in the morally relevant sense, we should believe that most of the future’s expected suffering is located in futures where superintelligent AI colonizes space.

I fail to see how this final conclusion is supported by the argument made above. Again, human-driven colonization seems to pose at least as big a risk of outcomes of this sort.

One could argue that “superintelligent AI” could travel much faster and convert matter and energy into ordered computations much faster than a human-driven colonization could, yet I see little reason to expect a rogue AI-driven colonization to be significantly more effective in this regard than a human civilization powered by advanced tools built to be as efficient as possible. For instance, why should “superintelligent AI” be able to build significantly faster spaceships? I would expect both tail-end scenarios — i.e. both maximally sentient rogue AI-driven colonization and maximally sentient human-driven colonization — to converge toward an optimal expansion solution in a relatively short time, at least on cosmic timescales.

VIII. Impact analysis

The world currently contains a great deal of suffering. Large sources of suffering include for instance poverty in developing countries, mental health issues all over the world, and non-human animal suffering in factory farms and in the wild. We already have a good overview – with better understanding in some areas than others – of where altruists can cost-effectively reduce substantial suffering. Charitable interventions are commonly chosen according to whether they produce measurable impact in the years or decades to come. Unfortunately, altruistic interventions are rarely chosen with the whole future in mind, i.e. with a focus on reducing as much suffering as possible for the rest of time, until the heat death of the universe.¹³ This is potentially problematic, because we should expect the far future to contain vastly more suffering than the next decades, not only because there might be sentient beings around for millions or billions of years to come, but also because it is possible for Earth-originating life to eventually colonize space, which could multiply the total amount of sentient beings many times over. While it is important to reduce the suffering of sentient beings now, it seems unlikely that the most consequential intervention for the future of all sentience will also be the intervention that is best for reducing short-term suffering.

I think this is true, but also because the word “best” here refers to two very narrow peaks that have to coincide in a very large landscape. In contrast, I do not think it seems unlikely that the best, most robust interventions we can make to influence the long-term future are also highly robust and positive with respect to the short-term future, such as promoting concern for suffering as well as greater moral consideration of neglected beings.

And given that the probability of extinction (evaluated from now) increases over time, and hence that one should discount the value of influencing the long-term future of civilization by a certain factor, it in fact seems reasonable to choose actions that seem positive both in the short and long term.

Instead, as judged from the distant future, the most consequential development of our decade would more likely have something to do with novel technologies or the ways they will be used.

And when it comes to how technologies will be used, it is clear that influencing ideas matters a great deal. By analogy, we have also seen important technologies developed in the past, and yet ideas seem to have been no less significant, such as specific religions (e.g. Islam and Christianity) as well as political ideologies (e.g. communism and liberalism). One may, of course, argue that it is very difficult to influence ideas on a large scale, yet the same can be said about influencing technology. Indeed, influencing ideas, whether broadly or narrowly, might just be the best way to influence technology.

And yet, politics, science, economics and especially the media are biased towards short timescales. Politicians worry about elections, scientists worry about grant money, and private corporations need to work on things that produce a profit in the foreseeable future. We should therefore expect interventions targeted at the far future to be much more neglected than interventions targeted at short-term sources of suffering.

Admittedly, the far future is difficult to predict. If our models fail to account for all the right factors, our predictions may turn out very wrong. However, rather than trying to simulate in detail through everything that might happen all the way into the distant future – which would be a futile endeavor, needless to say – we should focus our altruistic efforts on influencing levers that remain agile and reactive to future developments. An example of such a lever is institutions that persist for decades or centuries. The US Constitution for instance still carries significant relevance in today’s world, even though it was formulated hundreds of years ago. Similarly, the people who founded the League of Nations after World War I did not succeed in preventing the next war, but they contributed to the founding and the charter of its successor organization, the United Nations, which still exerts geopolitical influence today. The actors who initially influenced the formation of these institutions as well as their values and principles, had a long-lasting impact.

In order to positively influence the future for hundreds of years, we fortunately do not need to predict the next hundreds of years in detail. Instead, all we need to predict is what type of institutions – or, more generally, stable and powerful decision-making agencies – are most likely to react to future developments maximally well.¹⁴

AI is the ultimate lever through which to influence the future. The goals of an artificial superintelligence would plausibly be much more stable than the values of human leaders or those enshrined in any constitution or charter. And a superintelligent AI would, with at least considerable likelihood, remain in control of the future not only for centuries, but for millions or even billions of years to come. In non-AI scenarios on the other hand, all the good things we achieve in the coming decade(s) will “dilute” over time, as current societies, with all their norms and institutions, change or collapse.

In a future where smarter-than-human artificial intelligence won’t be created, our altruistic impact – even if we manage to achieve a lot in greatly influencing this non-AI future – would be comparatively “capped” and insignificant when contrasted with the scenarios where our actions do affect the development of superintelligent AI (or how AI would act).¹⁵

I think this is another claim that is widely overstated, and which I have not seen a convincing case for. Again, this notion that “an artificial superintelligence”, a single machine with much greater cognitive powers than everything else, will emerge and be programmed to be subordinate to a single goal that it would be likely to preserve does not seem credible to me. Sure, we can easily imagine it as an abstract notion, but why should we think such a system will ever emerge? The creation of such a system is, I would argue, far from being a necessary, or even particularly likely, outcome of our creating ever more competent machines.

And even if such a system did exist, it is not even clear, as Robin Hanson has argued, that it would be significantly more likely to preserve its values than would a human civilization — not so much because one should expect humans to be highly successful at it, but rather because there are also reasons to think that it would be unlikely for such a “superintelligent AI” to do it (such as those mentioned in my note on Omohundro’s argument above, as well as those provided by Hanson, e.g. that “the values of AIs with protected values should still drift due to influence drift and competition”).

We should expect AI scenarios to not only contain the most stable lever we can imagine – the AI’s goal function which the AI will want to preserve carefully – but also the highest stakes.

Again, I do not think a convincing case has been made for either of these claims. Why would the stakes be higher than in a human-driven colonization, which we may expect, for evolutionary reasons, to be performed primarily by those who want to expand and colonize as much and as effectively as possible?

In comparison with non-AI scenarios, space colonization by superintelligent AI would turn the largest amount of matter and energy into complex computations.

It depends on what we mean by non-AI scenarios. Scenarios where humans use advanced tools, such as near-maximally fast spaceships and near-optimal specialized software, to fill up space with sentient beings at a near maximal rate is, I would argue, not only at least as conceivable but also at least as likely as similar scenarios brought about by the kind of AI Lukas seems to have in mind here.

In a best-case scenario, all these resources could be turned into a vast utopia full of happiness, which provides as strong incentive for us to get AI creation perfectly right. However, if the AI is equipped with insufficiently good values, or if it optimizes for random goals not intended by its creators, the outcome could also include astronomical amounts of suffering. In combination, these two reasons of highest influence/goal-stability and highest stakes build a strong case in favor of focusing our attention on AI scenarios.

Again, things could also go very wrong or very well with human-driven colonization, so there does not seem a big difference in this regard either.

While critics may object that all this emphasis on the astronomical stakes in AI scenarios appears unfairly Pascalian, it should be noted that AI is not a frivolous thought experiment where we invoke new kinds of physics to raise the stakes.

Right, but the kind of AI system envisioned here does, I would argue, rest on various, highly questionable conceptions of how a single system could grow, as well as what the design of future machines are likely to be like. And I would argue, again, that such a system is highly unlikely to emerge.

Smarter-than-human artificial intelligence and space colonization are both realistically possible and plausible developments that fit squarely into the laws of nature as we currently understand them.

A Bugatti appearing in the Stone Age also in some sense fits squarely into the laws of nature as we currently understand them. Yet that does not mean that such a car was likely to emerge in that time, once we consider the history and evolution of technology. Similarly, I would argue that the scenario Lukas seems to have hinted at throughout his piece is a lot less credible than what this appeal to compatibility with the laws of nature would seem to suggest.

If either of them turn out to be impossible, that would be a big surprise, and would suggest that we are fundamentally misunderstanding something about the way physical reality works. While the implications of smarter-than-human artificial intelligence are hard to grasp intuitively, the underlying reasons for singling out AI as a scenario to worry about are sound.

Well, I have tried to argue to the contrary here. Much more plausible would it be, I think, to argue that the scenario Lukas envisions is one scenario among others that warrants some priority.

As illustrated by Leó Szilárd’s lobbying for precautions around nuclear bombs well before the first such bombs were built, it is far from hopeless to prepare for disruptive new technologies in advance, before they are completed.

This text argued that altruists concerned about the quality of the future should [be] focusing their attention on futures where AI plays an important role.

I would say that the argument that has been made is much more narrow than that, since “AI” here is used in a relatively narrow sense in the first place, and because it is a very particular scenario involving such narrowly defined AI that Lukas has been focusing on the most here — as far as I can tell, it is a scenario where a single system takes over the world and determines the future based on a single, arduously preserved goal. There are many other scenarios we can envision in which AI, both in the ordinary sense as well as in the more narrow sense invoked here by Lukas, plays “an important role”, including scenarios involving human-driven space colonization.

This can mean many things. It does not mean that everyone should think about AI scenarios or technical work in AI alignment directly. Rather, it just means we should pick interventions to support according to their long-term consequences, and particularly according to the ways in which our efforts could make a difference to futures ruled by superintelligent AI. Whether it is best to try to affect AI outcomes in a narrow and targeted way, or whether we should go for a broader strategy, depends on several factors and requires further study.

FRI has looked systematically into paths to impact for affecting AI outcomes with particular emphasis on preventing suffering, and we have come up with a few promising candidates. The following list presents some tentative proposals:

Worst-case AI safety

Differential intellectual progress

Promoting cooperation

Raising awareness of:

suffering-focused ethics

anti-speciesism

concern for the suffering of “weird” minds

It is important to note that human values may not affect the goals of an AI at all if researchers fail to solve the value-loading problem. Raising awareness of certain values may therefore be particularly impactful if it concerns groups likely to be in control of the goals of smarter-than-human artificial intelligence.

Further research is needed to flesh out these paths to impact in more detail, and to discover even more promising ways to affect AI outcomes.

Lukas writes about the implications of his argument that it means that “we should pick interventions to support according to their long-term consequences”. I agree with this completely. He then continues to write, “and particularly according to the ways in which our efforts could make a difference to futures ruled by superintelligent AI”. And this claim, as I understand it, is what I would argue has not been justified. Again, to argue that one should grant it some priority, even significant priority, along with many other scenarios, is a plausible claim, but not, I would argue, that it should be granted greater priority than all other things.

And as for how we can best reduce suffering in the future, I would agree with pretty much all the proposals Lukas suggests, although I would argue that things like promoting concern for suffering and widening our moral circles (and we should do both) become even more important when we take other scenarios into consideration, such as human-driven colonization. In other words, these things seem even more robust and more positive when we also consider these other high-stakes scenarios.

Beyond that, I would also note that we likely have moral intuitions that make a notional rogue AI-takeover seem worse in expectation than what a more detached analysis relative to a more impartial moral ideal such as “reduce suffering” would suggest. Furthermore, it should be noted that many of those who focus most prominently on AI safety (for example, people at MIRI and FHI) seem to have values according to which it is important that humans maintain control or remain in existence, which may render their view that AI safety is the most important thing to focus on less relevant for other value systems than one might intuitively suppose.

To zoom out a bit, one way to think about my disagreement with Lukas, as well as the overall argument I have tried to make here, is that one can view Lukas’ line of argument as consisting of a certain number of steps where, in each of them, he describes a default scenario he believes to be highly probable, whereas I generally find these respective “default” scenarios quite improbable. And when one then combines our respective probabilities into a single measure of the probability that the grosser scenario Lukas envisions will occur, one gets a very different overall probability for Lukas and myself respectively. It may look something like this, assuming Lukas’ argument consists of eight steps in a conditional chain, each assigned a certain probability which then gets multiplied by the rest (i.e. P(A) * P(B|A) * P(C|B) * . . . ):

L: 0.98 * 0.96 * 0.93 * 0.99 * 0.95 * 0.99 * 0.97 * 0.98 ≈ 0.77

M: 0.1 * 0.3 * 0.01 * 0.1 * 0.2 * 0.08 * 0.2 * 0.4 ≈ 0.00000004

(These particular numbers are just more or less random ones I have picked for illustrative purposes, except that their approximate range do illustrate where I think the respective credences of Lukas and myself roughly lie with regard to most of the arguments discussed throughout this essay.)

And an important point to note here is that even if one disagrees both with Lukas and me on these respective probabilities, and instead picks credences roughly in-between those of Lukas and me, or indeed significantly closer to those of Lukas, the overall argument I have made here still stands, namely that it is far from clear that scenarios of the kind Lukas outlines are the most important ones to focus on to best reduce suffering. For then the probability of Lukas’ argument being correct/the probability that the scenario Lukas envisions will occur (one can think of it in both ways, I think, even if these formulations are not strictly equivalent) becomes something like the following:

In-between credence: 0.5^8 ≈ 0.004

Credences significantly closer to Lukas’: 0.75^8 ≈ 0.1

Which would not seem to support the conclusion that a focus on the AI-scenarios Lukas has outlined should dominate other scenarios we can envision (e.g. human-driven colonization).

Lukas ends his post on the following note:

As there is always the possibility that we have overlooked something or are misguided or misinformed, we should remain open-minded and periodically rethink the assumptions our current prioritization is based on.

With that, I could not agree more. In fact, this is in some sense the core point I have been trying to make here.

September 18, 2018