Thoughts on AI pause

Whether to push for an AI pause is a hotly debated question. This post contains some of my thoughts on the issue of AI pause and the discourse that surrounds it.


Contents

  1. The motivation for an AI pause
  2. My thoughts on AI pause, in brief
  3. My thoughts on AI pause discourse
  4. Massive moral urgency: Yes, in both categories of worst-case risks

The motivation for an AI pause

Generally speaking, it seems that the primary motivation behind pushing for an AI pause is that work on AI safety is far from where it needs to be for humanity to maintain control of future AI progress. Therefore, a pause is needed so that work on AI safety — and other related work, such as AI governance — can catch up with the pace of progress in AI capabilities.

My thoughts on AI pause, in brief

Whether it is worth pushing for an AI pause obviously depends on various factors. For one, it depends on the opportunity cost: what could we be doing otherwise? After all, even if one thinks that an AI pause is desirable, one might still have reservations about its tractability compared to other aims. And even if one thinks that an AI pause is both desirable and tractable, there might still be other aims and activities that are even more beneficial (in expectation), such as working on worst-case AI safety (Gloor, 2016; Yudkowsky, 2017; Baumann, 2018), or increasing the priority that people devote to reducing risks of astronomical suffering (s-risks) (Althaus & Gloor, 2016; Baumann 2017; 2022; DiGiovanni, 2021).

Furthermore, there is the question of whether an AI pause would even be beneficial in the first place. This is a complicated question, and I will not explore it in detail here. (For a critical take, see “AI Pause Will Likely Backfire” by Nora Belrose.) Suffice it to say that, in my view, it seems highly uncertain whether any realistic AI pause would be beneficial overall — not just from a suffering-focused perspective, but from the perspective of virtually all impartial value systems. It seems to me that most advocates for AI pause are quite overconfident on this issue.

But to clarify, I am by no means opposed to advocating for an AI pause. It strikes me as something that one can reasonably conclude is helpful and worth doing (depending on one’s values and empirical judgement calls). But my current assessment is just that it is unlikely to be among the best ways to reduce future suffering, mainly because I view the alternative activities outlined above as being more promising, and because I suspect that most realistic AI pauses are unlikely to be clearly beneficial overall.

My thoughts on AI pause discourse

A related critical observation about much of the discourse around AI pause is that it tends toward a simplistic “doom vs. non-doom” dichotomy. That is, the picture that is conveyed seems to be that either humanity loses control of AI and goes extinct, which is bad; or humanity maintains control, which is good. And your probability of the former is your “p(doom)”.

Of course, one may argue that for strategic and communication purposes, it makes sense to simplify things and speak in such dichotomous terms. Yet the problem, in my view, is that this kind of picture is not accurate even to a first approximation. From an altruistic perspective, it is not remotely the case that “loss of control to AI” = “bad”, while “humans maintaining control” = “good”.

For example, if we are concerned with the reduction of s-risks (which is important by the lights of virtually all impartial value systems), we must compare the relative risks of “loss of control to AI” with the risks of “humans maintaining control” — however we define these rough categories. And sadly, it is not the case that “humans maintaining control” is associated with a negligible or trivial risk of worst-case outcomes. Indeed, it is not clear whether “humans maintaining control” is generally associated with better or worse prospects than “loss of control to AI” when it comes to s-risks.

In general, the question of whether a “human-controlled future” is better or worse with respect to reducing future suffering is a difficult one that has been discussed and debated at some length, and no clear consensus has emerged. As a case in point, Brian Tomasik places a 52 percent subjective probability on the claim that “Human-controlled AGI in expectation would result in less suffering than uncontrolled”.

This near-50/50 view stands in stark contrast to what often seems assumed as a core premise in much of the discourse surrounding AI pause, namely that a human-controlled future would obviously be far better (in expectation).

(Some reasons why one might be pessimistic regarding human-controlled futures can be found in the literature on human moral failings; see e.g. Cooper, 2018; Huemer, 2019; Kidd, 2020; Svoboda, 2022. Other reasons include basic competitive aims and dynamics that are likely to be found in a wide range of futures, including human-controlled ones; see e.g. Tomasik, 2013; Knutsson, 2022, sec. 3. See also Vinding, 2022.)

Massive moral urgency: Yes, in both categories of worst-case risks

There is a key point on which I agree strongly with advocates for an AI pause: there is a massive moral urgency in ensuring that we do not end up with horrific AI-controlled outcomes. Too few people appreciate this insight, and even fewer seem to be deeply moved by it.

At the same time, I think there is a similarly massive urgency in ensuring that we do not end up with horrific human-controlled outcomes. And humanity’s current trajectory is unfortunately not all that reassuring with respect to either of these broad classes of risks. (To be clear, this is not to say that an s-risk outcome is the most likely outcome in any of these two classes of future scenarios, but merely that the current trajectory looks highly suboptimal and concerning with respect to both of them.)

The upshot for me is that there is a roughly equal moral urgency in avoiding each of these categories of worst-case risks, and as hinted earlier, it seems doubtful to me that pushing for an AI pause is the best way to reduce these risks overall.

From AI to distant probes

The aim of this post is to present a hypothetical future scenario that challenges some of our basic assumptions and intuitions about our place in the cosmos.


Hypothetical future scenario: Earth-descendant probes

Imagine a future scenario in which AI progress continues, and where the ruling powers on Earth eventually send out advanced AI-driven probes to explore other star systems. The ultimate motives of these future Earth rulers may be mysterious and difficult to grasp from our current vantage point, yet we can nevertheless understand that their motives — in this hypothetical scenario — include the exploration of life forms that might have emerged or will emerge elsewhere in the universe. (The fact that there are already projects aimed at sending out (much less advanced) probes to other star systems is arguably some evidence of the plausibility of this future scenario.)

Such exploration may be considered important by these future Earth rulers for a number of reasons, but a prominent reason they consider it important is that it helps inform their broader strategy for the long-term future. By studying the frequency and character of nascent life elsewhere, they can build a better picture of the long-run future of life in the universe. This includes gaining a better picture of where and when these Earth descendants might eventually encounter other species — or probes — that are as advanced as themselves, and not least what these other advanced species might be like in terms of their motives and their propensities toward conflict or cooperation.

The Earth-descendant probes will take an especially strong interest in life forms that are relatively close to matching their own, functionally optimized level of technological development. Why? First of all, they wish to ensure that the ascending civilizations do not come to match their own level of technological sophistication, which the Earth-descendant probes will eventually take steps to prevent so as to not lose their power and influence over the future.

Second, they will study ascending civilizations because what takes place at that late “sub-optimized” stage may be particularly informative for estimating the nature of the fully optimized civilizations that the Earth-descendant probes might encounter in the future (at least the late sub-optimized stage of development seems more informative than do earlier stages of life where comparatively less change happens over time).

From the point of view of these distant life forms, the Earth-descendant probes are almost never visible, and when they occasionally are, they appear altogether mysterious. After all, the probes represent a highly advanced form of technology that the distant life forms do not yet understand, much less master, and the potential motives behind the study protocols of these rarely appearing probes are likewise difficult to make sense of from the outside. Thus, the distant life forms are being studied by the Earth-descendant probes without having any clear sense of their zoo-like condition.

Back to Earth

Now, what is the point of this hypothetical scenario? One point I wish to make is that this is not an absurd or unthinkable scenario. There are, I submit, no fantastical or unbelievable steps involved here, and we can hardly rule out that some version of this scenario could play out in the future. This is obviously not to say that it is the most likely future scenario, but merely that something like this scenario seems fairly plausible provided that technological development continues and eventually expands into space (perhaps around 1 to 10 percent likely?).

But what if we now make just one (theoretically) small change to this scenario such that Earth is no longer the origin of the advanced probes in question, but instead one of the perhaps many planets that are being visited and studied by advanced probes that originated elsewhere in the universe? Essentially, we are changing nothing in the scenario above, except for swapping which exact planet Earth happens to be.

Given the structural equivalence of these respective scenarios, we should hardly consider the swapped scenario to be much less plausible. Sure, we know for a fact that life has arisen on Earth, and hence the projection that Earth-originating life might eventually give rise to advanced probes is not entirely speculative. Yet there is a countervailing consideration that suggests that — conditional on a scenario equivalent to the one described above occurring — Earth is unlikely to be the first planet to give rise to advanced space probes, and is instead more likely to be observed by probes from elsewhere. 

The reason is simply that Earth is but one planet, whereas there are many other planets from which probes could have been sent to study Earth. For example, in a scenario in which a single civilization creates advanced probes that eventually go out and explore, say, a thousand other planets with life at roughly our stage of development (observed at different points in time), we would have a 1 in 1,001 chance of being that first exploring civilization — and a 1,000 in 1,001 chance of being an observed one, under this assumed scenario.

Indeed, even if the exploring civilization in this kind of scenario only ever visits, say, two other planets with life at roughly our stage, we would still be more likely to be among the observed ones than that first observing one (2 in 3 versus 1 in 3). Thus, whatever probability we assign to the hypothetical future scenario in which Earth-descendant space probes observe other life forms at roughly our stage, we should arguably assign a greater probability to a scenario in which we are being observed by similar such probes.

Nevertheless, I think many of us will intuitively think just the opposite, namely that the scenario involving Earth-descendant probes observing others seems far more plausible than the scenario in which we are currently being observed by foreign probes. Indeed, many of us intuitively find the foreign-probes scenario to be quite ridiculous. (That is also largely the attitude that is expressed in leading scholarly books on the Fermi paradox, with scant justification.)

Yet this complete dismissal is difficult to square with the apparent plausibility — or at least the non-ridiculousness — of the “Earth-descendant probes observing others” scenario, as well as the seemingly greater plausibility of the foreign probe scenario compared to the “Earth-descendant probes observing others” scenario. There appears to be a breakdown of the transitivity of plausibility and ridiculousness at the level of our intuitions.

What explains this inconsistency?

I can only speculate on what explains this apparent inconsistency, but I suspect that various biases and cultural factors are part of the explanation.

For example, wishful thinking could well play a role: we may better like a scenario in which Earth’s descendants will be the most advanced species in the universe, compared to a scenario in which we are a relatively late-coming and feeble party without any unique influence over the future. This could in turn cause us to ignore or downplay any considerations that speak against our preferred beliefs. And, of course, apart from our relative feebleness, being observed by an apparently indifferent superpower that does not intervene to prevent even the most gratuitous suffering would seem like bad news as well.

Perhaps more significantly, there is the force of cultural sentiment and social stigma. Most of us have grown up in a culture that openly ridicules the idea of an extraterrestrial presence around Earth. Taking that idea seriously has effectively been just another way of saying that you are a dumb-dumb (or worse), and few of us want to be seen in that way. For the human mind, that is a pressure so strong that it can move continents, and even block mere open-mindedness.

Given the unreasonable effectiveness of such cultural forces in schooling our intuitions, many of us intuitively “just know” in our bones that the idea of an extraterrestrial presence around Earth is ridiculous, with little need to invoke actual cogent reasons.

To be clear, my point here is not that we should positively believe in such a foreign presence, but merely that we may need to revise our intuitive assessment of this possibility, or at least question whether our intuitions and our level of open-mindedness toward this possibility are truly well-grounded.

What might we infer about optimized futures?

It is plausible to assume that technology will keep on advancing along various dimensions until it hits fundamental physical limits. We may refer to futures that involve such maxed-out technological development as “optimized futures”.

My aim in this post is to explore what we might be able to infer about optimized futures. Most of all, my aim is to advance this as an important question that is worth exploring further.


Contents

  1. Optimized futures: End-state technologies in key domains
  2. Why optimized futures are plausible
  3. Why optimized futures are worth exploring
  4. What can we say about optimized futures?
    1. Humanity may be close to (at least some) end-state technologies
    2. Optimized civilizations may be highly interested in near-optimized civilizations
    3. Strong technological convergence across civilizations?
    4. If technology stabilizes at an optimum, what might change?
    5. Information that says something about other optimized civilizations as an extremely coveted resource?
  5. Practical implications?
    1. Prioritizing values and institutions rather than pushing for technological progress?
    2. More research
  6. Conclusion
  7. Acknowledgments

Optimized futures: End-state technologies in key domains

The defining feature of optimized futures is that they entail end-state technologies that cannot be further improved in various key domains. Some examples of these domains include computing power, data storage, speed of travel, maneuverability, materials technology, precision manufacturing, and so on.

Of course, there may be significant tradeoffs between optimization across these respective domains. Likewise, there could be forms of “ultimate optimization” that are only feasible at an impractical cost — say, at extreme energy levels. Yet these complications are not crucial in this context. What I mean by “optimized futures” are futures that involve practically optimal technologies within key domains (such as those listed above).

Why optimized futures are plausible

There are both theoretical and empirical reasons to think that optimized futures are plausible (by which I here mean that they are at least somewhat probable — perhaps more than 10 percent likely).

Theoretically, if the future contains advanced goal-driven agents, we should generally expect those agents to want to achieve their goals in the most efficient ways possible. This in turn predicts continual progress toward ever more efficient technologies, at least as long as such progress is cost-effective.

Empirically, we have an extensive record of goal-oriented agents trying to improve their technology so as to better achieve their aims. Humanity has gone from having virtually no technology to creating a modern society surrounded by advanced technologies of various kinds. And even in our modern age of advanced technology, we still observe persistent incentives and trends toward further improvements in many domains of technology — toward better computers, robots, energy technology, and so on.

It is worth noting that the technological progress we have observed throughout human history has generally not been the product of some overarching collective plan that was deliberately aimed at technological progress. Instead, technological progress has in some sense been more robust than that, since even in the absence of any overarching plan, progress has happened as the result of ordinary demands and desires — for faster computers, faster and safer transportation, cheaper energy, etc.

This robustness is a further reason to think that optimized futures are plausible: even without any overarching plan aimed toward such a future, and even without any individual human necessarily wanting continued technological development leading to an optimized future, we might still be pulled in that direction all the same. And, of course, this point about plausibility applies to more than just humans: it applies to any set of agents who will be — or have been — structuring themselves in a sufficiently similar way so as to allow their everyday demands to push them toward continued technological development.

An objection against the plausibility of optimized futures is that there might be a lot of hidden potential for progress far beyond what our current understanding of physics seems to allow. However, such hidden potential would presumably be discovered eventually, and it seems probable that such hidden potential would likewise be exhausted at some point, even if it may happen later and at more extreme limits than we currently envision. That is, the broad claim that there will ultimately be some fundamental limits to technological development is not predicated on the more narrow claim that our current understanding of those limits is necessarily correct; the broader claim is robust to quite substantial extensions of currently envisioned limits. Indeed, the claim that there will be no fundamental limits to future technological development overall seems a stronger and less empirically grounded claim than does the claim that there will be such limits (cf. Lloyd, 2000; Krauss & Starkman, 2004).

Why optimized futures are worth exploring

The plausibility of optimized futures is one reason to explore them further, and arguably a sufficient reason in itself. Another reason is the scope of such futures: the futures that contain the largest numbers of sentient beings will most likely be optimized futures, suggesting that we have good reason to pay disproportionate attention to such futures, beyond what their degree of plausibility might suggest.

Optimized futures are also worth exploring given that they seem to be a likely point of convergence for many different kinds of technological civilizations. For example, an optimized future seems a plausible outcome of both human-controlled and AI-controlled Earth-originating civilizations, and it likewise seems a plausible outcome of advanced alien civilizations. Thus, a better understanding of optimized futures can potentially apply robustly to many different kinds of future scenarios.

An additional reason it is worth exploring optimized futures is that they overall seem quite neglected, especially given how plausible and consequential such futures appear to be. While some efforts have been made to clarify the physical limits of technology (see e.g. Sandberg, 1999; Lloyd, 2000; Krauss & Starkman, 2004), almost no work has been done on the likely trajectories and motives of civilizations with optimized technology, at least to my knowledge.

Lastly, the assumption of optimized technology is a rather strong constraint that might enable us to say quite a lot about futures that conform to that assumption, suggesting that this could be a fruitful perspective to adopt in our attempts to think about and predict the future.

What can we say about optimized futures?

The question of what we can say about optimized futures is a big one that deserves elaborate analysis. In this section, I will merely raise some preliminary points and speculative reflections.

Humanity may be close to (at least some) end-state technologies

One point that is worth highlighting is that a continuation of current rates of progress seems to imply that humanity could develop end-state technologies in information processing power within a few hundred years, perhaps 250 years at most (if current growth rates persist and assuming that our current understanding of the relevant physics is largely correct).

So at least in this important respect, and under the assumption of continued steady growth, humanity is surprisingly close to reaching an optimized future (cf. Lloyd, 2000).

Optimized civilizations may be highly interested in near-optimized civilizations

Such potential closeness to an optimized future could have significant implications in various ways. For example, if, hypothetically, there exists an older civilization that has already reached a state of optimized technology, any younger civilization that begins to approach optimized technologies within the same cosmic region would likely be of great interest to that older civilization.

One reason it might be of interest is that the optimized technologies of the younger civilization could potentially become competitive with the optimized technologies of the older civilization, and hence the older civilization may see a looming threat in the younger civilization’s advance toward such technologies. After all, since optimized technologies would represent a kind of upper bound of technological development, it is plausible that different instances of such technologies could be competitive with each other regardless of their origins.

Another reason the younger civilization might be of interest is that its trajectory could provide valuable information regarding the likely trajectories and goals of distant optimized civilizations that the older civilization may encounter in the future. (More on this point here.)

Taken together, these considerations suggest that if a given civilization is approaching optimized technology, and if there is an older civilization with optimized technology in its vicinity, this older civilization should take an increasing interest in this younger civilization so as to learn about it before the older civilization might have to permanently halt the development of the younger one.

Strong technological convergence across civilizations?

Another implication of optimized futures is that the technology of advanced civilizations across the universe might be remarkably convergent. Indeed, there are already many examples of convergent evolution in biology on Earth (e.g. eyes and large brains evolving several times independently). Likewise, many cases of convergence are found in cultural evolution in both early history (e.g. the independent emergence of farming, cities, and writing across the globe) as well as in recent history (e.g. independent discoveries in science and mathematics).

Yet the degree of convergence could well be even more pronounced in the case of the end-state technologies of advanced civilizations. After all, this is a case where highly advanced agents are bumping up against the same fundamental constraints, and the optimal engineering solutions in the face of these constraints will likely converge toward the same relatively narrow space of optimal designs — or at least toward the same narrow frontier of optimal designs given potential tradeoffs between different abilities.

In other words, the technologies of advanced civilizations might be far more similar and more firmly dictated by fundamental physical limits than we intuitively expect, especially given that we in our current world are used to seeing continually changing and improving technologies.

If technology stabilizes at an optimum, what might change?

The plausible convergence and stabilization of technological hardware also raises the interesting question of what, if anything, might change and vary in optimized futures.

This question can be understood in at least two distinct ways: what might change or vary across different optimized civilizations, and what might change over time within such civilizations? And note that prevalent change of the one kind need not imply prevalent change of the other kind. For example, it is conceivable that there might be great variation across civilizations, yet virtually no change in goals and values over time within civilizations (cf. “lock-in scenarios”).

Conversely, it is conceivable that goals and values change greatly over time within all optimized civilizations, yet such change could in principle still be convergent across civilizations, such that optimized civilizations tend to undergo roughly the same pattern of changes over time (though such convergence admittedly seems unlikely conditional on there being great changes over time in all optimized civilizations).

If we assume that technological hardware becomes roughly fixed, what might still change and vary — both over time and across different civilizations — includes the following (I am not claiming that this is an exhaustive list):

  • Space expansion: Civilizations might expand into space so as to acquire more resources; and civilizations may differ greatly in terms of how much space they manage to acquire.
  • More or different information: Knowledge may improve or differ over time and space; even if fundamental physics gets solved fairly quickly, there could still be knowledge to gain about, for example, how other civilizations tend to develop.
    • There would presumably also be optimization for information that is useful and actionable. After all, even a technologically optimized probe would still have limited memory, and hence there would be a need to fill this memory with the most relevant information given its tasks and storage capacity.
  • Different algorithms: The way in which information is structured, distributed, and processed might evolve and vary over time and across civilizations (though it is also conceivable that algorithms will ultimately converge toward a relatively narrow space of optima).
  • Different goals and values: As mentioned above, goals and values might change and vary, such as due to internal or external competition, or (perhaps less likely) through processes of reflection.

In other words, even if everyone has — or is — practically the same “iPhone End-State”, what is running on these iPhone End-States, and how many of them there are, may still vary greatly, both across civilizations and over time. And these distinct dimensions of variation could well become the main focus of optimized civilizations, plausibly becoming the main dimensions on which civilizations seek to develop and compete.

Note also that there may be conflicts between improvements along these respective dimensions. For example, perhaps the most aggressive forms of space expansion could undermine the goal of gaining useful information about how other civilizations tend to develop, and hence advanced civilizations might avoid or delay aggressive expansion if the information in question would be sufficiently valuable (cf. the “info gain motive”). Or perhaps aggressive expansion would pose serious risks at the level of a civilization’s internal coordination and control, thereby risking a drift in goals and values.

In general, it seems worth trying to understand what might be the most coveted resources and the most prioritized domains of development for civilizations with optimized technology. 

Information that says something about other optimized civilizations as an extremely coveted resource?

As hinted above, one of the key objectives of a civilization with optimized technology might be to learn, directly or indirectly, about other civilizations that it could encounter in the future. After all, if a civilization manages to both gain control of optimized technology and avoid destructive internal conflicts, the greatest threat to its apex status over time will likely be other civilizations with optimized technology. More generally, the main determinant of an optimized civilization’s success in achieving its goals — whether it can maintain an unrivaled apex status or not — could well be its ability to predict and interact gainfully with other optimized civilizations.

Thus, the most precious resource for any civilization with optimized technology might be information that can prepare this civilization for better exchanges with other optimized agents, whether those exchanges end up being cooperative, competitive, or outright aggressive. In particular, since the technology of optimized civilizations is likely to be highly convergent, the most interesting features to understand about other civilizations might be what kinds of institutions, values, decision procedures, and so on they end up adopting — the kinds of features that seem more contingent.

But again, I should stress that I mention these possibilities as speculative conjectures that seem worth exploring, not as confident predictions.

Practical implications?

In this section, I will briefly speculate on the implications of the prospect of optimized futures. Specifically, what might this prospect imply in terms of how we can best influence the future?

Prioritizing values and institutions rather than pushing for technological progress?

One implication is that there may be limited long-term payoffs in pushing for better technology per se, and that it might make more sense to prioritize the improvement of other factors, such as values and institutions. That is, if the future is in any case likely to be headed toward some technological optimum, and if the values and institutions (etc.) that will run this optimal technology are more contingent and “up for grabs”, then it arguably makes sense to prioritize those more contingent aspects.

To be clear, this is not to say that values and institutions will not also be subject to significant optimization pressures that push them in certain directions, but these pressures will plausibly still be weaker by comparison. After all, a wide range of values will imply a convergent incentive to create optimized technology, yet optimized technology seems compatible with a wide range of values and institutions. And it is not clear that there is a similarly strong pull toward some “optimized” set of values or institutions given optimized technology.

This perspective is arguably also supported by recent history. For example, we have seen technology improve greatly, with computing power heading in a clear upward direction over the past decades. Yet if we look at our values and institutions, it is much less clear whether they have moved in any particular direction over time, let alone an upward direction. Our values and institutions seem to have faced much less of a directional pressure compared to our technology.

More research

Perhaps one of the best things we can do to make better decisions with respect to optimized futures is to do research on such futures. The following are some broad questions that might be worth exploring:

  • What are the likely features and trajectories of optimized futures?
    • Are optimized futures likely to involve conflicts between different optimized civilizations?
    • Other things being equal, is a smaller or a larger number of optimized civilizations generally better for reducing risks of large-scale conflicts?
    • More broadly, is a smaller or larger number of optimized civilizations better for reducing future suffering?
  • What might the likely features and trajectories of optimized futures imply in terms of how we can best influence the future?
  • Are there some values or cooperation mechanisms that would be particularly beneficial to instill in optimized technology?
    • If so, what might they be, and how can we best work to ensure their (eventual) implementation?

Conclusion

The future might in some ways be more predictable than we imagine. I am not claiming to have drawn any clear or significant conclusions about how optimized futures are likely to unfold; I have mostly aired various conjectures. But I do think the question is valuable, and that it may provide a helpful lens for exploring how we can best impact the future.

Acknowledgments

Thanks to Tobias Baumann for helpful comments.

What does a future dominated by AI imply?

Among altruists working to reduce risks of bad outcomes due to AI, I sometimes get the impression that there is a rather quick step from the premise “the future will be dominated by AI” to a practical position that roughly holds that “technical AI safety research aimed at reducing risks associated with fast takeoff scenarios is the best way to prevent bad AI outcomes”.

I am not saying that this is the most common view among those who work to prevent bad outcomes due to AI. Nor am I saying that the practical position outlined above is necessarily an unreasonable one. But I think I have seen (something like) this sentiment assumed often enough for it to be worthy of a critique. My aim in this post is to argue that there are many other practical positions that one could reasonably adopt based on that same starting premise.


Contents

  1. “A future dominated by AI” can mean many things
    1. “AI” can mean many things
    2. “Dominated by” can mean many things
    3. Combinations of many things
  2. Future AI dominance does not imply fast AI development
  3. Fast AI development does not imply concentrated AI development
  4. “A future dominated by AI” does not mean that either “technical AI safety” or “AI governance” is most promising
  5. Concluding clarification

“A future dominated by AI” can mean many things

“AI” can mean many things

It is worth noting that the premise that “the future will be dominated by AI” covers a wide range of scenarios. After all, it covers scenarios in which advanced machine learning software is in power; scenarios in which brain emulations are in power; as well as scenarios in which humans stay in power while gradually updating their brains with gene technologies, brain implants, nanobots, etc., such that their intelligence would eventually be considered (mostly) artificial intelligence by our standards. And there are surely more categories of AI than just the three broad ones outlined above.

“Dominated by” can mean many things

The words “in power” and “dominated by” can likewise mean many different things. For example, they could mean anything from “mostly in power” and “mostly dominated by” to “absolutely in power” and “absolutely dominated by”. And these respective terms cover a surprisingly wide spectrum.

After all, a government in a democratic society could reasonably be claimed to be “mostly in power” in that society, and a future AI system that is given similar levels of power could likewise be said to be “mostly in power” in the society it governs. By contrast, even the government of North Korea falls considerably short of being “absolutely in power” on a strong definition of that term, which hints at the wide spectrum of meanings covered by the general term “in power”.

Note that the contrast above actually hints at two distinct (though related) dimensions on which different meanings of “in power” can vary. One has to do with the level of power — i.e. whether one has more or less of it — while the other has to do with how the power is exercised, e.g. whether it is democratic or totalitarian in nature.

Thus, “a future society with AI in power” could mean a future in which AI possesses most of the power in a democratically elected government, or it could mean a future in which AI possesses total power with no bounds except the limits of physics.

Combinations of many things

Lastly, we can make a combinatorial extension of the points made above. That is, we should be aware that “a future dominated by AI” could — and is perhaps likely to — combine different kinds of AI. For instance, one could imagine futures that contain significant numbers of AIs from each of the three broad categories of AI mentioned above.

Additionally, these AIs could exercise power in distinct ways and in varying degrees across different parts of the world. For example, some parts of the world might make decisions in ways that resemble modern democratic processes, with power distributed among many actors, while other parts of the world might make decisions in ways that resemble autocratic decision procedures.

Such a diversity of power structures and decision procedures may be especially likely in scenarios that involve large-scale space expansion, since different parts of the world would then eventually be causally disconnected, and since a larger volume of AI systems presumably renders greater variation more likely in general.

These points hint at the truly vast space of possible futures covered by a term such as “a future dominated by AI”.

Future AI dominance does not imply fast AI development

Another conceptual point is that “a future dominated by AI” does not imply that technological or social progress toward such a future will happen soon or that it will occur suddenly. Furthermore, I think one could reasonably argue that such an imminent or sudden change is quite unlikely (though it obviously becomes more likely the broader our conception of “a future dominated by AI” is).

An elaborate justification for my low credence in such sudden change is beyond the scope of this post, though I can at least note that part of the reason for my skepticism is that I think trends and projections in both computer hardware and economic growth speak against such rapid future change. (For more reasons to be skeptical, see Reflections on Intelligence and “A Contra AI FOOM Reading List”.)

A future dominated by AI could emerge through a very gradual process that occurs over many decades or even hundreds of years (conditional on it ever happening). And AI scenarios involving such gradual development could well be both highly likely and highly consequential.

An objection against focusing on such slow-growth scenarios might be that scenarios involving rapid change have higher stakes, and hence they are more worth prioritizing. But it is not clear to me why this should be the case. As I have noted elsewhere, a so-called value lock-in could also happen in a slow-growth scenario, and the probability of success — and of avoiding accidental harm — may well be higher in slow-growth scenarios (cf. “Which World Gets Saved”).

The upshot could thus be the very opposite, namely that it is ultimately more promising to focus on scenarios with relatively steady growth in AI capabilities and power. (I am not claiming that this focus is in fact more promising; my point is simply that it is not obvious and that there are good reasons to question a strong focus on fast-growth scenarios.)

Fast AI development does not imply concentrated AI development

Likewise, even if we grant that the pace of AI development will increase rapidly, it does not follow that this growth will be concentrated in a single (or a few) AI system(s), as opposed to being widely distributed, akin to an entire economy of machines that grow fast together. This issue of centralized versus distributed growth was in fact the main point of contention in the Hanson-Yudkowsky FOOM debate; and I agree with Hanson that distributed growth is considerably more likely.

Similar to the argument outlined in the previous section, one could argue that there is a wager to focus on scenarios that entail highly concentrated growth over those that involve highly distributed growth, even if the latter may be more likely. Perhaps the main argument in favor of this view is that it seems that our impact can be much greater if we manage to influence a single system that will eventually gain power compared to if our influence is dispersed across countless systems.

Yet I think there are good reasons to doubt that argument. One reason is that the strategy of influencing such a single AI system may require us to identify that system in advance, which might be a difficult bet that we could easily get wrong. In other words, our expected influence may be greatly reduced by the risk that we are wrong about which systems are most likely to gain power. Moreover, there might be similar and ultimately more promising levers for “concentrated influence” in scenarios that involve more distributed growth and power. Such levers may include formal institutions and societal values, both of which could exert a significant influence on the decisions of a large number of agents simultaneously — by affecting the norms, laws, and social equilibria under which they interact.

“A future dominated by AI” does not mean that either “technical AI safety” or “AI governance” is most promising

Another impression I have is that we sometimes tacitly assume that work on “avoiding bad AI outcomes” will fall either in the categories of “technical AI safety” or “AI governance”, or at least that it will mostly fall within these categories. But I do not think that this is the case, partly for the reasons alluded to above.

In particular, it seems to me that we sometimes assume that the aim of influencing “AI outcomes” is necessarily best pursued in ways that pertain quite directly to AI today. Yet why should we assume this to be the case? After all, it seems that there are many plausible alternatives.

For example, one could think that it is generally better to pursue broad investments so as to build flexible resources that make us better able to tackle these problems down the line — e.g. investments toward general movement building and toward increasing the amount of money that we will be able to spend later, when we might be better informed and have better opportunities to pursue direct work.

A complementary option is to focus on the broader contextual factors hinted at in the previous section. That is, rather than focusing primarily on the design of the AI systems themselves, or on the laws that directly govern their development, one may focus on influencing the wider context in which they will be developed and deployed — e.g. general values, institutions, diplomatic relations, collective knowledge and wisdom, etc. After all, the broader context in which AI systems will be developed and put into action could well prove critical to the outcomes that future AI systems will eventually create.

Note that I am by no means saying that work on technical AI safety or AI governance is not worth pursuing. My point is merely that these other strategies focused on building flexible resources and influencing broader contextual factors should not be overlooked as ways to influence “a future dominated by AI”. Indeed, I believe that these strategies are among the most promising ways in which we can have a beneficial such influence at this point.

Concluding clarification

On a final note, I should clarify that the main conceptual points I have been trying to make in this post likely do not contradict the explicitly endorsed views of anyone who works to reduce risks from AI. The objects of my concern are more (what I perceive to be) certain implicit models and commonly employed terminologies that I worry may distort how we think and talk about these issues.

Specifically, it seems to me that there might be a sort of collective availability heuristic at work, through which we continually boost the salience of a particular AI narrative — or a certain class of AI scenarios — along with a certain terminology that has come to be associated with that narrative (e.g. ‘AI takeoff’, ‘transformative AI’, etc). Yet if we change our assumptions a bit, or replace the most salient narrative with another plausible one, we might find that this terminology does not necessarily make a lot of sense anymore. We might find that our typical ways of thinking about AI outcomes may be resting on a lot of implicit assumptions that are more questionable and more narrow than we tend to realize.

Some reasons not to expect a growth explosion

Many people expect global economic growth to accelerate in the future, with growth rates that are not just significantly higher than those of today, but orders of magnitude higher.

The following are some of the main reasons I do not consider a growth explosion to be the most likely future outcome.


Contents

  1. Most economists do not expect a growth explosion
  2. Rates of innovation and progress in science have slowed down
  3. Moore’s law is coming to an end
  4. The growth of supercomputers has been slowing down for years
  5. Many of our technologies cannot get orders of magnitude more efficient
  6. Three objections in brief

Most economists do not expect a growth explosion

Estimates of the future of economic growth from economists themselves generally predict a continual decline in growth rates. For instance, one “review of publicly available projections of GDP per capita over long time horizons” concluded that growth will most likely continue to decline in most countries in the coming decades. A similar report from PWC came up with similar projections.

Some accessible books that explore economic growth in the past and explain why it is reasonable to expect stagnant growth rates in the future include Robert J. Gordon’s Rise and Fall of American Growth (short version) and Tyler Cowen’s The Great Stagnation (synopsis).

It is true that there are some economists who expect growth rates to be several orders of magnitude higher in the future, but these are generally outliers. Robin Hanson suggests that such a growth explosion is likely in his book The Age of Em, which, to give some context, fellow economist Bryan Caplan calls “the single craziest claim” of the book. Caplan further writes that Hanson’s arguments for such growth expectations were “astoundingly weak”.

The point here is not that the general opinion of economists is by any means a decisive reason to reject a growth explosion (as the most likely outcome). The point is merely that it represents a significant reason to doubt an imminent growth explosion, and that it is not in fact those who doubt a rapid rise in growth rates who are the consensus-defying contrarians (and in terms of imminence, it is worth noting that even Robin Hanson does not expect a growth explosion within the next couple of decades).

Rates of innovation and progress in science have slowed down

See Bloom et al.’s Are Ideas Getting Harder to Find? and Cowen & Southwood’s Is the rate of scientific progress slowing down? A couple of graphs from the latter:

Moore’s law is coming to an end

One of the main reasons to expect a growth acceleration in the future is the promise of information technology. And economists, including Gordon and Cowen mentioned above, indeed agree that information technology has been a key driver of the growth we have seen in recent decades. But the problem is that we have strong theoretical reasons to expect the underlying trend that has been driving most progress in information technology since the 1960s — i.e. Moore’s law — will be coming to an end within the next few years.

And while it may be that other hardware paradigms will replace silicon chips as we know them, and continue the by now familiar growth in information technology, we must admit that it is quite unclear whether this will happen, especially since we are already lacking noticeably behind this trend line.

One may object that this is just a matter of hardware, and that the real growth in information technology lies in software. But a problem with this claim is that, empirically, growth in software seems largely determined by growth in hardware.

The growth of supercomputers has been slowing down for years

Developments of the performance of the 500 fastest supercomputers in the world conform well to the pattern we should expect given that we are nearing the end of Moore’s law:


The 500th fastest supercomputer in the world was on a clear exponential trajectory from the early 1990s to 2010, after which growth in performance has been steadily declining. Roughly the same holds true of both the fastest supercomputer and the sum of the 500 fastest supercomputers: a clear exponential trajectory from the early 1990s to around 2013, after which the performance has been diverging ever further from the previous trajectory, in fact so much so that the performance of the sum of the 500 fastest supercomputers is now below the performance we should expect the single fastest supercomputer to have today based on 1993-2013 extrapolation.

Many of our technologies cannot get orders of magnitude more efficient

This point is perhaps most elaborately explored in Robert J. Gordon’s book mentioned above: it seems that we have already reaped much of the low-hanging fruit in terms of technological innovation, and in some respects it is impossible to improve things much further.

Energy efficiency is an obvious example, as many of our machines and energy harvesting technologies have already reached a significant fraction of the maximally possible efficiency. For instance, electric pumps and motors tend to have around 90 percent energy efficiency, while the efficiency of the best solar panels are above 40 percent. Many of our technologies thus cannot be made orders of magnitude more efficient, and many of them can at most be marginally improved, simply because they have reached the ceiling of hard physical limits.

Three objections in brief

#1. What about the exponential growth in the compute of the largest AI training runs from 2012-2018?

This is indeed a data point in the other direction. Note, however, that this growth does not appear to have continued after 2018. Moreover, much of this growth seems to have been unsustainable. For example, DeepMind lost more than a billion dollars in 2016-2018, with the loss getting greater each year: “$154 million in 2016, $341 million in 2017, $572 million in 2018”. And the loss was apparently even greater in 2019.

#2. What about the Open Philanthropy post in which David Roodman presented a diffusion model of future growth that predicted much higher growth rates?

I think that model overlooks most of the points made above. Second, I think the following figure from Roodman’s article is a strong indication about the fit of the model, particularly how the growth rates in 1600-1970 are virtually all in the high percentiles of the model, while the growth rates in 1980-2019 are all in the low percentiles, and generally in a lower percentile as time progresses. That is a strong sign that the model does not capture our actual trajectory, and that the fit is getting worse as time progresses.

BernouDiffPredGWP12KDecBlog.png

#3. We have a wager to give much more weight to high-growth scenarios.

First, I think it is questionable that scenarios with higher growth rates merit greater priority (e.g. a so-called value lock-in could also emerge in slow-growth scenarios, and it may be more feasible to influence slow-growth scenarios because they give us more time to acquire the requisite insights and resources to exert a significant and robustly positive influence). And it is less clear still that scenarios with higher growth merit much greater priority than scenarios with lower growth rates. But even if we grant that high-growth scenarios do merit greater priority, this should not change the bare epistemic credence we assign different scenarios. Our descriptive picture should not be distorted by such priority claims.

Effective altruism and common sense

Thomas Sowell once called Milton Friedman “one of those rare thinkers who had both genius and common sense”.

I am not here interested in Sowell’s claim about Friedman, but rather in his insight into the tension between abstract smarts and common sense, and particularly how it applies to the effective altruism (EA) community. For it seems to me that there sometimes is an unbalanced ratio of clever abstractions to common sense in EA discussions.

To be clear, my point is not that abstract ideas are unimportant, or even that everyday common sense should generally be favored over abstract ideas. After all, many of the core ideas of effective altruism are highly abstract in nature, such as impartiality and the importance of numbers, and I believe we are right to stand by these ideas. But my point is that common sense is underutilized as a sanity check that can prevent our abstractions from floating into the clouds. More generally, I seem to observe a tendency to make certain assumptions, and to do a lot of clever analysis and deductions based on those assumptions, but without spending anywhere near as much energy exploring the plausibility of these assumptions themselves.

Below are three examples that I think follow this pattern.

Boltzmann brains

A highly abstract idea that is admittedly intriguing to ponder is that of a Boltzmann brain: a hypothetical conscious brain that arises as the product of random quantum fluctuations. Boltzmann brains are a trivial corollary given certain assumptions: let some basic combinatorial assumptions hold for a set amount of time, and we can conclude that a lot of Boltzmann brains must exist in this span of time (at least as a matter of statistical certainty, similar to how we can derive and be certain of the second law of thermodynamics).

But this does not mean that Boltzmann brains are in fact possible, as the underlying assumptions may well be false. Beyond the obvious possibility that the lifetime of the universe could be too short, it is also conceivable that the combinatorial assumptions that allow a functioning 310 K human brain to emerge in ~ 0 K empty space do not in fact obtain, e.g. because it falsely assumes a combinatorial independence concerning the fluctuations that happen in each neighboring “bit” of the universe (or for some other reason). If any such key assumption is false, it could be that the emergence of a 310 K human brain in ~ 0 K space is not in fact allowed by the laws of physics, even in principle, meaning that even an infinite amount of time would never spontaneously produce a 310 K human Boltzmann brain.

Note that I am not claiming that Boltzmann brains cannot emerge in ~ 0 K space. My claim is simply that there is a big step from abstract assumptions to actual reality, and there is considerable uncertainty about whether the starting assumptions in question can indeed survive that step.

Quantum immortality

Another example is the notion of quantum immortality — not in the sense of merely surviving an attempted quantum suicide for improbably long, but in the sense of literal immortality because a tiny fraction of Everett branches continue to support a conscious survivor indefinitely.

This is a case where I think skeptical common sense and a search for erroneous assumptions is essential. Specifically, even granting a picture in which, say, a victim of a serious accident survives for a markedly longer time in one branch than in another, there are still strong reasons to doubt that there will be any branches in which the victim will survive for long. Specifically, we have good reason to believe that the measure of branches in which the victim survives will converge rapidly toward zero.

An objection might be that the measure indeed will converge toward zero, but that it never actually reaches zero, and hence there will in fact always be a tiny fraction of branches in which the victim survives. Yet I believe this rests on a false assumption. Our understanding of physics suggests that there is only — and could only be — a finite number of distinct branches, meaning that even if the measure of branches in which the victim survives is approximated well by a continuous function that never exactly reaches zero, the critical threshold that corresponds to a zero measure of actual branches with a surviving victim will in fact be reached, and probably rather quickly.

Of course, one may argue that we should still assign some probability to quantum immortality being possible, and that this possibility is still highly relevant in expectation. But I think there are many risks that are much less Pascallian and far more worthy of our attention.

Intelligence explosion

Unlike the two previous examples, this last example has become quite an influential idea in EA: the notion of a fast and local “intelligence explosion“.

I will not here restate my lengthy critiques of the plausibility of this notion (or the critiques advanced by others). And to be clear, I do not think the effective altruism community is at all wrong to have a strong focus on AI. But the mistake I think I do see is that there are many abstractly grounded assumptions pertaining to a hypothetical intelligence explosion that have received an insufficient amount of scrutiny from common sense and empirical data (Garfinkel, 2018 argues along similar lines).

I think part of the problem stems from the fact that Nick Bostrom’s book Superintelligence framed the future of AI in a certain way. Here, for instance, is how Bostrom frames the issue in the conclusion of his book (p. 319):

Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. … We have little idea when the detonation will occur, though if we hold the device to our ear we can hear a faint ticking sound. … Some little idiot is bound to press the ignite button just to see what happens.

I realize Bostrom is employing a metaphor here, and I realize that he assigns a substantial credence to many different future scenarios. But the way his book is framed is nonetheless mostly in terms of such a metaphorical bomb that could ignite an intelligence explosion (i.e. FOOM). And it seems that this kind of scenario in effect became the standard scenario many people assumed and worked on, with comparatively little effort going into the more fundamental question of how plausible this future scenario is in the first place. An abstract argument about (a rather vague notion of) “intelligence” recursively improving itself was given much weight, and much clever analysis focusing on this FOOM picture and its canonical problems followed.

Again, my claim here is not that this picture is wrong or implausible, but rather that the more fundamental questions about the nature and future of “intelligence” should be kept more alive, and that our approach to these questions should be more informed by empirical data, lest we misprioritize our resources.


In sum, our fondness for abstractions is plausibly a bias we need to control for. We can do this by applying common-sense heuristics to a greater extent, by spending more time considering how our abstract models might be wrong, and by making a greater effort to hold our assumptions up against empirical reality.

Two biases relevant to expected AI scenarios

My aim in this essay is to briefly review two plausible biases in relation to our expectations of future AI scenarios. In particular, these are biases that I think risk increasing our estimates of the probability of a local, so-called FOOM takeoff.

An important point to clarify from the outset is that these biases, if indeed real, do not in themselves represent reasons to simply dismiss FOOM scenarios. It would clearly be a mistake to think so. But they do, I submit, constitute reasons to be somewhat more skeptical of them, and to re-examine our beliefs regarding FOOM scenarios. (Stronger, more direct reasons to doubt FOOM have been reviewed elsewhere.)

Egalitarian intuitions looking for upstarts

The first putative bias has its roots in our egalitarian origins. As Christopher Boehm argues in his Hierarchy in the Forrest, we humans evolved in egalitarian tribes in which we created reverse dominance hierarchies to prevent domineering individuals from taking over. Boehm thus suggests that our minds are built to be acutely aware of the potential for any individual to rise and take over, perhaps even to the extent that we have specialized modules whose main task is to be attuned to this risk.

Western “Great Man” intuitions

The second putative bias is much more culturally contingent, and should be expected to be most pronounced in Western (“WEIRD“) minds. As Joe Henrich shows in his book The WEIRDest People in the World, Western minds are uniquely focused on individuals, so much so that their entire way of thinking about the world tends to revolve around individuals and individual properties (as opposed to thinking in terms of collectives and networks, which is more common among East Asian cultures).

The problem is that this Western, individualist mode of thinking, when applied straightforwardly to the dynamics of large-scale societies, is quite wrong. For while it may be mnemonically pragmatic to recount history, including the history of ideas and technology, in terms of individual actions and decisions, the truth is usually far more complex than this individualist narrative lets on. As Henrich argues, innovation is largely the product of large-scale systemic factors (such as the degree of connectedness between people), and these factors are usually far more important than is any individual, suggesting that Westerners tend to strongly overestimate the role that single individuals play in innovation and history more generally. Henrich thus alleges that the Western way of thinking about innovation reflects an “individualism bias” of sorts, and further notes that:

thinking about individuals and focusing on them as having dispositions and kind of always evaluating everybody [in terms of which] attributes they have … leads us to what’s called “the myth of the heroic inventor”, and that’s the idea that the great advances in technology and innovation are the products of individual minds that kind of just burst forth and give us these wonderful inventions. But if you look at the history of innovation, what you’ll find time after time was that there was lucky recombinations, people often invent stuff at the same time, and each individual only makes a small increment to a much larger, longer process.

In other words, innovation is the product of numerous small and piecemeal contributions to a much greater extent than Western “Great Man” storytelling suggests. (Of course, none of this is to say that individuals are unimportant, but merely that Westerners seem likely to vastly overestimate the influence that single individuals have on history and innovation.)

Upshot

If we have mental modules specialized to look for individuals that accumulate power and take control, and if we have expectations that roughly conform to this pattern in the context of future technology, with one individual entity innovating its way to a takeover, it seems that we should at least wonder whether this expectation may derive partly from our forager-age intuitions rather than resting purely on solid epistemics. Especially when this view of the future seems in strong tension with our actual understanding of innovation. This understanding being that innovation — contra Western intuition — is distributed, with increases in abilities generally the product of countless “small” insights and tools rather than a few big ones.

Both of the tendencies listed above lead us (or in the second case, mostly Westerners) to focus on individual agents rather than larger, systemic issues that may be crucial to future outcomes, yet which are less intuitively appealing for us to focus on. And there may well be more general explanations for this lack of appeal than just the two reasons listed above. The fact that there were no large-scale systemic issues of any kind for almost all of our species’ history renders it unsurprising that we are not particularly prone to focus on such issues (except for local signaling purposes).

Perhaps we need to control for this, and try to look more toward systemic issues than we are intuitively inclined to do. After all, the claim that the future will be dominated by AI systems in some form need not imply that the best way to influence that future is to focus on individual AI systems, as opposed to broader, institutional issues.

When Machines Improve Machines

The following is an excerpt from my book Reflections on Intelligence (2016/2024).


The term “Artificial General Intelligence” (AGI) refers to a machine that can perform any cognitive task at least as well as any human. This is often considered the holy grail of artificial intelligence research. It is also what many believe will give rise to an “intelligence explosion”, as machines will then be able to take over the design of smarter machines, and hence their further development will no longer be held back by the slowness of humans.

A Radical Shift?

Luke Muehlhauser and Anna Salamon describe the transition toward machines designing machines in the following way:

Once human programmers build an AI with a better-than-human capacity for AI design, the instrumental goal for self-improvement may motivate a positive feedback loop of self-enhancement. Now when the machine intelligence improves itself, it improves the intelligence that does the improving. (Muehlhauser & Salamon, 2012, p. 13)

While this might seem like a radical shift, software engineer Ramez Naam has argued that it is less radical than we might think, since we already use our latest technology to improve on itself and build the next generation of technology (Naam, 2010). As noted in the previous chapter, the way new tools are built and improved is by means of an enormous conglomerate of tools, and newly developed tools tend to become an addition to this existing set of tools. In Naam’s words:

[A] common assertion is that the advent of greater-than-human intelligence will herald The Singularity. These super intelligences will be able to advance science and technology faster than unaugmented humans can. They’ll be able to understand things that baseline humans can’t. And perhaps most importantly, they’ll be able to use their superior intellectual powers to improve on themselves, leading to an upward spiral of self improvement with faster and faster cycles each time.

In reality, we already have greater-than-human intelligences. They’re all around us. And indeed, they drive forward the frontiers of science and technology in ways that unaugmented individual humans can’t.

These superhuman intelligences are the distributed intelligences formed of humans, collaborating with one another, often via electronic means, and almost invariably with support from software systems and vast online repositories of knowledge. (Naam, 2010)

The design and construction of new machines is not the product of human ingenuity alone, but instead the product of a large system of advanced tools in which human ingenuity is just one component, albeit a component that plays many roles. Moreover, as Naam hints, superhuman intellectual abilities already play a crucial role in this design process. For example, computer programs make illustrations and calculations that no human could possibly make, and these have become indispensable components in the design of new tools in virtually all technological domains. In this way, superhuman intellectual abilities are already a significant part of the process of building superhuman intellectual abilities. This has led to continued growth, yet hardly an abrupt intelligence explosion.

Naam gives a specific example of an existing self-improving “superintelligence” (i.e. a super goal achiever), namely Intel:

Intel employs giant teams of humans and computers to design the next generation of its microprocessors. Faster chips mean that the computers it uses in the design become more powerful. More powerful computers mean that Intel can do more sophisticated simulations, that its CAD (computer aided design) software can take more of the burden off of the many hundreds of humans working on each chip design, and so on. There’s a direct feedback loop between Intel’s output and its own capabilities. …

Self-improving superintelligences have changed our lives tremendously, of course. But they don’t seem to have spiraled into a hard takeoff towards “singularity”. On a percentage basis, Google’s growth in revenue, in employees, and in servers have all slowed over time. It’s still a rapidly growing company, but that growth rate is slowly decelerating, not accelerating. The same is true of Intel and of the bulk of tech companies that have achieved a reasonable size. Larger typically means slower growing.

My point here is that neither superintelligence nor the ability to improve or augment oneself always lead to runaway growth. Positive feedback loops are a tremendously powerful force, but in nature (and here I’m liberally including corporate structures and the worldwide market economy in general as part of ‘nature’) negative feedback loops come into play as well, and tend to put brakes on growth. (Naam, 2010)

I quote Naam at length here because he makes this important point well, and because he is an expert with experience in the pursuit of using technology to make better technology. In addition to Naam’s point about Intel and other large tech companies that effectively improve themselves, I would add that although such mega-companies are highly competent collectives, they still only constitute a tiny part of the larger collective system that is the world economy, which they each contribute modestly to, and which they are entirely dependent upon.

A Familiar Dynamic

It has always been the latest, most advanced tools that, combined with the already existing set of tools, have collaborated to build the latest, most advanced tools. The expected “machines building machines” revolution is therefore not as revolutionary as it might seem at first sight. Strong versions of the “once machines can program AI better than humans” argument seem to assume that human software engineers are by far the main bottleneck to progress in the construction of more competent machines, which is a questionable premise. But even if it were true, and if we suddenly had a million times as many agents working to create better software, other bottlenecks would soon emerge, such as hardware production and energy. Essentially, we would be returned to the task of advancing our entire economy, something that pretty much all humans and machines are participating in already, knowingly or not.

The question concerning whether “intelligence” can explode is therefore basically: can the economy explode? To which we can answer that rapid increases in the growth rate of the world economy certainly have occurred in the past, and some argue that this is likely to happen again in the future (Hanson 1998; 2016). However, recent trends in economic growth, as well as in hardware growth in particular, give us some reason to be skeptical of such a future growth explosion (see e.g. Vinding, 2021; 2022).


See also Chimps, Humans, and AI: A Deceptive Analogy.

Consciousness: Orthogonal or Crucial?

The following is an excerpt from my book Reflections on Intelligence (2016/2024).


A question that is often considered open, sometimes even irrelevant, when it comes to “AGIs” and “superintelligences” is whether such entities would be conscious. Here is Nick Bostrom expressing such a sentiment:

By a “superintelligence” we mean an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills. This definition leaves open how the superintelligence is implemented: it could be a digital computer, an ensemble of networked computers, cultured cortical tissue or what have you. It also leaves open whether the superintelligence is conscious and has subjective experiences. (Bostrom, 2012, “Definition of ‘superintelligence’”)

Yet this is hardly true. If a system is “more capable than the best human brains in practically every field, including scientific creativity, general wisdom, and social skills”, the question of consciousness is highly relevant. Consciousness is integral to much of what we do and excel at, and thus if an entity is not conscious, it cannot outperform the best humans “in practically every field”, especially not in “general wisdom” and “scientific creativity”. Let us look at these in turn.

General Wisdom

A core aspect of “general wisdom” is to be wise about ethical issues. Yet being wise about ethical issues requires that one can consider and evaluate questions like the following in an informed manner:

  • Is there anything about the experience of suffering that makes its reduction a moral priority
  • Does anything about the experience of suffering justify the claim that reducing suffering has greater moral priority than increasing happiness (for the already happy)?
  • Is there anything about states of extreme suffering that make their reduction an overriding moral priority?

It seems that one would have to be conscious in order to explore and answer such questions in an informed way. That is, one would have to know what such experiences are like in order to understand their experiential properties and significance. Knowing what a term like “suffering” refers to — i.e. knowing what actual experiences of suffering are like — is thus crucial for informed ethical reflection.

The same point holds true about other areas of philosophy that bear on wisdom, such as the philosophy of mind: without knowing what it is like to have a conscious mind, one cannot contribute much to the discussion about what it is like to have one and to the exploration of different modes of consciousness. Indeed, an unconscious entity has no genuine understanding about what the issue of consciousness is even about in the first place (Pearce, 2012a; 2012b).

So both in ethics and in the philosophy of mind, an unconscious entity would be less than clueless about many of the deepest questions at hand. If an entity not only fails to surpass humans in these areas, but fails to even have the slightest clue about what we are talking about, it hardly surpasses the best humans in practically every field. After all, questions about the phenomenology of consciousness are also relevant to many other fields, including psychology, epistemology, and ontology.

In short, experiencing and reasoning about consciousness is a key part of “human abilities”, and hence an entity that is unable to do this cannot be claimed to outperform humans in the most important, much less all, human abilities (see also Pearce, 2012a; 2012b).

Scientific Creativity

Another ability mentioned above that an unconscious entity could supposedly outdo humans at is scientific creativity. Yet scientific creativity must relate to all fields of knowledge, including the science of the conscious mind itself. This is also a part of the natural world, and a most relevant one at that.

Experiencing and accurately reporting what a given state of consciousness is like is essential for the science of mind, yet an unconscious entity obviously cannot do such a thing, as there is no experience it can report from. It cannot display any genuine scientific creativity, or even produce mere observations, in the direct exploration of consciousness.

Chimps, Humans, and AI: A Deceptive Analogy

The prospect of smarter-than-human artificial intelligence (AI) is often presented and thought of in terms of a simple analogy: AI will stand in relation to us the way we stand in relation to chimps. In other words, AI will be qualitatively more competent and powerful than us, and its actions will be as inscrutable to humans as current human endeavors (e.g. science and politics) are to chimps.

My aim in this essay is to show that this is in many ways a false analogy. The difference in understanding and technological competence found between modern humans and chimps is, in an important sense, a zero-to-one difference that cannot be repeated.


Contents

  1. How Are Humans Different from Chimps?
    1. I. Symbolic Language
    2. II. Cumulative Technological Innovation
  2. The Range of Human Abilities Is Surprisingly Wide
  3. Why This Is Relevant

How Are Humans Different from Chimps?

A common answer to this question is that humans are smarter. Specifically, at the level of our individual cognitive abilities, humans, with our roughly three times larger brains, are just far more capable.

This claim no doubt contains a large grain of truth, as humans surely do beat chimps in a wide range of cognitive tasks. Yet it is also false in some respects. For example, chimps have superior working memory compared to humans, and they can apparently also beat humans in certain video games, including games involving navigation in complex mazes.

Researchers who study human uniqueness provide some rather different, more specific answers to the question. If we focus on individual mental differences in particular, researchers have found that, crudely speaking, humans are different from chimps in three principal ways: 1) we can learn language, 2) we have a strong orientation toward social learning, and 3) we are highly cooperative (among our ingroup, compared to chimps).

These differences have in turn resulted in two qualitative differences in the abilities of humans and chimps in today’s world.

I. Symbolic Language

The first qualitative difference is that we humans have acquired an ability to think and communicate in terms of symbolic language that represents complex concepts. We can learn about the deep history of life and about the likely future of the universe, including the fundamental limits to space travel and future computations given our current understanding of physics. Any educated human can learn a good deal about these things whereas no chimp can.

Note how this is truly a zero-to-one difference: no symbolic language versus advanced symbolic language through which knowledge can be represented and continually developed (Deacon, 1997, ch. 1). It is the difference between having no science of physics versus having an extensive such science with which we can predict future events and estimate some hard limits on future possibilities.

In many respects, this zero-to-one difference cannot be repeated. Given that we already have physical models that predict, say, the future motion of planets and the solar system to a high degree of accuracy, the best one can do in this respect is to (slightly) improve the accuracy of these predictions. Such further improvements cannot be compared to going from zero conceptual physics to current physics.

The same point applies to our scientific understanding more generally: we currently have theories that work decently at explaining most of the phenomena around us. And while one can significantly improve the accuracy and sophistication of many of these theories, such further improvements will likely be less significant than the qualitative leap from absolutely no conceptual models to the entire collection of models and theories that we currently have.

For example, going from no understanding of evolution by natural selection to the elaborate understanding of biology we have today can hardly be matched, in terms of qualitative and revolutionary leaps, by further refinements in biology. We have already mapped out the core basics of biology, especially when it comes to the history of life on Earth, and this can only be done once.

The point that the emergence of conceptual understanding is a kind of zero-to-one step has been made by others. Robin Hanson has made essentially the same point in response to the notion that future machines will be “as incomprehensible to us as we are to goldfish”:


This seems to me to ignore our rich multi-dimensional understanding of intelligence elaborated in our sciences of mind (computer science, AI, cognitive science, neuroscience, animal behavior, etc.).

… the ability of one mind to understand the general nature of another mind would seem mainly to depend on whether that first mind can understand abstractly at all, and on the depth and richness of its knowledge about minds in general. Goldfish do not understand us mainly because they seem incapable of any abstract comprehension. …

It seems to me that human cognition is general enough, and our sciences of mind mature enough, that we can understand much about quite a diverse zoo of possible minds, many of them much more capable than ourselves on many dimensions.


Ramez Naam has argued similarly in relation to the idea that there will be some future time or intelligence that current humans are fundamentally unable to understand. He argues that our understanding of the future is growing rather than shrinking as time progresses, and that AI and other future technologies will not be beyond comprehension:


All of those [future technologies] are still governed by the laws of physics. We can describe and model them through the tools of economics, game theory, evolutionary theory, and information theory. It may be that at some point humans or our descendants will have transformed the entire solar system into a living information processing entity — a Matrioshka Brain. We may have even done the same with the other hundred billion stars in our galaxy, or perhaps even spread to other galaxies.

Surely that is a scale beyond our ability to understand? Not particularly. I can use math to describe to you the limits on such an object, how much computing it would be able to do for the lifetime of the star it surrounded. I can describe the limit on the computing done by networks of multiple Matrioshka Brains by coming back to physics, and pointing out that there is a guaranteed latency in communication between stars, determined by the speed of light. I can turn to game theory and evolutionary theory to tell you that there will most likely be competition between different information patterns within such a computing entity, as its resources (however vast) are finite, and I can describe to you some of the dynamics of that competition and the existence of evolution, co-evolution, parasites, symbiotes, and other patterns we know exist.


Chimps can hardly understand human politics and science to a similar extent. Thus, the truth is that there is a strong disanalogy between the understanding that chimps have of humans versus the understanding that we humans — thanks to our conceptual tools — can have of any possible future intelligence (in physical and computational terms, say).

Note that the qualitative leap reviewed above was not one that happened shortly after human ancestors diverged from chimp ancestors. Instead, it was a much more recent leap that has been unfolding gradually since the first humans appeared, and which has continued to accelerate in recent centuries, as we have developed ever more advanced science and mathematics. In other words, this qualitative step has been a product of cultural evolution just as much as biological evolution. Early humans presumably had a roughly similar potential to learn modern language, science, mathematics, and so on. But such conceptual tools could not be acquired in the absence of a surrounding culture able to teach these innovations.

Ramez Naam has made a similar point:


If there was ever a singularity in human history, it occurred when humans evolved complex symbolic reasoning, which enabled language and eventually mathematics and science. Homo sapiens before this point would have been totally incapable of understanding our lives today. We have a far greater ability to understand what might happen at some point 10 million years in the future than they would to understand what would happen a few tens of thousands of years in the future.


II. Cumulative Technological Innovation

The second zero-to-one difference between humans and chimps is that we humans build things and refine our technology over time. To be sure, many non-human animals use tools in the form of sticks and stones, and some even shape primitive tools of their own. But only humans improve and build upon the technological inventions of their ancestors.

Thus, humans are unique in expanding their abilities by systematically exploiting their environment, molding the things around them into increasingly productive self-extensions. We have turned wildlands into crop fields, we have created technologies that can harvest energy, and we have built external memories far more reliable than our own, such as books and hard disks.

This is another qualitative leap that cannot be repeated: the step from having absolutely no cumulative technology to exploiting and optimizing our external environment toward our own ends — the step from having no external memory to having the current repository of stored human knowledge at our fingertips, and from harvesting absolutely no energy (other than through individual digestion) to collectively harvesting and using hundreds of quintillions of Joules every year.

Of course, it is possible to improve on and expand these innovations. We can harvest greater amounts of energy, for example, and create even larger external memories. Yet these are merely quantitative differences, and humanity indeed continually makes such improvements each year. They are not zero-to-one differences that only a new species could bring about.

In sum, we are unique in being the first species that systematically sculpted our surrounding environment and turned it into ever-improving tools. This step cannot be repeated, only expanded further.

Just like the qualitative leap in our symbolic reasoning skills, the qualitative leap in our ability to create technology and shape our environment emerged not between chimps and early humans, but between early humans and today’s humans, as the result of a cultural process occurring over thousands of years. In fact, the two leaps have been closely related: our ability to reason and communicate symbolically has enabled us to create cumulative technological innovation. Conversely, our technologies have allowed us to refine our knowledge and conceptual tools (e.g. via books, telescopes, and particle accelerators); and such improved knowledge has in turn made us able to build even better technologies with which we could advance our knowledge even further, and so on.

This, in a nutshell, is the story of the interdependent growth of human knowledge and technology, a story of recursive self-improvement (Simler, 2019, “On scientific networks”). It is not really a story about the individual human brain per se. After all, the human brain does not accomplish much in isolation. It is more a story about what happened between and around brains: in the exchange of information in networks of brains and in the external creations designed by them — a story made possible by the fact that the human brain is unique in being by far the most cultural brain of all, with its singular capacity to learn from and cooperate with others.

The Range of Human Abilities Is Surprisingly Wide

Another way in which an analogy to chimps is often drawn is by imagining an intelligence scale along which different species are ranked, such that, for example, we have “rats at 30, chimps at 60, the village idiot at 90, the average human at 98, and Einstein at 100”, and where future AI may in turn be ranked many hundreds of points higher than Einstein. According to this picture, it is not just that humans will stand in relation to AI the way chimps stand in relation to humans, but that AI will be far superior still. The human-chimp analogy is, on this view, a severe understatement of the difference between humans and future AI.

Such an intelligence scale may seem intuitively compelling, but how does it correspond to reality? One way to probe this question is to examine the range of human abilities in chess (as but one example that may provide some perspective; it obviously does not represent the full picture by any means).

The standard way to rank chess skills is with the Elo rating system, which is a good predictor of the outcomes of chess games between different players, whether human, digital, or otherwise. An early human beginner will have a rating around 300, a novice around 800, and a rating in the range 2000-2199 is ranked as “Expert”. The highest rating ever achieved is 2882 by Magnus Carlsen.

How large is this range of chess skills in an absolute sense? Remarkably large, it turns out. For example, it took more than four decades from when computers were first able to beat a human chess novice (the 1950s), until a computer was able to beat the best human player (1997, officially). In other words, the span from novice to Kasparov corresponded to more than four decades of progress in both software and hardware, with the hardware progress amounting to a million times more computing power. This alone suggests that the human range of chess skills is rather wide.

Yet the range seems even broader when we consider the upper bounds of chess performance. After all, the fact that it took computers decades to go from human novice to world champion does not mean that the best human is not still far from the best a computer could be in theory. Surprisingly, however, this latter distance does in fact seem quite small. Estimates suggest that the best possible chess machine would have an Elo rating around 3600.

This would mean that the relative distance between the best possible computer and the best human is only around 700 Elo points (the Elo rating is essentially a measure of relative distance; 700 Elo points corresponds to a winning percentage of around 1.5 percent for the losing player).

Thus, the distance between the best human and a chess “Expert” appears similar to the distance between the best human and the best possible chess brain, while the distance between a human beginner and the best human is far greater (2500 Elo points). This stands in stark contrast to the intelligence scale outlined above, which would predict the complete opposite: the distance from a human novice to the best human should be comparatively small whereas the distance from the best human to the optimal brain should be the larger one by far.

Of course, chess is a limited game that by no means reflects all relevant tasks and abilities. Even so, the wide range of human abilities in chess still serves to question popular claims about the supposed narrowness of the human range of ability.

Why This Is Relevant

The errors of the human-chimp analogy are worth highlighting for a few reasons. First, the analogy may lead us to underestimate how much we currently know and are able to understand. To think that intelligent systems of the future will be as incomprehensible to us today as human affairs are to chimps is to underestimate how extensive and universal our current knowledge of the world in fact is — not just when it comes to physical and computational principles, but also in relation to general economic and game-theoretic principles. For example, we know a good deal about economic growth, and this knowledge has a lot to say about how we should expect future intelligent systems to grow. In particular, it suggests that a sudden local AI takeoff scenario (AI-FOOM growth) is unlikely.

The analogy can thus have an insidious influence by making us feel like current theories and trends cannot be trusted much, because look how different humans are from chimps, and look how puny the human brain is compared to ultimate limits. I think this is exactly the wrong way to think about the future. I believe we have good reasons to base our expectations on our best available theories and on a deep study of past trends, including the actual evolution of human competences — not on simple analogies.

Relatedly, the human-chimp analogy is also relevant in that it can lead us to greatly overestimate the probability of a localized AI takeoff scenario. That is, if we get the story about the evolution of human competences so wrong that we think the differences we observe today between chimps and humans reduce chiefly to a story about changes in individual brains — as opposed to a much broader story about biological, cultural, and technological developments — then we are likely to have similarly inaccurate expectations about what comparable “brain innovations” in some individual machine would lead to on their own.

If the human-chimp analogy causes us to overestimate the probability of a localized AI takeoff scenario, it may nudge us toward focusing too much on some single, concentrated future thing that we expect to be all-important: the AI that suddenly becomes qualitatively more competent than humans. In effect, the human-chimp analogy can lead us to neglect broader factors, such as cultural and institutional developments.

To be clear, the points above are by no means a case for complacency about risks from AI. It is important that we get a clear picture of such risks, and that we allocate our resources accordingly. But this requires us to rely on accurate models of the world. If we overemphasize one set of risks, we are by necessity underemphasizing others.

Blog at WordPress.com.

Up ↑