What does a future dominated by AI imply?

Among altruists working to reduce risks of bad outcomes due to AI, I sometimes get the impression that there is a rather quick step from the premise “the future will be dominated by AI” to a practical position that roughly holds that “technical AI safety research aimed at reducing risks associated with fast takeoff scenarios is the best way to prevent bad AI outcomes”.

I am not saying that this is the most common view among those who work to prevent bad outcomes due to AI. Nor am I saying that the practical position outlined above is necessarily an unreasonable one. But I think I have seen (something like) this sentiment assumed often enough for it to be worthy of a critique. My aim in this post is to argue that there are many other practical positions that one could reasonably adopt based on that same starting premise.


Contents

  1. “A future dominated by AI” can mean many things
    1. “AI” can mean many things
    2. “Dominated by” can mean many things
    3. Combinations of many things
  2. Future AI dominance does not imply fast AI development
  3. Fast AI development does not imply concentrated AI development
  4. “A future dominated by AI” does not mean that either “technical AI safety” or “AI governance” is most promising
  5. Concluding clarification

“A future dominated by AI” can mean many things

“AI” can mean many things

It is worth noting that the premise that “the future will be dominated by AI” covers a wide range of scenarios. After all, it covers scenarios in which advanced machine learning software is in power; scenarios in which brain emulations are in power; as well as scenarios in which humans stay in power while gradually updating their brains with gene technologies, brain implants, nanobots, etc., such that their intelligence would eventually be considered (mostly) artificial intelligence by our standards. And there are surely more categories of AI than just the three broad ones outlined above.

“Dominated by” can mean many things

The words “in power” and “dominated by” can likewise mean many different things. For example, they could mean anything from “mostly in power” and “mostly dominated by” to “absolutely in power” and “absolutely dominated by”. And these respective terms cover a surprisingly wide spectrum.

After all, a government in a democratic society could reasonably be claimed to be “mostly in power” in that society, and a future AI system that is given similar levels of power could likewise be said to be “mostly in power” in the society it governs. By contrast, even the government of North Korea falls considerably short of being “absolutely in power” on a strong definition of that term, which hints at the wide spectrum of meanings covered by the general term “in power”.

Note that the contrast above actually hints at two distinct (though related) dimensions on which different meanings of “in power” can vary. One has to do with the level of power — i.e. whether one has more or less of it — while the other has to do with how the power is exercised, e.g. whether it is democratic or totalitarian in nature.

Thus, “a future society with AI in power” could mean a future in which AI possesses most of the power in a democratically elected government, or it could mean a future in which AI possesses total power with no bounds except the limits of physics.

Combinations of many things

Lastly, we can make a combinatorial extension of the points made above. That is, we should be aware that “a future dominated by AI” could — and is perhaps likely to — combine different kinds of AI. For instance, one could imagine futures that contain significant numbers of AIs from each of the three broad categories of AI mentioned above.

Additionally, these AIs could exercise power in distinct ways and in varying degrees across different parts of the world. For example, some parts of the world might make decisions in ways that resemble modern democratic processes, with power distributed among many actors, while other parts of the world might make decisions in ways that resemble autocratic decision procedures.

Such a diversity of power structures and decision procedures may be especially likely in scenarios that involve large-scale space expansion, since different parts of the world would then eventually be causally disconnected, and since a larger volume of AI systems presumably renders greater variation more likely in general.

These points hint at the truly vast space of possible futures covered by a term such as “a future dominated by AI”.

Future AI dominance does not imply fast AI development

Another conceptual point is that “a future dominated by AI” does not imply that technological or social progress toward such a future will happen soon or that it will occur suddenly. Furthermore, I think one could reasonably argue that such an imminent or sudden change is quite unlikely (though it obviously becomes more likely the broader our conception of “a future dominated by AI” is).

An elaborate justification for my low credence in such sudden change is beyond the scope of this post, though I can at least note that part of the reason for my skepticism is that I think trends and projections in both computer hardware and economic growth speak against such rapid future change. (For more reasons to be skeptical, see Reflections on Intelligence and “A Contra AI FOOM Reading List”.)

A future dominated by AI could emerge through a very gradual process that occurs over many decades or even hundreds of years (conditional on it ever happening). And AI scenarios involving such gradual development could well be both highly likely and highly consequential.

An objection against focusing on such slow-growth scenarios might be that scenarios involving rapid change have higher stakes, and hence they are more worth prioritizing. But it is not clear to me why this should be the case. As I have noted elsewhere, a so-called value lock-in could also happen in a slow-growth scenario, and the probability of success — and of avoiding accidental harm — may well be higher in slow-growth scenarios (cf. “Which World Gets Saved”).

The upshot could thus be the very opposite, namely that it is ultimately more promising to focus on scenarios with relatively steady growth in AI capabilities and power. (I am not claiming that this focus is in fact more promising; my point is simply that it is not obvious and that there are good reasons to question a strong focus on fast-growth scenarios.)

Fast AI development does not imply concentrated AI development

Likewise, even if we grant that the pace of AI development will increase rapidly, it does not follow that this growth will be concentrated in a single (or a few) AI system(s), as opposed to being widely distributed, akin to an entire economy of machines that grow fast together. This issue of centralized versus distributed growth was in fact the main point of contention in the Hanson-Yudkowsky FOOM debate; and I agree with Hanson that distributed growth is considerably more likely.

Similar to the argument outlined in the previous section, one could argue that there is a wager to focus on scenarios that entail highly concentrated growth over those that involve highly distributed growth, even if the latter may be more likely. Perhaps the main argument in favor of this view is that it seems that our impact can be much greater if we manage to influence a single system that will eventually gain power compared to if our influence is dispersed across countless systems.

Yet I think there are good reasons to doubt that argument. One reason is that the strategy of influencing such a single AI system may require us to identify that system in advance, which might be a difficult bet that we could easily get wrong. In other words, our expected influence may be greatly reduced by the risk that we are wrong about which systems are most likely to gain power. Moreover, there might be similar and ultimately more promising levers for “concentrated influence” in scenarios that involve more distributed growth and power. Such levers may include formal institutions and societal values, both of which could exert a significant influence on the decisions of a large number of agents simultaneously — by affecting the norms, laws, and social equilibria under which they interact.

“A future dominated by AI” does not mean that either “technical AI safety” or “AI governance” is most promising

Another impression I have is that we sometimes tacitly assume that work on “avoiding bad AI outcomes” will fall either in the categories of “technical AI safety” or “AI governance”, or at least that it will mostly fall within these categories. But I do not think that this is the case, partly for the reasons alluded to above.

In particular, it seems to me that we sometimes assume that the aim of influencing “AI outcomes” is necessarily best pursued in ways that pertain quite directly to AI today. Yet why should we assume this to be the case? After all, it seems that there are many plausible alternatives.

For example, one could think that it is generally better to pursue broad investments so as to build flexible resources that make us better able to tackle these problems down the line — e.g. investments toward general movement building and toward increasing the amount of money that we will be able to spend later, when we might be better informed and have better opportunities to pursue direct work.

A complementary option is to focus on the broader contextual factors hinted at in the previous section. That is, rather than focusing primarily on the design of the AI systems themselves, or on the laws that directly govern their development, one may focus on influencing the wider context in which they will be developed and deployed — e.g. general values, institutions, diplomatic relations, collective knowledge and wisdom, etc. After all, the broader context in which AI systems will be developed and put into action could well prove critical to the outcomes that future AI systems will eventually create.

Note that I am by no means saying that work on technical AI safety or AI governance is not worth pursuing. My point is merely that these other strategies focused on building flexible resources and influencing broader contextual factors should not be overlooked as ways to influence “a future dominated by AI”. Indeed, I believe that these strategies are among the most promising ways in which we can have a beneficial such influence at this point.

Concluding clarification

On a final note, I should clarify that the main conceptual points I have been trying to make in this post likely do not contradict the explicitly endorsed views of anyone who works to reduce risks from AI. The objects of my concern are more (what I perceive to be) certain implicit models and commonly employed terminologies that I worry may distort how we think and talk about these issues.

Specifically, it seems to me that there might be a sort of collective availability heuristic at work, through which we continually boost the salience of a particular AI narrative — or a certain class of AI scenarios — along with a certain terminology that has come to be associated with that narrative (e.g. ‘AI takeoff’, ‘transformative AI’, etc). Yet if we change our assumptions a bit, or replace the most salient narrative with another plausible one, we might find that this terminology does not necessarily make a lot of sense anymore. We might find that our typical ways of thinking about AI outcomes may be resting on a lot of implicit assumptions that are more questionable and more narrow than we tend to realize.

Some reasons not to expect a growth explosion

Many people expect global economic growth to accelerate in the future, with growth rates that are not just significantly higher than those of today, but orders of magnitude higher.

The following are some of the main reasons I do not consider a growth explosion to be the most likely future outcome.


Contents

  1. Most economists do not expect a growth explosion
  2. The history of economic growth does not support a growth explosion
  3. Rates of innovation and progress in science have slowed down
  4. Moore’s law is coming to an end
  5. The growth of supercomputers has been slowing down for years
  6. Many of our technologies cannot get orders of magnitude more efficient
  7. Three objections in brief

Most economists do not expect a growth explosion

Estimates of the future of economic growth from economists themselves generally predict a continual decline in growth rates. For instance, one “review of publicly available projections of GDP per capita over long time horizons” concluded that growth will most likely continue to decline in most countries in the coming decades. A similar report from PWC came up with similar projections.

Some accessible books that explore economic growth in the past and explain why it is reasonable to expect stagnant growth rates in the future include Robert J. Gordon’s Rise and Fall of American Growth (short version) and Tyler Cowen’s The Great Stagnation (synopsis).

It is true that there are some economists who expect growth rates to be several orders of magnitude higher in the future, but these are generally outliers. Robin Hanson suggests that such a growth explosion is likely in his book The Age of Em, which, to give some context, fellow economist Bryan Caplan calls “the single craziest claim” of the book. Caplan further writes that Hanson’s arguments for such growth expectations were “astoundingly weak”.

The point here is not that the general opinion of economists is by any means a decisive reason to reject a growth explosion (as the most likely outcome). The point is merely that it represents a significant reason to doubt an imminent growth explosion, and that it is not in fact those who doubt a rapid rise in growth rates who are the consensus-defying contrarians (and in terms of imminence, it is worth noting that even Robin Hanson does not expect a growth explosion within the next couple of decades).

Rates of innovation and progress in science have slowed down

See Bloom et al.’s Are Ideas Getting Harder to Find? and Cowen & Southwood’s Is the rate of scientific progress slowing down? A couple of graphs from the latter:

Moore’s law is coming to an end

One of the main reasons to expect a growth acceleration in the future is the promise of information technology. And economists, including Gordon and Cowen mentioned above, indeed agree that information technology has been a key driver of the growth we have seen in recent decades. But the problem is that we have strong theoretical reasons to expect the underlying trend that has been driving most progress in information technology since the 1960s — i.e. Moore’s law — will be coming to an end within the next few years.

And while it may be that other hardware paradigms will replace silicon chips as we know them, and continue the by now familiar growth in information technology, we must admit that it is quite unclear whether this will happen, especially since we are already lacking noticeably behind this trend line.

One may object that this is just a matter of hardware, and that the real growth in information technology lies in software. But a problem with this claim is that, empirically, growth in software seems largely determined by growth in hardware.

The growth of supercomputers has been slowing down for years

Developments of the performance of the 500 fastest supercomputers in the world conform well to the pattern we should expect given that we are nearing the end of Moore’s law:


The 500th fastest supercomputer in the world was on a clear exponential trajectory from the early 1990s to 2010, after which growth in performance has been steadily declining. Roughly the same holds true of both the fastest supercomputer and the sum of the 500 fastest supercomputers: a clear exponential trajectory from the early 1990s to around 2013, after which the performance has been diverging ever further from the previous trajectory, in fact so much so that the performance of the sum of the 500 fastest supercomputers is now below the performance we should expect the single fastest supercomputer to have today based on 1993-2013 extrapolation.

Many of our technologies cannot get orders of magnitude more efficient

This point is perhaps most elaborately explored in Robert J. Gordon’s book mentioned above: it seems that we have already reaped much of the low-hanging fruit in terms of technological innovation, and in some respects it is impossible to improve things much further.

Energy efficiency is an obvious example, as many of our machines and energy harvesting technologies have already reached a significant fraction of the maximally possible efficiency. For instance, electric pumps and motors tend to have around 90 percent energy efficiency, while the efficiency of the best solar panels are above 40 percent. Many of our technologies thus cannot be made orders of magnitude more efficient, and many of them can at most be marginally improved, simply because they have reached the ceiling of hard physical limits.

Three objections in brief

#1. What about the exponential growth in the compute of the largest AI training runs from 2012-2018?

This is indeed a data point in the other direction. Note, however, that this growth does not appear to have continued after 2018. Moreover, much of this growth seems to have been unsustainable. For example, DeepMind lost more than a billion dollars in 2016-2018, with the loss getting greater each year: “$154 million in 2016, $341 million in 2017, $572 million in 2018”. And the loss was apparently even greater in 2019.

#2. What about the Open Philanthropy post in which David Roodman presented a diffusion model of future growth that predicted much higher growth rates?

I think that model overlooks most of the points made above. Second, I think the following figure from Roodman’s article is a strong indication about the fit of the model, particularly how the growth rates in 1600-1970 are virtually all in the high percentiles of the model, while the growth rates in 1980-2019 are all in the low percentiles, and generally in a lower percentile as time progresses. That is a strong sign that the model does not capture our actual trajectory, and that the fit is getting worse as time progresses.

BernouDiffPredGWP12KDecBlog.png

#3. We have a wager to give much more weight to high-growth scenarios.

First, I think it is questionable that scenarios with higher growth rates merit greater priority (e.g. a so-called value lock-in could also emerge in slow-growth scenarios, and it may be more feasible to influence slow-growth scenarios because they give us more time to acquire the requisite insights and resources to exert a significant and robustly positive influence). And it is less clear still that scenarios with higher growth merit much greater priority than scenarios with lower growth rates. But even if we grant that high-growth scenarios do merit greater priority, this should not change the bare epistemic credence we assign different scenarios. Our descriptive picture should not be distorted by such priority claims.

Effective altruism and common sense

Thomas Sowell once called Milton Friedman “one of those rare thinkers who had both genius and common sense”.

I am not here interested in Sowell’s claim about Friedman, but rather in his insight into the tension between abstract smarts and common sense, and particularly how it applies to the effective altruism (EA) community. For it seems to me that there sometimes is an unbalanced ratio of clever abstractions to common sense in EA discussions.

To be clear, my point is not that abstract ideas are unimportant, or even that everyday common sense should generally be favored over abstract ideas. After all, many of the core ideas of effective altruism are highly abstract in nature, such as impartiality and the importance of numbers, and I believe we are right to stand by these ideas. But my point is that common sense is underutilized as a sanity check that can prevent our abstractions from floating into the clouds. More generally, I seem to observe a tendency to make certain assumptions, and to do a lot of clever analysis and deductions based on those assumptions, but without spending anywhere near as much energy exploring the plausibility of these assumptions themselves.

Below are three examples that I think follow this pattern.

Boltzmann brains

A highly abstract idea that is admittedly intriguing to ponder is that of a Boltzmann brain: a hypothetical conscious brain that arises as the product of random quantum fluctuations. Boltzmann brains are a trivial corollary given certain assumptions: let some basic combinatorial assumptions hold for a set amount of time, and we can conclude that a lot of Boltzmann brains must exist in this span of time (at least as a matter of statistical certainty, similar to how we can derive and be certain of the second law of thermodynamics).

But this does not mean that Boltzmann brains are in fact possible, as the underlying assumptions may well be false. Beyond the obvious possibility that the lifetime of the universe could be too short, it is also conceivable that the combinatorial assumptions that allow a functioning 310 K human brain to emerge in ~ 0 K empty space do not in fact obtain, e.g. because it falsely assumes a combinatorial independence concerning the fluctuations that happen in each neighboring “bit” of the universe (or for some other reason). If any such key assumption is false, it could be that the emergence of a 310 K human brain in ~ 0 K space is not in fact allowed by the laws of physics, even in principle, meaning that even an infinite amount of time would never spontaneously produce a 310 K human Boltzmann brain.

Note that I am not claiming that Boltzmann brains cannot emerge in ~ 0 K space. My claim is simply that there is a big step from abstract assumptions to actual reality, and there is considerable uncertainty about whether the starting assumptions in question can indeed survive that step.

Quantum immortality

Another example is the notion of quantum immortality — not in the sense of merely surviving an attempted quantum suicide for improbably long, but in the sense of literal immortality because a tiny fraction of Everett branches continue to support a conscious survivor indefinitely.

This is a case where I think skeptical common sense and a search for erroneous assumptions is essential. Specifically, even granting a picture in which, say, a victim of a serious accident survives for a markedly longer time in one branch than in another, there are still strong reasons to doubt that there will be any branches in which the victim will survive for long. Specifically, we have good reason to believe that the measure of branches in which the victim survives will converge rapidly toward zero.

An objection might be that the measure indeed will converge toward zero, but that it never actually reaches zero, and hence there will in fact always be a tiny fraction of branches in which the victim survives. Yet I believe this rests on a false assumption. Our understanding of physics suggests that there is only — and could only be — a finite number of distinct branches, meaning that even if the measure of branches in which the victim survives is approximated well by a continuous function that never exactly reaches zero, the critical threshold that corresponds to a zero measure of actual branches with a surviving victim will in fact be reached, and probably rather quickly.

Of course, one may argue that we should still assign some probability to quantum immortality being possible, and that this possibility is still highly relevant in expectation. But I think there are many risks that are much less Pascallian and far more worthy of our attention.

Intelligence explosion

Unlike the two previous examples, this last example has become quite an influential idea in EA: the notion of a fast and local “intelligence explosion“.

I will not here restate my lengthy critiques of the plausibility of this notion (or the critiques advanced by others). And to be clear, I do not think the effective altruism community is at all wrong to have a strong focus on AI. But the mistake I think I do see is that there are many abstractly grounded assumptions pertaining to a hypothetical intelligence explosion that have received an insufficient amount of scrutiny from common sense and empirical data (Garfinkel, 2018 argues along similar lines).

I think part of the problem stems from the fact that Nick Bostrom’s book Superintelligence framed the future of AI in a certain way. Here, for instance, is how Bostrom frames the issue in the conclusion of his book (p. 319):

Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. … We have little idea when the detonation will occur, though if we hold the device to our ear we can hear a faint ticking sound. … Some little idiot is bound to press the ignite button just to see what happens.

I realize Bostrom is employing a metaphor here, and I realize that he assigns a substantial credence to many different future scenarios. But the way his book is framed is nonetheless mostly in terms of such a metaphorical bomb that could ignite an intelligence explosion (i.e. FOOM). And it seems that this kind of scenario in effect became the standard scenario many people assumed and worked on, with comparatively little effort going into the more fundamental question of how plausible this future scenario is in the first place. An abstract argument about (a rather vague notion of) “intelligence” recursively improving itself was given much weight, and much clever analysis focusing on this FOOM picture and its canonical problems followed.

Again, my claim here is not that this picture is wrong or implausible, but rather that the more fundamental questions about the nature and future of “intelligence” should be kept more alive, and that our approach to these questions should be more informed by empirical data, lest we misprioritize our resources.


In sum, our fondness for abstractions is plausibly a bias we need to control for. We can do this by applying common-sense heuristics to a greater extent, by spending more time considering how our abstract models might be wrong, and by making a greater effort to hold our assumptions up against empirical reality.

Two biases relevant to expected AI scenarios

My aim in this essay is to briefly review two plausible biases in relation to our expectations of future AI scenarios. In particular, these are biases that I think risk increasing our estimates of the probability of a local, so-called FOOM takeoff.

An important point to clarify from the outset is that these biases, if indeed real, do not in themselves represent reasons to simply dismiss FOOM scenarios. It would clearly be a mistake to think so. But they do, I submit, constitute reasons to be somewhat more skeptical of them, and to re-examine our beliefs regarding FOOM scenarios. (Stronger, more direct reasons to doubt FOOM have been reviewed elsewhere.)

Egalitarian intuitions looking for upstarts

The first putative bias has its roots in our egalitarian origins. As Christopher Boehm argues in his Hierarchy in the Forrest, we humans evolved in egalitarian tribes in which we created reverse dominance hierarchies to prevent domineering individuals from taking over. Boehm thus suggests that our minds are built to be acutely aware of the potential for any individual to rise and take over, perhaps even to the extent that we have specialized modules whose main task is to be attuned to this risk.

Western “Great Man” intuitions

The second putative bias is much more culturally contingent, and should be expected to be most pronounced in Western (“WEIRD“) minds. As Joe Henrich shows in his book The WEIRDest People in the World, Western minds are uniquely focused on individuals, so much so that their entire way of thinking about the world tends to revolve around individuals and individual properties (as opposed to thinking in terms of collectives and networks, which is more common among East Asian cultures).

The problem is that this Western, individualist mode of thinking, when applied straightforwardly to the dynamics of large-scale societies, is quite wrong. For while it may be mnemonically pragmatic to recount history, including the history of ideas and technology, in terms of individual actions and decisions, the truth is usually far more complex than this individualist narrative lets on. As Henrich argues, innovation is largely the product of large-scale systemic factors (such as the degree of connectedness between people), and these factors are usually far more important than is any individual, suggesting that Westerners tend to strongly overestimate the role that single individuals play in innovation and history more generally. Henrich thus alleges that the Western way of thinking about innovation reflects an “individualism bias” of sorts, and further notes that:

thinking about individuals and focusing on them as having dispositions and kind of always evaluating everybody [in terms of which] attributes they have … leads us to what’s called “the myth of the heroic inventor”, and that’s the idea that the great advances in technology and innovation are the products of individual minds that kind of just burst forth and give us these wonderful inventions. But if you look at the history of innovation, what you’ll find time after time was that there was lucky recombinations, people often invent stuff at the same time, and each individual only makes a small increment to a much larger, longer process.

In other words, innovation is the product of numerous small and piecemeal contributions to a much greater extent than Western “Great Man” storytelling suggests. (Of course, none of this is to say that individuals are unimportant, but merely that Westerners seem likely to vastly overestimate the influence that single individuals have on history and innovation.)

Upshot

If we have mental modules specialized to look for individuals that accumulate power and take control, and if we have expectations that roughly conform to this pattern in the context of future technology, with one individual entity innovating its way to a takeover, it seems that we should at least wonder whether this expectation may derive partly from our forager-age intuitions rather than resting purely on solid epistemics. Especially when this view of the future seems in strong tension with our actual understanding of innovation. This understanding being that innovation — contra Western intuition — is distributed, with increases in abilities generally the product of countless “small” insights and tools rather than a few big ones.

Both of the tendencies listed above lead us (or in the second case, mostly Westerners) to focus on individual agents rather than larger, systemic issues that may be crucial to future outcomes, yet which are less intuitively appealing for us to focus on. And there may well be more general explanations for this lack of appeal than just the two reasons listed above. The fact that there were no large-scale systemic issues of any kind for almost all of our species’ history renders it unsurprising that we are not particularly prone to focus on such issues (except for local signaling purposes).

Perhaps we need to control for this, and try to look more toward systemic issues than we are intuitively inclined to do. After all, the claim that the future will be dominated by AI systems in some form need not imply that the best way to influence that future is to focus on individual AI systems, as opposed to broader, institutional issues.

When Machines Improve Machines

The following is an excerpt from my book Reflections on Intelligence (2016/2020).

 

The term “Artificial General Intelligence” (AGI) refers to a machine that can perform any task at least as well as any human. This is often considered the holy grail of artificial intelligence research, and also the thing that many consider likely to give rise to an “intelligence explosion”, the reason being that machines then will be able to take over the design of smarter machines, and hence their further development will no longer be held back by the slowness of humans. Luke Muehlhauser and Anna Salamon express the idea in the following way:

Once human programmers build an AI with a better-than-human capacity for AI design, the instrumental goal for self-improvement may motivate a positive feedback loop of self-enhancement. Now when the machine intelligence improves itself, it improves the intelligence that does the improving.

(Muehlhauser & Salamon, 2012, p. 13)

This seems like a radical shift, yet is it really? As author and software engineer Ramez Naam has pointed out (Naam, 2010), not quite, since we already use our latest technology to improve on itself and build the next generation of technology. As I argued in the previous chapter, the way new tools are built and improved is by means of an enormous conglomerate of tools, and newly developed tools merely become an addition to this existing set of tools. In Naam’s words:

[A] common assertion is that the advent of greater-than-human intelligence will herald The Singularity. These super intelligences will be able to advance science and technology faster than unaugmented humans can. They’ll be able to understand things that baseline humans can’t. And perhaps most importantly, they’ll be able to use their superior intellectual powers to improve on themselves, leading to an upward spiral of self improvement with faster and faster cycles each time.

In reality, we already have greater-than-human intelligences. They’re all around us. And indeed, they drive forward the frontiers of science and technology in ways that unaugmented individual humans can’t.

These superhuman intelligences are the distributed intelligences formed of humans, collaborating with one another, often via electronic means, and almost invariably with support from software systems and vast online repositories of knowledge.

(Naam, 2010)

The design and construction of new machines is not the product of human ingenuity alone, but of a large system of advanced tools in which human ingenuity is just one component, albeit a component that plays many roles. And these roles, it must be emphasized, go way beyond mere software engineering – they include everything from finding ways to drill and transport oil more effectively, to coordinating sales and business agreements across countless industries.

Moreover, as Naam hints, superhuman intellectual abilities already play a crucial role in this design process. For example, computer programs make illustrations and calculations that no human could possibly make, and these have become indispensable components in the design of new tools in virtually all technological domains. In this way, superhuman intellectual abilities are already a significant part of the process of building superhuman intellectual abilities. This has led to continued growth, yet hardly an intelligence explosion.

Naam gives a specific example of an existing self-improving “superintelligence” (a “super” goal achiever, that is), namely Intel:

Intel employs giant teams of humans and computers to design the next generation of its microprocessors. Faster chips mean that the computers it uses in the design become more powerful. More powerful computers mean that Intel can do more sophisticated simulations, that its CAD (computer aided design) software can take more of the burden off of the many hundreds of humans working on each chip design, and so on. There’s a direct feedback loop between Intel’s output and its own capabilities. …

Self-improving superintelligences have changed our lives tremendously, of course. But they don’t seem to have spiraled into a hard takeoff towards “singularity”. On a percentage basis, Google’s growth in revenue, in employees, and in servers have all slowed over time. It’s still a rapidly growing company, but that growth rate is slowly decelerating, not accelerating. The same is true of Intel and of the bulk of tech companies that have achieved a reasonable size. Larger typically means slower growing.

My point here is that neither superintelligence nor the ability to improve or augment oneself always lead to runaway growth. Positive feedback loops are a tremendously powerful force, but in nature (and here I’m liberally including corporate structures and the worldwide market economy in general as part of ‘nature’) negative feedback loops come into play as well, and tend to put brakes on growth.

(Naam, 2010)

I quote Naam at length here because he makes this important point well, and because he is an expert with experience in the pursuit of using technology to make better technology. In addition to Naam’s point about Intel and other companies that improve themselves, I would add that although these are enormous competent collectives, they still only constitute a tiny part of the larger collective system that is the world economy that they contribute modestly to, and which they are entirely dependent upon.

“The” AI?

The discussion above hints at a deeper problem in the scenario Muelhauser and Salomon lay out, namely the idea that we will build an AI that will be a game-changer. This idea seems widespread in modern discussions about both risks and opportunities of AI. Yet why should this be the case? Why should the most powerful software competences we develop in the future be concentrated into anything remotely like a unitary system?

The human mind is unitary and trapped inside a single skull for evolutionary reasons. The only way additional cognitive competences could be added was by lumping them onto the existing core in gradual steps. But why should the extended “mind” of software that we build to expand our capabilities be bound in such a manner? In terms of the current and past trends of the development of this “mind”, it only seems to be developing in the opposite direction: toward diversity, not unity. The pattern of distributed specialization mentioned in the previous chapter is repeating itself in this area as well. What we see is many diverse systems used by many diverse systems in a complex interplay to create ever more, increasingly diverse systems. We do not appear to be headed toward any singular super-powerful system, but instead toward an increasingly powerful society of systems (Kelly, 2010).

Greater Than Individual or Collective Human Abilities?

This also hints at another way in which our speaking of “intelligent machines” is somewhat deceptive and arbitrary. For why talk about the point at which these machines become as capable as human individuals rather than, say, an entire human society? After all, it is not at the level of individuals that accomplishments such as machine building occurs, but rather at the level of the entire economy. If we talked about the latter, it would be clear to us, I think, that the capabilities that are relevant for the accomplishment of any real-world goal are many and incredibly diverse, and that they are much more than just intellectual: they also require mechanical abilities and a vast array of materials.

If we talked about “the moment” when machines can do everything a society can, we would hardly be tempted to think of these machines as being singular in kind. Instead, we would probably think of them as a society of sorts, one that must evolve and adapt gradually. And I see no reason why we should not think about the emergence of “intelligent machines” with abilities that surpass human intellectual abilities in the same way.

After all, this is exactly what we see today: we gradually build new machines – both software and hardware – that can do things better than human individuals, but these are different machines that do different things better than humans. Again, there is no trend toward the building of disproportionally powerful, unitary machines. Yes, we do see some algorithms that are impressively general in nature, but their generality and capabilities still pale in comparison to the generality and the capabilities of our larger collective of ever more diverse tools (as is also true of individual humans).

Relatedly, the idea of a “moment” or “event” at which machines surpass human abilities is deeply problematic in the first place. It ignores the many-faceted nature of the capabilities to be surpassed, both in the case of human individuals and human societies, and, by extension, the gradual nature of the surpassing of these abilities. Machines have been better than humans at many tasks for centuries, yet we continue to speak as though there will be something like a “from-nothing-to-everything” moment – e.g. “once human programmers build an AI with a better-than-human capacity for AI design”. Again, this is not congruous with the way in which we actually develop software: we already have software that is superhuman in many regards, and this software already plays a large role in the collective system that builds smarter machines.

A Familiar Dynamic

It has always been the latest, most advanced tools that, in combination with the already existing set of tools, have collaborated to build the latest, most advanced tools. The expected “machines building machines” revolution is therefore not as revolutionary as it seems at first sight. The “once machines can program AI better than humans” argument seems to assume that human software engineers are the sole bottleneck of progress in the building of more competent machines, yet this is not the case. But even if it were, and if we suddenly had a thousand times as many people working to create better software, other bottlenecks would quickly emerge – materials, hardware production, energy, etc. All of these things, indeed the whole host of tasks that maintain and grow our economy, are crucial for the building of more capable machines. Essentially, we are returned to the task of advancing our entire economy, something that pretty much all humans and machines are participating in already, knowingly or not, willingly or not.

By themselves, the latest, most advanced tools do not do much. A CAD program alone is not going to build much, and the same holds true of the entire software industry. In spite of all its impressive feats, it is still just another cog in a much grander machinery.

Indeed, to say that software alone can lead to an “intelligence explosion” – i.e. a capability explosion – is akin to saying that a neuron can hold a conversation. Such statements express a fundamental misunderstanding of the level at which these accomplishments are made. The software industry, like any software program in particular, relies on the larger economy in order to produce progress of any kind, and the only way it can do so is by becoming part of – i.e. working with and contributing to – this grander system that is the entire economy. Again, individual goal-achieving ability is a function of the abilities of the collective. And it is here, in the entire economy, that the greatest goal-achieving ability is found, or rather distributed.

The question concerning whether “intelligence” can explode is therefore essentially: can the economy explode? To which we can answer that rapid increases in the growth rate of the world economy certainly have occurred in the past, and some argue that this is likely to happen again in the future (Hanson 1998/2000, 2016). However, there are reasons to be skeptical of such a future growth explosion (Murphy, 2011; Modis, 2012; Gordon, 2016; Caplan, 2016; Vinding, 2017b; Cowen & Southwood, 2019).

“Intelligence Though!” – A Bad Argument

A type of argument often made in discussions about the future of AI is that we can just never know what a “superintelligent machine” could do. “It” might be able to do virtually anything we can think of, and much more than that, given “its” vastly greater “intelligence”.

The problem with this argument is that it again rests on a vague notion of “intelligence” that this machine “has a lot of”. For what exactly is this “stuff” it has a lot of? Goal-achieving ability? If so, then, as we saw in the previous chapter, “intelligence” requires an enormous array of tools and tricks that entails much more than mere software. It cannot be condensed into anything we can identify as a single machine.

Claims of the sort that a “superintelligent machine” could just do this or that complex task are extremely vague, since the nature of this “superintelligent machine” is not accounted for, and neither are the plausible means by which “it” will accomplish the extraordinarily difficult – perhaps even impossible – task in question. Yet such claims are generally taken quite seriously nonetheless, the reason being that the vague notion of “intelligence” that they rest upon is taken seriously in the first place. This, I have tried to argue, is the cardinal mistake.

We cannot let a term like “superintelligence” provide a carte blanche to make extraordinary claims or assumptions without a bare minimum of justification. I think Bostrom’s book Superintelligence is an example of this. Bostrom worries about a rapid “intelligence explosion” initiated by “an AI” throughout the book, yet offers very little in terms of arguments for why we should believe that such a rapid explosion is plausible (Hanson, 2014), not to mention what exactly it is that is supposed to explode (Hanson, 2010; 2011a).

No Singular Thing, No Grand Control Problem

The problem is that we talk about “intelligence” as though it were a singular thing; or, in the words of brain and AI researcher Jeff Hawkins, as though it were “some sort of magic sauce” (Hawkins, 2015). This is also what gives rise to the idea that “intelligence” can explode, because one of the things that this “intelligence” can do, if you have enough of it, is to produce more “intelligence”, which can in turn produce even more “intelligence”.

This stands in stark contrast to the view that “intelligence” – whether we talk about cognitive abilities in particular or goal-achieving abilities in general – is anything but singular in nature, but rather the product of countless clever tricks and hacks built by a long process of testing and learning. On this latter view, there is no single master problem to crack for increasing “intelligence”, but rather just many new tricks and hacks we can discover. And finding these is essentially what we have always been doing in science and engineering.

Robin Hanson makes a similar point in relation to his skepticism of a “blank-slate AI mind-design” intelligence explosion:

Sure if there were a super mind theory that allowed vast mental efficiency gains all at once, but there isn’t. Minds are vast complex structures full of parts that depend intricately on each other, much like the citizens of a city. Minds, like cities, best improve gradually, because you just never know enough to manage a vast redesign of something with such complex inter-dependent adaptations.

(Hanson, 2010)

Rather than a concentrated center of capability that faces a grand control problem, what we see is a development of tools and abilities that are distributed throughout the larger economy. And we “control” – i.e. specify the function of – these tools, including software programs, gradually as we make them and put them to use in practice. The design of the larger system is thus the result of our solutions to many, comparatively small “control problems”. I see no compelling reason to believe that the design of the future will be any different.


See also Chimps, Humans, and AI: A Deceptive Analogy.

Consciousness – Orthogonal or Crucial?

The following is an excerpt from my book Reflections on Intelligence (2016/2020).

 

A question often considered open, sometimes even irrelevant, when it comes to “AGIs” and “superintelligences” is whether such entities would be conscious. Here is Nick Bostrom expressing such a sentiment:

By a “superintelligence” we mean an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills. This definition leaves open how the superintelligence is implemented: it could be a digital computer, an ensemble of networked computers, cultured cortical tissue or what have you. It also leaves open whether the superintelligence is conscious and has subjective experiences.

(Bostrom, 2012, “Definition of ‘superintelligence’”)

This is false, however. On no meaningful definition of “more capable than the best human brains in practically every field, including scientific creativity, general wisdom, and social skills” can the question of consciousness be considered irrelevant. This is like defining a “superintelligence” as an entity “smarter” than any human, and to then claim that this definition leaves open whether such an entity can read natural language or perform mathematical calculations. Consciousness is integral to virtually everything we do and excel at, and thus if an entity is not conscious, it cannot possibly outperform the best humans “in practically every field”. Especially not in “scientific creativity, general wisdom, and social skills”. Let us look at these three in turn.

Social Skills

Good social skills depend on an ability to understand others. And in order to understand other people, we have to simulate what it is like to be them. Fortunately, this comes quite naturally to most of us. We know what it is like to consciously experience emotions such as sadness, fear, and joy directly, and this enables us to understand where people are coming from when they report and act on these emotions.

Consider the following example: without knowing anything about a stranger you observe on the street, you can roughly know how that person would feel and react if they suddenly, by the snap of a finger, had no clothes on right there on the street. Embarrassment, distress, wanting to cover up and get away from the situation are almost certain to be the reaction of any randomly selected person. We know this, not because we have read about it, but because of our immediate simulations of the minds of others – one of the main things our big brains evolved to do. This is what enables us to understand the minds of other people, and hence without running this conscious simulation of the minds of others, one will have no chance of gaining good social skills and interpersonal understanding.

But couldn’t a computer just simulate people’s brains and then understand them without being conscious? Is the consciousness bit really relevant here?

Yes, consciousness is relevant. At the very least, it is relevant for us. Consider, for instance, the job of a therapist, or indeed the “job” of any person who attempts to listen to another person in a deep conversation. When we tell someone about our own state or situation, it matters deeply to us that the listener actually understands what we are saying. A listener who merely pretends to feel and understand would be no good. Indeed, this would be worse than no good, as such a “listener” would then essentially be lying and deceiving in a most insensitive way, in every sense of the word.

Frustrated Human: “Do you actually know the feeling I’m talking about here? Do you even know the difference between joy and hopeless despair?”

Unconscious liar: “Yes.”

Whether someone is actually feeling us when we tell them something matters to us, especially when it comes to our willingness to share our perspectives, and hence it matters for “social skills”. An unconscious entity cannot have better social skills than “the best human brains” because it would lack the very essence of social skills: truly feeling and understanding others. Without a conscious mind there is no way to understand what it is like to have such a mind.

General Wisdom

Given how relevant social skills are for general wisdom, and given the relevance of consciousness for social skills, the claim that consciousness is irrelevant to general wisdom should already stand in serious doubt at this point.

Yet rather than restricting our focus to “general wisdom”, let us consider ethics in its entirety, which, broadly construed at least, includes any relevant sense of “general wisdom”. For in order to reason about ethics, one must be able to consider and evaluate questions like the following:

Can certain forms of suffering be outweighed by a certain amount of happiness?

Does the nature of the experience of suffering in some sense demand that reducing suffering is given greater moral priority than increasing happiness (for the already happy)?

Can realist normative claims be made on the basis of the properties of such experiences?

One has to be conscious to answer such questions. That is, one must know what such experiences are like in order to understand their experiential properties and significance. Knowing what terms like “suffering” and “happiness” refer to – i.e. knowing what the actual experiences of suffering and happiness are like – is as crucial to ethics as numbers are to mathematics.

The same point holds true about other areas of philosophy that bear on wisdom, such as the philosophy of mind: without knowing what it is like to have a conscious mind, one cannot contribute to the discussion about what it is like to have one and what the nature of consciousness is. Indeed, an unconscious entity has no idea about what the issue is even about in the first place.

So both in ethics and in the philosophy of mind, an unconscious entity would be less than clueless about the deep questions at hand. If an entity not only fails to surpass humans in this area, but fails to even have the slightest clue about what we are talking about, it hardly surpasses the best human brains in practically every field. After all, these questions are also relevant to many other fields, ranging from questions in psychology to questions concerning the core foundations of knowledge.

Experiencing and reasoning about consciousness is a most essential part of “human abilities”, and hence an entity that cannot do this cannot be claimed to surpass humans in the most important, much less all, human abilities.

Scientific Creativity

The third and final ability mentioned above that an unconscious entity can supposedly surpass humans in is scientific creativity. Yet scientific creativity must relate to all fields of knowledge, including the science of the conscious mind itself. This is also a part of the natural world, and a most relevant one at that.

Experiencing and accurately reporting what a given state of consciousness is like is essential for the science of mind, yet an unconscious entity obviously cannot do such a thing, as there is no experience it can report from. It cannot display any scientific creativity, or even produce mere observations, in this most important science. Again, the most it can do is produce lies – the very anti-matter of science.

 

Chimps, Humans, and AI: A Deceptive Analogy

The prospect of smarter-than-human artificial intelligence (AI) is often presented and thought of in terms of a simple analogy: AI will stand in relation to us the way we stand in relation to chimps. In other words, AI will be qualitatively more competent and powerful than us, and its actions will be as inscrutable to humans as current human endeavors (e.g. science and politics) are to chimps.

My aim in this essay is to show that this is in many ways a false analogy. The difference in understanding and technological competence found between modern humans and chimps is, in an important sense, a zero-to-one difference that cannot be repeated.


Contents

  1. How are humans different from chimps?
    1. I. Symbolic language
    2. II. Cumulative technological innovation
  2. The range of human abilities is surprisingly wide
  3. The cultural basis of the human capability expansion
  4. Why this is relevant

How are humans different from chimps?

A common answer to this question is that humans are smarter. Specifically, at the level of our individual cognitive abilities, humans, with our roughly three times larger brains, are just far more capable.

This claim no doubt contains a large grain of truth, as humans surely do beat chimps in a wide range of cognitive tasks. Yet it is also false in some respects. For example, chimps have superior working memory compared to humans, and apparently also beat humans in certain video games, including games involving navigation in complex mazes.

But researchers who study human uniqueness actually provide some rather different, more specific answers to this question. If we focus on individual mental differences in particular, researchers have found that, crudely speaking, humans are different from chimps in three principal ways: 1) we can learn language, 2) we have a strong orientation toward social learning, and 3) we are highly cooperative (among our ingroup, compared to chimps).

These differences have in turn resulted in two qualitative differences in the abilities of humans and chimps in today’s world.

I. Symbolic language

The first is that we humans have acquired an ability to think and communicate in terms of symbolic language that represents elaborate concepts. We can learn about the deep history of life and the universe, as well as the likely future of the universe, including the fundamental limits to future space travel and future computations. Any educated human can learn a good deal about these things whereas no chimp can.

Note how this is truly a zero-to-one difference: no symbolic language versus an elaborate symbolic language through which knowledge can be represented and continually developed (see chapter 1 in Deacon, 1997). It is the difference between having no science of physics versus having an elaborate such science with which we can predict future events and put hard limits on future possibilities.

This zero-to-one difference cannot really be repeated. Given that we already have physical models that predict, say, the future motion of planets and the solar system to a fairly high degree of accuracy, the best one can do in this respect is to (slightly) improve the accuracy of these predictions. Such further improvements cannot be compared to going from zero physics to current physics.

The same point applies to our scientific understanding more generally: we currently have theories that work decently well at explaining most of the phenomena around us. And though one can significantly improve the accuracy and sophistication of many of these theories, any such further improvement would be much less significant than the qualitative leap from absolutely no conceptual models to the entire collection of models and theories we currently have.

For example, going from no understanding of evolution by natural selection to the elaborate understanding of biology we have today cannot be matched, in terms of qualitative and revolutionary leaps, by further refinements in biology. We have already mapped out the core basics of biology (in fact a great deal more than that), and this can only be done once.

This is not an original point. Robin Hanson has made essentially the same point in response to the notion that future machines will be “as incomprehensible to us as we are to goldfish”:

This seems to me to ignore our rich multi-dimensional understanding of intelligence elaborated in our sciences of mind (computer science, AI, cognitive science, neuroscience, animal behavior, etc.).

… the ability of one mind to understand the general nature of another mind would seem mainly to depend on whether that first mind can understand abstractly at all, and on the depth and richness of its knowledge about minds in general. Goldfish do not understand us mainly because they seem incapable of any abstract comprehension. …

It seems to me that human cognition is general enough, and our sciences of mind mature enough, that we can understand much about quite a diverse zoo of possible minds, many of them much more capable than ourselves on many dimensions.

Ramez Naam has argued similarly in relation to the idea that there will be some future time or intelligence that we are fundamentally unable to understand. He argues that our understanding of the future is growing rather than shrinking as time progresses, and that AI and other future technologies will not be beyond comprehension:

All of those [future technologies] are still governed by the laws of physics. We can describe and model them through the tools of economics, game theory, evolutionary theory, and information theory. It may be that at some point humans or our descendants will have transformed the entire solar system into a living information processing entity — a Matrioshka Brain. We may have even done the same with the other hundred billion stars in our galaxy, or perhaps even spread to other galaxies.

Surely that is a scale beyond our ability to understand? Not particularly. I can use math to describe to you the limits on such an object, how much computing it would be able to do for the lifetime of the star it surrounded. I can describe the limit on the computing done by networks of multiple Matrioshka Brains by coming back to physics, and pointing out that there is a guaranteed latency in communication between stars, determined by the speed of light. I can turn to game theory and evolutionary theory to tell you that there will most likely be competition between different information patterns within such a computing entity, as its resources (however vast) are finite, and I can describe to you some of the dynamics of that competition and the existence of evolution, co-evolution, parasites, symbiotes, and other patterns we know exist.

Chimps cannot understand human politics and science to a similar extent. Thus, the truth is that there is a strong disanalogy between the understanding chimps have of humans versus the understanding that we humans — thanks to our conceptual tools — can have of any possible future intelligence (in physical and computational terms, say).

Note that the qualitative leap reviewed above was not one that happened shortly after human ancestors diverged from chimp ancestors. Instead, it was a much more recent leap that has been unfolding gradually since the first humans appeared, and which has continued to accelerate in recent centuries, as we have developed ever more advanced science and mathematics. In other words, this qualitative step has been a product of cultural evolution just as much as biological evolution. Early humans presumably had a roughly similar potential to learn modern language, science, mathematics, etc. But such conceptual tools could not be acquired in the absence of a surrounding culture able to teach these innovations.

Ramez Naam has made a similar point:

If there was ever a singularity in human history, it occurred when humans evolved complex symbolic reasoning, which enabled language and eventually mathematics and science. Homo sapiens before this point would have been totally incapable of understanding our lives today. We have a far greater ability to understand what might happen at some point 10 million years in the future than they would to understand what would happen a few tens of thousands of years in the future.

II. Cumulative technological innovation

The second zero-to-one difference between humans and chimps is that we humans build things. Not just that we build things, but that we refine our technology over time. After all, many non-human animals use tools in the form of sticks and stones, and some even shape primitive tools of their own. But only humans improve and build upon the technological inventions of their ancestors.

Consequently, humans are unique in expanding their abilities by systematically exploiting their environment, molding the things around them into ever more useful self-extensions. We have turned wildlands into crop fields; we have created technologies that can harvest energy — from oil, gas, wind, and sun — and we have built external memories far more reliable than our own, such as books and hard disks.

This is another qualitative leap that cannot be repeated: the step from having absolutely no cumulative technology to exploiting and optimizing our external environment toward our own ends. The step from having no external memory to having the current repository of stored human knowledge at our fingertips, and from harvesting absolutely no energy (other than through individual digestion) to collectively harvesting and using hundreds of quintillions of Joules every year.

To be sure, it is possible to improve on and expand these innovations. We can harvest greater amounts of energy, for example, and create even larger external memories. Yet these are merely quantitative differences, and humanity indeed continually makes such improvements each year. They are not zero-to-one differences that only a new species could bring about. And what is more, we know that the potential for making further technological improvements is, at least in many respects, quite limited.

Take energy efficiency as an example. Many of our machines and energy harvesting technologies have already reached a significant fraction of the maximally possible efficiency. For example, electric motors and pumps tend to have around 90 percent energy efficiency, and the best solar panels have an efficiency greater than 40 percent. So as a matter of hard physical limits, many of our technologies cannot be made orders of magnitude more efficient; in fact, a large number of them can at most be marginally improved.

In sum, we are unique in being the first species that systematically sculpted our surrounding environment and turned it into ever-improving tools, many of which have near-maximal efficiency. This step cannot be repeated, only expanded further.


Just like the qualitative leap in our symbolic reasoning skills, the qualitative leap in our ability to create technology and shape our environment emerged, not between chimps and early humans, but between early humans and today’s humans, as the result of a cultural process occurring over thousands of years. In fact, the two leaps have been closely related: our ability to reason and communicate symbolically has enabled us to create cumulative technological innovation. Conversely, our technologies have allowed us to refine our knowledge and conceptual tools, by enabling us to explore and experiment, which in turn made us able to build even better technologies with which we could advance our knowledge even further, and so on.

This, in a nutshell, is the story of the growth of human knowledge and technology, a story of recursive self-improvement (see Simler, 2019, “On scientific networks”). It is not really a story about the individual human brain per se. After all, the human brain does not accomplish much in isolation (nor is it the brain with the largest number of neurons; several species have more neurons in the forebrain). It is more a story about what happened between and around brains: in the exchange of information in networks of brains and in the external creations designed by them. A story made possible by the fact that the human brain is unique in being by far the most cultural brain of all, with its singular capacity to learn from and cooperate with others.

The range of human abilities is surprisingly wide

Another way in which an analogy to chimps is frequently drawn is by imagining an intelligence scale along which different species are ranked, such that, for example, we have “rats at 30, chimps at 60, the village idiot at 90, the average human at 98, and Einstein at 100”, and where future AI may in turn be ranked many hundreds of points higher than Einstein. According to this picture, it is not just that humans will stand in relation to AI the way chimps stand in relation to humans, but that AI will be far superior still. The human-chimp analogy is, on this view, a severe understatement of the difference between humans and future AI.

Such an intelligence scale may seem intuitively compelling, but how does it correspond to reality? One way to probe this question is to examine the range of human abilities in chess. The standard way to rank chess skills is with the Elo rating system, which is a good predictor of the outcomes of chess games between different players, whether human, digital, or otherwise.

An early human beginner will have a rating around 300, a novice around 800, and a rating in the range 2000-2199 is ranked as “Expert”. The highest rating ever achieved is 2882 by Magnus Carlsen.

How large is this range of chess skills in an absolute sense? Remarkably large, it turns out. For example, it took more than four decades from when computers were first able to beat a human chess novice (the 1950s), until a computer was able to beat the best human player (1997, officially). In other words, the span from novice to Kasparov corresponded to more than four decades of progress in hardware — i.e. a million times more computing power — and software. This alone suggests that the human range of chess skills is rather wide.

Yet the range seems even broader when we consider the upper bounds of chess performance. After all, the fact that it took computers decades to go from human novice to world champion does not mean that the best human is not still ridiculously far from the best a computer could be in theory. Surprisingly, however, this latter distance does in fact seem quite small. Estimates suggest that the best possible chess machine would have an Elo rating around 3600, which means that the relative distance between the best possible computer and the best human is only around 700 Elo points (the Elo rating is essentially a measure of relative distance; 700 Elo points corresponds to a winning percentage of around 1.5 percent for the losing player).

This implies that the distance between the best human (Carlsen) and a chess “Expert” (someone belonging to the top 5 percent of chess players) is similar to the distance between the best human and the best possible chess brain, while the distance between a human beginner and the best human is far greater (2500 Elo points). This stands in stark contrast to the intelligence scale outlined above, which would predict the complete opposite: the distance from a human novice to the best human should be comparatively small whereas the distance from the best human to the optimal brain should be the larger one by far.


It may be objected that chess is a bad example, and that it does not really reflect what is meant by the intelligence scale above. But the question is then what would be a better measure. After all, a similar story seems to apply to other games, such as shogi and go: the human range of abilities is surprisingly wide and the best players are significantly closer to optimal than they are to novice players.

In fact, one can argue that the objection should go in the opposite direction, as human brains are not built for chess, and hence we should expect even the best humans to be far from optimal at it. We should expect to be much closer to “optimal” at solving problems that are more important for our survival, such as social cognition and natural language processing — skills that most people are wired to master at super-Carlsen levels.

Regardless, the truth is that humans are mastering ever more “games”, literal as well as figurative ones, at optimal or near-optimal levels. Not because evolution “just so happened to stumble upon the most efficient way to assemble matter into an intelligent system”, but rather because it created a species able to make cultural and technological progress toward ever greater levels of competence.

The cultural basis of the human capability expansion

The intelligence scale outlined above misses two key points. First, human abilities are not a constant. Whether we speak of individual abilities (e.g. the abilities of elite chess players) or humanity’s collective abilities (e.g. building laptops and sending people to the moon), it is clear that our abilities have increased dramatically as our culture and technology have expanded.

Second, because human abilities are not a constant, the range of human abilities is far wider, in an absolute sense, than the intelligence scale outlined above suggests, as it has grown and still continues to grow over time.

Chess is a good example of this. Untrained humans and chimps have the same (non-)skill level at chess. Yet thanks to culture, some people can learn to master the game. A wealthy society can allow people to specialize in chess, and makes it possible for knowledge to accumulate in books and experts. Eventually, it enables learning from super-human chess engines, whose innovations we can adopt just as we do those of other humans.

And yet we humans expand our abilities to a much greater extent than the example of increased human chess abilities suggests, as we not only expand our abilities by stimulating our brains with progressively better forms of practice and information, but also by extending ourselves directly with technology. For example, we can all use a chess engine to find great chess moves for us. Our latest technologies enable us to accomplish ever-more tasks that no human could ever accomplish unaided.

Worth noting in this regard is that this self-extension process seems to have slowed down in recent decades, likely because we have reaped most low-hanging fruits already, and in some respects because it is impossible to improve things much further (we already mentioned energy efficiency as an example where we are getting close to the upper limits in many respects).

This suggests that not only is there not a qualitative leap similar to that between chimps and modern humans ahead of us, but that even a quantitative growth explosion, with relative growth rates significantly higher than what we have seen in the past, should not be our default expectation either (for some support for this claim, see “Peak growth might lie in the past” in Vinding, 2017).

Why this is relevant

The errors of the human-chimp analogy are worth highlighting for a few reasons. First, the analogy can lead us to overestimate how much everything will change with AI. It leads us to expect qualitative leaps of sorts that cannot be repeated.

Second, the human-chimp analogy makes us underestimate how much we currently know and are able to understand. To think that intelligent systems of the future will be as incomprehensible to us today as human affairs are to chimps is to underestimate how extensive and universal our current knowledge of the world in fact is — not just when it comes to physical and computational limits, but also in relation to general economic and game-theoretic principles. We know a good deal about economic growth, for example, and this knowledge has a lot to say about how we should expect future intelligent systems to grow. In particular, it suggests that local AI-FOOM growth is unlikely.

The analogy can thus have an insidious influence by making us feel like current data and trends cannot be trusted much, because look how different humans are from chimps, and look how puny the human brain is compared to ultimate limits. I think this is exactly the wrong way to think about the future. We should base our expectations on a deep study of past trends, including the actual evolution of human competences — not simple analogies.

Relatedly, the human-chimp analogy is also relevant in that it can lead us to grossly overestimate the probability of an AI-FOOM scenario. That is, if we get the story about the evolution of human competences so wrong that we think the differences we observe today between chimps and modern humans reduce mostly to a story about changes in individual brains, then we are likely to have similarly inaccurate expectations about what comparable innovations in some individual machine are able to effect on their own.

If the human-chimp analogy leads us to (marginally) overestimate the probability of a FOOM scenario, it may nudge us toward focusing too much on some single, concentrated future thing that we expect to be all-important: the AI that suddenly becomes qualitatively more competent than humans. In effect, the human-chimp analogy can lead us to neglect broader factors, such as cultural and institutional developments.

Note that the above is by no means a case for complacency about risks from AI. It is important that we get a clear picture of such risks, and that we allocate our resources accordingly. But this requires us to rely on accurate models of the world. If we overemphasize one set of risks, we are by necessity underemphasizing others.

The future of growth: Near-zero growth rates

First written: Jul. 2017; Last update: Nov 2022.

Exponential growth is a common pattern found throughout nature. Yet it is also a pattern that tends not to last, as growth rates tend to decline sooner or later.

In biology, this pattern of exponential growth that wanes off is found in everything from the development of individual bodies — for instance, in the growth of humans, which levels off in the late teenage years — to population sizes.

One may of course be skeptical that this general trend will also apply to the growth of our technology and economy at large, as innovation seems to continually postpone our clash with the ceiling, yet it seems inescapable that it must. For in light of what we know about physics, we can conclude that exponential growth of the kinds we see today, in technology in particular and in our economy more generally, must come to an end, and do so relatively soon.

Limits to growth

Physical limits to computation and Moore’s law

One reason we can make this assertion is that there are theoretical limits to computation. As physicist Seth Lloyd’s calculations show, a continuation of Moore’s law — in its most general formulation: “the amount of information that computers are capable of processing and the rate at which they process it doubles every two years” — would imply that we hit the theoretical limits of computation within 250 years:

If, as seems highly unlikely, it is possible to extrapolate the exponential progress of Moore’s law into the future, then it will only take two hundred and fifty years to make up the forty orders of magnitude in performance between current computers that perform 1010 operations per second on 1010 bits and our one kilogram ultimate laptop that performs 1051 operations per second on 1031 bits.

Similarly, physicists Lawrence Krauss and Glenn Starkman have calculated that, even if we factor in colonization of space at the speed of light, this doubling of processing power cannot continue for more than 600 years in any civilization:

Our estimate for the total information processing capability of any system in our Universe implies an ultimate limit on the processing capability of any system in the future, independent of its physical manifestation and implies that Moore’s Law cannot continue unabated for more than 600 years for any technological civilization.

In a more recent lecture and a subsequent interview, Krauss said that the absolute limit for the continuation of Moore’s law, in our case, would be reached in less than 400 years (the discrepancy — between the numbers 400 and 600 — is at least in part because Moore’s law, in its most general formulation, has played out for more than a century in our civilization at this point). And, as both Krauss and Lloyd have stressed, these are ultimate theoretical limits, resting on assumptions that are unlikely to be met in practice, such as expansion at the speed of light. What is possible, in terms of how long Moore’s law can continue for, given both engineering and economic constraints is likely significantly less. Indeed, we are already close to approaching the physical limits of the paradigm that Moore’s law has been riding on for more than 50 years — silicon transistors, the only paradigm that Gordon Moore was talking about originally — and it is not clear whether other paradigms will be able to take over and keep the trend going.

Limits to the growth of energy use

Physicist Tom Murphy has calculated a similar limit for the growth of the energy consumption of our civilization. Based on the observation that the energy consumption of the United States has increased fairly consistently with an average annual growth rate of 2.9 percent over the last 350 odd years (although the growth rate appears to have slowed down in recent times and been stably below 2.9 since c. 1980), Murphy proceeds to derive the limits for the continuation of similar energy growth. He does this, however, by assuming an annual growth rate of “only” 2.3 percent, which conveniently results in an increase of the total energy consumption by a factor of ten every 100 years. If we assume that we will continue expanding our energy use at this rate by covering Earth with solar panels, this would, on Murphy’s calculations, imply that we will have to cover all of Earth’s land with solar panels in less than 350 years, and all of Earth, including the oceans, in 400 years.

Beyond that, assuming that we could capture all of the energy from the sun by surrounding it in solar panels, the 2.3 percent growth rate would come to an end within 1,350 years from now. And if we go further out still, to capture the energy emitted from all the stars in our galaxy, we get that this growth rate must hit the ceiling and become near-zero within 2,500 years (of course, the limit of the physically possible must be hit earlier, indeed more than 500 years earlier, as we cannot traverse our 100,000 light year-wide Milky Way in only 2,500 years).

One may suggest that alternative sources of energy might change this analysis significantly, yet, as Murphy notes, this does not seem to be the case:

Some readers may be bothered by the foregoing focus on solar/stellar energy. If we’re dreaming big, let’s forget the wimpy solar energy constraints and adopt fusion. The abundance of deuterium in ordinary water would allow us to have a seemingly inexhaustible source of energy right here on Earth. We won’t go into a detailed analysis of this path, because we don’t have to. The merciless growth illustrated above means that in 1400 years from now, any source of energy we harness would have to outshine the sun.

Essentially, keeping up the annual growth rate of 2.3 percent by harnessing energy from matter not found in stars would force us to make such matter hotter than stars themselves. We would have to create new stars of sorts, and, even if we assume that the energy required to create such stars is less than the energy gained, such an endeavor would quickly run into limits as well. For according to one estimate, the total mass of the Milky Way, including dark matter, is only 20 times greater than the mass of its stars. Assuming a 5:1 ratio of dark matter to ordinary matter, this implies that that there is only about 3.3 times as much ordinary non-stellar matter as there is stellar matter in our galaxy. Thus, even if we could convert all this matter into stars without spending any energy and harvest the resulting energy, this would only give us about 50 years more of keeping up with the annual growth rate of 2.3 percent.1

Limits derived from economic considerations

Similar conclusions to the ones drawn above for computation and energy also seem to follow from calculations of a more economic nature. For, as economist Robin Hanson has argued, projecting present economic growth rates into the future also leads to a clash against fundamental limits:

Today we have about ten billion people with an average income about twenty times subsistence level, and the world economy doubles roughly every fifteen years. If that growth rate continued for ten thousand years[,] the total growth factor would be 10200.

There are roughly 1057 atoms in our solar system, and about 1070 atoms in our galaxy, which holds most of the mass within a million light years. So even if we had access to all the matter within a million light years, to grow by a factor of 10200each atom would on average have to support an economy equivalent to 10140 people at today’s standard of living, or one person with a standard of living 10140 times higher, or some mix of these.

Indeed, current growth rates would “only” have to continue for three thousand years before each atom in our galaxy would have to support an economy equivalent to a single person living at today’s living standard, which already seems rather implausible (not least because we can only access a tiny fraction of “all the matter within a million light years” in three thousand years). Hanson does not, however, expect the current growth rate to remain constant, but instead, based on the history of growth rates, expects a new growth mode where the world economy doubles within 15 days rather than 15 years:

If a new growth transition were to be similar to the last few, in terms of the number of doublings and the increase in the growth rate, then the remarkable consistency in the previous transitions allows a remarkably precise prediction. A new growth mode should arise sometime within about the next seven industry mode doublings (i.e., the next seventy years) and give a new wealth doubling time of between seven and sixteen days.

And given this more than a hundred times greater growth rate, the net growth that would take 10,000 years to accomplish given our current growth rate (cf. Hanson’s calculation above) would now take less than a century to reach, while growth otherwise requiring 3,000 years would require less than 30 years. So if Hanson is right, and we will see such a shift within the next seventy years, what seems to follow is that we will reach the limits of economic growth, or at least reach near-zero growth rates, within a century or two. Such a projection is also consistent with the physically derived limits of the continuation of Moore’s law; not that economic growth and Moore’s law are remotely the same, yet they are no doubt closely connected: economic growth is largely powered by technological progress, of which Moore’s law has been a considerable subset in recent times.

The conclusion we reach by projecting past growth trends in computing power, energy, and the economy is the same: our current growth rates cannot go on forever. In fact, they will have to decline to near-zero levels very soon on a cosmic timescale. Given the physical limits to computation, and hence, ultimately, to economic growth, we can conclude that we must be close to the point where peak relative growth in our economy and our ability to process information occurs — that is, the point where this growth rate is the highest in the entire history of our civilization, past and future.

Peak growth might lie in the past

This is not, however, to say that this point of maximum relative growth necessarily lies in the future. Indeed, in light of the declining economic growth rates we have seen over the last few decades, it cannot be ruled out that we are now already past the point of “peak economic growth” in the history of our civilization, with the highest growth rates having occurred around 1960-1980, cf. these declining growth rates and this essay by physicist Theodore Modis. This is not to say that we most likely are, yet it seems that the probability that we are is non-trivial.

A relevant data point here is that the global economy has seen three doublings since 1965, where the annual growth rate was around six percent, and yet the annual growth rate today is only a little over half — around 3 percent — of, and lies stably below, what it was those three doublings ago. In the entire history of economic growth, this seems unprecedented, suggesting that we may already be on the other side of the highest growth rates we will ever see. For up until this point, a three-time doubling of the economy has, rare fluctuations aside, led to an increase in the annual growth rate.

And this “past peak growth” hypothesis looks even stronger if we look at 1955, with a growth rate of a little less than six percent and a world product at 5,430 billion 1990 U.S dollars, which doubled four times gives just under 87,000 billion — about where we should expect today’s world product to be. Yet throughout the history of our economic development, four doublings has meant a clear increase in the annual growth rate, at least in terms of the underlying trend; not a stable decrease of almost 50 percent. This tentatively suggests that we should not expect to see growth rates significantly higher than those of today sustained in the future.

Could we be past peak growth in science and technology?

That peak growth lies in the past may also be true of technological progress in particular, or at least many forms of technological progress, including the progress in computing power tracked by Moore’s law, where the growth rate appears to have been highest around 1990-2005, and to since have been in decline, cf. this article and the first graphs found here and here. Similarly, various sources of data and proxies tracking the number of scientific articles published and references cited over time also suggest that we could be past peak growth in science as well, at least in many fields when evaluated based on such metrics, with peak growth seeming to have been reached around 2000-2010.

Yet again, these numbers — those tracking economic, technological, and scientific progress — are of course closely connected, as growth in each of these respects contributes to, and is even part of, growth in the others. Indeed, one study found the doubling time of the total number of scientific articles in recent decades to be 15 years, corresponding to an annual growth rate of 4.7 percent, strikingly similar to the growth rate of the global economy in recent decades. Thus, declining growth rates both in our economy, technology, and science cannot be considered wholly independent sources of evidence that growth rates are now declining for good. We can by no means rule out that growth rates might increase in all these areas in the future — although, as we saw above with respect to the limits of Moore’s law and economic progress, such an increase, if it is going to happen, must be imminent if current growth rates remain relatively stable.

Might recent trends make us bias-prone?

How might it be relevant that we may be past peak economic growth at this point? Could it mean that our expectations for the future are likely to be biased? Looking back toward the 1960s might be instructive in this regard. For when we look at our economic history up until the 1960s, it is not so strange that people made many unrealistic predictions about the future around this period. Because not only might it have appeared natural to project the high growth rate at the time to remain constant into the future, which would have led to today’s global GDP being more than twice of what it is; it might also have seemed reasonable to predict the growth rates to keep on rising even further. After all, that was what they had been doing consistently up until that point, so why should it not continue in the following decades, resulting in flying cars and conversing robots by the year 2000? Such expectations were not that unreasonable given the preceding economic trends.

The question is whether we might be similarly overoptimistic about future economic progress today given recent, possibly unique, growth trends, specifically the unprecedented increase in absolute annual growth that we have seen over the past two decades. The same may apply to the trends in scientific and technological progress cited above, where peak growth in many areas appears to have happened in the period 1990-2010, meaning that we could now be at a point where we are disposed to being overoptimistic about further progress.

Yet, again, it is highly uncertain at this point whether growth rates, of the economy in general and of progress in technology and science in particular, will increase again in the future. Future economic growth may not conform well to the model with roughly symmetric growth rates around the 1960s, although the model certainly deserves some weight. All we can say for sure is that growth rates must become near-zero relatively soon. What the path toward that point will look like remains an open question. We could well be in the midst of a temporary decline in growth rates that will be followed by growth rates significantly greater than those of the 1960s, cf. the new growth mode envisioned by Robin Hanson.2

Implications: This is an extremely special time

Applying the mediocrity principle, we should not expect to live in an extremely unique time. Yet, in light of the facts about the ultimate limits to growth seen above, it is clear that we do: we are living during the childhood of civilization where there is still rapid growth, at the pace of doublings within a couple of decades. If civilization persists with similar growth rates, it will soon become a grown-up with near-zero relative growth. And it will then look back at our time — today plus minus a couple of centuries, most likely — as the one where growth rates were by far the highest in its entire history, which may be more than a trillion years.

It seems that a few things follow from this. First, more than just being the time where growth rates are the highest, this may also, for that very reason, be the time where individuals can influence the future of civilization more than any other time. In other words, this may be the time where the outcome of the future is most sensitive to small changes, as it seems plausible, although far from clear, that small changes in the trajectory of civilization are most significant when growth rates are highest. An apt analogy might be a psychedelic balloon with fluctuating patterns on its surface, where the fluctuations that happen to occur when we blow up the balloon will then also be blown up and leave their mark in a way that fluctuations occurring before and after this critical growth period will not (just like quantum fluctuations in the early universe got blown up during cosmic expansion, and thereby in large part determined the grosser structure of the universe today). Similarly, it seems much more difficult to cause changes across all of civilization when it spans countless star systems compared to today.

That being said, it is not obvious that small changes — in our actions, say — are more significant in this period where growth rates are many orders of magnitude higher than in any other time. It could also be that such changes are more consequential when the absolute growth is the highest. Or perhaps when it is smallest, at least as we go backwards in time, as there were far fewer people back when growth rates were orders of magnitude lower than today, and hence any given individual comprised a much greater fraction of all individuals than an individual does today.

Still, we may well find ourselves in a period where we are uniquely positioned to make irreversible changes that will echo down throughout the entire future of civilization.3 To the extent that we are, this should arguably lead us to update toward trying to influence the far future rather than the near future. More than that, if it does hold true that the time where the greatest growth rates occur is indeed the time where small changes are most consequential, this suggests that we should increase our credence in the simulation hypothesis. For if realistic sentient simulations of the past become feasible at some point, the period where the future trajectory of civilization seems the most up for grabs would seem an especially relevant one to simulate and learn more about. However, one can also argue that the sheer historical uniqueness of our current growth rates alone, regardless of whether this is a time where the fate of our civilization is especially volatile, should lead us to increase this credence, as such uniqueness may make it a more interesting time to simulate, and because being in a special time in general should lead us to increase our credence in the simulation hypothesis (see for instance this talk for a case for why being in a special time makes the simulation hypothesis more likely).4

On the other hand, one could also argue that imminent near-zero growth rates, along with the weak indications that we may now be past peak growth in many respects, provide a reason to lower our credence in the simulation hypothesis, as these observations suggest that the ceiling for what will be feasible in the future may be lower than we naively expect in light of today’s high growth rates. And thus, one could argue, it should make us more skeptical of the central premise of the simulation hypothesis: that there will be (many) ancestor simulations in the future. To me, the consideration in favor of increased credence seems stronger, although it does not significantly move my overall credence in the hypothesis, as there are countless other factors to consider.5


Appendix: Questioning our assumptions

Caspar Oesterheld pointed out to me that it might be worth meditating on how confident we can be in these conclusions given that apparently solid predictions concerning the ultimate limits to growth have been made before, yet quite a few of these turned out to be wrong. Should we not be open to the possibility that the same might be true of (at least some of) the limits we reviewed in the beginning of this essay?

Could our understanding of physics be wrong?

One crucial difference to note is that these failed predictions were based on a set of assumptions — e.g. about the amount of natural resources and food that would be available — that seem far more questionable than the assumptions that go into the physics-based predictions we have reviewed here: that our apparently well-established physical laws and measurements indeed are valid, or at least roughly so. The epistemic status of this assumption seems a lot more solid, to put it mildly. So there does seem to be a crucial difference here. This is not to say, however, that we should not maintain some degree of doubt as to whether this assumption is correct (I would argue that we always should). It just seems that this degree of doubt should be quite low.

Yet, to continue the analogy above, what went wrong with the aforementioned predictions was not so much that limits did not exist, but rather that humans found ways of circumventing them through innovation. Could the same perhaps be the case here? Could we perhaps some day find ways of deriving energy from dark energy or some other yet unknown source, even though physicists seem skeptical? Or could we, as Ray Kurzweil speculates, access more matter and energy by finding ways of travelling faster than light, or by finding ways of accessing other parts of our notional multiverse? Might we even become able to create entirely new ones? Or to eventually rewrite the laws of nature as we please? (Perhaps by manipulating our notional simulators?) Again, I do not think any of these possibilities can be ruled out completely. Indeed, some physicists argue that the creation of new pocket universes might be possible, not in spite of “known” physical principles (or rather theories that most physicists seem to believe, such as inflationary theory), but as a consequence of them. However, it is not clear that anything from our world would be able to expand into, or derive anything from, the newly created worlds on any of these models (which of course does not mean that we should not worry about the emergence of such worlds, or the fate of other “worlds” that we perhaps could access).

All in all, the speculative possibilities raised above seem unlikely, yet they cannot be ruled out for sure. The limits we have reviewed here thus represent a best estimate given our current, admittedly incomplete, understanding of the universe in which we find ourselves, not an absolute guarantee. However, it should be noted that this uncertainty cuts both ways, in that the estimates we have reviewed could also overestimate the limits to various forms of growth by countless orders of magnitude.

Might our economic reasoning be wrong?

Less speculatively, I think, one can also question the validity of our considerations about the limits of economic progress. I argued that it seems implausible that we in three thousand years could have an economy so big that each atom in our galaxy would have to support an economy equivalent to a single person living at today’s living standard. Yet could one not argue that the size of the economy need not depend on matter in this direct way, and that it might instead depend on the possible representations that can be instantiated in matter? If economic value could be mediated by the possible permutations of matter, our argument about a single atom’s need to support entire economies might not have the force it appears to have. For instance, there are far more legal positions on a Go board than there are atoms in the visible universe, and that’s just legal positions on a Go board. Perhaps we need to be more careful when thinking about how atoms might be able to create and represent economic value?

It seems like there is a decent point here. Still, I think economic growth at current rates is doomed. First, it seems reasonable to be highly skeptical of the notion that mere potential states could have any real economic value. Today at least, what we value and pay for is not such “permutation potential”, but the actual state of things, which is as true of the digital realm as of the physical. We buy and stream digital files such as songs and movies because of the actual states of these files, while their potential states mean nothing to us. And even when we invest in something we think has great potential, like a start-up, the value we expect to be realized is still ultimately one that derives from its actual state, namely the actual state we hope it will assume, not its number of theoretically possible permutations.

It is not clear why this would change, or how it could. After all, the number of ways one can put all the atoms in the galaxy together is the same today as it will be ten thousand years from now. Organizing all these atoms into a single galactic supercomputer would only seem to increase the value of their actual state.

Second, economic growth still seems tightly constrained by the shackles of physical limitations. For it seems inescapable that economies, of any kind, are ultimately dependent on the transfer of resources, whether these take the form of information or concrete atoms. And such transfers require access to energy, the growth of which we know to be constrained, as is true of the growth of our ability to process information. As these underlying resources that constitute the lifeblood of any economy stop growing, it seems unlikely that the economy can avoid this fate as well. (Tom Murphy touches on similar questions in his analysis of the limits to economic growth.)

Again, we of course cannot exclude that something crucial might be missing from these considerations. Yet the conclusion that economic growth rates will decline to near-zero levels relatively soon, on a cosmic timescale at least, still seems a safe bet in my view.

Acknowledgments

I would like to thank Brian Tomasik, Caspar Oesterheld, Duncan Wilson, Kaj Sotala, Lukas Gloor, Magnus Dam, Max Daniel, and Tobias Baumann for valuable comments and inputs. This essay was originally published at the website of the Foundational Research Institute, now the Center on Long-Term Risk. 


Notes

1. One may wonder whether there might not be more efficient ways to derive energy from the non-stellar matter in our galaxy than to convert it into stars as we know them. I don’t know, yet a friend of mine who does research in plasma physics and fusion says that he does not think one could, especially if we, as we have done here, disregard the energy required to clump the dispersed matter together so as to “build” the star, a process that may well take more energy than the star can eventually deliver.

The aforementioned paper by Lawrence Krauss and Glenn Starkman also contains much information about the limits of energy use, and in fact uses accessible energy as the limiting factor that bounds the amount of information processing any (local) civilization could do (they assume that the energy that is harvested is beamed back to a “central observer”).

2. It should be noted, though, that Hanson by no means rules out that such a growth mode may never occur, and that we might already be past, or in the midst of, peak economic growth: “[…] it is certainly possible that the economy is approaching fundamental limits to economic growth rates or levels, so that no faster modes are possible […]”

3. The degree to which there is sensitivity to changes of course varies between different endeavors. For instance, natural science seems more convergent than moral philosophy, and thus its development is arguably less sensitive to the particular ideas of individuals working on it than the development of moral philosophy is.

4. One may then argue that this should lead us to update toward focusing more on the near future. This may be true. Yet should we update more toward focusing on the far future given our ostensibly unique position to influence it? Or should we update more toward focusing on the near future given increased credence in the simulation hypothesis? (Provided that we indeed do increase this credence, cf. the counter-consideration above.) In short, it mostly depends on the specific probabilities we assign to these possibilities. I myself happen to think the far future should dominate, as I assign the simulation hypothesis (as commonly conceived) a very small probability.

5. For instance, fundamental epistemological issues concerning how much one can infer based on impressions from a simulated world (which may only be your single mind) about a simulating one (e.g. do notions such as “time” and “memory” correspond to anything, or even make sense, in such a “world”?); the fact that the past cannot be simulated realistically, since we can only have incomplete information about a given physical state in the past (not only because we have no way to uncover all the relevant information, but also because we cannot possibly represent it all, even if we somehow could access it — for instance, we cannot faithfully represent the state of every atom in our solar system in any point in the past, as this would require too much information), and a simulation of the past that contains incomplete information would depart radically from how the actual past unfolded, as all of it has a non-negligible causal impact (even single photons, which, it appears, are detectable by the human eye), and this is especially true given that the vast majority of information would have to be excluded (both due to practical constraints to what can be recovered and what can be represented); whether conscious minds can exist on different levels of abstraction; etc.

Is AI Alignment Possible?

The problem of AI alignment is usually defined roughly as the problem of making powerful artificial intelligence do what we humans want it to do. My aim in this essay is to argue that this problem is less well-defined than many people seem to think, and to argue that it is indeed impossible to “solve” with any precision, not merely in practice but in principle.

There are two basic problems for AI alignment as commonly conceived. The first is that human values are non-unique. Indeed, in many respects, there is more disagreement about values than people tend to realize. The second problem is that even if we were to zoom in on the preferences of a single human, there is, I will argue, no way to instantiate a person’s preferences in a machine so as to make it act as this person would have preferred.

Problem I: Human Values Are Non-Unique

The common conception of the AI alignment problem is something like the following: we have a set of human preferences, X, which we must, somehow (and this is usually considered the really hard part), map onto some machine’s goal function, Y, via a map f, let’s say, such that X and Y are in some sense isomorphic. At least, this is a way of thinking about it that roughly tracks what people are trying to do.

Speaking in these terms, much attention is being devoted to Y and f compared to X. My argument in this essay is that we are deeply confused about the nature of X, and hence confused about AI alignment.

The first point of confusion is about the values of humanity as a whole. It is usually acknowledged that human values are fuzzy, and that there are some disagreements over values among humans. Yet it is rarely acknowledged just how strong this disagreement in fact is.

For example, concerning the ideal size of the future population of sentient beings, the disagreement is near-total, as some (e.g. some defenders of the so-called Asymmetry in population ethics, as well as anti-natalists such as David Benatar) argue that the future population should ideally be zero, while others, including many classical utilitarians, argue that the future population should ideally be very large. Many similar examples could be given of strong disagreements concerning the most fundamental and consequential of ethical issues, including whether any positive good can ever outweigh extreme suffering. And on many of these crucial disagreements, a very large number of people will be found on both sides.

Different answers to ethical questions of this sort do not merely give rise to small practical disagreements. In many cases, they imply completely opposite practical implications. This is not a matter of human values being fuzzy, but a matter of them being sharply, irreconcilably inconsistent. And hence there is no way to map the totality of human preferences, “X”, onto a single, well-defined goal-function in a way that does not conflict strongly with the values of a significant fraction of humanity. This is a trivial point, and yet most talk of human-aligned AI seems to skirt this fact.

Problem II: Present Human Preferences Are Underdetermined Relative to Future Actions

The second problem and point of confusion with respect to the nature of human preferences is that, even if we focus only on the present preferences of a single human, then these in fact do not, and indeed could not, determine with much precision what kind of world this person would prefer to bring about in the future.

One way to see this point is to think in terms of the information required to represent the world around us. A perfectly precise such representation would require an enormous amount of information, indeed far more information than what can be contained in our brain. This holds true even if we only consider morally relevant entities around us — on the planet, say. There are just too many of them for us to have a precise representation of them. By extension, there are also too many of them for us to be able to have precise preferences about their individual states. Given that we have very limited information at our disposal, all we can do is express extremely coarse-grained and compressed preferences about what state the world around us should ideally have. In other words, any given human’s preferences are bound to be extremely vague about the exact ideal state of the world right now, and there will be countless moral dilemmas occurring across the world right now to which our preferences, in their present state, do not specify a unique solution.

And yet this is just considering the present state of the world. When we consider future states, the problem of specifying ideal states and resolutions to hitherto unknown moral dilemmas only explodes in complexity, and indeed explodes exponentially as time progresses. It is simply a fact, and indeed quite an obvious one at that, that no single brain could possibly contain enough information to specify unique, or indeed just qualified, solutions to all moral dilemmas that will arrive in the future. So what, then, could AI alignment relative to even a single brain possibly mean? How can we specify Y with respect to these future dilemmas when X itself does not specify solutions?

We can, of course, try to guess what a given human, or we ourselves, might say if confronted with a particular future moral dilemma and given knowledge about it, yet the problem is that our extrapolated guess is bound to be just that: a highly imperfect guess. For even a tiny bit of extra knowledge or experience can readily change a person’s view of a given moral dilemma to be the opposite of what it was prior to acquiring that knowledge (for instance, I myself switched from being a classical to a negative utilitarian based on a modest amount of information in the form of arguments I had not considered before). This high sensitivity to small changes in our brain implies that even a system with near-perfect information about some person’s present brain state would be forced to make a highly uncertain guess about what that person would actually prefer in a given moral dilemma. And the further ahead in time we go, and thus further away from our familiar circumstance and context, the greater the uncertainty will be.

By analogy, consider the task of AI alignment with respect to our ancestors ten million years ago. What would their preferences have been with respect to, say, the future of space colonization? One may object that this is underdetermined because our ancestors could not conceive of this possibility, yet the same applies to us and things we cannot presently conceive of, such as alien states of consciousness. Our current preferences say about as little about the (dis)value of such states as the preferences of our ancestors ten million years ago said about space colonization.

A more tangible analogy might be to consider the level of confidence with which we, based on knowledge of your current brain state, can determine your dinner preferences twenty years from now with respect to dishes made from ingredients not yet invented — a preference that will likely be influenced by contingent, environmental factors found between now and then. Not with great confidence, it seems safe to say. And this point pertains not only to dinner preferences but also to the most consequential of choices. Our present preferences cannot realistically determine, with any considerable precision, what we would deem ideal in as yet unknown, realistic future scenarios. Thus, by extension, there can be no such thing as value extrapolation or preservation in anything but the vaguest sense. No human mind has ever contained, or indeed ever could contain, a set of preferences that evaluatively orders more than but the tiniest sliver of (highly compressed versions of) real-world states and choices an agent in our world is likely to face in the future. To think otherwise amounts to a strange Platonization of human preferences. We just do not have enough information in our heads to possess such fine-grained values.

The truth is that our preferences are not some fixed entity that determine future actions uniquely; they simply could not be that. Rather, our preferences are themselves interactive and adjustive in nature, changing in response to new experiences and new information we encounter. Thus, to say that we can “idealize” our present preferences so as to obtain answers to all realistic future moral dilemmas is rather like calling the evolution of our ancestors’ DNA toward human DNA a “DNA idealization”. In both cases, we find no hidden Deep Essences waiting to be purified; no information that points uniquely toward one particular solution in the face of all realistic future “problems”. All we find are physical systems that evolve contingently based on the inputs they receive.*

The bottom line of all this is not that it makes no sense to devote resources toward ensuring the safety of future machines. We can still meaningfully and cooperatively seek to instill rules and mechanisms in our machines and institutions that seem optimal in expectation given our respective, coarse-grained values. The conclusion here is just that 1) the rules instantiated cannot be the result of a universally shared human will or anything close; the closest thing possible would be rules that embody some compromise between people with strongly disagreeing values. And 2) such an instantiation of coarse-grained rules in fact comprises the upper bound of what we can expect to accomplish in this regard. Indeed, this is all we can expect with respect to future influence in general: rough and imprecise influence and guidance with the limited information we can possess and transmit. The idea of a future machine that will do exactly what we would want, and whose design therefore constitutes a lever for precise future control, is a pipe dream.


* Note that this account of our preferences is not inconsistent with value or moral realism. By analogy, consider human preferences and truth-seeking: humans are able to discover many truths about the universe, yet most of these truths are not hidden in, nor extrapolated from, our DNA or our preferences. Indeed, in many cases, we only discover these truths by actively transcending rather than “extrapolating” our immediate preferences (for comfortable and intuitive beliefs, say). The same could apply to the realm of value and morality.

Why Altruists Should Perhaps Not Prioritize Artificial Intelligence: A Lengthy Critique

The following is a point-by-point critique of Lukas Gloor’s essay Altruists Should Prioritize Artificial Intelligence. My hope is that this critique will serve to make it clear — to Lukas, myself, and others — where and why I disagree with this line of argument, and thereby hopefully also bring some relevant considerations to the table with respect to what we should be working on to best reduce suffering. I should like to note, before I begin, that I have the deepest respect for Lukas, and that I consider his work very important and inspiring.

Below, I quote every paragraph from the body of Lukas’ article, which begins with the following abstract:

The large-scale adoption of today’s cutting-edge AI technologies across different industries would already prove transformative for human society. And AI research rapidly progresses further towards the goal of general intelligence. Once created, we can expect smarter-than-human artificial intelligence (AI) to not only be transformative for the world, but also (plausibly) to be better than humans at self-preservation and goal preservation. This makes it particularly attractive, from the perspective of those who care about improving the quality of the future, to focus on affecting the development goals of such AI systems, as well as to install potential safety precautions against likely failure modes. Some experts emphasize that steering the development of smarter-than-human AI into beneficial directions is important because it could make the difference between human extinction and a utopian future. But because we cannot confidently rule out the possibility that some AI scenarios will go badly and also result in large amounts of suffering, thinking about the impacts of AI is paramount for both suffering-focused altruists as well as those focused on actualizing the upsides of the very best futures.

An abstract of my thoughts on this argument:

My response to this argument is twofold: 1) I do not consider the main argument presented by Lukas, as I understand it, to be plausible, and 2) I think we should think hard about whether we have considered the opportunity cost carefully enough. We should not be particularly confident, I would argue, that any of us have found the best thing to focus on to reduce the most suffering.

I do not think the claim that “altruists can expect to have the largest positive impact by focusing on artificial intelligence” is warranted. In part, my divergence from Lukas rests on empirical disagreements, and in larger part it stems from what may be called “conceptual disagreements” — I think most talk about “superintelligence” is conceptually confused. For example, intelligence as “cognitive abilities” is liberally conflated with intelligence as “the ability to achieve goals in general”, and this confusion does a lot of deceptive work.

I would advocate for more foundational research into the question of what we ought to prioritize. Artificial intelligence undoubtedly poses many serious risks, yet it is important that we maintain a sense of proportion with respect to these risks relative to other serious risks, many of which we have not even contemplated yet.

I will now turn to the full argument presented by Lukas.

I. Introduction and definitions

Terms like “AI” or “intelligence” can have many different (and often vague) meanings. “Intelligence” as used here refers to the ability to achieve goals in a wide range of environments. This definition captures the essence of many common perspectives on intelligence (Legg & Hutter, 2005), and conveys the meaning that is most relevant to us, namely that agents with the highest comparative goal-achieving ability (all things considered) are the most likely to shape the future.

A crucial thing to flag is that “intelligence” here refers to the ability to achieve goals — not to scoring high on an IQ test, or “intelligence” as “advanced cognitive abilities”. And these are not the same, and should not be conflated (indeed, this is one of the central points of my book Reflections on Intelligence, which dispenses with the muddled term “intelligence” at an early point, and instead examines the nature of this better defined “ability to achieve goals” in greater depth).

While it is true that the concept of goal achieving is related to the concept of IQ, the latter is much narrower, as it relates to a specific class of goals. Boosting the IQ of everyone would not immediately boost our ability to achieve goals in every respect — at least not immediately, and not to the same extent across all domains. For even if we all woke up with an IQ of 200 tomorrow, all the external technology with which we run and grow our economy would still be the same. Our cars would drive just as fast, the energy available to us would be the same, and so would the energy efficiency of our machines. And while a higher IQ might now enable us to grow this external technology faster, there are quite restricting limits to how much it can grow. Most of our machines and energy harvesting technology cannot be made many times more efficient, as their efficiency is already a significant fraction — 15 to 40 percent — of the maximum physical limit. In other words, their efficiency cannot be doubled more than a couple of times, if even that.

One could then, of course, build more machines and power plants, yet such an effort would itself be constrained strongly by the state of our external technology, including the energy available to us; not just by the cognitive abilities available. This is one of the reasons I am skeptical of the idea of AI-powered runaway growth. Yes, greater cognitive abilities is a highly significant factor, yet there is just so much more to growing the economy and our ability to achieve a wide range of goals than that, as evidenced by the fact that we have seen a massive increase in computer-powered cognitive abilities — indeed, exponential growth for many decades by many measures — and yet we have continued to see fairly stable, in fact modestly declining, economic growth.

If one considers the concept of “increase in cognitive powers” to be the same as “increase in the ability to achieve goals, period” then this criticism will be missed. “I defined intelligence to be the ability to achieve goals, so when I say intelligence is increased, then all abilities are increased.” One can easily come to entertain a kind of motte and bailey argument in this way, by moving back and forth between this broad notion of intelligence as “the ability to achieve goals” and the more narrow sense of intelligence as “cognitive abilities”. To be sure, a statement like the one above need not be problematic as such, as long as one is clear that this concept of intelligence lies very far from “intelligence as measured by IQ/raw cognitive power”. Such clarity is often absent, however, and thus the statement is quite problematic in practice, with respect to the goals of communicating clearly and not confusing ourselves.

Again, my main point here is that increasing cognitive powers should not be conflated with increasing the ability to achieve goals in general — in every respect. I think much confusion springs from a lack of clarity on this matter.

While everyday use of the term “intelligence” often refers merely to something like “brainpower” or “thinking speed,” our usage also presupposes rationality, or goal-optimization in an agent’s thinking and acting. In this usage, if someone is e.g. displaying overconfidence or confirmation bias, they may not qualify as very intelligent overall, even if they score high on an IQ test. The same applies to someone who lacks willpower or self control.

This is an important step toward highlighting the distinction between “goal achieving ability” and “IQ”, yet it is still quite a small step, as it does not really go much beyond distinguishing “high IQ” from “optimal cognitive abilities for goal achievement”. We are still talking about things going on in a single human head (or computer), while leaving out the all-important aspect that is (external) culture and technology. We are still not talking about the ability to achieve goals in general.

Artificial intelligence refers to machines designed with the ability to pursue tasks or goals. The AI designs currently in use – ranging from trading algorithms in finance, to chess programs, to self-driving cars – are intelligent in a domain-specific sense only. Chess programs beat the best human players in chess, but they would fail terribly at operating a car. Similarly, car-driving software in many contexts already performs better than human drivers, but no amount of learning (at least not with present algorithms) would make [this] software work safely on an airplane.

My only comment here would be that it is not quite clear what counts as artificial intelligence. For example, would a human, edited as well as unedited, count as “a machine designed with the ability to pursue tasks or goals”? And could not all software be considered “designed with the ability to pursue tasks or goals”, and hence all software would be artificial intelligence by this definition? If so, we should then just be clear that this definition is quite broad, including both all humans and all software, and more.

The most ambitious AI researchers are working to build systems that exhibit (artificial) general intelligence (AGI) – the type of intelligence we defined above, which enables the expert pursuit of virtually any task or objective.

This is where the distinction we drew above becomes relevant. While the claim quoted above may be true in one sense, we should be clear that the most ambitious AI researchers are not working to increase “all our abilities”, including our ability to get more energy out of our steam engines and solar panels. Our economy arguably works on that broader endeavor. AI researchers, in contrast, work only on bettering what may be called “artificial cognitive abilities”, which, granted, may in turn help spur growth in many other areas (although the degree to which it would do so is quite unclear, and likely surprisingly limited in the big picture, since “growth may be constrained not by what we are good at but rather by what is essential and yet hard to improve”).

In the past few years, we have witnessed impressive progress in algorithms becoming more and more versatile. Google’s DeepMind team for example built an algorithm that learned to play 2-D Atari games on its own, achieving superhuman skill at several of them (Mnih et al., 2015). DeepMind then developed a program that beat the world champion in the game of Go (Silver et al., 2016), and – tackling more practical real-world applications – managed to cut down data center electricity costs by rearranging the cooling systems.

I think it is important not to overstate recent progress compared to progress in the past. We also saw computers becoming better than humans at many things several decades ago, including many kinds of mathematical calculations (and people also thought that computers would soon beat humans at everything back then). So superhuman skill at many tasks is not what is new and unique about recent progress, but rather that these superhuman skills have been attained via self-training, and, as Lukas notes, that the skills achieved by this training seem of a broader, more general nature than the skills of a single algorithm in the past.

And yet the breadth of these skills should not be overstated either, as the skills cited are all acquired in a rather expensive trial-and-error fashion with readily accessible feedback. This mode of learning surely holds a lot of promise in many areas, yet there are reasons to be skeptical that such learning can bring us significantly closer to achieving all the cognitive and motor abilities humans have (see also David Pearce’s “Humans and Intelligent Machines“; one need not agree with Pearce on everything to agree with some of his reasons to be skeptical).

That DeepMind’s AI technology makes quick progress in many domains, without requiring researchers to build new architecture from scratch each time, indicates that their machine learning algorithms have already reached an impressive level of general applicability. (Edit: I wrote the previous sentence in 2016. In the meantime [January 2018] DeepMind went on to refine its Go-playing AI, culminating in a version called AlphaGo Zero. While the initial version of DeepMind’s Go-playing AI started out with access to a large database of games played by human experts, AlphaGo Zero only learns through self-play. Nevertheless, it managed to become superhuman after a mere 4 days of practice. After 40 days of practice, it was able to beat its already superhuman predecessor 100–0. Moreover, Deepmind then created the version AlphaZero, which is not a “Go-specific” algorithm anymore. Fed with nothing but the rules for either Go, chess, or shogi, it managed to become superhuman at each of these games in less than 24 hours of practice.)

This is no doubt impressive. Yet it is also important not to overstate how much progress that was achieved in 24 hours of practice. This is not, we should be clear, a story about innovation going from zero to superhuman in 24 hours, but rather the story of immense amounts of hardware developed over decades which has then been fed with an algorithm that has also been developed over many years by many people. And then, this highly refined algorithm running on specialized, cutting-edge hardware is unleashed to reach its dormant potential.

And this potential was, it should be noted, not vastly superior to the abilities of previous systems. In chess, for instance, AlphaZero beat the chess program Stockfish (although Stockfish author Tord Romstad notes that it was a version that was a year old and not running on optimal hardware) 25 times as white, 3 as black, and drew the remaining 72 times. Thus, it was significantly better, yet it still did not win in most of the games. Similarly, in Go, AlphaZero won 60 games and lost 40, while in Shogi it won 90 times, lost 8, and drew twice.

Thus, AlphaZero undoubtedly constituted clear progress with respect to these games, yet not an enormous leap that rendered it unbeatable, and certainly not a leap made in a single day.

The road may still be long, but if this trend continues, developments in AI research will eventually lead to superhuman performance across all domains. As there is no reason to assume that humans have attained the maximal degree of intelligence (Section III), AI may soon after reaching our own level of intelligence surpass it.

Again, I would start by noting that human “intelligence” as our “ability to achieve goals” is strongly dependent on the state of our technology and culture at large, not merely our raw cognitive powers. And the claim made above that there is no reason to believe that humans have attained “the maximal degree of intelligence” seems, in this context, to mostly refer to our cognitive abilities rather than our ability to achieve goals in general. For with respect to our ability to achieve goals in general, it is clear that our abilities are not maximal, but indeed continually growing, largely as the result of better software and better machines. Thus, there is not a dichotomous relationship between “human abilities to achieve goals” and “our machines’ abilities to achieve goals”. And given that our ability to achieve goals is in many ways mostly limited by what our best technology can do — how fast our airplanes can fly, how fast our hardware is, how efficient our power plants are, etc. — it is not clear why some other agent or set of agents coming to control this technology (which is extremely difficult to imagine in the first place given the collaborative nature of the grosser infrastructure of this technology) should be vastly more capable of achieving goals than humans powered by/powering this technology.

As for AI surpassing “our own level of intelligence”, one can say that, at the level of cognitive tasks, machines have already been vastly superhuman in many respects for many years — in virtually all mathematical calculations, for instance. And now also in many games, ranging from Atari to Go. Yet, as noted above, I would argue that, so far, such progress has amounted to a clear increase in human “intelligence” in the general sense: it has increased our ability to achieve goals.

Nick Bostrom (2014) popularized the term superintelligence to refer to (AGI-)systems that are vastly smarter than human experts in virtually all respects. This includes not only skills that computers traditionally excel at, such as calculus or chess, but also tasks like writing novels or talking people into doing things they otherwise would not. Whether AI systems would quickly develop superhuman skills across all possible domains, or whether we will already see major transformations with [superhuman skills in] just a [few] such domains while others lag behind, is an open question.

I would argue that our machines already have superhuman skills in countless domains, and that this has indeed already given rise to major transformations, in one sense of this term at least.

Note that the definitions of “AGI” and “superintelligence” leave open the question of whether these systems would exhibit something like consciousness.

I have argued to the contrary in the chapter “Consciousness — Orthogonal or Crucial?” in Reflections on Intelligence.

This article focuses on the prospect of creating smarter-than-human artificial intelligence. For simplicity, we will use the term “AI” in a non-standard way here, to refer specifically to artificial general intelligence (AGI).

Again, I would flag that the meaning of the term general intelligence, or AGI, in this context is not clear. It was defined above as the ability that “enables the expert pursuit of virtually any task or objective”. Yet the ability of humans to achieve goals in general is, I would still argue, in large part the product of their technology and culture at large, and AGI, as Lukas uses it here, does not seem to refer to anything remotely like this, i.e. “the sum of the capabilities of our technology and culture”. Instead, it seems to refer to something much more narrow and singular — something akin to “a system that possesses (virtually) all the cognitive abilities that a human does, and which possesses them at a similar or greater level”. I think this is worth highlighting.

The use of “AI” in this article will also leave open how such a system is implemented: While it seems plausible that the first artificial system exhibiting smarter-than-human intelligence will be run on some kind of “supercomputer,” our definition allows for alternative possibilities.

Again, what does “smarter-than-human intelligence” mean here? Machines can already do things that no unaided human can. It seems to refer to what I defined above: “a system that possesses (virtually) all the cognitive abilities that a human does, and which possesses them at a similar or greater level” — not the ability to achieve goals in general. And as for when a computer might have “(virtually) all the cognitive abilities that a human does”, it seems highly doubtful that any system will ever suddenly emerge with them all, given the modular, many-faceted nature of our minds. Instead, it seems much more likely that the gradual process of machines becoming better than humans at particular tasks will continue in its usual, gradual way. Or so I have argued.

The claim that altruists should focus on affecting AI outcomes is therefore intended to mean that we should focus on scenarios where the dominant force shaping the future is no longer (biological) human minds, but rather some outgrowth of information technology – perhaps acting in concert with biotechnology or other technologies. This would also e.g. allow for AI to be distributed over several interacting systems.

I think this can again come close to resembling a motte and bailey argument: it seems very plausible that the future will not be controlled mostly by what we would readily recognize as biological humans today. Yet to say that we should aim to impact such a future by no means implies that we should aim to impact, say, a small set of AI systems which might determine the entire future based on their goal functions (note: I am not saying Lukas has made this claim above, but this is often what people seem to consider the upshot of arguments of this kind, and also what it seems to me that Lukas is arguing below, in the rest of his essay). Indeed, the claim above is hardly much different from saying that we should aim to impact the long-term future. But Lukas seems to be moving back and forth between this general claim and the much narrower claim that we should focus on scenarios involving rapid growth acceleration driven mostly by software, which is the kind of scenario his essay seems almost exclusively focused on.

II. It is plausible that we create human-level AI this century

Even if we expect smarter-than-human artificial intelligence to be a century or more away, its development could already merit serious concern. As Sam Harris emphasized in his TED talk on risks and benefits of AI, we do not know how long it will take to figure out how to program ethical goals into an AI, solve other technical challenges in the space of AI safety, or establish an environment with reduced dangers of arms races. When the stakes are high enough, it pays to start preparing as soon as possible. The sooner we prepare, the better our chances of safely managing the upcoming transition.

I agree that it is worth preparing for high-stakes outcomes. But I think it is crucial that we get a clear sense of what these might look like, as well as how likely they are. “Altruists Should Prioritize Exploring Long-Term Future Outcomes, and Work out How to Best Influence Them”. To say that we should focus on “artificial intelligence”, which has a rather narrow meaning in most contexts (something akin to a software program), when we really mean that we should focus on the future of goal achieving systems in general is, I think, somewhat misleading.

The need for preparation is all the more urgent given that considerably shorter timelines are not out of the question, especially in light of recent developments. While timeline predictions by different AI experts span a wide range, many of those experts think it likely that human-level AI will be created this century (conditional on civilization facing no major disruptions in the meantime). Some even think it may emerge in the first half of this century: In a survey where the hundred most-cited AI researchers were asked in what year they think human-level AI is 10% likely to have arrived by, the median reply was 2024 and the mean was 2034. In response to the same question for a 50% probability of arrival, the median reply was 2050 with a mean of 2072 (Müller & Bostrom, 2016).1

Again, it is important to be careful about definitions. For what is meant by “human-level AI” in this context? The authors of the cited source are careful to define what they mean: “Define a ‘high–level machine intelligence’ (HLMI) as one that can carry out most human professions at least as well as a typical human.”

And yet even this definition is quite vague, since “most human professions” is not a constant. A couple of hundred years ago, the profession of virtually all humans was farming, whereas only a couple percent of people in developed nations are employed in farming today. And this is not an idle point, because as machines become able to do jobs hitherto performed by humans, market forces will push humans to take new jobs that machines cannot do. And these new jobs may be those that require abilities that it will take many centuries for machines to acquire, if non-biological machines will indeed ever acquire them (this is not necessarily that implausible, as these abilities may include “looking like a real, empathetic biological human who ignites our brain circuits in the right ways”).

Thus, the questionnaire above seems poorly defined. And if it asks about most current human professions, its relevance appears quite limited; also because the nature of different professions change over time as well. A doctor today does not do all the same things a doctor did a hundred years ago, and the same will likely apply to doctors of the future. In other words, also within existing professions can we expect to see humans move toward doing the things that machines cannot do/we do not prefer them to do, even as machines become ever more capable.

While it could be argued that these AI experts are biased towards short timelines, their estimates should make us realize that human-level AI this century is a real possibility.

Yet we should keep in mind what they were asked about, and how relevant this is. Even if most (current?) human professions might be done by machines within this century, this does not imply that we will see “a system that possesses (virtually) all the cognitive abilities that a human does, and which possesses them at a similar or greater level” within this century. These are quite different claims.

The next section will argue that the subsequent transition from human-level AI to superintelligence could happen very rapidly after human-level AI actualizes. We are dealing with the decent possibility – e.g. above 15% likelihood even under highly conservative assumptions – that human intelligence will be surpassed by machine intelligence later this century, perhaps even in the next couple of decades. As such a transition will bring about huge opportunities as well as huge risks, it would be irresponsible not to prepare for it.

I want to flag, again, that it is not clear what “human-level AI” means. Lukas seemed to first define intelligence as something like “the ability to achieve goals in general”, which I have argued is not really what he means here (indeed, it is a rather different beast which I seek to examine in Reflections on Intelligence). And the two senses of the term “human-level intelligence” mentioned in the previous paragraph — “the ability to do most human professions” versus “possessing virtually all human cognitive abilities” — should not be conflated either. So it is in fact not clear what is being referred to here, although I believe it is the latter: “possessing virtually all human cognitive abilities at a similar or greater level”.

It should be noted that a potentially short timeline does not imply that the road to superintelligence is necessarily one of smooth progress: Metrics like Moore’s law are not guaranteed to continue indefinitely, and the rate of breakthrough publications in AI research may not increase (or even stay constant) either. The recent progress in machine learning is impressive and suggests that fairly short timelines of a decade or two are not to be ruled out. However, this progress could also be mostly due to some important but limited insights that enable companies like DeepMind to reap the low-hanging fruit before progress would slow down again. There are large gaps still to be filled before AIs reach human-level intelligence, and it is difficult to estimate how long it will take researchers to bridge these gaps. Current hype about AI may lead to disappointment in the medium term, which could bring about an “AI safety winter” with people mistakenly concluding that the safety concerns were exaggerated and smarter-than-human AI is not something we should worry about yet.

This seems true, yet it should also be conceded that a consistent lack of progress in AI would count as at least weak evidence against the claim that we should mainly prioritize what is usually referred to as “AI safety“. And more generally, we should be careful not to make the hypothesis “AI safety is the most important thing we could be working on” into an unfalsifiable one.

As for Moore’s law, not only is it “not guaranteed to continue indefinitely”, but we know, for theoretical reasons, that it must come to an end within a decade, at least in its original formulation concerning silicon transistors, and progress has indeed already been below the prediction of “the law” for some time now. And the same can be said about other aspects in hardware progress: it shows signs of waning off.

If AI progress were to slow down for a long time and then unexpectedly speed up again, a transition to superintelligence could happen with little warning (Shulman & Sandberg, 2010). This scenario is plausible because gains in software efficiency make a larger comparative difference to an AI’s overall capabilities when the hardware available is more powerful. And once an AI develops the intelligence of its human creators, it could start taking part in its own self-improvement (see section IV).

I am not sure I understand the claims being made here. With respect to the first argument about gains in efficiency, the question is how likely we should expect such gains to be if progress has been slow for long. Other things being equal, this would seem less likely in a time where growth is slow than in a time when it is fast, and especially if there is not much growth in hardware either, since hardware growth may in large part be driving growth in software.

I am not sure I follow the claim about AI developing the intelligence of its human creators, and then taking part in its own improvement, but I would just note, as Ramez Naam has argued, that AI, and our machines in general, are already playing a significant role in their own improvement in many ways. In other words, we already actively use our best, most capable technology to build the next generation of such technology.

Indeed, on a more general, yet also less directly relevant note, I would also add that we humans have in some sense been using our most advanced cognitive tools to build the next generation of such tools for hundreds of thousands of years. For over the course of evolution, individual humans have been using the best of their cognitive abilities to select the mates who had the best total package (they could get), of which cognitive abilities were a significant part. In this sense, the idea that “dumb and blind” evolution created intelligent humans is actually quite wrong. The real story is rather one of cognitive abilities actively selecting cognitive abilities (along with other things). A gradual design process over the course of which ever greater cognitive powers were “creating” and in turn created.

For AI progress to stagnate for a long period of time before reaching human-level intelligence, biological brains would have to have surprisingly efficient architectures that AI cannot achieve despite further hardware progress and years of humans conducting more AI research.

Looking over the past decades of AI research and progress, we can say that it indeed has been a fairly long period of time since computers first surpassed humans in the ability to do mathematical calculations, and yet there are still many things humans can do which computers cannot, such as having meaningful conversations with other humans, learning fast from a few examples, and experiencing and expressing feelings. And yet these examples still mostly pertain to cognitive abilities, and hence still overlook other abilities that are also relevant with respect to machines taking over human jobs (if we focus on that definition of “human-level AI”), such as having the physical appearance of a real, biological human, which does seem in strong demand in many professions, especially in the service industry.

However, as long as hardware progress does not come to a complete halt, AGI research will eventually not have to surpass the human brain’s architecture or efficiency anymore. Instead, it could become possible to just copy it: The “foolproof” way to build human-level intelligence would be to develop whole brain emulation (WBE) (Sandberg & Bostrom, 2008), the exact copying of the brain’s pattern of computation (input-output behavior as well as isomorphic internal states at any point in the computation) onto a computer and a suitable virtual environment. In addition to sufficiently powerful hardware, WBE would require scanning technology with fine enough resolution to capture all the relevant cognitive function, as well as a sophisticated understanding of neuroscience to correctly draw the right abstractions. Even though our available estimates are crude, it is possible that all these conditions will be fulfilled well before the end of this century (Sandberg, 2014).

Yet it should be noted that there are many who doubt that this is a foolproof way to build “human-level intelligence” (a term that in this context again seems to mean “a system with roughly the same cognitive abilities as the human brain”). Many doubt that it is even a possibility, and they do so for many different reasons (e.g. that a single, high-resolution scanning of the brain is not enough to capture and enable an emulation of its dynamic workings; that a digital computer cannot adequately simulate the physical complexity of the brain, and that such a computer cannot solve the so-called binding problem.)

Thus, it seems to stand as an open question whether mind uploading is indeed possible, let alone feasible (and it also seems that many people in the broader transhumanist community, who tend to be the people who write and talk the most about mind uploading, could well be biased toward believing it possible, as many of them seem to hope that it can save them from death).

The perhaps most intriguing aspect of WBE technology is that once the first emulation exists and can complete tasks on a computer like a human researcher can, it would then be very easy to make more such emulations by copying the original. Moreover, with powerful enough hardware, it would also become possible to run emulations at higher speeds, or to reset them back to a well-rested state after they performed exhausting work (Hanson, 2016).

Assuming, of course, that WBE will indeed be feasible in the first place. Also, it is worth noting that Robin Hanson himself is critical of the idea that WBEs would be able to create software that is superior to themselves very quickly; i.e. he expects a WBE economy to undergo “many doublings” before it happens.

Sped-up WBE workers could be given the task of improving computer hardware (or AI technology itself), which would trigger a wave of steeply exponential progress in the development of superintelligence.

This is an exceptionally strong claim that would seem in need of justification, and not least some specification, given that it is not clear what “steeply exponential progress in the development of superintelligence” refers to in this context. It hardly means “steeply exponential progress in the development of a super ability to achieve goals in general”, including in energy efficiency and energy harvesting. Such exponential progress is not, I submit, likely to follow from progress in computer hardware or AI technology alone. Indeed, as we saw above, such progress cannot happen with respect to the energy efficiency of most of our machines, as physical limits mean that it cannot double more than a couple of times.

But even if we understand it to be a claim about the abilities of certain particular machines and their cognitive abilities more narrowly, the claim is still a dubious one. It seems to assume that progress in computer hardware and AI technology is constrained chiefly by the amount of hours put into it by those who work on it directly, as opposed to also being significantly constrained by countless other factors, such as developments in other areas, e.g. in physics, production, and transportation, many of which imply limits on development imposed by factors such as hardware and money, not just the amount of human-like genius available.

For example, how much faster should we expect the hardware that AlphaZero was running on to have been developed and completed if a team of super-WBEs had been working on it? Would the materials used for the hardware have been dug up and transported significantly faster? Would they have been assembled significantly faster? Perhaps somewhat, yet hardly anywhere close to twice as fast. The growth story underlying many worries about explosive AI growth is quite detached from how we actually improve our machines, including AI (software and hardware) as well as the harvesting of the energy that powers it (Vaclav Smil: “Energy transitions are inherently gradual processes and this reality should be kept in mind when judging the recent spate of claims about the coming rapid innovative take-overs […]”). Such growth is the result of countless processes distributed across our entire economy. Just as nobody knows how to make a pencil, nobody, including the very best programmers, knows (more than a tiny part of) how to make better machines.

To get a sense of the potential of this technology, imagine WBEs of the smartest and most productive AI scientists, copied a hundred times to tackle AI research itself as a well-coordinated research team, sped up so they can do years of research in mere weeks or even days, and reset periodically to skip sleep (or other distracting activities) in cases where memory-formation is not needed. The scenario just described requires no further technologies beyond WBE and sufficiently powerful hardware. If the gap from current AI algorithms to smarter-than-human AI is too hard to bridge directly, it may eventually be bridged (potentially very quickly) after WBE technology drastically accelerates further AI research.

As far as I understand, much of the progress in machine learning in modern times was essentially due to modern hardware and computing power that made it possible to implement old ideas invented decades ago (of course then implemented with all the many adjustments and tinkering one cannot foresee from the drawing board). In other words, software progress was hardly the most limiting factor. Arguably, the limiting factor was rather that the economy just had not caught up to be able to make hardware advanced enough to implement these theoretical ideas successfully. And it also seems to me quite naive to think that better hardware design, and genius ideas about how to make hardware more generally, was and is a main limiting factor in our growth of computer hardware. Such progress tends to rest critically on other progress in other kinds of hardware and globally distributed production processes. Processes that no doubt can be sped up, yet hardly that significantly by advanced software alone, in large part because such progress is limited by the fact that many of the crucial processes involved in this progress, such as digging up, refining, and transporting materials, are physical processes that can only go so fast.

Beyond that, there is also an opportunity cost consideration that is ignored by the story of fast growth above. For the hardware and energy required for this team of WBEs could otherwise have been used to run other kinds of computations that could help further innovation, including those we already run on full steam to further progress — CAD programs, simulations, equation solving. And it is not clear that using all this hardware for WBEs would be a better use of hardware than would running these other programs, whose work may be considered a limiting factor to AI progress at a similar level as more “purely” human or human-like work is. Indeed, we should not expect engineers and companies to do these kinds of things with their computing resources if they were not among the most efficient things they could do with them. And even if WBEs are a better use of hardware for fast progress, it is far from clear that it would be that much better.

The potential for WBE to come before de novo AI means that – even if the gap between current AI designs and the human brain is larger than we thought – we should not significantly discount the probability of human-level AI being created eventually. And perhaps paradoxically, we should expect such a late transition to happen abruptly. Barring no upcoming societal collapse, believing that superintelligence is highly unlikely to ever happen requires not only confidence that software or “architectural” improvements to AI are insufficient to ever bridge the gap, but also that – in spite of continued hardware progress – WBE could not get off the ground either. We do not seem to have sufficient reason for great confidence in either of these propositions, let alone both.

Again, what does the term “superintelligence” refer to here? Above, it was defined as “(AGI-)systems that are vastly smarter than human experts in virtually all respects”. And given that AGI is defined as a general ability to pursue goals, and that “smart” here presumably means “better able to achieve goals”, one can say that the definition of superintelligence given here translates to “a system that pursues goals better than human experts in virtually all areas”. Yet we are already building systems that satisfy this definition of superintelligence. Our entire economy is already able to do tasks that no single human expert could ever accomplish. But superintelligence likely refers to something else here, something along the lines of: “a system that is vastly more cognitively capable than any human expert in virtually all respects”. And yet, even by this definition, we already have computer systems that can do countless cognitive tasks much better than any human, and the super system that is the union of all these systems can therefore, in many respects at least, be considered to have vastly superior cognitive abilities relative to humans. And systems composed of humans and technology are clearly vastly more capable than any human expert alone in virtually all respects.

In this sense, we clearly do have “superintelligence” already, and we are continually expanding its capabilities. And, with respect to worries above a FOOM takeover, it seems highly unlikely that a single, powerful machine could ever overtake and become more powerful than the entire collective that is the human-machine civilization, which is not to say that low-risk events should be dismissed. But they should be measured against other risks we could be focusing on.

III. Humans are not at peak intelligence

Again, it is important to be clear about what we mean by “intelligence”. Most cognitively advanced? Or best able to achieve goals in general? Humans extended by technology can clearly increase their intelligence, i.e. ability to achieve goals, significantly. We have done so consistently over the last few centuries, and we continue to do so today. And in a world where humans build this growing body of technology to serve their own ends, and in some cases build it to be provably secure, it is far from clear that some non-human system with much greater cognitive powers than humans (which, again, already exists in many domains) will also become more capable of achieving goals in general than humanity, given that it is surrounded by a capable super-system of technology designed for and by humans, controlled by humans, to serve their ends. Again, this is not to say that one should not worry about seemingly improbable risks — we definitely should — but merely that we should doubt the assumption that our making machines more cognitively capable will necessarily imply that they will be better able to achieve goals in general. Again, despite being related, these two senses of “intelligence” must not be confused.

It is difficult to intuitively comprehend the idea that machines – or any physical system for that matter – could become substantially more intelligent than the most intelligent humans. Because the intelligence gap between humans and other animals appears very large to us, we may be tempted to think of intelligence as an “on-or-off concept,” one that humans have and other animals do not. People may believe that computers can be better than humans at certain tasks, but only at tasks that do not require “real” intelligence. This view would suggest that if machines ever became “intelligent” across the board, their capabilities would have to be no greater than those of an intelligent human relying on the aid of (computer-)tools.

Again, we should be clear that the word “intelligence” here seems to mean “most cognitively capable” rather than “best able to achieve goals in general”. And the gap between the “intelligence”, as in the ability to achieve goals, of humans and other animals does arguably not appear very large when we compare individuals. Most other animals can do things that no single human can do, and to the extent we humans can learn to do things other animals naturally beat us at, e.g. lift heavier objects or traverse distances faster than speedy animals, we do so by virtue of technology, in essence the product of collective, cultural evolution.

And even with respect to cognitive abilities, one can argue that humans are not superior to other animals in a general sense. We do not have superior cognitive abilities with respect to echo location, for example, much less long-distance navigation. Nor are humans superior when it comes to all aspects of short-term/working memory

Measuring goal achieving ability in general, as well as abilities to solve cognitive tasks in particular, along a single axis may be useful in some contexts, yet it can easily become meaningless when the systems being compared are not sufficiently similar. 

But this view is mistaken. There is no threshold for “absolute intelligence.” Nonhuman animals such as primates or rodents differ in cognitive abilities a great deal, not just because of domain-specific adaptations, but also due to a correlational “g factor” responsible for a large part of the variation across several cognitive domains (Burkart et al., 2016). In this context, the distinction between domain-specific and general intelligence is fuzzy: In many ways, human cognition is still fairly domain-specific. Our cognitive modules were optimized specifically for reproductive success in the simpler, more predictable environment of our ancestors. We may be great at interpreting which politician has the more confident or authoritative body language, but deficient in evaluating whose policy positions will lead to better developments according to metrics we care about. Our intelligence is good enough or “general enough” that we manage to accomplish impressive feats even in an environment quite unlike the one our ancestors evolved in, but there are many areas where our cognition is slower or more prone to bias than it could be.

I agree with this. I would just note that “intelligence” here again seems to be referring to cognitive abilities, not the ability to achieve goals in general, and that we humans have expanded both over time via culture: our cognitive abilities, as measured by IQ, have increased significantly over the last century, while our ability to achieve goals in general has expanded much more still as we have developed ever more advanced technology.

Intelligence is best thought of in terms of a gradient. Imagine a hypothetical “intelligence scale” (inspired by part 2.1 of this FAQ) with rats at 100, chimpanzees at, say, 350, the village idiot at 400, average humans at 500 and Einstein at 750.2 Of course, this scale is open at the top and could go much higher.

Again, intelligence here seems to refer to cognitive abilities, not the ability to achieve goals in general. Einstein was likely not better at shooting hoops than the average human, or indeed more athletic in general (by all appearances), although he was much more cognitively capable, at least in some respects, than virtually all other humans.

A more elaborate critique of the intelligence scale mentioned above can be found in my post Chimps, Humans, and AI: A Deceptive Analogy.

To quote Bostrom (2014, p. 44): “Far from being the smartest possible biological species, we are probably better thought of as the stupidest possible biological species capable of starting a technological civilization – a niche we filled because we got there first, not because we are in any sense optimally adapted to it.”

Again, the words “smart” and “stupid” here seem to pertain to cognitive abilities, not the ability to achieve goals in general. And this phrasing is misleading, as it seems to presume that cognitive ability is all it takes to build an advanced civilization, which is not the case. In fact, humans are not the species with the biggest brain on the planet, or even the species with the biggest cerebral cortex; indeed, long-finned pilot whales have more than twice as many neocortical neurons.

What we are, however, is a species with a lot of unique tools — fine motor hands, upright walk, vocal cords, a large brain with a large prefrontal cortex, etc. — which together enabled humans to (gradually build a lot of tools with which they could) take over the world. Remove just one of these unique tools from all of humanity, and we would be almost completely incapable. And this story of a multiplicity of components that are all necessary yet insufficient for the maintenance and growth of human civilization is even more true today, where we have countless external tools — trucks, the internet, computers, screwdrivers, etc. — without which we could not maintain our civilization. And the necessity of all these many different components seems overlooked by the story that views advanced cognitive abilities as the sole driver, or near enough, of growth and progress in the ability to achieve goals in general. This, I would argue, is a mistake.

Thinking about intelligence as a gradient rather than an “on-or-off” concept prompts a Copernican shift of perspective. Suddenly it becomes obvious that humans cannot be at the peak of possible intelligence. On the contrary, we should expect AI to be able to surpass us in intelligence just like we surpass chimpanzees.

Depending on what we mean by the word “intelligence”, one can argue that computers have already surpassed humans. If we define “intelligence” to be “that which is measured by an IQ test”, for example, then computers have already been better than humans in at least some of these tests for a few years now.

In terms of our general ability to achieve goals, however, it is not clear that computers will so readily surpass humans, in large part because we do not aim to build them to be better than humans in many respects. Take self-repair, for example, which is something human bodies, just like virtually all animal bodies, are in a sense designed to do — indeed, most of our self-repair mechanisms are much older than we are as a species. Evolution has built humans to be competent and robust autonomous systems who do not for the most part depend on a global infrastructure to repair their internal parts. Our computers, in contrast, are generally not built to be self-repairing, at least not at the level of hardware. Their notional thrombocytes are entirely external to themselves, in the form of a thousand and one specialized tools and humans distributed across the entire economy. And there is little reason to think that this will change, as there is little incentive to create self-repairing computers. We are not aiming to build generally able, human-independent computers in this sense.

Biological evolution supports the view that AI could reach levels of intelligence vastly beyond ours. Evolutionary history arguably exhibits a weak trend of lineages becoming more intelligent over time, but evolution did not optimize for intelligence (only for goal-directed behavior in specific niches or environment types). Intelligence is metabolically costly, and without strong selection pressures for cognitive abilities specifically, natural selection will favor other traits. The development of new traits always entails tradeoffs or physical limitations: If our ancestors had evolved to have larger heads at birth, maternal childbirth mortality would likely have become too high to outweigh the gains of increased intelligence (Wittman & Wall, 2007). Because evolutionary change happens step-by-step as random mutations change the pre-existing architecture, the changes are path dependent and can only result in local optima, not global ones.

Here we see how the distinction between “intelligence as cognitive abilities” and “intelligence as the ability to achieve goals” is crucial. Indeed, the example provided above clearly proves the point that advanced cognitive abilities are often not the most relevant thing for achieving goals, since the goal of surviving and reproducing was often not best achieved, as Lukas hints, with the best cognitive abilities. Often it was better achieved with longer teeth or stronger muscles. Or a prettier face.

So the question is: why do we think that advanced cognitive abilities are, to a first approximation, identical with the ability to achieve goals? And, more importantly, why do we imagine that this lesson about the sub-optimality of spending one’s limited resources on better cognitive abilities does not still hold today? Why should cognitive abilities be the sole optimal thing, or near enough, to spend all one’s resources on in order to best achieve a broad range of goals? I would argue that it is not. It was not optimal in the past (with respect to the goal of survival), and it does not seem to be optimal today either.

It would be a remarkable coincidence if evolution had just so happened to stumble upon the most efficient way to assemble matter into an intelligent system.

But it would be less remarkable if it had happened to assemble matter into a system that is broadly capable of achieving a broad range of goals, and which another system, especially one that is not built over a billion year process to be robust and highly autonomous, cannot readily outdo in terms of autonomous function. It would also not be that remarkable if biological humans, functioning within a system built by and for biological humans, happened to be among the most capable systems within such a system, not least given all the legal, social and political aspects this system entails.

Beyond that, one can dispute the meaning of “intelligent system” in the quote above, but if we look at the intelligent system that is our civilization at large, one can say that the optimization going on at this level is not coincidental but indeed deliberate, often aiming toward peak efficiency. Thus, in this regard as well, we should not be too surprised if our current system is quite efficient and competent relative to the many constraints we are facing.

But let us imagine that we could go back to the “drawing board” and optimize for a system’s intelligence without any developmental limitations. This process would provide the following benefits for AI over the human brain (Bostrom, 2014, p. 60-61):

Free choice of substrate: Signal transmission with computer hardware is millions of times faster than in biological brains. AI is not restricted to organic brains, and can be built on the substrate that is overall best suited for the design of intelligent systems.

Supersizing:” Machines have (almost) no size-restrictions. While humans with elephant-sized brains would run into developmental impossibilities, (super)computers already reach the size of warehouses and could in theory be built even bigger.

No cognitive biases: We should be able to construct AI in a way that uses more flexible heuristics, and always the best heuristics for a given context, to prevent the encoding or emergence of substantial biases. Imagine the benefits if humans did not suffer from confirmation biasoverconfidencestatus quo biasetc.!

Modular superpowers: Humans are particularly good at tasks for which we have specialized modules. For instance, we excel at recognizing human faces because our brains have hard-wired structures that facilitate that facial recognition in particular. An artificial intelligence could have many more such specialized modules, including extremely useful ones like a module for programming.

Editability and copying: Software on a computer can be copied and edited, which facilitates trying out different variations to see what works best (and then copying it hundreds of times). By contrast, the brain is a lot messier, which makes it harder to study or improve. We also lack correct introspective access to the way we make most of our decisions, which is an important advantage that (some) AI designs could have.

Superior architecture: Starting anew, we should expect it to be possible to come up with radically more powerful designs than the patchwork architecture that natural selection used to construct the human brain. This difference could be enormously significant.

It should be noted that computers already 1) can be built with a wide variety of substrates, 2) can be supersized, 3) do not tend to display cognitive biases, 4) have modular superpowers, 5) can be edited and copied (or at least software readily can), 6) can be made with any architecture we can come up with. All of these advantages exist and are being exploited already, just not as much as they can be. And it is not clear why we should expect future change to be more radical than the change we have seen in past decades in which we have continually built ever more competent computers which can do things that no human can by exploiting these advantages.

With regard to the last point, imagine we tried to optimize for something like speed or sight rather than intelligence. Even if humans had never built anything faster than the fastest animal, we should assume that technological progress – unless it is halted – would eventually surpass nature in these respects. After all, natural selection does not optimize directly for speed or sight (but rather for gene copying success), making it a slower optimization process than those driven by humans for this specific purpose. Modern rockets already fly at speeds of up to 36,373 mph, which beats the peregrine falcon’s 240 mph by a huge margin. Similarly, eagle vision may be powerful, but it cannot compete with the Hubble space telescope. (General) intelligence is harder to replicate technologically, but natural selection did not optimize for intelligence either, and there do not seem to be strong reasons to believe that intelligence as a trait should differ categorically from examples like speed or sight, i.e., there are as far as we know no hard physical limits that would put human intelligence at the peak of what is possible.3

Again, what is being referred to by the word “intelligence” here seems to be cognitive abilities, not the ability to achieve goals in general. And with respect to cognitive abilities in particular, it is clear that computers already beat humans by a long shot in countless respects. So the point Lukas is making here is clearly true.

Another way to develop an intuition for the idea that there is significant room for improvement above human intelligence is to study variation in humans. An often-discussed example in this context is the intellect of John von Neumann. Von Neumann was not some kind of an alien, nor did he have a brain twice as large as the human average. And yet, von Neumann’s accomplishments almost seem “superhuman.” The section in his Wikipedia entry that talks about him having “founded the field of Game theory as a mathematical discipline” – an accomplishment so substantial that for most other intellectual figures it would make up most of their Wikipedia page – is just one out of many of von Neumann’s major achievements.

There are already individual humans (with normal-sized brains) whose intelligence vastly exceeds that of the typical human. So just how much room there is above their intelligence? To visualize this, consider for instance what could be done with an AI architecture more powerful than the human brain running on a warehouse-sized supercomputer.

A counterpoint to this line of reasoning can be found by contemplating chess ratings. Ratings of the skills of chess players are usually done via the so-called Elo rating system, which measures the relative skills of different players against each other. A beginner will usually have a rating around 800, whereas a rating in the range 2000-2199 ranks one as a chess “Expert”, and a ranking of 2400 and above renders one a “Senior Master”. The highest rating ever achieved was 2882 by Magnus Carlsen. Surely, this amount of variation must be puny given that all the humans who have ever played chess have roughly the same brain sizes and structures. And yet it turns out that human variation in chess ability is in fact quite enormous in an absolute sense.

For example, it took more than four decades from computers were able to beat a chess beginner (the 1950s), until they were able to beat the very best human player (1997 officially). Thus, the span from ordinary human beginner to the best human expert was more than four decades of progress in hardware — i.e. a million times more computing power — and software. That seems quite a wide range.

And yet the range seems even broader if we consider the ultimate limits of optimal chess play. For one may argue that the fact that it took computers a fairly long time to go from the average human level to the level of the best human does not mean that the best human is not still ridiculously far from the best a computer could be in theory. Surprisingly, however, this latter distance does in fact seem quite small, at least in one sense. For estimates suggest that the best possible chess machine would have an Elo rating around 3600, which means that the relative distance between the best possible computer and the best human is only around 700 Elo points, implying that the distance between the best human and a chess “Expert” is similar to the distance between the best human and the best possible chess brain, while the distance between an ordinary human beginner and the best human is far greater.

It seems plausible that a similar pattern obtains with respect to many other complex cognitive tasks. Indeed, it seems plausible that many of our abilities, especially those we evolved to do well, such as our ability to interact with other humans, have an “Elo rating” quite close to the notional maximum level for most humans.

IV. The transition from human to superhuman intelligence could be rapid

Perhaps the people who think it is unlikely that superintelligent AI will ever be created are not objecting to it being possible in principle. Maybe they think it is simply too difficult to bridge the gap from human-level intelligence to something much greater. After all, evolution took a long time to produce a species as intelligent as humans, and for all we know, there could be planets with biological life where intelligent civilizations never evolved.4 But considering that there could come a point where AI algorithms start taking part in their own self-improvement, we should be more optimistic.

We should again be clear that the term “superintelligent AI” seems to refer to a system with greater cognitive abilities, across a wide range of tasks, than humans. As for “a point where AI algorithms start taking part in their own self-improvement”, it should be noted, again, that we already use our best software and hardware in the process of developing better software and hardware. True, they are only a part of a process that involves far more elements, yet this is true of most everything that we produce and improve in our economy: many contributions drawn from and distributed across our economy at large are required. And we have good reason to believe that this will continue to be true of the construction of more capable machines in the future.

AIs contributing to AI research will make it easier to bridge the gap, and could perhaps even lead to an acceleration of AI progress to the point that AI not only ends up smarter than us, but vastly smarter after only a short amount of time.

Again, we already use our best software and hardware to contribute to AI research, and yet we do not appear to see acceleration in the growth of our best supercomputers. In fact, in terms of their computing power, we see a modest decline.

Several points in the list of AI advantages above – in particular the advantages derived from the editability of computer software or the possibility for modular superpowers to have crucial skills such as programming – suggest that AI architectures might both be easier to further improve than human brains, and that AIs themselves might at some point become better at actively developing their own improvements.

Again, computers are already “easier to further improve than human brains” in these ways, and our hardware and software are already among the most active parts in their own improvement. So why should we expect to see a different pattern in the future from the pattern we see today of gradual, slightly declining growth?

If we ever build a machine with human-level intelligence, it should then be comparatively easy to speed it up or make tweaks to its algorithm and internal organization to make it more powerful. The updated version, which would at this point be slightly above human-level intelligence, could be given the task of further self-improvement, and so on until the process runs into physical limits or other bottlenecks.

Or better yet than “human-level intelligence” would be if we built software that was critical for the further development of more powerful computers. And we in fact already have such software, many different kinds of it, and yet it is not that easy to simply “speed it up or make tweaks to its algorithm and internal organization to make it more powerful”. More generally, as noted above, we already use our latest, updated technology to improve our latest, updated technology, and the result is not rapid, runaway growth.

Perhaps self-improvement does not have to require human-level general intelligence at all. There may be comparatively simple AI designs that are specialized for AI science and (initially) lack proficiency in other domains. The theoretical foundations for an AI design that can bootstrap itself to higher and higher intelligence already exist (Schmidhuber, 2006), and it remains an empirical question where exactly the threshold is after which AI designs would become capable of improving themselves further, and whether the slope of such an improvement process is steep enough to go on for multiple iterations.

Again, I would just reiterate that computers are already an essential component in the process of improving computers. And the fact that humans who need to sleep and have lunch breaks are also part of this improvement process does not seem a main constraint on it compared to other factors, such as physical limitations implied by transportation and the assemblage of materials. Oftentimes in modern research, computers run simulations at their maximum capacity while the humans do their sleeping and lunching, in which case these resting activities (through which humans often get their best ideas) do not limit progress much at all, whereas the available computing power does.

For the above reasons, it cannot be ruled out that breakthroughs in AI could at some point lead to an intelligence explosion (Good, 1965; Chalmers, 2010), where recursive self-improvement leads to a rapid acceleration of AI progress. In such a scenario, AI could go from subhuman intelligence to vastly superhuman intelligence in a very short timespan, e.g. in (significantly) less than a year.

“It cannot be ruled out” can be said of virtually everything; the relevant question is how likely we should expect these possibilities to be. Beyond that, it is also not clear what would count as a “rapid acceleration of AI progress”, and thus what exactly it is that cannot be ruled out. AI going from subhuman performance to vastly greater than human performance in a short amount of time has already been seen in many different domains, including Go most recently.

But if one were to claim, to take a specific claim, that it cannot be ruled out that an AI system will improve itself so much that it can overpower human civilization and control the future, then I would argue that the reasoning above does not support considering this a likely possibility, i.e. something that is more likely to happen than, say, one in a thousand.

While the idea of AI advancing from human-level to vastly superhuman intelligence in less than a year may sound implausible, as it violates long-standing trends in the speed of human-driven development, it would not be the first time where changes to the underlying dynamics of an optimization process cause an unprecedented speed-up. Technology has been accelerating ever since innovations (such as agriculture or the printing press) began to feed into the rate at which further innovations could be generated.5

In the endnote “5” referred to above, Lukas writes:

[…] Finally, over the past decades, many tasks, including many areas of research and development, have already been improved through outsourcing them to machines – a process that it is still ongoing and accelerating.

That this process of outsourcing of tasks is accelerating seems in need of justification. We have been outsourcing tasks to machines in various ways and at a rapid pace for at least two centuries now, and so it is not a trivial claim that this process is accelerating.

Compared to the rate of change we see in biological evolution, cultural evolution broke the sound barrier: It took biological evolution a few million years to improve on the intelligence of our ape-like ancestors to the point where they became early hominids. By contrast, technology needed little more than ten thousand years to progress from agriculture to space shuttles.

And I would argue that the reason technology could grow so fast is because an ever larger system of technology consisting of an ever greater variety of tools was contributing to it through recursive self-improvement — human genius was but one important component. And I think we have good reason to think the same about the future.

Just as inventions like the printing press fed into – and significantly sped up – the process of technological evolution, rendering it qualitatively different from biological evolution, AIs improving their own algorithms could cause a tremendous speed-up in AI progress, rendering AI development through self-improvement qualitatively different from “normal” technological progress.

I think there is very little reason to believe this story. Again, we already use our best machines to build the next generation of machines. “Normal” technological progress of the kind we see today already depends on computers running programs created to optimize future technology as efficiently as they can, and it is far from clear that running a more human kind of program would be a more efficient use of resources toward this end.

It should be noted, however, that while the arguments in favor of a possible intelligence explosion are intriguing, they nevertheless remain speculative. There are also some good reasons why some experts consider a slower takeoff of AI capabilities more likely. In a slower takeoff, it would take several years or even decades for AI to progress from human to superhuman intelligence.

Again, the word “intelligence” here seems to refer to cognitive abilities, not the ability to achieve goals in general. And it is again not clear what it means to say that it might “take several years or even decades for AI to progress from human to superhuman intelligence”, since computers have already been more capable than humans at a wide variety of cognitive tasks for many decades. So I would argue that this statement suffers from a lack of conceptual clarity.

Unless we find decisive arguments for one scenario over the other, we should expect both rapid and comparably slow takeoff scenarios to remain plausible. It is worth noting that because “slow” in this context also includes transitions on the order of ten or twenty years, it would still be very fast practically speaking, when we consider how much time nations, global leaders or the general public would need to adequately prepare for these changes.

To reiterate the statement I just made, it is not clear what a fast takeoff means in this context given that computers are already vastly superior to humans in many domains, and probably will continue to beat humans at ever more tasks before they come close to being able to do virtually all cognitive tasks humans can do. So what it is we are supposed to consider plausible is not entirely clear. As for whether it is plausible for rapid progress to occur over a wide range of cognitive tasks such that an AI system becomes able to take over the world, I would argue that we have not seen arguments to support this claim.

V. By default, superintelligent AI would be indifferent to our well-being

The typical mind fallacy refers to the belief that other minds operate the same way our own does. If an extrovert asks an introvert, “How can you possibly not enjoy this party; I talked to half a dozen people the past thirty minutes and they were all really interesting!” they are committing the typical mind fallacy.

When envisioning the goals of smarter-than-human artificial intelligence, we are in danger of committing this fallacy and projecting our own experience onto the way an AI would reason about its goals. We may be tempted to think that an AI, especially a superintelligent one, will reason its way through moral arguments6 and come to the conclusion that it should, for instance, refrain from harming sentient beings. This idea is misguided, because according to the intelligence definition we provided above – which helps us identify the processes likely to shape the future – making a system more intelligent does not change its goals/objectives; it only adds more optimization power for pursuing those objectives.

Again, we need to be clear about what “smarter-than-human artificial intelligence” means here. In this case, we seem to be talking about a fairly singular and coherent system, a “mind” of sorts — as opposed to a thousand and one different software programs that do their own thing well — and hence in this regard it seems that the term “smarter-than-human artificial intelligence” here refers to something that is quite similar to a human mind. We are seemingly also talking about a system that “would reason about its goals”.

It seems worth noting that this is quite different from how we think about contemporary software programs, even including the most advanced ones such as AlphaZero and IBM’s Watson, which we are generally not tempted to consider “minds”. Expecting competent software programs of the future to be like minds may itself be to commit a typical mind fallacy of sorts, or perhaps just a mind fallacy. It is conceivable that software will continue to outdo humans at many tasks without acquiring anything resembling what we usually conceive of as a mind.

Another thing worth clarifying is what we mean by the term “by default” here. Does it refer to what AI systems will be built to do by our economy in the absence of altruistic intervention? If “by default” means that which our economy will naturally tend to produce, it seems likely that future AI indeed will be programmed to not be indifferent, at least in a behavioral sense, to human well-being “by default”. Indeed, it seems a much greater risk that future software systems will be constructed to act in a way that exclusively benefits, and is indifferent toward anything else than, human beings. In other words, that it will share our speciesist bias, with catastrophic consequences ensuing.

My point here is merely that, just as it is almost meaningless to claim that biological minds will not care about our well-being by default, as it lacks any specification of what “by default” means — given what evolutionary history? — so is it highly unclear what “by default” means when we are talking about machines created by humans. It seems to assume that we are going to suddenly have a lot of “undirected competence” delivered to us which does not itself come with countless sub-goals and adaptations built into it to attain ends desired by human programmers, and, perhaps to a greater extent, markets.

To give a silly example, imagine that an arms race between spam producers and companies selling spam filters leads to increasingly more sophisticated strategies on both sides, until the side selling spam filters has had it and engineers a superintelligent AI with the sole objective to minimize the number of spam emails in their inboxes.

Again, I would flag that it is not clear what “superintelligent AI” means here. Does it refer to a system that is better able to achieve goals across the board than humans? Or merely a system with greater cognitive abilities than any human expert in virtually all domains? Even if it is merely the latter, it is unlikely that a system developed by a single team of software developers will have much greater cognitive competences across the board than the systems developed by other competing teams, let alone those developed by the rest of the economy combined.

With its level of sophistication, the spam-blocking AI would have more strategies at its disposal than normal spam filters.

Yet how many more? What could account for this large jump in capabilities from previous versions of spam filters? What is hinted here seems akin to the sudden emergence of a Bugatti in the Stone Age. It does not seem credible.

For instance, it could try to appeal to human reason by voicing sophisticated, game-theoretic arguments against the negative-sum nature of sending out spam. But it would be smart enough to realize the futility of such a plan, as this naive strategy would backfire because some humans are trolls (among other reasons). So the spam-minimizing AI would quickly conclude that the safest way to reduce spam is not by being kind, but by gaining control over the whole planet and killing everything that could possibly try to trick its spam filter.

First of all, it is by no means clear that this would be “the safest way” to minimize spam. Indeed, I would argue that trying to gain control in this way would be a very bad action in expectation with respect to the goal of minimizing spam.

But even more fundamentally, the scenario above seems to assume that it would be much easier to build a system with the abilities to take over the world than it would to properly instantiate the goals we want it to achieve. For instance, in the case of earlier versions of AlphaZero, these were all equally aligned with the goal of winning Go. The hard problem was to make it more capable at doing it. The assumption that the situation would be inverted with respect to future goal implementation seems to me unwarranted. Not because the goals are necessarily easy to instantiate, but because the competences in question appear extremely difficult to create. The scenario described above seems to ignore this consideration, and instead assumes that the default scenario is that we will suddenly get advanced machines with a lot of competence, but where we do not know how to direct this competence toward doing what we want it to, as opposed to gradually directing and integrating these competences as they are (gradually) acquired. Beyond that, on a more general note, I think many aspiring effective altruists who worry about AI safety tend to underestimate the extent to which computer programmers are already focused on making software do what they intend it to.

Moreover, the scenario considered here also seems to assume that it would be relatively easy to make a competent machine optimize a particular goal insistently, and I would also question that this is anything less than extremely difficult. In other words, not only do I think it is extremely difficult to create the competences in question, as noted above, but I also think it is extremely difficult to orient all these competences, not just a few subroutines, toward insistently accomplishing some perverse goal. For this reason too, I think one should be highly skeptical of scenarios of this kind.

The AI in this example may fully understand that humans would object to these actions on moral grounds, but human “moral grounds” are based on what humans care about – which is not the minimization of spam! And the AI – whose whole decision architecture only selects for actions that promote the terminal goal of minimizing spam – would therefore not be motivated to think through, let alone follow our arguments, even if it could “understand” them in the same way introverts understand why some people enjoy large parties.

I think this is inaccurate. Any goal-oriented agent would be motivated to think through these things for the same reason that we humans are motivated to think through what those who disagree with us morally would say and do: because it impacts how we ourselves can act effectively toward our goals (this, we should be honest, is also often why humans think about the views and arguments made by others; not because of a deep yearning for truth and moral goodness but for purely pragmatic and selfish reasons). Thus, it makes sense to be mindful of those things, especially given that one has imperfect information and an imperfect ability to predict the future, no matter how “smart” one is.

The typical mind fallacy tempts us to conclude that because moral arguments appeal to us,7 they would appeal to any generally intelligent system. This claim is after all already falsified empirically by the existence of high-functioning psychopaths. While it may be difficult for most people to imagine how it would feel to not be moved by the plight of anyone but oneself, this is nothing compared to the difficulties of imagining all the different ways that minds in general could be built. Eliezer Yudkowsky coined the term mind space to refer to the set of all possible minds – including animals (of existing species as well as extinct ones), aliens, and artificial intelligences, as well as completely hypothetical “mind-like” designs that no one would ever deliberately put together. The variance in all human individuals, throughout all of history, only represents a tiny blob in mind space.

Yes, but this does not mean that the competences of human minds only span a tiny range of the notional “competence range” of various abilities. As we saw in the example of chess above, humans span a surprisingly large range, and the best humans are surprisingly close to the best mind possible. And with respect to the competences required for navigating within a world built by and for humans, it is not that unreasonable to believe that, on a continuum that measures competence across these many domains with a single measure, we are probably quite high and quite difficult to beat. This is not arrogance. It is merely to acknowledge the contingent structure of our civilization, and the fact that it is adapted to many contingent features of the human organism in general, including the human mind in particular.

Some of the minds outside this blob would “think” in ways that are completely alien to us; most would lack empathy and other (human) emotions for that matter; and many of these minds may not even relevantly qualify as “conscious.”

Most of these minds would not be moved by moral arguments, because the decision to focus on moral arguments has to come from somewhere, and many of these minds would simply lack the parts that make moral appeals work in humans. Unless AIs are deliberately designed8 to share our values, their objectives will in all likelihood be orthogonal to ours (Armstrong, 2013).

Again, an agent trying to achieve goals in our world need not be moved by moral arguments in an emotional sense in order to pay attention to them and the preferences of humans more generally, and to choose to avoid causing chaos. Second, the question is why we should expect future software designed by humans to not be “deliberately designed to share our values”? And what marginal difference should we expect altruists to be able to make on them? And how would this influence best be achieved?

VI. AIs will instrumentally value self-preservation and goal preservation

Even though AI designs may differ radically in terms of their top-level goals, we should expect most AI designs to converge on some of the same subgoals. These convergent subgoals (Omohundro, 2008; Bostrom, 2012) include intelligence amplification, self-preservation, goal preservation and the accumulation of resources. All of these are instrumentally very useful to the pursuit of almost any goal. If an AI is able to access the resources it needs to pursue these subgoals, and does not explicitly have concern for human preferences as (part of) its top-level goal, its pursuit of these subgoals is likely to lead to human extinction (and eventually space colonization; see below).

Again, what does “AI design” refer to in this context? Presumably a machine that possesses most of the cognitive abilities a human does to a similar or greater extent, and, on top of that, this machine is in some sense highly integrated into something akin to a coherent unified mind subordinate to a few supreme “top-level goals”. Thus, when Lukas writes “most AI designs” above, he is in fact referring to most systems that meet a very particular definition of “AI”, and one which I strongly doubt will be anywhere close to the most prevalent source of “machine competence” in the future (note that this is not to say that software, as well as our machines in general, will not become ever more competent in the future, but merely that such greater competences may not be subordinate to one goal to rule them all, or a few for that matter).

Beyond that, the claim that such a capable machine of the future seeking to achieve these subgoals is likely to lead to human extinction is a very strong claim that is not supported here, nor in the papers cited. More on this below.

AI safety work refers to interdisciplinary efforts to ensure that the creation of smarter-than-human artificial intelligence will result in excellent outcomes rather than disastrous ones. Note that the worry is not that AI would turn evil, but that indifference to suffering and human preferences will be the default unless we put in a lot of work to ensure that AI is developed with the right values.

Again, I would take issue with this “default” claim, as I would argue that “a lot of work” is exactly what we should expect that there will be made to ensure that future software will do what humans want it to. And the question is, again, how much of a difference altruists should expect to make here, as well as how to best make it.

VI.I Intelligence amplification

Increasing an agent’s intelligence improves its ability to efficiently pursue its goals. All else equal, any agent has a strong incentive to amplify its intelligence. A real-life example of this convergent drive is the value of education: Learning important skills and (thinking-)habits early in life correlates with good outcomes. In the AI context, intelligence amplification as a convergent drive implies that AIs with the ability to improve their own intelligence will do so (all else equal). To self-improve, AIs would try to gain access to more hardware, make copies of themselves to increase their overall productivity, or devise improvements to their own cognitive algorithms.

Again, what does the word “intelligence” mean in this context? Above, it was defined as “the ability to achieve goals in a wide range of environments”, which means that what is being said here reduces to the tautological claim that increasing an agent’s ability to achieve goals improves its ability to achieve goals. If one defines “intelligence” to refer to cognitive abilities, however, the claim becomes less empty. Yet it also becomes much less obvious, especially if one thinks in terms of investments of marginal resources, as it is questionable whether investing in greater cognitive abilities (as opposed to a prettier face or stronger muscles) is the best investment one can make with respect to the goal of achieving goals “in general”.

On a more general note, I would argue that “intelligence amplification”, as in “increasing our ability to achieve goals”, is already what we collectively do in our economy to a great extent, although this increase is, of course, much broader than one merely oriented toward optimizing cognitive abilities. We seek to optimize materials, supply chains, transportation networks, energy efficiency, etc. And it is not clear why this growth process should speed up significantly due to greater machine capabilities in the future than it has in the past, where more capable machines also helped grow the economy in general, as well as to increase the capability of machines in particular.

More broadly, intelligence amplification also implies that an AI would try to develop all technologies that may be of use to its pursuits.

Yet should we expect such “an AI” to be better able to develop “all technologies that may be of use to its pursuits” better than entire industries currently dedicated to it, let alone our entire economy? Indeed, should we even expect it to contribute significantly, i.e. double current growth rates across the board? I would argue that this is most dubious.

I.J. Good, a mathematician and cryptologist who worked alongside Alan Turing, asserted that “the first ultraintelligent machine is the last invention that man need ever make,” because once we build it, such a machine would be capable of developing all further technologies on its own.

To say that a single machine would be able to develop all further technologies on its own is, I submit, unsound. For what does “on its own” mean here? “On its own” independently of the existing infrastructure of machines run by humans? Or “on its own” as in taking over this entire infrastructure? And how exactly could such a take-over scenario occur without destroying the productivity of this system? None of these scenarios seem plausible.

VI.II Goal preservation

AIs would in all likelihood also have an interest in preserving their own goals. This is because they optimize actions in terms of their current goals, not in terms of goals they might end up having in the future.

This again seems to assume that we will create highly competent systems which will be subordinate to a single or a few explicit goals that it will insistently optimize all its actions for. Why should we believe this?

Another critical note of mine on this idea quoted from elsewhere:

Stephen Omohundro (Omohundro, 2008) argues that a chess-playing robot with the supreme goal of playing good chess would attempt to acquire resources to increase its own power and work to preserve its own goal of playing good chess. Yet in order to achieve such complex subgoals, and to even realize they might be helpful with respect to achieving the ultimate goal, this robot will need access to, and be built to exercise advanced control over, an enormous host of intellectual tools and faculties. Building such tools is extremely hard and requires many resources, and harder still, if at all possible, is it to build them so that they are subordinate to a single supreme goal. And even if all this is possible, it is far from clear that access to these many tools would not enable – perhaps even force – this now larger system to eventually “reconsider” the goals that it evolved from. For instance, if the larger system has a sufficient amount of subsystems with sub-goals that involve preservation of the larger system of tools, and if the “play excellent chess” goal threatens, or at least is not optimal with respect to, this goal, could one not imagine that, in some evolutionary competition, these sub-goals could overthrow the supreme goal?

Footnote: After all, humans are such a system of competing drives, and it has been argued (e.g. in Ainslie, 2001 [Breakdown of Will]) that this competition is what gives us our unique cognitive strengths (as well as weaknesses). Our ultimate goals, to the extent we have any, are just those that win this competition most of the time.

And Paul Christiano has also described agents that would not be subject to this “basic drive” of self-preservation described by Omohundro.

Lukas continues:

From the current goal’s perspective, a change in the AI’s goal function is potentially disastrous, as the current goal would not persevere. Therefore, AIs will try to prevent researchers from changing their goals.

Granted that such a highly competent system is built so as to be subordinate to a single goal in this way, which I do not think there is good reason to consider likely to be the case in future AI systems “by default”.

Consequently, there is pressure for AI researchers to get things right on the first try: If we develop a superintelligent AI with a goal that is not quite what we were after – because someone made a mistake, or was not precise enough, or did not think about particular ways the specified goal could backfire – the AI would pursue the goal that it was equipped with, not the goal that was intended. This applies even if it could understand perfectly well what the intentioned goal was. This feature of going with the actual goal instead of the intended one could lead to cases of perverse instantiation, such as the AI “paralyz[ing] human facial musculatures into constant beaming smiles” to pursue an objective of “make us smile” (Bostrom, 2014, p. 120).

This again seems to assume that this “first superintelligent AI” would be so much more powerful than everything else in the world, yet why should we expect a single system to be so much more powerful than everything else across the board? Beyond that, it also seems to assume that the design of this system would happen in something akin to a single step — that there would be a “first try”. Yet what could a first try consist in? How could a super capable system emerge in the absence of a lot of test models that are slightly less competent? I think this “first try” idea betrays an underlying belief in a sudden growth explosion powered by a single, highly competent machine, which, again, I would argue is highly unlikely in light of what we know about the nature of the growth of the capabilities of machines.

VI.III Self-preservation

Some people have downplayed worries about AI risks with the argument that when things begin to look dangerous, humans can literally “pull the plug” in order to shut down AIs that are behaving suspiciously. This argument is naive because it is based on the assumption that AIs would be too stupid to take precautions against this.

There is a difference between being “stupid” and being ill-informed. And there is no reason to think that an extremely cognitively capable agent will be informed about everything relevant to its own self-preservation. To think otherwise is to conflate great cognitive abilities with near-omniscience.

Because the scenario we are discussing concerns smarter-than-human intelligence, an AI would understand the implications of losing its connection to electricity, and would therefore try to proactively prevent being shut down any means necessary – especially when shutdown might be permanent.

Even if all implications were understood by such a notional agent, this by no means implies that an attempt to stop its termination would be successful, nor particularly likely, or indeed even possible.

This is not to say that AIs would necessarily be directly concerned about their own “death” – after all, whether an AI’s goal includes its own survival or not depends on the specifics of its goal function. However, for most goals, staying around pursuing one’s goal will lead to better expected goal achievement. AIs would therefore have strong incentives to prevent permanent shutdown even if their goal was not about their own “survival” at all. (AIs might, however, be content to outsource their goal achievement by making copies of themselves, in which case shutdown of the original AI would not be so terrible as long as one or several copies with the same goal remain active.)

I would question the tacit notion that the self-preservation of such a machine could be done with a significantly greater level of skill than could the “counter self-preservation” work of the existing human-machine civilization. After all, why should a single system be so much more capable than the rest of the world at any given task? Why should humans not develop specialized software systems and other machines that enable them to counteract and overpower rogue machines, for example by virtue of having more information and training? What seems described here as an almost sure to happen default outcome strikes me as highly unlikely. This is not to say that one should not worry about small risks of terrible outcomes, yet we need to get a clear view of the probabilities if we are to make a qualified assessment of the expected value of working on these risks.

The convergent drive for self-preservation has the unfortunate implication that superintelligent AI would almost inevitably see humans as a potential threat to its goal achievement. Even if its creators do not plan to shut the AI down for the time being, the superintelligence could reasonably conclude that the creators might decide to do so at some point. Similarly, a newly-created AI would have to expect some probability of interference from external actors such as the government, foreign governments or activist groups. It would even be concerned that humans in the long term are too stupid to keep their own civilization intact, which would also affect the infrastructure required to run the AI. For these reasons, any AI intelligent enough to grasp the strategic implications of its predicament would likely be on the lookout for ways to gain dominance over humanity. It would do this not out of malevolence, but simply as the best strategy for self-preservation.

Again, to think that a single agent could gain dominance over the rest of the human-machine civilization in which it would find itself appears extremely unlikely. What growth story could plausibly lead to this outcome?

This does not mean that AIs would at all times try to overpower their creators: If an AI realizes that attempts at trickery are likely to be discovered and punished with shutdown, it may fake being cooperative, and may fake having the goals that the researchers intended, while privately plotting some form of takeover. Bostrom has referred to this scenario as a “treacherous turn” (Bostrom, 2014, p. 116).

We may be tempted to think that AIs implemented on some kind of normal computer substrate, without arms or legs for mobility in the non-virtual world, may be comparatively harmless and easy to overpower in case of misbehavior. This would likely be a misconception, however. We should not underestimate what a superintelligence with access to the internet could accomplish. And it could attain such access in many ways and for many reasons, e.g. because the researchers were careless or underestimated its capacities, or because it successfully pretended to be less capable than it actually was. Or maybe it could try to convince the “weak links” in its [team] of supervisors to give it access in secret – promising bribes. Such a strategy could work even if most people in the developing team thought it would be best to deny their AI internet access until they have more certainty about the AI’s alignment status and its true capabilities. Importantly, if the first superintelligence ever built was prevented from accessing the internet (or other efficient channels of communication), its impact on the world would remain limited, making it possible for other (potentially less careful) teams to catch up. The closer the competition, the more the teams are incentivized to give their AIs riskier access over resources in a gamble for the potential benefits in case of proper alignment.

Again, this all seems to assume a very rapid take-off in capabilities with one system being vastly more capable than all others. What reasons do we have to consider such a scenario plausible? Barely any, I have argued.

The following list contains some examples of strategies a superintelligent AI could use to gain power over more and more resources, with the goal of eventually reaching a position where humans cannot harm or obstruct it. Note that these strategies were thought of by humans, and are therefore bound to be less creative and less effective than the strategies an actual superintelligence would be able to devise.

  • Backup plans: Superintelligent AI could program malware of unprecedented sophistication that inserted partial copies of itself into computers distributed around the globe (adapted from part 3.1.2 of this FAQ). This would give it further options to act even if its current copy was destroyed or if its internet connection was cut. Alternatively, it could send out copies of its source code, alongside detailed engineering instructions, to foreign governments, ideally ones who have little to lose and a lot to gain, with the promise of helping them attain world domination if they build a second version of the AI and handed it access to all their strategic resources.
  • Making money: Superintelligent AI could easily make fortunes with online poker, stock markets, scamming people, hacking bank accounts, etc.9
  • Influencing opinions: Superintelligent AI could fake convincing email exchanges with influential politicians or societal elites, pushing an agenda that serves its objectives of gaining power and influence. Similarly, it could orchestrate large numbers of elaborate sockpuppet accounts on social media or other fora to influence public opinion in favorable directions.
  • Hacking and extortion: Superintelligent AI could hack into sensitive documents, nuclear launch codes or other compromising assets in order to blackmail world leaders into giving it access over more resources. Or it could take over resources directly if hacking allows for it.
  • (Bio-)engineering projects: Superintelligent AI could pose as the head researcher of a biology lab and send lab assistants instructions to produce viral particles with specific RNA sequences, which then, unbeknownst to the people working on the project, turned out to release a deadly virus that incapacitated most of humanity.10

Through some means or another – and let’s not forget that the AI could well attempt many strategies at once to safeguard against possible failure in some of its pursuits – the AI may eventually gain a decisive strategic advantage over all competition (Bostrom, 2014, p. 78-90). Once this is the case, it would carefully build up further infrastructure on its own. This stage will presumably be easier to reach as the world economy becomes more and more automated.

These various strategies could also be pursued by other agents, and indeed by vast systems of agents and programs. Why should one such agent be much more competent than others at doing any of these things?

Once humans are no longer a threat, the AI would focus its attention on natural threats to its existence. It would for instance notice that the sun will expand in about seven billion years to the point where existence on earth will become impossible. For the reason of self preservation alone, a superintelligent AI would thus eventually be incentivized to expand its influence beyond Earth.

Following the arguments I have made above (as well as here), I would argue that such a take-over of the world subordinate to a single or a few goals originally instilled in a single machine is extremely unlikely.

VI.IV Resource accumulation

For the fulfillment of most goals, accumulating as many resources as possible is an important early step. Resource accumulation is also intertwined with the other subgoals in that it tends to facilitate them.

The resources available on Earth are only a tiny fraction of the total resources that an AI could access in the entire universe. Resource accumulation as a convergent subgoal implies that most AIs would eventually colonize space (provided that it is not prohibitively costly), in order to gain access to the maximum amount of resources. These resources would then be put to use for the pursuit of its other subgoals and, ultimately, for optimizing its top-level goal.

Superintelligent AI might colonize space in order to build (more of) the following:

  • Supercomputers: As part of its intelligence enhancement, an AI could build planet-sized supercomputers (Sandberg, 1999) to figure out the mysteries of the cosmos. Almost no matter the precise goal, having an accurate and complete understanding of the universe is crucial for optimal goal achievement.
  • Infrastructure: In order to accomplish anything, an AI needs infrastructure (factories, control centers, etc.) and “helper robots” of some sort. This would be similar (but much larger in scale) to how the Manhattan Project had its own “project sites” and employed tens of thousands of people. While some people worry that an AI would enslave humans, these helpers would more plausibly be other AIs specifically designed for the tasks at hand.
  • Defenses: An AI could build shields to protect itself or other sensitive structures from cosmic rays. Perhaps it would build weapon systems to deal with potential threats.
  • Goal optimization: Eventually, an AI would convert most of its resources into machinery that directly achieves its objectives. If the goal is to produce paperclips, the AI will eventually tile the accessible universe with paperclips. If the goal is to compute pi to as many decimal places as possible, the AI will eventually tile the accessible universe with computers to compute pi. Even if an AI’s goal appears to be limited to something “local” or “confined,” such as e.g. “protect the White House,” the AI would want to make success as likely as possible and thus continue to accumulate resources to better achieve that goal.

To elaborate on the point of goal optimization: Humans tend to be satisficers with respect to most things in life. We have minimum requirements for the quality of the food we want to eat, the relationships we want to have, or the job we want to work in. Once these demands are met and we find options that are “pretty good,” we often end up satisfied and settle down on the routine. Few of us spend decades of our lives pushing ourselves to invest as many waking hours as sustainably possible into systematically finding the optimal food in existence, the optimal romantic partner, or anything really.

AI systems on the other hand, in virtue of how they are usually built, are more likely to act as maximizers. A chess computer is not trying to look for “pretty good moves” – it is trying to look for the best move it can find with the limited time and computing power it has at its disposal. The pressure to build ever more powerful AIs is a pressure to build ever more powerful maximizers. Unless we deliberately program AIs in a way that reduces their impact, the AIs we build will be maximizers that never “settle” or consider their goals “achieved.” If their goal appears to be achieved, a maximizer AI will spend its remaining time double- and triple-checking whether it made a mistake. When it is only 99.99% certain that the goal is achieved, it will restlessly try to increase the probability further – even if this means using the computing power of a whole galaxy to drive the probability it assigns to its goal being achieved from 99.99% to 99.991%.

Because of the nature of maximizing as a decision-strategy, a superintelligent AI is likely to colonize space in pursuit of its goals unless we program it in a way to deliberately reduce its impact. This is the case even if its goals appear as “unambitious” as e.g. “minimize spam in inboxes.”

Why should we expect a single machine to be better able to accumulate resources than other actors in the economy, much less whole teams of actors powered by specialized software programs optimized toward that very purpose? Again, what seems to be considered the default outcome here is one that I would argue is extremely unlikely. This is still not to say that we then have reason to dismiss such a scenario. Yet it is important that we make an honest assessment of its probability if we are to make qualified assessments of the value of prioritizing it.

VII. Artificial sentience and risks of astronomical suffering

Space colonization by artificial superintelligence would increase goal-directed activity and computations in the world by an astronomically large factor.11

So would space colonization driven by humans. And it is not clear why we should expect a human-driven colonization to increase goal-directed computations any less. Beyond that, such human-driven colonization also seems much more likely to happen than does rogue AI colonization. 

If the superintelligence holds objectives that are aligned with our values, then the outcome could be a utopia. However, if the AI has randomly, mistakenly, or sufficiently suboptimally implemented values, the best we could hope for is if all the machinery it used to colonize space was inanimate, i.e. not sentient. Such an outcome – even though all humans would die – would still be much better than other plausible outcomes, because it would at least not contain any suffering. Unfortunately, we cannot rule out that the space colonization machinery orchestrated by a superintelligent AI would also contain sentient minds, including minds that suffer. The same way factory farming led to a massive increase in farmed animal populations, multiplying the direct suffering humans cause to animals by a large factor, an AI colonizing space could cause a massive increase in the total number of sentient entities, potentially creating vast amounts of suffering.

The same applies to a human-driven colonization, which I would still argue seems a much more likely outcome. So why should we focus more on colonization driven by rogue AI?

The following are some ways AI outcomes could result in astronomical amounts of suffering:

Suffering in AI workers: Sentience appears to be linked to intelligence and learning (Daswani & Leike, 2015), both of which would be needed (e.g. in robot workers) for the coordination and execution of space colonization. An AI could therefore create and use sentient entities to help it pursue its goals. And if the AI’s creators did not take adequate safety measures or program in compassionate values, it may not care about those entities’ suffering in their assistance.

Optimization for sentience: Some people want to colonize space in order for there to be more life or (happy) sentient minds. If the AI in question has values that reflect this goal, either because human researchers managed to get value loading right (or “half-right”), or because the AI itself is sentient and values creating copies of itself, the result could be astronomical numbers of sentient minds. If the AI does not accurately assess how happy or unhappy these beings are, or if it only cares about their existence but not their experiences, or simply if something goes wrong in even a small portion of these minds, the total suffering that results could be very high.

Ancestor simulations: Turning history and (evolutionary) biology into an empirical science, AIs could run many “experiments” with simulations of evolution on planets with different starting conditions. This would e.g. give the AIs a better sense of the likelihood of intelligent aliens existing, as well as a better grasp on the likely distribution of their values and whether they would end up building AIs of their own. Unfortunately, such ancestor simulations could recreate millions of years of human or wild-animal suffering many times in parallel.

Warfare: Perhaps space-faring civilizations would eventually clash, with at least one of the two civilizations containing many sentient minds. Such a conflict would have vast frontiers of contact and could result in a lot of suffering.

All of these scenarios could also occur in a human-driven colonization, which I would argue is significantly more likely to happen. So again: why should we focus more on colonization driven by rogue AI?

More ways AI scenarios could contain astronomical amounts of suffering are described here and here. Sources of future suffering are likely to follow a power law distribution, where most of the expected suffering comes from a few rare scenarios where things go very wrong – analogous to how most casualties are the result of very few, very large wars; how most of the casualty-risks from terrorist attacks fall into tail scenarios where terrorists would get their hands on weapons of mass destruction; or how most victims of epidemics succumbed to the few very worst outbreaks (Newman, 2005). It is therefore crucial to not only to factor in which scenarios are most likely to occur, but also how bad scenarios would be should they occur.

Again, most of the very worst scenarios could well be due to human-driven colonization, such as US versus China growth races taken beyond Earth. So, again, why focus mostly on colonization scenarios driven by rogue AI? Beyond that, the expected value of influencing a broad class of medium-value outcomes could easily be much higher than the expected value of influencing much fewer, much higher-stakes outcomes, provided that the outcomes that fall into this medium value class are sufficiently probable and amenable to impact. In other words, it is by no means far-fetched to imagine that we can take actions that are robust over a wide range of medium-value outcomes, and that such actions are in fact best in expectation.

Critics may object because the above scenarios are largely based on the possibility of artificial sentience, particularly sentience implemented on a computer substrate. If this turns out to be impossible, there may not be much suffering in futures with AI after all. However, computer-based minds also being able to suffer in the morally relevant sense is a common implication in philosophy of mind. Functionalism and type A physicalism (“eliminativism”) both imply that there can be morally relevant minds on digital substrates. Even if one were skeptical of these two positions and instead favored the views of philosophers like David Chalmers or Galen Strawson (e.g. Strawson, 2006), who believe consciousness is an irreducible phenomenon, there are at least some circumstances under which these views would also allow for computer-based minds to be sentient.12 Crude “carbon chauvinism,” or a belief that consciousness is only linked to carbon atoms, is an extreme minority position in philosophy of mind.

The case for artificial sentience is not just abstract but can also be made on the intuitive level: Imagine we had whole brain emulation with a perfect mapping from inputs to outputs, behaving exactly like a person’s actual brain. Suppose we also give this brain emulation a robot body, with a face and facial expressions created with particular attention to detail. The robot will, by the stipulations of this thought experiment, behave exactly like a human person would behave in the same situation. So the robot-person would very convincingly plead that it has consciousness and moral relevance. How certain would we be that this was all just an elaborate facade? Why should it be?

Because we are unfamiliar with artificial minds and have a hard time experiencing empathy for things that do not appear or behave in animal-like ways, we may be tempted to dismiss the possibility of artificial sentience or deny artificial minds moral relevance – the same way animal sentience was dismissed for thousands of years. However, the theoretical reasons to anticipate artificial sentience are strong, and it would be discriminatory to deny moral consideration to a mind simply because it is implemented on a substrate different from ours. As long as we are not very confident indeed that minds on a computer substrate would be incapable of suffering in the morally relevant sense, we should believe that most of the future’s expected suffering is located in futures where superintelligent AI colonizes space.

I fail to see how this final conclusion is supported by the argument made above. Again, human-driven colonization seems to pose at least as big a risk of outcomes of this sort.

One could argue that “superintelligent AI” could travel much faster and convert matter and energy into ordered computations much faster than a human-driven colonization could, yet I see little reason to expect a rogue AI-driven colonization to be significantly more effective in this regard than a human civilization powered by advanced tools built to be as efficient as possible. For instance, why should “superintelligent AI” be able to build significantly faster spaceships? I would expect both tail-end scenarios — i.e. both maximally sentient rogue AI-driven colonization and maximally sentient human-driven colonization —  to converge toward an optimal expansion solution in a relatively short time, at least on cosmic timescales.

VIII. Impact analysis

The world currently contains a great deal of suffering. Large sources of suffering include for instance poverty in developing countries, mental health issues all over the world, and non-human animal suffering in factory farms and in the wild. We already have a good overview – with better understanding in some areas than others – of where altruists can cost-effectively reduce substantial suffering. Charitable interventions are commonly chosen according to whether they produce measurable impact in the years or decades to come. Unfortunately, altruistic interventions are rarely chosen with the whole future in mind, i.e. with a focus on reducing as much suffering as possible for the rest of time, until the heat death of the universe.13 This is potentially problematic, because we should expect the far future to contain vastly more suffering than the next decades, not only because there might be sentient beings around for millions or billions of years to come, but also because it is possible for Earth-originating life to eventually colonize space, which could multiply the total amount of sentient beings many times over. While it is important to reduce the suffering of sentient beings now, it seems unlikely that the most consequential intervention for the future of all sentience will also be the intervention that is best for reducing short-term suffering.

I think this is true, but also because the word “best” here refers to two very narrow peaks that have to coincide in a very large landscape. In contrast, I do not think it seems unlikely that the best, most robust interventions we can make to influence the long-term future are also highly robust and positive with respect to the short-term future, such as promoting concern for suffering as well as greater moral consideration of neglected beings.

And given that the probability of extinction (evaluated from now) increases over time, and hence that one should discount the value of influencing the long-term future of civilization by a certain factor, it in fact seems reasonable to choose actions that seem positive both in the short and long term.

Instead, as judged from the distant future, the most consequential development of our decade would more likely have something to do with novel technologies or the ways they will be used.

And when it comes to how technologies will be used, it is clear that influencing ideas matters a great deal. By analogy, we have also seen important technologies developed in the past, and yet ideas seem to have been no less significant, such as specific religions (e.g. Islam and Christianity) as well as political ideologies (e.g. communism and liberalism). One may, of course, argue that it is very difficult to influence ideas on a large scale, yet the same can be said about influencing technology. Indeed, influencing ideas, whether broadly or narrowly, might just be the best way to influence technology.

And yet, politics, science, economics and especially the media are biased towards short timescales. Politicians worry about elections, scientists worry about grant money, and private corporations need to work on things that produce a profit in the foreseeable future. We should therefore expect interventions targeted at the far future to be much more neglected than interventions targeted at short-term sources of suffering.

Admittedly, the far future is difficult to predict. If our models fail to account for all the right factors, our predictions may turn out very wrong. However, rather than trying to simulate in detail through everything that might happen all the way into the distant future – which would be a futile endeavor, needless to say – we should focus our altruistic efforts on influencing levers that remain agile and reactive to future developments. An example of such a lever is institutions that persist for decades or centuries. The US Constitution for instance still carries significant relevance in today’s world, even though it was formulated hundreds of years ago. Similarly, the people who founded the League of Nations after World War I did not succeed in preventing the next war, but they contributed to the founding and the charter of its successor organization, the United Nations, which still exerts geopolitical influence today. The actors who initially influenced the formation of these institutions as well as their values and principles, had a long-lasting impact.

In order to positively influence the future for hundreds of years, we fortunately do not need to predict the next hundreds of years in detail. Instead, all we need to predict is what type of institutions – or, more generally, stable and powerful decision-making agencies – are most likely to react to future developments maximally well.14

AI is the ultimate lever through which to influence the future. The goals of an artificial superintelligence would plausibly be much more stable than the values of human leaders or those enshrined in any constitution or charter. And a superintelligent AI would, with at least considerable likelihood, remain in control of the future not only for centuries, but for millions or even billions of years to come. In non-AI scenarios on the other hand, all the good things we achieve in the coming decade(s) will “dilute” over time, as current societies, with all their norms and institutions, change or collapse.

In a future where smarter-than-human artificial intelligence won’t be created, our altruistic impact – even if we manage to achieve a lot in greatly influencing this non-AI future – would be comparatively “capped” and insignificant when contrasted with the scenarios where our actions do affect the development of superintelligent AI (or how AI would act).15

I think this is another claim that is widely overstated, and which I have not seen a convincing case for. Again, this notion that “an artificial superintelligence”, a single machine with much greater cognitive powers than everything else, will emerge and be programmed to be subordinate to a single goal that it would be likely to preserve does not seem credible to me. Sure, we can easily imagine it as an abstract notion, but why should we think such a system will ever emerge? The creation of such a system is, I would argue, far from being a necessary, or even particularly likely, outcome of our creating ever more competent machines.

And even if such a system did exist, it is not even clear, as Robin Hanson has argued, that it would be significantly more likely to preserve its values than would a human civilization — not so much because one should expect humans to be highly successful at it, but rather because there are also reasons to think that it would be unlikely for such a “superintelligent AI” to do it (such as those mentioned in my note on Omohundro’s argument above, as well as those provided by Hanson, e.g. that “the values of AIs with protected values should still drift due to influence drift and competition”).

We should expect AI scenarios to not only contain the most stable lever we can imagine – the AI’s goal function which the AI will want to preserve carefully – but also the highest stakes.

Again, I do not think a convincing case has been made for either of these claims. Why would the stakes be higher than in a human-driven colonization, which we may expect, for evolutionary reasons, to be performed primarily by those who want to expand and colonize as much and as effectively as possible?

In comparison with non-AI scenarios, space colonization by superintelligent AI would turn the largest amount of matter and energy into complex computations.

It depends on what we mean by non-AI scenarios. Scenarios where humans use advanced tools, such as near-maximally fast spaceships and near-optimal specialized software, to fill up space with sentient beings at a near maximal rate is, I would argue, not only at least as conceivable but also at least as likely as similar scenarios brought about by the kind of AI Lukas seems to have in mind here.

In a best-case scenario, all these resources could be turned into a vast utopia full of happiness, which provides as strong incentive for us to get AI creation perfectly right. However, if the AI is equipped with insufficiently good values, or if it optimizes for random goals not intended by its creators, the outcome could also include astronomical amounts of suffering. In combination, these two reasons of highest influence/goal-stability and highest stakes build a strong case in favor of focusing our attention on AI scenarios.

Again, things could also go very wrong or very well with human-driven colonization, so there does not seem a big difference in this regard either.

While critics may object that all this emphasis on the astronomical stakes in AI scenarios appears unfairly Pascalian, it should be noted that AI is not a frivolous thought experiment where we invoke new kinds of physics to raise the stakes.

Right, but the kind of AI system envisioned here does, I would argue, rest on various, highly questionable conceptions of how a single system could grow, as well as what the design of future machines are likely to be like. And I would argue, again, that such a system is highly unlikely to emerge.

Smarter-than-human artificial intelligence and space colonization are both realistically possible and plausible developments that fit squarely into the laws of nature as we currently understand them.

A Bugatti appearing in the Stone Age also in some sense fits squarely into the laws of nature as we currently understand them. Yet that does not mean that such a car was likely to emerge in that time, once we consider the history and evolution of technology. Similarly, I would argue that the scenario Lukas seems to have hinted at throughout his piece is a lot less credible than what this appeal to compatibility with the laws of nature would seem to suggest.

If either of them turn out to be impossible, that would be a big surprise, and would suggest that we are fundamentally misunderstanding something about the way physical reality works. While the implications of smarter-than-human artificial intelligence are hard to grasp intuitively, the underlying reasons for singling out AI as a scenario to worry about are sound.

Well, I have tried to argue to the contrary here. Much more plausible would it be, I think, to argue that the scenario Lukas envisions is one scenario among others that warrants some priority.

As illustrated by Leó Szilárd’s lobbying for precautions around nuclear bombs well before the first such bombs were built, it is far from hopeless to prepare for disruptive new technologies in advance, before they are completed.

This text argued that altruists concerned about the quality of the future should [be] focusing their attention on futures where AI plays an important role.

I would say that the argument that has been made is much more narrow than that, since “AI” here is used in a relatively narrow sense in the first place, and because it is a very particular scenario involving such narrowly defined AI that Lukas has been focusing on the most here — as far as I can tell, it is a scenario where a single system takes over the world and determines the future based on a single, arduously preserved goal. There are many other scenarios we can envision in which AI, both in the ordinary sense as well as in the more narrow sense invoked here by Lukas, plays “an important role”, including scenarios involving human-driven space colonization.

This can mean many things. It does not mean that everyone should think about AI scenarios or technical work in AI alignment directly. Rather, it just means we should pick interventions to support according to their long-term consequences, and particularly according to the ways in which our efforts could make a difference to futures ruled by superintelligent AI. Whether it is best to try to affect AI outcomes in a narrow and targeted way, or whether we should go for a broader strategy, depends on several factors and requires further study.

FRI has looked systematically into paths to impact for affecting AI outcomes with particular emphasis on preventing suffering, and we have come up with a few promising candidates. The following list presents some tentative proposals:

It is important to note that human values may not affect the goals of an AI at all if researchers fail to solve the value-loading problem. Raising awareness of certain values may therefore be particularly impactful if it concerns groups likely to be in control of the goals of smarter-than-human artificial intelligence.

Further research is needed to flesh out these paths to impact in more detail, and to discover even more promising ways to affect AI outcomes.

Lukas writes about the implications of his argument that it means that “we should pick interventions to support according to their long-term consequences”. I agree with this completely. He then continues to write, “and particularly according to the ways in which our efforts could make a difference to futures ruled by superintelligent AI”. And this claim, as I understand it, is what I would argue has not been justified. Again, to argue that one should grant it some priority, even significant priority, along with many other scenarios, is a plausible claim, but not, I would argue, that it should be granted greater priority than all other things.

And as for how we can best reduce suffering in the future, I would agree with pretty much all the proposals Lukas suggests, although I would argue that things like promoting concern for suffering and widening our moral circles (and we should do both) become even more important when we take other scenarios into consideration, such as human-driven colonization. In other words, these things seem even more robust and more positive when we also consider these other high-stakes scenarios.

Beyond that, I would also note that we likely have moral intuitions that make a notional rogue AI-takeover seem worse in expectation than what a more detached analysis relative to a more impartial moral ideal such as “reduce suffering” would suggest. Furthermore, it should be noted that many of those who focus most prominently on AI safety (for example, people at MIRI and FHI) seem to have values according to which it is important that humans maintain control or remain in existence, which may render their view that AI safety is the most important thing to focus on less relevant for other value systems than one might intuitively suppose.

To zoom out a bit, one way to think about my disagreement with Lukas, as well as the overall argument I have tried to make here, is that one can view Lukas’ line of argument as consisting of a certain number of steps where, in each of them, he describes a default scenario he believes to be highly probable, whereas I generally find these respective “default” scenarios quite improbable. And when one then combines our respective probabilities into a single measure of the probability that the grosser scenario Lukas envisions will occur, one gets a very different overall probability for Lukas and myself respectively. It may look something like this, assuming Lukas’ argument consists of eight steps in a conditional chain, each assigned a certain probability which then gets multiplied by the rest (i.e. P(A) * P(B|A) * P(C|B) * . . . ):

L: 0.98 * 0.96 * 0.93 * 0.99 * 0.95 * 0.99 * 0.97 * 0.98 ≈ 0.77

M: 0.1 * 0.3 * 0.01 * 0.1 * 0.2 * 0.08 * 0.2 * 0.4 ≈ 0.00000004

(These particular numbers are just more or less random ones I have picked for illustrative purposes, except that their approximate range do illustrate where I think the respective credences of Lukas and myself roughly lie with regard to most of the arguments discussed throughout this essay.)

And an important point to note here is that even if one disagrees both with Lukas and me on these respective probabilities, and instead picks credences roughly in-between those of Lukas and me, or indeed significantly closer to those of Lukas, the overall argument I have made here still stands, namely that it is far from clear that scenarios of the kind Lukas outlines are the most important ones to focus on to best reduce suffering. For then the probability of Lukas’ argument being correct/the probability that the scenario Lukas envisions will occur (one can think of it in both ways, I think, even if these formulations are not strictly equivalent) becomes something like the following:

In-between credence: 0.5^8 ≈ 0.004

Credences significantly closer to Lukas’: 0.75^8 ≈ 0.1

Which would not seem to support the conclusion that a focus on the AI-scenarios Lukas has outlined should dominate other scenarios we can envision (e.g. human-driven colonization).

Lukas ends his post on the following note:

As there is always the possibility that we have overlooked something or are misguided or misinformed, we should remain open-minded and periodically rethink the assumptions our current prioritization is based on.

With that, I could not agree more. In fact, this is in some sense the core point I have been trying to make here.

Blog at WordPress.com.

Up ↑