Is AI Alignment Possible?

The problem of AI alignment is usually defined roughly as the problem of making powerful artificial intelligence do what we humans want it to do. My aim in this essay is to argue that this problem is less well-defined than many people seem to think, and to argue that it is indeed impossible to “solve” with any precision, not merely in practice but in principle.

There are two basic problems for AI alignment as commonly conceived. The first is that human values are non-unique. Indeed, in many respects, there is more disagreement about values than people tend to realize. The second problem is that even if we were to zoom in on the preferences of a single human, there is, I will argue, no way to instantiate a person’s preferences in a machine so as to make it act as this person would have preferred.

Problem I: Human Values Are Non-Unique

The common conception of the AI alignment problem is something like the following: we have a set of human preferences, X, which we must, somehow (and this is usually considered the really hard part), map onto some machine’s goal function, Y, via a map f, let’s say, such that X and Y are in some sense isomorphic. At least, this is a way of thinking about it that roughly tracks what people are trying to do.

Speaking in these terms, much attention is being devoted to Y and f compared to X. My argument in this essay is that we are deeply confused about the nature of X, and hence confused about AI alignment.

The first point of confusion is about the values of humanity as a whole. It is usually acknowledged that human values are fuzzy, and that there are some disagreements over values among humans. Yet it is rarely acknowledged just how strong this disagreement in fact is.

For example, concerning the ideal size of the future population of sentient beings, the disagreement is near-total, as some (e.g. some defenders of the so-called Asymmetry in population ethics, as well as anti-natalists such as David Benatar) argue that the future population should ideally be zero, while others, including many classical utilitarians, argue that the future population should ideally be very large. Many similar examples could be given of strong disagreements concerning the most fundamental and consequential of ethical issues, including whether any positive good can ever outweigh extreme suffering. And on many of these crucial disagreements, a very large number of people will be found on both sides.

Different answers to ethical questions of this sort do not merely give rise to small practical disagreements; in many cases, they imply completely opposite practical implications. This is not a matter of human values being fuzzy, but a matter of them being sharply, irreconcilably inconsistent. And hence there is no way to map the totality of human preferences, “X”, onto a single, well-defined goal-function in a way that does not conflict strongly with the values of a significant fraction of humanity. This is a trivial point, and yet most talk of human-aligned AI seems oblivious to this fact.

Problem II: Present Human Preferences Are Underdetermined Relative to Future Actions

The second problem and point of confusion with respect to the nature of human preferences is that, even if we focus only on the present preferences of a single human, then these in fact do not, and indeed could not possibly, determine with much precision what kind of world this person would prefer to bring about in the future.

This claim requires some unpacking, but one way to realize what I am trying to say here is to think in terms of the information required to represent the world around us. A precise such representation would require an enormous amount of information, indeed far more information than what can be contained in our brain. This holds true even if we only consider morally relevant entities around us — on the planet, say. There are just too many of them for us to have a precise representation of them. By extension, there are also too many of them for us to be able to have precise preferences about their individual states. Given that we have very limited information at our disposal, all we can do is express extremely coarse-grained and compressed preferences about what state the world around us should ideally have. In other words: any given human’s preferences are bound to be extremely vague about the exact ideal state of the world right now, and there will be countless moral dilemmas occurring across the world right now to which our preferences, in their present state, do not specify a unique solution.

And yet this is just considering the present state of the world. When we consider future states, the problem of specifying ideal states and resolutions to hitherto unknown moral dilemmas only explodes in complexity, and indeed explodes exponentially as time progresses. It is simply a fact, and indeed quite an obvious one at that, that no single brain could possibly contain enough information to specify unique, or indeed just qualified, solutions to all moral dilemmas that will arrive in the future. So what, then, could AI alignment relative to even a single brain possibly mean? How can we specify Y with respect to these future dilemmas when X itself does not specify solutions?

We can, of course, try to guess what a given human, or we ourselves, might say if confronted with a particular future moral dilemma and given knowledge about it, yet the problem is that our extrapolated guess is bound to be just that: a highly imperfect guess. For even a tiny bit of extra knowledge or experience can readily change a person’s view of a given moral dilemma to be the opposite of what it was prior to acquiring that knowledge (for instance, I myself switched from being a classical to a negative utilitarian based on a modest amount of information in the form of arguments I had not considered before). This high sensitivity to small changes in our brain implies that even a system with near-perfect information about some person’s present brain state would be forced to make a highly uncertain guess about what that person would actually prefer in a given moral dilemma. And the further ahead in time we go, and thus further away from our familiar circumstance and context, the greater the uncertainty will be.

By analogy, consider the task of AI alignment with respect to our ancestors ten million years ago. What would their preferences have been with respect to, say, the future of space colonization? One may object that this is underdetermined because our ancestors could not conceive of this possibility, yet the same applies to us and things we cannot presently conceive of, such as alien states of consciousness. Our current preferences say about as little about the (dis)normativity of such states as the preferences of our ancestors ten million years ago said about space colonization.

A more tangible analogy might be to consider the level of confidence with which we, based on knowledge of your current brain state, can determine your dinner preferences twenty years from now with respect to dishes made from ingredients not yet invented — a preference that will likely be influenced by contingent, environmental factors found between now and then. Not with great confidence, it seems safe to say. And this point pertains not only to dinner preferences but also to the most consequential of choices. Our present preferences cannot realistically determine, with any considerable precision, what we would deem ideal in as yet unknown, realistic future scenarios. Thus, by extension, there can be no such thing as value extrapolation or preservation in anything but the vaguest sense. No human mind has ever contained, or indeed ever could contain, a set of preferences that evaluatively orders any more but the tiniest sliver of (highly compressed versions of) real-world states and choices an agent in our world is likely to face in the future. To think otherwise amounts to a strange Platonization of human preferences. We just do not have enough information in our heads to possess such fine-grained values.

The truth is that our preferences are not some fixed entity that determine future actions uniquely; they simply could not be that. Rather, our preferences are themselves interactive and adjustive in nature, changing in response to new experiences and new information we encounter. Thus, to say that we can “idealize” our present preferences so as to obtain answers to all realistic future moral dilemmas is rather like calling the evolution of our ancestors’ DNA toward human DNA a “DNA idealization”. In both cases, we find no hidden Deep Essences waiting to be purified; no information that points uniquely toward one particular solution in the face of all realistic future “problems”. All we find are physical systems that evolve contingently based on the inputs they receive.*

The bottom line of all this is not that it makes no sense to devote resources toward ensuring the safety of future machines. We can still meaningfully and cooperatively seek to instill rules and mechanisms in our machines and institutions that seem optimal in expectation given our respective, coarse-grained values. The conclusion here is just that 1) the rules instantiated cannot be the result of a universally shared human will or anything close; the closest thing possible would be rules that embody some compromise between people with strongly disagreeing values. And 2) such an instantiation of coarse-grained rules in fact comprises the upper bound of what we can expect to accomplish in this regard. Indeed, this is all we can expect with respect to future influence in general: rough and imprecise influence and guidance with the limited information we can possess and transmit. The idea of a future machine that will do exactly what we would want, and whose design therefore constitutes a lever for precise future control, is a pipe dream.


* Note that this account of our preferences is not inconsistent with value or moral realism. By analogy, consider human preferences and truth-seeking: humans are able to discover many truths about the universe, yet most of these truths are not hidden in, nor extrapolated from, our DNA or our preferences. Indeed, in many cases, we only discover these truths by actively transcending rather than “extrapolating” our immediate preferences (for comfortable and intuitive beliefs, say). The same could apply to the realm of value and morality.

Why Altruists Should Perhaps Not Prioritize Artificial Intelligence: A Lengthy Critique

The following is a point-by-point critique of Lukas Gloor’s essay Altruists Should Prioritize Artificial Intelligence. My hope is that this critique will serve to make it clear — to Lukas, myself, and others — where and why I disagree with this line of argument, and thereby hopefully also bring some relevant considerations to the table with respect to what we should be working on to best reduce suffering. I should like to note, before I begin, that I have the deepest respect for Lukas, and that I consider his work very important and inspiring.

Below, I quote every paragraph from the body of Lukas’ article, which begins with the following abstract:

The large-scale adoption of today’s cutting-edge AI technologies across different industries would already prove transformative for human society. And AI research rapidly progresses further towards the goal of general intelligence. Once created, we can expect smarter-than-human artificial intelligence (AI) to not only be transformative for the world, but also (plausibly) to be better than humans at self-preservation and goal preservation. This makes it particularly attractive, from the perspective of those who care about improving the quality of the future, to focus on affecting the development goals of such AI systems, as well as to install potential safety precautions against likely failure modes. Some experts emphasize that steering the development of smarter-than-human AI into beneficial directions is important because it could make the difference between human extinction and a utopian future. But because we cannot confidently rule out the possibility that some AI scenarios will go badly and also result in large amounts of suffering, thinking about the impacts of AI is paramount for both suffering-focused altruists as well as those focused on actualizing the upsides of the very best futures.

An abstract of my thoughts on this argument:

My response to this argument is twofold: 1) I do not consider the main argument presented by Lukas, as I understand it, to be plausible, and 2) I think we should think hard about whether we have considered the opportunity cost carefully enough. We should not be particularly confident, I would argue, that any of us have found the best thing to focus on to reduce the most suffering.

I do not think the claim that “altruists can expect to have the largest positive impact by focusing on artificial intelligence” is warranted. In part, my divergence from Lukas rests on empirical disagreements, and in larger part it stems from what may be called “conceptual disagreements” — I think most talk about “superintelligence” is conceptually confused. For example, intelligence as “cognitive abilities” is liberally conflated with intelligence as “the ability to achieve goals in general”, and this confusion does a lot of deceptive work.

I would advocate for more foundational research into the question of what we ought to prioritize. Artificial intelligence undoubtedly poses many serious risks, yet it is important that we maintain a sense of proportion with respect to these risks relative to other serious risks, many of which we have not even contemplated yet.

I will now turn to the full argument presented by Lukas.

I. Introduction and definitions

Terms like “AI” or “intelligence” can have many different (and often vague) meanings. “Intelligence” as used here refers to the ability to achieve goals in a wide range of environments. This definition captures the essence of many common perspectives on intelligence (Legg & Hutter, 2005), and conveys the meaning that is most relevant to us, namely that agents with the highest comparative goal-achieving ability (all things considered) are the most likely to shape the future.

A crucial thing to flag is that “intelligence” here refers to the ability to achieve goals — not to scoring high on an IQ test, or “intelligence” as “advanced cognitive abilities”. And these are not the same, and should not be conflated (indeed, this is one of the central points of my book Reflections on Intelligence, which dispenses with the muddled term “intelligence” at an early point, and instead examines the nature of this better defined “ability to achieve goals” in greater depth).

While it is true that the concept of goal achieving is related to the concept of IQ, the latter is much narrower, as it relates to a specific class of goals. Boosting the IQ of everyone would not immediately boost our ability to achieve goals in every respect — at least not immediately, and not to the same extent across all domains. For even if we all woke up with an IQ of 200 tomorrow, all the external technology with which we run and grow our economy would still be the same. Our cars would drive just as fast, the energy available to us would be the same, and so would the energy efficiency of our machines. And while a higher IQ might now enable us to grow this external technology faster, there are quite restricting limits to how much it can grow. Most of our machines and energy harvesting technology cannot be made many times more efficient, as their efficiency is already a significant fraction — 15 to 40 percent — of the maximum physical limit. In other words, their efficiency cannot be doubled more than a couple of times, if even that.

One could then, of course, build more machines and power plants, yet such an effort would itself be constrained strongly by the state of our external technology, including the energy available to us; not just by the cognitive abilities available. This is one of the reasons I am skeptical of the idea of AI-powered runaway growth. Yes, greater cognitive abilities is a highly significant factor, yet there is just so much more to growing the economy and our ability to achieve a wide range of goals than that, as evidenced by the fact that we have seen a massive increase in computer-powered cognitive abilities — indeed, exponential growth for many decades by many measures — and yet we have continued to see fairly stable, in fact modestly declining, economic growth.

If one considers the concept of “increase in cognitive powers” to be the same as “increase in the ability to achieve goals, period” then this criticism will be missed. “I defined intelligence to be the ability to achieve goals, so when I say intelligence is increased, then all abilities are increased.” One can easily come to entertain a kind of motte and bailey argument in this way, by moving back and forth between this broad notion of intelligence as “the ability to achieve goals” and the more narrow sense of intelligence as “cognitive abilities”. To be sure, a statement like the one above need not be problematic as such, as long as one is clear that this concept of intelligence lies very far from “intelligence as measured by IQ/raw cognitive power”. Such clarity is often absent, however, and thus the statement is quite problematic in practice, with respect to the goals of communicating clearly and not confusing ourselves.

Again, my main point here is that increasing cognitive powers should not be conflated with increasing the ability to achieve goals in general — in every respect. I think much confusion springs from a lack of clarity on this matter.

While everyday use of the term “intelligence” often refers merely to something like “brainpower” or “thinking speed,” our usage also presupposes rationality, or goal-optimization in an agent’s thinking and acting. In this usage, if someone is e.g. displaying overconfidence or confirmation bias, they may not qualify as very intelligent overall, even if they score high on an IQ test. The same applies to someone who lacks willpower or self control.

This is an important step toward highlighting the distinction between “goal achieving ability” and “IQ”, yet it is still quite a small step, as it does not really go much beyond distinguishing “high IQ” from “optimal cognitive abilities for goal achievement”. We are still talking about things going on in a single human head (or computer), while leaving out the all-important aspect that is (external) culture and technology. We are still not talking about the ability to achieve goals in general.

Artificial intelligence refers to machines designed with the ability to pursue tasks or goals. The AI designs currently in use – ranging from trading algorithms in finance, to chess programs, to self-driving cars – are intelligent in a domain-specific sense only. Chess programs beat the best human players in chess, but they would fail terribly at operating a car. Similarly, car-driving software in many contexts already performs better than human drivers, but no amount of learning (at least not with present algorithms) would make [this] software work safely on an airplane.

My only comment here would be that it is not quite clear what counts as artificial intelligence. For example, would a human, edited as well as unedited, count as “a machine designed with the ability to pursue tasks or goals”? And could not all software be considered “designed with the ability to pursue tasks or goals”, and hence all software would be artificial intelligence by this definition? If so, we should then just be clear that this definition is quite broad, including both all humans and all software, and more.

The most ambitious AI researchers are working to build systems that exhibit (artificial) general intelligence (AGI) – the type of intelligence we defined above, which enables the expert pursuit of virtually any task or objective.

This is where the distinction we drew above becomes relevant. While the claim quoted above may be true in one sense, we should be clear that the most ambitious AI researchers are not working to increase “all our abilities”, including our ability to get more energy out of our steam engines and solar panels. Our economy arguably works on that broader endeavor. AI researchers, in contrast, work only on bettering what may be called “artificial cognitive abilities”, which, granted, may in turn help spur growth in many other areas (although the degree to which it would do so is quite unclear, and likely surprisingly limited in the big picture, since “growth may be constrained not by what we are good at but rather by what is essential and yet hard to improve”).

In the past few years, we have witnessed impressive progress in algorithms becoming more and more versatile. Google’s DeepMind team for example built an algorithm that learned to play 2-D Atari games on its own, achieving superhuman skill at several of them (Mnih et al., 2015). DeepMind then developed a program that beat the world champion in the game of Go (Silver et al., 2016), and – tackling more practical real-world applications – managed to cut down data center electricity costs by rearranging the cooling systems.

I think it is important not to overstate recent progress compared to progress in the past. We also saw computers becoming better than humans at many things several decades ago, including many kinds of mathematical calculations (and people also thought that computers would soon beat humans at everything back then). So superhuman skill at many tasks is not what is new and unique about recent progress, but rather that these superhuman skills have been attained via self-training, and, as Lukas notes, that the skills achieved by this training seem of a broader, more general nature than the skills of a single algorithm in the past.

And yet the breadth of these skills should not be overstated either, as the skills cited are all acquired in a rather expensive trial-and-error fashion with readily accessible feedback. This mode of learning surely holds a lot of promise in many areas, yet there are reasons to be skeptical that such learning can bring us significantly closer to achieving all the cognitive and motor abilities humans have (see also David Pearce’s “Human’s and Intelligent Machines“; one need not agree with Pearce on everything to agree with some of his reasons to be skeptical).

That DeepMind’s AI technology makes quick progress in many domains, without requiring researchers to build new architecture from scratch each time, indicates that their machine learning algorithms have already reached an impressive level of general applicability. (Edit: I wrote the previous sentence in 2016. In the meantime [January 2018] DeepMind went on to refine its Go-playing AI, culminating in a version called AlphaGo Zero. While the initial version of DeepMind’s Go-playing AI started out with access to a large database of games played by human experts, AlphaGo Zero only learns through self-play. Nevertheless, it managed to become superhuman after a mere 4 days of practice. After 40 days of practice, it was able to beat its already superhuman predecessor 100–0. Moreover, Deepmind then created the version AlphaZero, which is not a “Go-specific” algorithm anymore. Fed with nothing but the rules for either Go, chess, or shogi, it managed to become superhuman at each of these games in less than 24 hours of practice.)

This is no doubt impressive. Yet it is also important not to overstate how much progress that was achieved in 24 hours of practice. This is not, we should be clear, a story about innovation going from zero to superhuman in 24 hours, but rather the story of immense amounts of hardware developed over decades which has then been fed with an algorithm that has also been developed over many years by many people. And then, this highly refined algorithm running on specialized, cutting-edge hardware is unleashed to reach its dormant potential.

And this potential was, it should be noted, not vastly superior to the abilities of previous systems. In chess, for instance, AlphaZero beat the chess program Stockfish (although Stockfish author Tord Romstad notes that it was a version that was a year old and not running on optimal hardware) 25 times as white, 3 as black, and drew the remaining 72 times. Thus, it was significantly better, yet it still did not win in most of the games. Similarly, in Go, AlphaZero won 60 games and lost 40, while in Shogi it won 90 times, lost 8, and drew twice.

Thus, AlphaZero undoubtedly comprised clear progress with respect to these games, yet not an enormous leap that rendered it unbeatable, and certainly not a leap made in a single day.

The road may still be long, but if this trend continues, developments in AI research will eventually lead to superhuman performance across all domains. As there is no reason to assume that humans have attained the maximal degree of intelligence (Section III), AI may soon after reaching our own level of intelligence surpass it.

Again, I would start by noting that human “intelligence” as our “ability to achieve goals” is strongly dependent on the state of our technology and culture at large, not merely our raw cognitive powers. And the claim made above that there is no reason to believe that humans have attained “the maximal degree of intelligence” seems, in this context, to mostly refer to our cognitive abilities rather than our ability to achieve goals in general. For with respect to our ability to achieve goals in general, it is clear that our abilities are not maximal, but indeed continually growing, largely as the result of better software and better machines. Thus, there is not a dichotomous relationship between “human abilities to achieve goals” and “our machines’ abilities to achieve goals”. And given that our ability to achieve goals is in many ways mostly limited by what our best technology can do — how fast our airplanes can fly, how fast our hardware is, how efficient our power plants are, etc. — it is not clear why some other agent or set of agents coming to control this technology (which is extremely difficult to imagine in the first place given the collaborative nature of the grosser infrastructure of this technology) should be vastly more capable of achieving goals than humans powered by/powering this technology.

As for AI surpassing “our own level of intelligence”, one can say that, at the level of cognitive tasks, machines have already been vastly superhuman in many respects for many years — in virtually all mathematical calculations, for instance. And now also in many games, ranging from Atari to Go. Yet, as noted above, I would argue that, so far, such progress has comprised a clear increase in human “intelligence” in the general sense: it has increased our ability to achieve goals.

Nick Bostrom (2014) popularized the term superintelligence to refer to (AGI-)systems that are vastly smarter than human experts in virtually all respects. This includes not only skills that computers traditionally excel at, such as calculus or chess, but also tasks like writing novels or talking people into doing things they otherwise would not. Whether AI systems would quickly develop superhuman skills across all possible domains, or whether we will already see major transformations with [superhuman skills in] just a [few] such domains while others lag behind, is an open question.

I would argue that our machines already have superhuman skills in countless domains, and that this has indeed already given rise to major transformations, in one sense of this term at least.

Note that the definitions of “AGI” and “superintelligence” leave open the question of whether these systems would exhibit something like consciousness.

I have argued to the contrary in the chapter “Consciousness — Orthogonal or Crucial?” in Reflections on Intelligence.

This article focuses on the prospect of creating smarter-than-human artificial intelligence. For simplicity, we will use the term “AI” in a non-standard way here, to refer specifically to artificial general intelligence (AGI).

Again, I would flag that the meaning of the term general intelligence, or AGI, in this context is not clear. It was defined above as the ability that “enables the expert pursuit of virtually any task or objective”. Yet the ability of humans to achieve goals in general is, I would still argue, in large part the product of their technology and culture at large, and AGI, as Lukas uses it here, does not seem to refer to anything remotely like this, i.e. “the sum of the capabilities of our technology and culture”. Instead, it seems to refer to something much more narrow and singular — something akin to “a system that possesses (virtually) all the cognitive abilities that a human does, and which possesses them at a similar or greater level”. I think this is worth highlighting.

The use of “AI” in this article will also leave open how such a system is implemented: While it seems plausible that the first artificial system exhibiting smarter-than-human intelligence will be run on some kind of “supercomputer,” our definition allows for alternative possibilities.

Again, what does “smarter-than-human intelligence” mean here? Machines can already do things that no unaided human can. It seems to refer to what I defined above: “a system that possesses (virtually) all the cognitive abilities that a human does, and which possesses them at a similar or greater level” — not the ability to achieve goals in general. And as for when a computer might have “(virtually) all the cognitive abilities that a human does”, it seems highly doubtful that any system will ever suddenly emerge with them all, given the modular, many-faceted nature of our minds. Instead, it seems much more likely that the gradual process of machines becoming better than humans at particular tasks will continue in its usual, gradual way. Or so I have argued.

The claim that altruists should focus on affecting AI outcomes is therefore intended to mean that we should focus on scenarios where the dominant force shaping the future is no longer (biological) human minds, but rather some outgrowth of information technology – perhaps acting in concert with biotechnology or other technologies. This would also e.g. allow for AI to be distributed over several interacting systems.

I think this can again come close to resembling a motte and bailey argument: it seems very plausible that the future will not be controlled mostly by what we would readily recognize as biological humans today. Yet to say that we should aim to impact such a future by no means implies that we should aim to impact, say, a small set of AI systems which might determine the entire future based on their goal functions (note: I am not saying Lukas has made this claim above, but this is often what people seem to consider the upshot of arguments of this kind, and also what it seems to me that Lukas is arguing below, in the rest of his essay). Indeed, the claim above is hardly much different from saying that we should aim to impact the long-term future. But Lukas seems to be moving back and forth between this general claim and the much narrower claim that we should focus on scenarios involving rapid growth acceleration driven mostly by software, which is the kind of scenario his essay seems almost exclusively focused on.

II. It is plausible that we create human-level AI this century

Even if we expect smarter-than-human artificial intelligence to be a century or more away, its development could already merit serious concern. As Sam Harris emphasized in his TED talk on risks and benefits of AI, we do not know how long it will take to figure out how to program ethical goals into an AI, solve other technical challenges in the space of AI safety, or establish an environment with reduced dangers of arms races. When the stakes are high enough, it pays to start preparing as soon as possible. The sooner we prepare, the better our chances of safely managing the upcoming transition.

I agree that it is worth preparing for high-stakes outcomes. But I think it is crucial that we get a clear sense of what these might look like, as well as how likely they are. “Altruists Should Prioritize Exploring Long-Term Future Outcomes, and Work out How to Best Influence Them”. To say that we should focus on “artificial intelligence”, which has a rather narrow meaning in most contexts (something akin to a software program), when we really mean that we should focus on the future of goal achieving systems in general is, I think, somewhat misleading.

The need for preparation is all the more urgent given that considerably shorter timelines are not out of the question, especially in light of recent developments. While timeline predictions by different AI experts span a wide range, many of those experts think it likely that human-level AI will be created this century (conditional on civilization facing no major disruptions in the meantime). Some even think it may emerge in the first half of this century: In a survey where the hundred most-cited AI researchers were asked in what year they think human-level AI is 10% likely to have arrived by, the median reply was 2024 and the mean was 2034. In response to the same question for a 50% probability of arrival, the median reply was 2050 with a mean of 2072 (Müller & Bostrom, 2016).1

Again, it is important to be careful about definitions. For what is meant by “human-level AI” in this context? The authors of the cited source are careful to define what they mean: “Define a ‘high–level machine intelligence’ (HLMI) as one that can carry out most human professions at least as well as a typical human.”

And yet even this definition is quite vague, since “most human professions” is not a constant. A couple of hundred years ago, the profession of virtually all humans was farming, whereas only a couple percent of people in developed nations are employed in farming today. And this is not an idle point, because as machines become able to do jobs hitherto performed by humans, market forces will push humans to take new jobs that machines cannot do. And these new jobs may be those that require abilities that it will take many centuries for machines to acquire, if non-biological machines will indeed ever acquire them (this is not necessarily that implausible, as these abilities may include “looking like a real, empathetic biological human who ignites our brain circuits in the right ways”).

Thus, the questionnaire above seems poorly defined. And if it asks about most current human professions, its relevance appears quite limited; also because the nature of different professions change over time as well. A doctor today does not do all the same things a doctor did a hundred years ago, and the same will likely apply to doctors of the future. In other words, also within existing professions can we expect to see humans move toward doing the things that machines cannot do/we do not prefer them to do, even as machines become ever more capable.

While it could be argued that these AI experts are biased towards short timelines, their estimates should make us realize that human-level AI this century is a real possibility.

Yet we should keep in mind what they were asked about, and how relevant this is. Even if most (current?) human professions might be done by machines within this century, this does not imply that we will see “a system that possesses (virtually) all the cognitive abilities that a human does, and which possesses them at a similar or greater level” within this century. These are quite different claims.

The next section will argue that the subsequent transition from human-level AI to superintelligence could happen very rapidly after human-level AI actualizes. We are dealing with the decent possibility – e.g. above 15% likelihood even under highly conservative assumptions – that human intelligence will be surpassed by machine intelligence later this century, perhaps even in the next couple of decades. As such a transition will bring about huge opportunities as well as huge risks, it would be irresponsible not to prepare for it.

I want to flag, again, that it is not clear what “human-level AI” means. Lukas seemed to first define intelligence as something like “the ability to achieve goals in general”, which I have argued is not really what he means here (indeed, it is a rather different beast which I seek to examine in Reflections on Intelligence). And the two senses of the term “human-level intelligence” mentioned in the previous paragraph — “the ability to do most human professions” versus “possessing virtually all human cognitive abilities” — should not be conflated either. So it is in fact not clear what is being referred to here, although I believe it is the latter: “possessing virtually all human cognitive abilities at a similar or greater level”.

It should be noted that a potentially short timeline does not imply that the road to superintelligence is necessarily one of smooth progress: Metrics like Moore’s law are not guaranteed to continue indefinitely, and the rate of breakthrough publications in AI research may not increase (or even stay constant) either. The recent progress in machine learning is impressive and suggests that fairly short timelines of a decade or two are not to be ruled out. However, this progress could also be mostly due to some important but limited insights that enable companies like DeepMind to reap the low-hanging fruit before progress would slow down again. There are large gaps still to be filled before AIs reach human-level intelligence, and it is difficult to estimate how long it will take researchers to bridge these gaps. Current hype about AI may lead to disappointment in the medium term, which could bring about an “AI safety winter” with people mistakenly concluding that the safety concerns were exaggerated and smarter-than-human AI is not something we should worry about yet.

This seems true, yet it should also be conceded that a consistent lack of progress in AI would count as at least weak evidence against the claim that we should mainly prioritize what is usually referred to as “AI safety“. And more generally, we should be careful not to make the hypothesis “AI safety is the most important thing we could be working on” into an unfalsifiable one.

As for Moore’s law, not only is it “not guaranteed to continue indefinitely”, but we know, for theoretical reasons, that it must come to an end within a decade, at least in its original formulation concerning silicon transistors, and progress has indeed already been below the prediction of “the law” for some time now. And the same can be said about other aspects in hardware progress: it shows signs of waning off.

If AI progress were to slow down for a long time and then unexpectedly speed up again, a transition to superintelligence could happen with little warning (Shulman & Sandberg, 2010). This scenario is plausible because gains in software efficiency make a larger comparative difference to an AI’s overall capabilities when the hardware available is more powerful. And once an AI develops the intelligence of its human creators, it could start taking part in its own self-improvement (see section IV).

I am not sure I understand the claims being made here. With respect to the first argument about gains in efficiency, the question is how likely we should expect such gains to be if progress has been slow for long. Other things being equal, this would seem less likely in a time where growth is slow than in a time when it is fast, and especially if there is not much growth in hardware either, since hardware growth may in large part be driving growth in software.

I am not sure I follow the claim about AI developing the intelligence of its human creators, and then taking part in its own improvement, but I would just note, as Ramez Naam has argued, that AI, and our machines in general, are already playing a significant role in their own improvement in many ways. In other words, we already actively use our best, most capable technology to build the next generation of such technology.

Indeed, on a more general, yet also less directly relevant note, I would also add that we humans have in some sense been using our most advanced cognitive tools to build the next generation of such tools for hundreds of thousands of years. For over the course of evolution, individual humans have been using the best of their cognitive abilities to select the mates who had the best total package (they could get), of which cognitive abilities were a significant part. In this sense, the idea that “dumb and blind” evolution created intelligent humans is actually quite wrong. The real story is rather one of cognitive abilities actively selecting cognitive abilities (along with other things). A gradual design process over the course of which ever greater cognitive powers were “creating” and in turn created.

For AI progress to stagnate for a long period of time before reaching human-level intelligence, biological brains would have to have surprisingly efficient architectures that AI cannot achieve despite further hardware progress and years of humans conducting more AI research.

Looking over the past decades of AI research and progress, we can say that it indeed has been a fairly long period of time since computers first surpassed humans in the ability to do mathematical calculations, and yet there are still many things humans can do which computers cannot, such as having meaningful conversations with other humans, learning fast from a few examples, and experiencing and expressing feelings. And yet these examples still mostly pertain to cognitive abilities, and hence still overlook other abilities that are also relevant with respect to machines taking over human jobs (if we focus on that definition of “human-level AI”), such as having the physical appearance of a real, biological human, which does seem in strong demand in many professions, especially in the service industry.

However, as long as hardware progress does not come to a complete halt, AGI research will eventually not have to surpass the human brain’s architecture or efficiency anymore. Instead, it could become possible to just copy it: The “foolproof” way to build human-level intelligence would be to develop whole brain emulation (WBE) (Sandberg & Bostrom, 2008), the exact copying of the brain’s pattern of computation (input-output behavior as well as isomorphic internal states at any point in the computation) onto a computer and a suitable virtual environment. In addition to sufficiently powerful hardware, WBE would require scanning technology with fine enough resolution to capture all the relevant cognitive function, as well as a sophisticated understanding of neuroscience to correctly draw the right abstractions. Even though our available estimates are crude, it is possible that all these conditions will be fulfilled well before the end of this century (Sandberg, 2014).

Yet it should be noted that there are many who doubt that this is a foolproof way to build “human-level intelligence” (a term that in this context again seems to mean “a system with roughly the same cognitive abilities as the human brain”). Many doubt that it is even a possibility, and they do so for many different reasons (e.g. that a single, high-resolution scanning of the brain is not enough to capture and enable an emulation of its dynamic workings; that a digital computer cannot adequately simulate the physical complexity of the brain, and that such a computer cannot solve the so-called binding problem.)

Thus, it seems to stand as an open question whether mind uploading is indeed possible, let alone feasible (and it also seems that many people in the broader transhumanist community, who tend to be the people who write and talk the most about mind uploading, could well be biased toward believing it possible, as many of them seem to hope that it can save them from death).

The perhaps most intriguing aspect of WBE technology is that once the first emulation exists and can complete tasks on a computer like a human researcher can, it would then be very easy to make more such emulations by copying the original. Moreover, with powerful enough hardware, it would also become possible to run emulations at higher speeds, or to reset them back to a well-rested state after they performed exhausting work (Hanson, 2016).

Assuming, of course, that WBE will indeed be feasible in the first place. Also, it is worth noting that Robin Hanson himself is critical of the idea that WBEs would be able to create software that is superior to themselves very quickly; i.e. he expects a WBE economy to undergo “many doublings” before it happens.

Sped-up WBE workers could be given the task of improving computer hardware (or AI technology itself), which would trigger a wave of steeply exponential progress in the development of superintelligence.

This is an exceptionally strong claim that would seem in need of justification, and not least some specification, given that it is not clear what “steeply exponential progress in the development of superintelligence” refers to in this context. It hardly means “steeply exponential progress in the development of a super ability to achieve goals in general”, including in energy efficiency and energy harvesting. Such exponential progress is not, I submit, likely to follow from progress in computer hardware or AI technology alone. Indeed, as we saw above, such progress cannot happen with respect to the energy efficiency of most of our machines, as physical limits mean that it cannot double more than a couple of times.

But even if we understand it to be a claim about the abilities of certain particular machines and their cognitive abilities more narrowly, the claim is still a dubious one. It seems to assume that progress in computer hardware and AI technology is constrained chiefly by the amount of hours put into it by those who work on it directly, as opposed to also being significantly constrained by countless other factors, such as developments in other areas, e.g. in physics, production, and transportation, many of which imply limits on development imposed by factors such as hardware and money, not just the amount of human-like genius available.

For example, how much faster should we expect the hardware that AlphaZero was running on to have been developed and completed if a team of super-WBEs had been working on it? Would the materials used for the hardware have been dug up and transported significantly faster? Would they have been assembled significantly faster? Perhaps somewhat, yet hardly anywhere close to twice as fast. The growth story underlying many worries about explosive AI growth is quite detached from how we actually improve our machines, including AI (software and hardware) as well as the harvesting of the energy that powers it (Vaclav Smil: “Energy transitions are inherently gradual processes and this reality should be kept in mind when judging the recent spate of claims about the coming rapid innovative take-overs […]”). Such growth is the result of countless processes distributed across our entire economy. Just as nobody knows how to make a pencil, nobody, including the very best programmers, knows (more than a tiny part of) how to make better machines.

To get a sense of the potential of this technology, imagine WBEs of the smartest and most productive AI scientists, copied a hundred times to tackle AI research itself as a well-coordinated research team, sped up so they can do years of research in mere weeks or even days, and reset periodically to skip sleep (or other distracting activities) in cases where memory-formation is not needed. The scenario just described requires no further technologies beyond WBE and sufficiently powerful hardware. If the gap from current AI algorithms to smarter-than-human AI is too hard to bridge directly, it may eventually be bridged (potentially very quickly) after WBE technology drastically accelerates further AI research.

As far as I understand, much of the progress in machine learning in modern times was essentially due to modern hardware and computing power that made it possible to implement old ideas invented decades ago (of course then implemented with all the many adjustments and tinkering whose necessity and exact nature one cannot foresee from the drawing board). In other words, software progress was hardly the most limiting factor. Arguably, the limiting factor was rather that the economy just had not caught up to be able to make hardware advanced enough to implement these theoretical ideas successfully. And it also seems to me quite naive to think that better hardware design, and genius ideas about how to make hardware more generally, was and is a main limiting factor in our growth of computer hardware. Such progress tends to rest critically on other progress in other kinds of hardware and globally distributed production processes. Processes that no doubt can be sped up, yet hardly that significantly by advanced software alone, in large part because such progress is limited by the fact that many of the crucial processes involved in this progress, such as digging up, refining, and transporting materials, are physical processes that can only go so fast.

Beyond that, there is also an opportunity cost consideration that is ignored by the story of fast growth above. For the hardware and energy required for this team of WBEs could otherwise have been used to run other kinds of computations that could help further innovation, including those we already run on full steam to further progress — CAD programs, simulations, equation solving. And it is not clear that using all this hardware for WBEs would be a better use of hardware than would running these other programs, whose work may be considered a limiting factor to AI progress at a similar level as more “purely” human or human-like work is. Indeed, we should not expect engineers and companies to do these kinds of things with their computing resources if they were not among the most efficient things they could do with them. And even if WBEs are a better use of hardware for fast progress, it is far from clear that it would be that much better.

The potential for WBE to come before de novo AI means that – even if the gap between current AI designs and the human brain is larger than we thought – we should not significantly discount the probability of human-level AI being created eventually. And perhaps paradoxically, we should expect such a late transition to happen abruptly. Barring no upcoming societal collapse, believing that superintelligence is highly unlikely to ever happen requires not only confidence that software or “architectural” improvements to AI are insufficient to ever bridge the gap, but also that – in spite of continued hardware progress – WBE could not get off the ground either. We do not seem to have sufficient reason for great confidence in either of these propositions, let alone both.

Again, what does the term “superintelligence” refer to here? Above, it was defined as “(AGI-)systems that are vastly smarter than human experts in virtually all respects”. And given that AGI is defined as a general ability to pursue goals, and that “smart” here presumably means “better able to achieve goals”, one can say that the definition of superintelligence given here translates to “a system that pursues goals better than human experts in virtually all areas”. Yet we are already building systems that satisfy this definition of superintelligence. Our entire economy is already able to do tasks that no single human expert could ever accomplish. But superintelligence likely refers to something else here, something along the lines of: “a system that is vastly more cognitively capable than any human expert in virtually all respects”. And yet, even by this definition, we already have computer systems that can do countless cognitive tasks much better than any human, and the super system that is the union of all these systems can therefore, in many respects at least, be considered to have vastly superior cognitive abilities relative to humans. And systems composed of humans and technology are clearly vastly more capable than any human expert alone in virtually all respects.

In this sense, we clearly do have “superintelligence” already, and we are continually expanding its capabilities. And, with respect to worries above a FOOM takeover, it seems highly unlikely that a single, powerful machine could ever overtake and become more powerful than the entire collective that is the human-machine civilization, which is not to say that low-risk events should be dismissed. But they should be measured against other risks we could be focusing on.

III. Humans are not at peak intelligence

Again, it is important to be clear about what we mean by “intelligence”. Most cognitively advanced? Or best able to achieve goals in general? Humans extended by technology can clearly increase their intelligence, i.e. ability to achieve goals, significantly. We have done so consistently over the last few centuries, and we continue to do so today. And in a world where humans build this growing body of technology to serve their own ends, and in some cases build it to be provably secure, it is far from clear that some non-human system with much greater cognitive powers than humans (which, again, already exists in many domains) will also become more capable of achieving goals in general than humanity, given that it is surrounded by a capable super-system of technology designed for and by humans, controlled by humans, to serve their ends. Again, this is not to say that one should not worry about seemingly improbable risks — we definitely should — but merely that we should doubt the assumption that our making machines more cognitively capable will necessarily imply that they will be better able to achieve goals in general. Again, despite being related, these two senses of “intelligence” must not be confused.

It is difficult to intuitively comprehend the idea that machines – or any physical system for that matter – could become substantially more intelligent than the most intelligent humans. Because the intelligence gap between humans and other animals appears very large to us, we may be tempted to think of intelligence as an “on-or-off concept,” one that humans have and other animals do not. People may believe that computers can be better than humans at certain tasks, but only at tasks that do not require “real” intelligence. This view would suggest that if machines ever became “intelligent” across the board, their capabilities would have to be no greater than those of an intelligent human relying on the aid of (computer-)tools.

Again, we should be clear that the word “intelligence” here seems to mean “most cognitively capable” rather than “best able to achieve goals in general”. And the gap between the “intelligence”, as in the ability to achieve goals, of humans and other animals does arguably not appear very large when we compare individuals. Most other animals can do things that no single human can do, and to the extent we humans can learn to do things other animals naturally beat us at, e.g. lift heavier objects or traverse distances faster than speedy animals, we do so by virtue of technology, in essence the product of collective, cultural evolution.

And even with respect to cognitive abilities, one can argue that humans are not superior to other animals in a general sense. We do not have superior cognitive abilities with respect to echo location, for example, much less long-distance navigation. Nor are humans superior when it comes to all aspects of short-term/working memory

Measuring goal achieving ability in general, as well as abilities to solve cognitive tasks in particular, along a single axis may be useful in some contexts, yet it can easily become meaningless when the systems being compared are not sufficiently similar. 

But this view is mistaken. There is no threshold for “absolute intelligence.” Nonhuman animals such as primates or rodents differ in cognitive abilities a great deal, not just because of domain-specific adaptations, but also due to a correlational “g factor” responsible for a large part of the variation across several cognitive domains (Burkart et al., 2016). In this context, the distinction between domain-specific and general intelligence is fuzzy: In many ways, human cognition is still fairly domain-specific. Our cognitive modules were optimized specifically for reproductive success in the simpler, more predictable environment of our ancestors. We may be great at interpreting which politician has the more confident or authoritative body language, but deficient in evaluating whose policy positions will lead to better developments according to metrics we care about. Our intelligence is good enough or “general enough” that we manage to accomplish impressive feats even in an environment quite unlike the one our ancestors evolved in, but there are many areas where our cognition is slower or more prone to bias than it could be.

I agree with this. I would just note that “intelligence” here again seems to be referring to cognitive abilities, not the ability to achieve goals in general, and that we humans have expanded both over time via culture: our cognitive abilities, as measured by IQ, have increased significantly over the last century, while our ability to achieve goals in general has expanded much more still as we have developed ever more advanced technology.

Intelligence is best thought of in terms of a gradient. Imagine a hypothetical “intelligence scale” (inspired by part 2.1 of this FAQ) with rats at 100, chimpanzees at, say, 350, the village idiot at 400, average humans at 500 and Einstein at 750.2 Of course, this scale is open at the top and could go much higher.

Again, intelligence here seems to refer to cognitive abilities, not the ability to achieve goals in general. Einstein was likely not better at shooting hoops than the average human, or indeed more athletic in general (by all appearances), although he was much more cognitively capable, at least in some respects, than virtually all other humans.

To quote Bostrom (2014, p. 44): “Far from being the smartest possible biological species, we are probably better thought of as the stupidest possible biological species capable of starting a technological civilization – a niche we filled because we got there first, not because we are in any sense optimally adapted to it.”

Again, the words “smart” and “stupid” here seem to pertain to cognitive abilities, not the ability to achieve goals in general. And this phrasing is misleading, as it seems to presume that cognitive ability is all it takes to build an advanced civilization, which is not the case. In fact, humans are not the species with the biggest brain on the planet, or even the species with the biggest cerebral cortex; indeed, long-finned pilot whales have more than twice as many neocortical neurons.

What we are, however, is a species with a lot of unique tools — fine motor hands, upright walk, vocal cords, a large brain with a large prefrontal cortex, etc. — which together enabled humans to (gradually build a lot of tools with which they could) take over the world. Remove just one of these unique tools from all of humanity, and we would be almost completely incapable. And this story of a multiplicity of components that are all necessary yet insufficient for the maintenance and growth of human civilization is even more true today, where we have countless external tools — trucks, the internet, computers, screwdrivers, etc. — without which we could not maintain our civilization. And the necessity of all these many different components seems overlooked by the story that views advanced cognitive abilities as the sole driver, or near enough, of growth and progress in the ability to achieve goals in general. This, I would argue, is a mistake.

Thinking about intelligence as a gradient rather than an “on-or-off” concept prompts a Copernican shift of perspective. Suddenly it becomes obvious that humans cannot be at the peak of possible intelligence. On the contrary, we should expect AI to be able to surpass us in intelligence just like we surpass chimpanzees.

Depending on what we mean by the word “intelligence”, one can argue that computers have already surpassed humans. If we define “intelligence” to be “that which is measured by an IQ test”, for example, then computers have already been better than humans in at least some of these tests for a few years now.

In terms of our general ability to achieve goals, however, it is not clear that computers will so readily surpass humans, in large part because we do not aim to build them to be better than humans in many respects. Take self-repair, for example, which is something human bodies, just like virtually all animal bodies, are in a sense designed to do — indeed, most of our self-repair mechanisms are much older than we are as a species. Evolution has built humans to be competent and robust autonomous systems who do not for the most part depend on a global infrastructure to repair their internal parts. Our computers, in contrast, are generally not built to be self-repairing, at least not at the level of hardware. Their notional thrombocytes are entirely external to themselves, in the form of a thousand and one specialized tools and humans distributed across the entire economy. And there is little reason to think that this will change, as there is little incentive to create self-repairing computers. We are not aiming to build generally able, human-independent computers in this sense.

Biological evolution supports the view that AI could reach levels of intelligence vastly beyond ours. Evolutionary history arguably exhibits a weak trend of lineages becoming more intelligent over time, but evolution did not optimize for intelligence (only for goal-directed behavior in specific niches or environment types). Intelligence is metabolically costly, and without strong selection pressures for cognitive abilities specifically, natural selection will favor other traits. The development of new traits always entails tradeoffs or physical limitations: If our ancestors had evolved to have larger heads at birth, maternal childbirth mortality would likely have become too high to outweigh the gains of increased intelligence (Wittman & Wall, 2007). Because evolutionary change happens step-by-step as random mutations change the pre-existing architecture, the changes are path dependent and can only result in local optima, not global ones.

Here we see how the distinction between “intelligence as cognitive abilities” and “intelligence as the ability to achieve goals” is crucial. Indeed, the example provided above clearly proves the point that advanced cognitive abilities are often not the most relevant thing for achieving goals, since the goal of surviving and reproducing was often not best achieved, as Lukas hints, with the best cognitive abilities. Often it was better achieved with longer teeth or stronger muscles. Or a prettier face.

So the question is: why do we think that advanced cognitive abilities are, to a first approximation, identical with the ability to achieve goals? And, more importantly, why do we imagine that this lesson about the sub-optimality of spending one’s limited resources on better cognitive abilities does not still hold today? Why should cognitive abilities be the sole optimal thing, or near enough, to spend all one’s resources on in order to best achieve a broad range of goals? I would argue that it is not. It was not optimal in the past (with respect to the goal of survival), and it does not seem to be optimal today either.

It would be a remarkable coincidence if evolution had just so happened to stumble upon the most efficient way to assemble matter into an intelligent system.

But it would be less remarkable if it had happened to assemble matter into a system that is broadly capable of achieving a broad range of goals, and which another system, especially one that is not built over a billion year process to be robust and highly autonomous, cannot readily outdo in terms of autonomous function. It would also not be that remarkable if biological humans, functioning within a system built by and for biological humans, happened to be among the most capable systems within such a system, not least given all the legal, social and political aspects this system entails.

Beyond that, one can dispute the meaning of “intelligent system” in the quote above, but if we look at the intelligent system that is our civilization at large, one can say that the optimization going on at this level is not coincidental but indeed deliberate, often aiming toward peak efficiency. Thus, in this regard as well, we should not be too surprised if our current system is quite efficient and competent relative to the many constraints we are facing.

But let us imagine that we could go back to the “drawing board” and optimize for a system’s intelligence without any developmental limitations. This process would provide the following benefits for AI over the human brain (Bostrom, 2014, p. 60-61):

Free choice of substrate: Signal transmission with computer hardware is millions of times faster than in biological brains. AI is not restricted to organic brains, and can be built on the substrate that is overall best suited for the design of intelligent systems.

Supersizing:” Machines have (almost) no size-restrictions. While humans with elephant-sized brains would run into developmental impossibilities, (super)computers already reach the size of warehouses and could in theory be built even bigger.

No cognitive biases: We should be able to construct AI in a way that uses more flexible heuristics, and always the best heuristics for a given context, to prevent the encoding or emergence of substantial biases. Imagine the benefits if humans did not suffer from confirmation biasoverconfidencestatus quo biasetc.!

Modular superpowers: Humans are particularly good at tasks for which we have specialized modules. For instance, we excel at recognizing human faces because our brains have hard-wired structures that facilitate that facial recognition in particular. An artificial intelligence could have many more such specialized modules, including extremely useful ones like a module for programming.

Editability and copying: Software on a computer can be copied and edited, which facilitates trying out different variations to see what works best (and then copying it hundreds of times). By contrast, the brain is a lot messier, which makes it harder to study or improve. We also lack correct introspective access to the way we make most of our decisions, which is an important advantage that (some) AI designs could have.

Superior architecture: Starting anew, we should expect it to be possible to come up with radically more powerful designs than the patchwork architecture that natural selection used to construct the human brain. This difference could be enormously significant.

It should be noted that computers already 1) can be built with a wide variety of substrates, 2) can be supersized, 3) do not tend to display cognitive biases, 4) have modular superpowers, 5) can be edited and copied (or at least software readily can), 6) can be made with any architecture we can come up with. All of these advantages exist and are being exploited already, just not as much as they can be. And it is not clear why we should expect future change to be more radical than the change we have seen in past decades in which we have continually built ever more competent computers which can do things that no human can by exploiting these advantages.

With regard to the last point, imagine we tried to optimize for something like speed or sight rather than intelligence. Even if humans had never built anything faster than the fastest animal, we should assume that technological progress – unless it is halted – would eventually surpass nature in these respects. After all, natural selection does not optimize directly for speed or sight (but rather for gene copying success), making it a slower optimization process than those driven by humans for this specific purpose. Modern rockets already fly at speeds of up to 36,373 mph, which beats the peregrine falcon’s 240 mph by a huge margin. Similarly, eagle vision may be powerful, but it cannot compete with the Hubble space telescope. (General) intelligence is harder to replicate technologically, but natural selection did not optimize for intelligence either, and there do not seem to be strong reasons to believe that intelligence as a trait should differ categorically from examples like speed or sight, i.e., there are as far as we know no hard physical limits that would put human intelligence at the peak of what is possible.3

Again, what is being referred to by the word “intelligence” here seems to be cognitive abilities, not the ability to achieve goals in general. And with respect to cognitive abilities in particular, it is clear that computers already beat humans by a long shot in countless respects. So the point Lukas is making here is clearly true.

Another way to develop an intuition for the idea that there is significant room for improvement above human intelligence is to study variation in humans. An often-discussed example in this context is the intellect of John von Neumann. Von Neumann was not some kind of an alien, nor did he have a brain twice as large as the human average. And yet, von Neumann’s accomplishments almost seem “superhuman.” The section in his Wikipedia entry that talks about him having “founded the field of Game theory as a mathematical discipline” – an accomplishment so substantial that for most other intellectual figures it would make up most of their Wikipedia page – is just one out of many of von Neumann’s major achievements.

There are already individual humans (with normal-sized brains) whose intelligence vastly exceeds that of the typical human. So just how much room there is above their intelligence? To visualize this, consider for instance what could be done with an AI architecture more powerful than the human brain running on a warehouse-sized supercomputer.

A counterpoint to this line of reasoning can be found by contemplating chess ratings. Ratings of the skills of chess players are usually done via the so-called Elo rating system, which measures the relative skills of different players against each other. A beginner will usually have a rating around 800, whereas a rating in the range 2000-2199 ranks one as a chess “Expert”, and a ranking of 2400 and above renders one a “Senior Master”. The highest rating ever achieved was 2882 by Magnus Carlsen. Surely, this amount of variation must be puny given that all the humans who have ever played chess have roughly the same brain sizes and structures. And yet it turns out that human variation in chess ability is in fact quite enormous in an absolute sense.

For example, it took more than four decades from computers were able to beat a chess beginner (the 1950s), until they were able to beat the very best human player (1997 officially). Thus, the span from ordinary human beginner to the best human expert was more than four decades of progress in hardware — i.e. a million times more computing power — and software. That seems quite a wide range.

And yet the range seems even broader if we consider the ultimate limits of optimal chess play. For one may argue that the fact that it took computers a fairly long time to go from the average human level to the level of the best human does not mean that the best human is not still ridiculously far from the best a computer could be in theory. Surprisingly, however, this latter distance does in fact seem quite small, at least in one sense. For estimates suggest that the best possible chess machine would have an Elo rating around 3600, which means that the relative distance between the best possible computer and the best human is only around 700 Elo points, implying that the distance between the best human and a chess “Expert” is similar to the distance between the best human and the best possible chess brain, while the distance between an ordinary human beginner and the best human is far greater.

It seems plausible that a similar pattern obtains with respect to many other complex cognitive tasks. Indeed, it seems plausible that many of our abilities, especially those we evolved to do well, such as our ability to interact with other humans, have an “Elo rating” quite close to the notional maximum level for most humans.

IV. The transition from human to superhuman intelligence could be rapid

Perhaps the people who think it is unlikely that superintelligent AI will ever be created are not objecting to it being possible in principle. Maybe they think it is simply too difficult to bridge the gap from human-level intelligence to something much greater. After all, evolution took a long time to produce a species as intelligent as humans, and for all we know, there could be planets with biological life where intelligent civilizations never evolved.4 But considering that there could come a point where AI algorithms start taking part in their own self-improvement, we should be more optimistic.

We should again be clear that the term “superintelligent AI” seems to refer to a system with greater cognitive abilities, across a wide range of tasks, than humans. As for “a point where AI algorithms start taking part in their own self-improvement”, it should be noted, again, that we already use our best software and hardware in the process of developing better software and hardware. True, they are only a part of a process that involves far more elements, yet this is true of most everything that we produce and improve in our economy: many contributions drawn from and distributed across our economy at large are required. And we have good reason to believe that this will continue to be true of the construction of more capable machines in the future.

AIs contributing to AI research will make it easier to bridge the gap, and could perhaps even lead to an acceleration of AI progress to the point that AI not only ends up smarter than us, but vastly smarter after only a short amount of time.

Again, we already use our best software and hardware to contribute to AI research, and yet we do not appear to see acceleration in the growth of our best supercomputers. In fact, in terms of their computing power, we see a modest decline.

Several points in the list of AI advantages above – in particular the advantages derived from the editability of computer software or the possibility for modular superpowers to have crucial skills such as programming – suggest that AI architectures might both be easier to further improve than human brains, and that AIs themselves might at some point become better at actively developing their own improvements.

Again, computers are already “easier to further improve than human brains” in these ways, and our hardware and software are already among the most active parts in their own improvement. So why should we expect to see a different pattern in the future from the pattern we see today of gradual, slightly declining growth?

If we ever build a machine with human-level intelligence, it should then be comparatively easy to speed it up or make tweaks to its algorithm and internal organization to make it more powerful. The updated version, which would at this point be slightly above human-level intelligence, could be given the task of further self-improvement, and so on until the process runs into physical limits or other bottlenecks.

Or better yet than “human-level intelligence” would be if we built software that was critical for the further development of more powerful computers. And we in fact already have such software, many different kinds of it, and yet it is not that easy to simply “speed it up or make tweaks to its algorithm and internal organization to make it more powerful”. More generally, as noted above, we already use our latest, updated technology to improve our latest, updated technology, and the result is not rapid, runaway growth.

Perhaps self-improvement does not have to require human-level general intelligence at all. There may be comparatively simple AI designs that are specialized for AI science and (initially) lack proficiency in other domains. The theoretical foundations for an AI design that can bootstrap itself to higher and higher intelligence already exist (Schmidhuber, 2006), and it remains an empirical question where exactly the threshold is after which AI designs would become capable of improving themselves further, and whether the slope of such an improvement process is steep enough to go on for multiple iterations.

Again, I would just reiterate that computers are already an essential component in the process of improving computers. And the fact that humans who need to sleep and have lunch breaks are also part of this improvement process does not seem a main constraint on it compared to other factors, such as physical limitations implied by transportation and the assemblage of materials. Oftentimes in modern research, computers run simulations at their maximum capacity while the humans do their sleeping and lunching, in which case these resting activities (through which humans often get their best ideas) do not limit progress much at all, whereas the available computing power does.

For the above reasons, it cannot be ruled out that breakthroughs in AI could at some point lead to an intelligence explosion (Good, 1965; Chalmers, 2010), where recursive self-improvement leads to a rapid acceleration of AI progress. In such a scenario, AI could go from subhuman intelligence to vastly superhuman intelligence in a very short timespan, e.g. in (significantly) less than a year.

“It cannot be ruled out” can be said of virtually everything; the relevant question is how likely we should expect these possibilities to be. Beyond that, it is also not clear what would count as a “rapid acceleration of AI progress”, and thus what exactly it is that cannot be ruled out. AI going from subhuman performance to vastly greater than human performance in a short amount of time has already been seen in many different domains, including Go most recently.

But if one were to claim, to take a specific claim, that it cannot be ruled out that an AI system will improve itself so much that it can overpower human civilization and control the future, then I would argue that the reasoning above does not support considering this a likely possibility, i.e. something that is more likely to happen than, say, one in a thousand.

While the idea of AI advancing from human-level to vastly superhuman intelligence in less than a year may sound implausible, as it violates long-standing trends in the speed of human-driven development, it would not be the first time where changes to the underlying dynamics of an optimization process cause an unprecedented speed-up. Technology has been accelerating ever since innovations (such as agriculture or the printing press) began to feed into the rate at which further innovations could be generated.5

In the endnote “5” referred to above, Lukas writes:

[…] Finally, over the past decades, many tasks, including many areas of research and development, have already been improved through outsourcing them to machines – a process that it is still ongoing and accelerating.

That this process of outsourcing of tasks is accelerating seems in need of justification. We have been outsourcing tasks to machines in various ways and at a rapid pace for at least two centuries now, and so it is not a trivial claim that this process is accelerating.

Compared to the rate of change we see in biological evolution, cultural evolution broke the sound barrier: It took biological evolution a few million years to improve on the intelligence of our ape-like ancestors to the point where they became early hominids. By contrast, technology needed little more than ten thousand years to progress from agriculture to space shuttles.

And I would argue that the reason technology could grow so fast is because an ever larger system of technology consisting of an ever greater variety of tools was contributing to it through recursive self-improvement — human genius was but one important component. And I think we have good reason to think the same about the future.

Just as inventions like the printing press fed into – and significantly sped up – the process of technological evolution, rendering it qualitatively different from biological evolution, AIs improving their own algorithms could cause a tremendous speed-up in AI progress, rendering AI development through self-improvement qualitatively different from “normal” technological progress.

I think there is very little reason to believe this story. Again, we already use our best machines to build the next generation of machines. “Normal” technological progress of the kind we see today already depends on computers running programs created to optimize future technology as efficiently as they can, and it is far from clear that running a more human kind of program would be a more efficient use of resources toward this end.

It should be noted, however, that while the arguments in favor of a possible intelligence explosion are intriguing, they nevertheless remain speculative. There are also some good reasons why some experts consider a slower takeoff of AI capabilities more likely. In a slower takeoff, it would take several years or even decades for AI to progress from human to superhuman intelligence.

Again, the word “intelligence” here seems to refer to cognitive abilities, not the ability to achieve goals in general. And it is again not clear what it means to say that it might “take several years or even decades for AI to progress from human to superhuman intelligence”, since computers have already been more capable than humans at a wide variety of cognitive tasks for many decades. So I would argue that this statement suffers from a lack of conceptual clarity.

Unless we find decisive arguments for one scenario over the other, we should expect both rapid and comparably slow takeoff scenarios to remain plausible. It is worth noting that because “slow” in this context also includes transitions on the order of ten or twenty years, it would still be very fast practically speaking, when we consider how much time nations, global leaders or the general public would need to adequately prepare for these changes.

To reiterate the statement I just made, it is not clear what a fast takeoff means in this context given that computers are already vastly superior to humans in many domains, and probably will continue to beat humans at ever more tasks before they come close to being able to do virtually all cognitive tasks humans can do. So what it is we are supposed to consider plausible is not entirely clear. As for whether it is plausible for rapid progress to occur over a wide range of cognitive tasks such that an AI system becomes able to take over the world, I would argue that we have not seen arguments to support this claim.

V. By default, superintelligent AI would be indifferent to our well-being

The typical mind fallacy refers to the belief that other minds operate the same way our own does. If an extrovert asks an introvert, “How can you possibly not enjoy this party; I talked to half a dozen people the past thirty minutes and they were all really interesting!” they are committing the typical mind fallacy.

When envisioning the goals of smarter-than-human artificial intelligence, we are in danger of committing this fallacy and projecting our own experience onto the way an AI would reason about its goals. We may be tempted to think that an AI, especially a superintelligent one, will reason its way through moral arguments6 and come to the conclusion that it should, for instance, refrain from harming sentient beings. This idea is misguided, because according to the intelligence definition we provided above – which helps us identify the processes likely to shape the future – making a system more intelligent does not change its goals/objectives; it only adds more optimization power for pursuing those objectives.

Again, we need to be clear about what “smarter-than-human artificial intelligence” means here. In this case, we seem to be talking about a fairly singular and coherent system, a “mind” of sorts — as opposed to a thousand and one different software programs that do their own thing well — and hence in this regard it seems that the term “smarter-than-human artificial intelligence” here refers to something that is quite similar to a human mind. We are seemingly also talking about a system that “would reason about its goals”.

It seems worth noting that this is quite different from how we think about contemporary software programs, even including the most advanced ones such as AlphaZero and IBM’s Watson, which we are generally not tempted to consider “minds”. Expecting competent software programs of the future to be like minds may itself be to commit a typical mind fallacy of sorts, or perhaps just a mind fallacy. It is conceivable that software will continue to outdo humans at many tasks without acquiring anything resembling what we usually conceive of as a mind.

Another thing worth clarifying is what we mean by the term “by default” here. Does it refer to what AI systems will be built to do by our economy in the absence of altruistic intervention? If “by default” means that which our economy will naturally tend to produce, it seems likely that future AI indeed will be programmed to not be indifferent, at least in a behavioral sense, to human well-being “by default”. Indeed, it seems a much greater risk that future software systems will be constructed to act in a way that exclusively benefits, and is indifferent toward anything else than, human beings. In other words, that it will share our speciesist bias, with catastrophic consequences ensuing.

My point here is merely that, just as it is almost meaningless to claim that biological minds will not care about our well-being by default, as it lacks any specification of what “by default” means — given what evolutionary history? — so is it highly unclear what “by default” means when we are talking about machines created by humans. It seems to assume that we are going to suddenly have a lot of “undirected competence” delivered to us which does not itself come with countless sub-goals and adaptations built into it to attain ends desired by human programmers, and, perhaps to a greater extent, markets.

To give a silly example, imagine that an arms race between spam producers and companies selling spam filters leads to increasingly more sophisticated strategies on both sides, until the side selling spam filters has had it and engineers a superintelligent AI with the sole objective to minimize the number of spam emails in their inboxes.

Again, I would flag that it is not clear what “superintelligent AI” means here. Does it refer to a system that is better able to achieve goals across the board than humans? Or merely a system with greater cognitive abilities than any human expert in virtually all domains? Even if it is merely the latter, it is unlikely that a system developed by a single team of software developers will have much greater cognitive competences across the board than the systems developed by other competing teams, let alone those developed by the rest of the economy combined.

With its level of sophistication, the spam-blocking AI would have more strategies at its disposal than normal spam filters.

Yet how many more? What could account for this large jump in capabilities from previous versions of spam filters? What is hinted here seems akin to the sudden emergence of a Bugatti in the Stone Age. It does not seem credible.

For instance, it could try to appeal to human reason by voicing sophisticated, game-theoretic arguments against the negative-sum nature of sending out spam. But it would be smart enough to realize the futility of such a plan, as this naive strategy would backfire because some humans are trolls (among other reasons). So the spam-minimizing AI would quickly conclude that the safest way to reduce spam is not by being kind, but by gaining control over the whole planet and killing everything that could possibly try to trick its spam filter.

First of all, it is by no means clear that this would be “the safest way” to minimize spam. Indeed, I would argue that trying to gain control in this way would be a very bad action in expectation with respect to the goal of minimizing spam.

But even more fundamentally, the scenario above seems to assume that it would be much easier to build a system with the abilities to take over the world than it would to properly instantiate the goals we want it to achieve. For instance, in the case of earlier versions of AlphaZero, these were all equally aligned with the goal of winning Go. The hard problem was to make it more capable at doing it. The assumption that the situation would be inverted with respect to future goal implementation seems to me unwarranted. Not because the goals are necessarily easy to instantiate, but because the competences in question appear extremely difficult to create. The scenario described above seems to ignore this consideration, and instead assumes that the default scenario is that we will suddenly get advanced machines with a lot of competence, but where we do not know how to direct this competence toward doing what we want it to, as opposed to gradually directing and integrating these competences as they are (gradually) acquired. Beyond that, on a more general note, I think many aspiring effective altruists who worry about AI safety tend to underestimate the extent to which computer programmers are already focused on making software do what they intend it to.

Moreover, the scenario considered here also seems to assume that it would be relatively easy to make a competent machine optimize a particular goal insistently, and I would also question that this is anything less than extremely difficult. In other words, not only do I think it is extremely difficult to create the competences in question, as noted above, but I also think it is extremely difficult to orient all these competences, not just a few subroutines, toward insistently accomplishing some perverse goal. For this reason too, I think one should be highly skeptical of scenarios of this kind.

The AI in this example may fully understand that humans would object to these actions on moral grounds, but human “moral grounds” are based on what humans care about – which is not the minimization of spam! And the AI – whose whole decision architecture only selects for actions that promote the terminal goal of minimizing spam – would therefore not be motivated to think through, let alone follow our arguments, even if it could “understand” them in the same way introverts understand why some people enjoy large parties.

I think this is inaccurate. Any goal-oriented agent would be motivated to think through these things for the same reason that we humans are motivated to think through what those who disagree with us morally would say and do: because it impacts how we ourselves can act effectively toward our goals (this, we should be honest, is also often why humans think about the views and arguments made by others; not because of a deep yearning for truth and moral goodness but for purely pragmatic and selfish reasons). Thus, it makes sense to be mindful of those things, especially given that one has imperfect information and an imperfect ability to predict the future, no matter how “smart” one is.

The typical mind fallacy tempts us to conclude that because moral arguments appeal to us,7 they would appeal to any generally intelligent system. This claim is after all already falsified empirically by the existence of high-functioning psychopaths. While it may be difficult for most people to imagine how it would feel to not be moved by the plight of anyone but oneself, this is nothing compared to the difficulties of imagining all the different ways that minds in general could be built. Eliezer Yudkowsky coined the term mind space to refer to the set of all possible minds – including animals (of existing species as well as extinct ones), aliens, and artificial intelligences, as well as completely hypothetical “mind-like” designs that no one would ever deliberately put together. The variance in all human individuals, throughout all of history, only represents a tiny blob in mind space.

Yes, but this does not mean that the competences of human minds only span a tiny range of the notional “competence range” of various abilities. As we saw in the example of chess above, humans span a surprisingly large range, and the best humans are surprisingly close to the best mind possible. And with respect to the competences required for navigating within a world built by and for humans, it is not that unreasonable to believe that, on a continuum that measures competence across these many domains with a single measure, we are probably quite high and quite difficult to beat. This is not arrogance. It is merely to acknowledge the contingent structure of our civilization, and the fact that it is adapted to many contingent features of the human organism in general, including the human mind in particular.

Some of the minds outside this blob would “think” in ways that are completely alien to us; most would lack empathy and other (human) emotions for that matter; and many of these minds may not even relevantly qualify as “conscious.”

Most of these minds would not be moved by moral arguments, because the decision to focus on moral arguments has to come from somewhere, and many of these minds would simply lack the parts that make moral appeals work in humans. Unless AIs are deliberately designed8 to share our values, their objectives will in all likelihood be orthogonal to ours (Armstrong, 2013).

Again, an agent trying to achieve goals in our world need not be moved by moral arguments in an emotional sense in order to pay attention to them and the preferences of humans more generally, and to choose to avoid causing chaos. Second, the question is why we should expect future software designed by humans to not be “deliberately designed to share our values”? And what marginal difference should we expect altruists to be able to make on them? And how would this influence best be achieved?

VI. AIs will instrumentally value self-preservation and goal preservation

Even though AI designs may differ radically in terms of their top-level goals, we should expect most AI designs to converge on some of the same subgoals. These convergent subgoals (Omohundro, 2008; Bostrom, 2012) include intelligence amplification, self-preservation, goal preservation and the accumulation of resources. All of these are instrumentally very useful to the pursuit of almost any goal. If an AI is able to access the resources it needs to pursue these subgoals, and does not explicitly have concern for human preferences as (part of) its top-level goal, its pursuit of these subgoals is likely to lead to human extinction (and eventually space colonization; see below).

Again, what does “AI design” refer to in this context? Presumably a machine that possesses most of the cognitive abilities a human does to a similar or greater extent, and, on top of that, this machine is in some sense highly integrated into something akin to a coherent unified mind subordinate to a few supreme “top-level goals”. Thus, when Lukas writes “most AI designs” above, he is in fact referring to most systems that meet a very particular definition of “AI”, and one which I strongly doubt will be anywhere close to the most prevalent source of “machine competence” in the future (note that this is not to say that software, as well as our machines in general, will not become ever more competent in the future, but merely that such greater competences may not be subordinate to one goal to rule them all, or a few for that matter).

Beyond that, the claim that such a capable machine of the future seeking to achieve these subgoals is likely to lead to human extinction is a very strong claim that is not supported here, nor in the papers cited. More on this below.

AI safety work refers to interdisciplinary efforts to ensure that the creation of smarter-than-human artificial intelligence will result in excellent outcomes rather than disastrous ones. Note that the worry is not that AI would turn evil, but that indifference to suffering and human preferences will be the default unless we put in a lot of work to ensure that AI is developed with the right values.

Again, I would take issue with this “default” claim, as I would argue that “a lot of work” is exactly what we should expect that there will be made to ensure that future software will do what humans want it to. And the question is, again, how much of a difference altruists should expect to make here, as well as how to best make it.

VI.I Intelligence amplification

Increasing an agent’s intelligence improves its ability to efficiently pursue its goals. All else equal, any agent has a strong incentive to amplify its intelligence. A real-life example of this convergent drive is the value of education: Learning important skills and (thinking-)habits early in life correlates with good outcomes. In the AI context, intelligence amplification as a convergent drive implies that AIs with the ability to improve their own intelligence will do so (all else equal). To self-improve, AIs would try to gain access to more hardware, make copies of themselves to increase their overall productivity, or devise improvements to their own cognitive algorithms.

Again, what does the word “intelligence” mean in this context? Above, it was defined as “the ability to achieve goals in a wide range of environments”, which means that what is being said here reduces to the tautological claim that increasing an agent’s ability to achieve goals improves its ability to achieve goals. If one defines “intelligence” to refer to cognitive abilities, however, the claim becomes less empty. Yet it also becomes much less obvious, especially if one thinks in terms of investments of marginal resources, as it is questionable whether investing in greater cognitive abilities (as opposed to a prettier face or stronger muscles) is the best investment one can make with respect to the goal of achieving goals “in general”.

On a more general note, I would argue that “intelligence amplification”, as in “increasing our ability to achieve goals”, is already what we collectively do in our economy to a great extent, although this increase is, of course, much broader than one merely oriented toward optimizing cognitive abilities. We seek to optimize materials, supply chains, transportation networks, energy efficiency, etc. And it is not clear why this growth process should speed up significantly due to greater machine capabilities in the future than it has in the past, where more capable machines also helped grow the economy in general, as well as to increase the capability of machines in particular.

More broadly, intelligence amplification also implies that an AI would try to develop all technologies that may be of use to its pursuits.

Yet should we expect such “an AI” to be better able to develop “all technologies that may be of use to its pursuits” better than entire industries currently dedicated to it, let alone our entire economy? Indeed, should we even expect it to contribute significantly, i.e. double current growth rates across the board? I would argue that this is most dubious.

I.J. Good, a mathematician and cryptologist who worked alongside Alan Turing, asserted that “the first ultraintelligent machine is the last invention that man need ever make,” because once we build it, such a machine would be capable of developing all further technologies on its own.

To say that a single machine would be able to develop all further technologies on its own is, I submit, unsound. For what does “on its own” mean here? “On its own” independently of the existing infrastructure of machines run by humans? Or “on its own” as in taking over this entire infrastructure? And how exactly could such a take-over scenario occur without destroying the productivity of this system? None of these scenarios seem plausible.

VI.II Goal preservation

AIs would in all likelihood also have an interest in preserving their own goals. This is because they optimize actions in terms of their current goals, not in terms of goals they might end up having in the future.

This again seems to assume that we will create highly competent systems which will be subordinate to a single or a few explicit goals that it will insistently optimize all its actions for. Why should we believe this?

Another critical note of mine on this idea quoted from elsewhere:

Stephen Omohundro (Omohundro, 2008) argues that a chess-playing robot with the supreme goal of playing good chess would attempt to acquire resources to increase its own power and work to preserve its own goal of playing good chess. Yet in order to achieve such complex subgoals, and to even realize they might be helpful with respect to achieving the ultimate goal, this robot will need access to, and be built to exercise advanced control over, an enormous host of intellectual tools and faculties. Building such tools is extremely hard and requires many resources, and harder still, if at all possible, is it to build them so that they are subordinate to a single supreme goal. And even if all this is possible, it is far from clear that access to these many tools would not enable – perhaps even force – this now larger system to eventually “reconsider” the goals that it evolved from. For instance, if the larger system has a sufficient amount of subsystems with sub-goals that involve preservation of the larger system of tools, and if the “play excellent chess” goal threatens, or at least is not optimal with respect to, this goal, could one not imagine that, in some evolutionary competition, these sub-goals could overthrow the supreme goal?

Footnote: After all, humans are such a system of competing drives, and it has been argued (e.g. in Ainslie, 2001 [Breakdown of Will]) that this competition is what gives us our unique cognitive strengths (as well as weaknesses). Our ultimate goals, to the extent we have any, are just those that win this competition most of the time.

And Paul Christiano has also described agents that would not be subject to this “basic drive” of self-preservation described by Omohundro.

Lukas continues:

From the current goal’s perspective, a change in the AI’s goal function is potentially disastrous, as the current goal would not persevere. Therefore, AIs will try to prevent researchers from changing their goals.

Granted that such a highly competent system is built so as to be subordinate to a single goal in this way, which I do not think there is good reason to consider likely to be the case in future AI systems “by default”.

Consequently, there is pressure for AI researchers to get things right on the first try: If we develop a superintelligent AI with a goal that is not quite what we were after – because someone made a mistake, or was not precise enough, or did not think about particular ways the specified goal could backfire – the AI would pursue the goal that it was equipped with, not the goal that was intended. This applies even if it could understand perfectly well what the intentioned goal was. This feature of going with the actual goal instead of the intended one could lead to cases of perverse instantiation, such as the AI “paralyz[ing] human facial musculatures into constant beaming smiles” to pursue an objective of “make us smile” (Bostrom, 2014, p. 120).

This again seems to assume that this “first superintelligent AI” would be so much more powerful than everything else in the world, yet why should we expect a single system to be so much more powerful than everything else across the board? Beyond that, it also seems to assume that the design of this system would happen in something akin to a single step — that there would be a “first try”. Yet what could a first try consist in? How could a super capable system emerge in the absence of a lot of test models that are slightly less competent? I think this “first try” idea betrays an underlying belief in a sudden growth explosion powered by a single, highly competent machine, which, again, I would argue is highly unlikely in light of what we know about the nature of the growth of the capabilities of machines.

VI.III Self-preservation

Some people have downplayed worries about AI risks with the argument that when things begin to look dangerous, humans can literally “pull the plug” in order to shut down AIs that are behaving suspiciously. This argument is naive because it is based on the assumption that AIs would be too stupid to take precautions against this.

There is a difference between being “stupid” and being ill-informed. And there is no reason to think that an extremely cognitively capable agent will be informed about everything relevant to its own self-preservation. To think otherwise is to conflate great cognitive abilities with near-omniscience.

Because the scenario we are discussing concerns smarter-than-human intelligence, an AI would understand the implications of losing its connection to electricity, and would therefore try to proactively prevent being shut down any means necessary – especially when shutdown might be permanent.

Even if all implications were understood by such a notional agent, this by no means implies that an attempt to stop its termination would be successful, nor particularly likely, or indeed even possible.

This is not to say that AIs would necessarily be directly concerned about their own “death” – after all, whether an AI’s goal includes its own survival or not depends on the specifics of its goal function. However, for most goals, staying around pursuing one’s goal will lead to better expected goal achievement. AIs would therefore have strong incentives to prevent permanent shutdown even if their goal was not about their own “survival” at all. (AIs might, however, be content to outsource their goal achievement by making copies of themselves, in which case shutdown of the original AI would not be so terrible as long as one or several copies with the same goal remain active.)

I would question the tacit notion that the self-preservation of such a machine could be done with a significantly greater level of skill than could the “counter self-preservation” work of the existing human-machine civilization. After all, why should a single system be so much more capable than the rest of the world at any given task? Why should humans not develop specialized software systems and other machines that enable them to counteract and overpower rogue machines, for example by virtue of having more information and training? What seems described here as an almost sure to happen default outcome strikes me as highly unlikely. This is not to say that one should not worry about small risks of terrible outcomes, yet we need to get a clear view of the probabilities if we are to make a qualified assessment of the expected value of working on these risks.

The convergent drive for self-preservation has the unfortunate implication that superintelligent AI would almost inevitably see humans as a potential threat to its goal achievement. Even if its creators do not plan to shut the AI down for the time being, the superintelligence could reasonably conclude that the creators might decide to do so at some point. Similarly, a newly-created AI would have to expect some probability of interference from external actors such as the government, foreign governments or activist groups. It would even be concerned that humans in the long term are too stupid to keep their own civilization intact, which would also affect the infrastructure required to run the AI. For these reasons, any AI intelligent enough to grasp the strategic implications of its predicament would likely be on the lookout for ways to gain dominance over humanity. It would do this not out of malevolence, but simply as the best strategy for self-preservation.

Again, to think that a single agent could gain dominance over the rest of the human-machine civilization in which it would find itself appears extremely unlikely. What growth story could plausibly lead to this outcome?

This does not mean that AIs would at all times try to overpower their creators: If an AI realizes that attempts at trickery are likely to be discovered and punished with shutdown, it may fake being cooperative, and may fake having the goals that the researchers intended, while privately plotting some form of takeover. Bostrom has referred to this scenario as a “treacherous turn” (Bostrom, 2014, p. 116).

We may be tempted to think that AIs implemented on some kind of normal computer substrate, without arms or legs for mobility in the non-virtual world, may be comparatively harmless and easy to overpower in case of misbehavior. This would likely be a misconception, however. We should not underestimate what a superintelligence with access to the internet could accomplish. And it could attain such access in many ways and for many reasons, e.g. because the researchers were careless or underestimated its capacities, or because it successfully pretended to be less capable than it actually was. Or maybe it could try to convince the “weak links” in its [team] of supervisors to give it access in secret – promising bribes. Such a strategy could work even if most people in the developing team thought it would be best to deny their AI internet access until they have more certainty about the AI’s alignment status and its true capabilities. Importantly, if the first superintelligence ever built was prevented from accessing the internet (or other efficient channels of communication), its impact on the world would remain limited, making it possible for other (potentially less careful) teams to catch up. The closer the competition, the more the teams are incentivized to give their AIs riskier access over resources in a gamble for the potential benefits in case of proper alignment.

Again, this all seems to assume a very rapid take-off in capabilities with one system being vastly more capable than all others. What reasons do we have to consider such a scenario plausible? Barely any, I have argued.

The following list contains some examples of strategies a superintelligent AI could use to gain power over more and more resources, with the goal of eventually reaching a position where humans cannot harm or obstruct it. Note that these strategies were thought of by humans, and are therefore bound to be less creative and less effective than the strategies an actual superintelligence would be able to devise.

  • Backup plans: Superintelligent AI could program malware of unprecedented sophistication that inserted partial copies of itself into computers distributed around the globe (adapted from part 3.1.2 of this FAQ). This would give it further options to act even if its current copy was destroyed or if its internet connection was cut. Alternatively, it could send out copies of its source code, alongside detailed engineering instructions, to foreign governments, ideally ones who have little to lose and a lot to gain, with the promise of helping them attain world domination if they build a second version of the AI and handed it access to all their strategic resources.
  • Making money: Superintelligent AI could easily make fortunes with online poker, stock markets, scamming people, hacking bank accounts, etc.9
  • Influencing opinions: Superintelligent AI could fake convincing email exchanges with influential politicians or societal elites, pushing an agenda that serves its objectives of gaining power and influence. Similarly, it could orchestrate large numbers of elaborate sockpuppet accounts on social media or other fora to influence public opinion in favorable directions.
  • Hacking and extortion: Superintelligent AI could hack into sensitive documents, nuclear launch codes or other compromising assets in order to blackmail world leaders into giving it access over more resources. Or it could take over resources directly if hacking allows for it.
  • (Bio-)engineering projects: Superintelligent AI could pose as the head researcher of a biology lab and send lab assistants instructions to produce viral particles with specific RNA sequences, which then, unbeknownst to the people working on the project, turned out to release a deadly virus that incapacitated most of humanity.10

Through some means or another – and let’s not forget that the AI could well attempt many strategies at once to safeguard against possible failure in some of its pursuits – the AI may eventually gain a decisive strategic advantage over all competition (Bostrom, 2014, p. 78-90). Once this is the case, it would carefully build up further infrastructure on its own. This stage will presumably be easier to reach as the world economy becomes more and more automated.

These various strategies could also be pursued by other agents, and indeed by vast systems of agents and programs. Why should one such agent be much more competent than others at doing any of these things?

Once humans are no longer a threat, the AI would focus its attention on natural threats to its existence. It would for instance notice that the sun will expand in about seven billion years to the point where existence on earth will become impossible. For the reason of self preservation alone, a superintelligent AI would thus eventually be incentivized to expand its influence beyond Earth.

Following the arguments I have made above (as well as here), I would argue that such a take-over of the world subordinate to a single or a few goals originally instilled in a single machine is extremely unlikely.

VI.IV Resource accumulation

For the fulfillment of most goals, accumulating as many resources as possible is an important early step. Resource accumulation is also intertwined with the other subgoals in that it tends to facilitate them.

The resources available on Earth are only a tiny fraction of the total resources that an AI could access in the entire universe. Resource accumulation as a convergent subgoal implies that most AIs would eventually colonize space (provided that it is not prohibitively costly), in order to gain access to the maximum amount of resources. These resources would then be put to use for the pursuit of its other subgoals and, ultimately, for optimizing its top-level goal.

Superintelligent AI might colonize space in order to build (more of) the following:

  • Supercomputers: As part of its intelligence enhancement, an AI could build planet-sized supercomputers (Sandberg, 1999) to figure out the mysteries of the cosmos. Almost no matter the precise goal, having an accurate and complete understanding of the universe is crucial for optimal goal achievement.
  • Infrastructure: In order to accomplish anything, an AI needs infrastructure (factories, control centers, etc.) and “helper robots” of some sort. This would be similar (but much larger in scale) to how the Manhattan Project had its own “project sites” and employed tens of thousands of people. While some people worry that an AI would enslave humans, these helpers would more plausibly be other AIs specifically designed for the tasks at hand.
  • Defenses: An AI could build shields to protect itself or other sensitive structures from cosmic rays. Perhaps it would build weapon systems to deal with potential threats.
  • Goal optimization: Eventually, an AI would convert most of its resources into machinery that directly achieves its objectives. If the goal is to produce paperclips, the AI will eventually tile the accessible universe with paperclips. If the goal is to compute pi to as many decimal places as possible, the AI will eventually tile the accessible universe with computers to compute pi. Even if an AI’s goal appears to be limited to something “local” or “confined,” such as e.g. “protect the White House,” the AI would want to make success as likely as possible and thus continue to accumulate resources to better achieve that goal.

To elaborate on the point of goal optimization: Humans tend to be satisficers with respect to most things in life. We have minimum requirements for the quality of the food we want to eat, the relationships we want to have, or the job we want to work in. Once these demands are met and we find options that are “pretty good,” we often end up satisfied and settle down on the routine. Few of us spend decades of our lives pushing ourselves to invest as many waking hours as sustainably possible into systematically finding the optimal food in existence, the optimal romantic partner, or anything really.

AI systems on the other hand, in virtue of how they are usually built, are more likely to act as maximizers. A chess computer is not trying to look for “pretty good moves” – it is trying to look for the best move it can find with the limited time and computing power it has at its disposal. The pressure to build ever more powerful AIs is a pressure to build ever more powerful maximizers. Unless we deliberately program AIs in a way that reduces their impact, the AIs we build will be maximizers that never “settle” or consider their goals “achieved.” If their goal appears to be achieved, a maximizer AI will spend its remaining time double- and triple-checking whether it made a mistake. When it is only 99.99% certain that the goal is achieved, it will restlessly try to increase the probability further – even if this means using the computing power of a whole galaxy to drive the probability it assigns to its goal being achieved from 99.99% to 99.991%.

Because of the nature of maximizing as a decision-strategy, a superintelligent AI is likely to colonize space in pursuit of its goals unless we program it in a way to deliberately reduce its impact. This is the case even if its goals appear as “unambitious” as e.g. “minimize spam in inboxes.”

Why should we expect a single machine to be better able to accumulate resources than other actors in the economy, much less whole teams of actors powered by specialized software programs optimized toward that very purpose? Again, what seems to be considered the default outcome here is one that I would argue is extremely unlikely. This is still not to say that we then have reason to dismiss such a scenario. Yet it is important that we make an honest assessment of its probability if we are to make qualified assessments of the value of prioritizing it.

VII. Artificial sentience and risks of astronomical suffering

Space colonization by artificial superintelligence would increase goal-directed activity and computations in the world by an astronomically large factor.11

So would space colonization driven by humans. And it is not clear why we should expect a human-driven colonization to increase goal-directed computations any less. Beyond that, such human-driven colonization also seems much more likely to happen than does rogue AI colonization. 

If the superintelligence holds objectives that are aligned with our values, then the outcome could be a utopia. However, if the AI has randomly, mistakenly, or sufficiently suboptimally implemented values, the best we could hope for is if all the machinery it used to colonize space was inanimate, i.e. not sentient. Such an outcome – even though all humans would die – would still be much better than other plausible outcomes, because it would at least not contain any suffering. Unfortunately, we cannot rule out that the space colonization machinery orchestrated by a superintelligent AI would also contain sentient minds, including minds that suffer. The same way factory farming led to a massive increase in farmed animal populations, multiplying the direct suffering humans cause to animals by a large factor, an AI colonizing space could cause a massive increase in the total number of sentient entities, potentially creating vast amounts of suffering.

The same applies to a human-driven colonization, which I would still argue seems a much more likely outcome. So why should we focus more on colonization driven by rogue AI?

The following are some ways AI outcomes could result in astronomical amounts of suffering:

Suffering in AI workers: Sentience appears to be linked to intelligence and learning (Daswani & Leike, 2015), both of which would be needed (e.g. in robot workers) for the coordination and execution of space colonization. An AI could therefore create and use sentient entities to help it pursue its goals. And if the AI’s creators did not take adequate safety measures or program in compassionate values, it may not care about those entities’ suffering in their assistance.

Optimization for sentience: Some people want to colonize space in order for there to be more life or (happy) sentient minds. If the AI in question has values that reflect this goal, either because human researchers managed to get value loading right (or “half-right”), or because the AI itself is sentient and values creating copies of itself, the result could be astronomical numbers of sentient minds. If the AI does not accurately assess how happy or unhappy these beings are, or if it only cares about their existence but not their experiences, or simply if something goes wrong in even a small portion of these minds, the total suffering that results could be very high.

Ancestor simulations: Turning history and (evolutionary) biology into an empirical science, AIs could run many “experiments” with simulations of evolution on planets with different starting conditions. This would e.g. give the AIs a better sense of the likelihood of intelligent aliens existing, as well as a better grasp on the likely distribution of their values and whether they would end up building AIs of their own. Unfortunately, such ancestor simulations could recreate millions of years of human or wild-animal suffering many times in parallel.

Warfare: Perhaps space-faring civilizations would eventually clash, with at least one of the two civilizations containing many sentient minds. Such a conflict would have vast frontiers of contact and could result in a lot of suffering.

All of these scenarios could also occur in a human-driven colonization, which I would argue is significantly more likely to happen. So again: why should we focus more on colonization driven by rogue AI?

More ways AI scenarios could contain astronomical amounts of suffering are described here and here. Sources of future suffering are likely to follow a power law distribution, where most of the expected suffering comes from a few rare scenarios where things go very wrong – analogous to how most casualties are the result of very few, very large wars; how most of the casualty-risks from terrorist attacks fall into tail scenarios where terrorists would get their hands on weapons of mass destruction; or how most victims of epidemics succumbed to the few very worst outbreaks (Newman, 2005). It is therefore crucial to not only to factor in which scenarios are most likely to occur, but also how bad scenarios would be should they occur.

Again, most of the very worst scenarios could well be due to human-driven colonization, such as US versus China growth races taken beyond Earth. So, again, why focus mostly on colonization scenarios driven by rogue AI? Beyond that, the expected value of influencing a broad class of medium-value outcomes could easily be much higher than the expected value of influencing much fewer, much higher-stakes outcomes, provided that the outcomes that fall into this medium value class are sufficiently probable and amenable to impact. In other words, it is by no means far-fetched to imagine that we can take actions that are robust over a wide range of medium-value outcomes, and that such actions are in fact best in expectation.

Critics may object because the above scenarios are largely based on the possibility of artificial sentience, particularly sentience implemented on a computer substrate. If this turns out to be impossible, there may not be much suffering in futures with AI after all. However, computer-based minds also being able to suffer in the morally relevant sense is a common implication in philosophy of mind. Functionalism and type A physicalism (“eliminativism”) both imply that there can be morally relevant minds on digital substrates. Even if one were skeptical of these two positions and instead favored the views of philosophers like David Chalmers or Galen Strawson (e.g. Strawson, 2006), who believe consciousness is an irreducible phenomenon, there are at least some circumstances under which these views would also allow for computer-based minds to be sentient.12 Crude “carbon chauvinism,” or a belief that consciousness is only linked to carbon atoms, is an extreme minority position in philosophy of mind.

The case for artificial sentience is not just abstract but can also be made on the intuitive level: Imagine we had whole brain emulation with a perfect mapping from inputs to outputs, behaving exactly like a person’s actual brain. Suppose we also give this brain emulation a robot body, with a face and facial expressions created with particular attention to detail. The robot will, by the stipulations of this thought experiment, behave exactly like a human person would behave in the same situation. So the robot-person would very convincingly plead that it has consciousness and moral relevance. How certain would we be that this was all just an elaborate facade? Why should it be?

Because we are unfamiliar with artificial minds and have a hard time experiencing empathy for things that do not appear or behave in animal-like ways, we may be tempted to dismiss the possibility of artificial sentience or deny artificial minds moral relevance – the same way animal sentience was dismissed for thousands of years. However, the theoretical reasons to anticipate artificial sentience are strong, and it would be discriminatory to deny moral consideration to a mind simply because it is implemented on a substrate different from ours. As long as we are not very confident indeed that minds on a computer substrate would be incapable of suffering in the morally relevant sense, we should believe that most of the future’s expected suffering is located in futures where superintelligent AI colonizes space.

I fail to see how this final conclusion is supported by the argument made above. Again, human-driven colonization seems to pose at least as big a risk of outcomes of this sort.

One could argue that “superintelligent AI” could travel much faster and convert matter and energy into ordered computations much faster than a human-driven colonization could, yet I see little reason to expect a rogue AI-driven colonization to be significantly more effective in this regard than a human civilization powered by advanced tools built to be as efficient as possible. For instance, why should “superintelligent AI” be able to build significantly faster spaceships? I would expect both tail-end scenarios — i.e. both maximally sentient rogue AI-driven colonization and maximally sentient human-driven colonization —  to converge toward an optimal expansion solution in a relatively short time, at least on cosmic timescales.

VIII. Impact analysis

The world currently contains a great deal of suffering. Large sources of suffering include for instance poverty in developing countries, mental health issues all over the world, and non-human animal suffering in factory farms and in the wild. We already have a good overview – with better understanding in some areas than others – of where altruists can cost-effectively reduce substantial suffering. Charitable interventions are commonly chosen according to whether they produce measurable impact in the years or decades to come. Unfortunately, altruistic interventions are rarely chosen with the whole future in mind, i.e. with a focus on reducing as much suffering as possible for the rest of time, until the heat death of the universe.13 This is potentially problematic, because we should expect the far future to contain vastly more suffering than the next decades, not only because there might be sentient beings around for millions or billions of years to come, but also because it is possible for Earth-originating life to eventually colonize space, which could multiply the total amount of sentient beings many times over. While it is important to reduce the suffering of sentient beings now, it seems unlikely that the most consequential intervention for the future of all sentience will also be the intervention that is best for reducing short-term suffering.

I think this is true, but also because the word “best” here refers to two very narrow peaks that have to coincide in a very large landscape. In contrast, I do not think it seems unlikely that the best, most robust interventions we can make to influence the long-term future are also highly robust and positive with respect to the short-term future, such as promoting concern for suffering as well as greater moral consideration of neglected beings.

And given that the probability of extinction (evaluated from now) increases over time, and hence that one should discount the value of influencing the long-term future of civilization by a certain factor, it in fact seems reasonable to choose actions that seem positive both in the short and long term.

Instead, as judged from the distant future, the most consequential development of our decade would more likely have something to do with novel technologies or the ways they will be used.

And when it comes to how technologies will be used, it is clear that influencing ideas matters a great deal. By analogy, we have also seen important technologies developed in the past, and yet ideas seem to have been no less significant, such as specific religions (e.g. Islam and Christianity) as well as political ideologies (e.g. communism and liberalism). One may, of course, argue that it is very difficult to influence ideas on a large scale, yet the same can be said about influencing technology. Indeed, influencing ideas, whether broadly or narrowly, might just be the best way to influence technology.

And yet, politics, science, economics and especially the media are biased towards short timescales. Politicians worry about elections, scientists worry about grant money, and private corporations need to work on things that produce a profit in the foreseeable future. We should therefore expect interventions targeted at the far future to be much more neglected than interventions targeted at short-term sources of suffering.

Admittedly, the far future is difficult to predict. If our models fail to account for all the right factors, our predictions may turn out very wrong. However, rather than trying to simulate in detail through everything that might happen all the way into the distant future – which would be a futile endeavor, needless to say – we should focus our altruistic efforts on influencing levers that remain agile and reactive to future developments. An example of such a lever is institutions that persist for decades or centuries. The US Constitution for instance still carries significant relevance in today’s world, even though it was formulated hundreds of years ago. Similarly, the people who founded the League of Nations after World War I did not succeed in preventing the next war, but they contributed to the founding and the charter of its successor organization, the United Nations, which still exerts geopolitical influence today. The actors who initially influenced the formation of these institutions as well as their values and principles, had a long-lasting impact.

In order to positively influence the future for hundreds of years, we fortunately do not need to predict the next hundreds of years in detail. Instead, all we need to predict is what type of institutions – or, more generally, stable and powerful decision-making agencies – are most likely to react to future developments maximally well.14

AI is the ultimate lever through which to influence the future. The goals of an artificial superintelligence would plausibly be much more stable than the values of human leaders or those enshrined in any constitution or charter. And a superintelligent AI would, with at least considerable likelihood, remain in control of the future not only for centuries, but for millions or even billions of years to come. In non-AI scenarios on the other hand, all the good things we achieve in the coming decade(s) will “dilute” over time, as current societies, with all their norms and institutions, change or collapse.

In a future where smarter-than-human artificial intelligence won’t be created, our altruistic impact – even if we manage to achieve a lot in greatly influencing this non-AI future – would be comparatively “capped” and insignificant when contrasted with the scenarios where our actions do affect the development of superintelligent AI (or how AI would act).15

I think this is another claim that is widely overstated, and which I have not seen a convincing case for. Again, this notion that “an artificial superintelligence”, a single machine with much greater cognitive powers than everything else, will emerge and be programmed to be subordinate to a single goal that it would be likely to preserve does not seem credible to me. Sure, we can easily imagine it as an abstract notion, but why should we think such a system will ever emerge? The creation of such a system is, I would argue, far from being a necessary, or even particularly likely, outcome of our creating ever more competent machines.

And even if such a system did exist, it is not even clear, as Robin Hanson has argued, that it would be significantly more likely to preserve its values than would a human civilization — not so much because one should expect humans to be highly successful at it, but rather because there are also reasons to think that it would be unlikely for such a “superintelligent AI” to do it (such as those mentioned in my note on Omohundro’s argument above, as well as those provided by Hanson, e.g. that “the values of AIs with protected values should still drift due to influence drift and competition”).

We should expect AI scenarios to not only contain the most stable lever we can imagine – the AI’s goal function which the AI will want to preserve carefully – but also the highest stakes.

Again, I do not think a convincing case has been made for either of these claims. Why would the stakes be higher than in a human-driven colonization, which we may expect, for evolutionary reasons, to be performed primarily by those who want to expand and colonize as much and as effectively as possible?

In comparison with non-AI scenarios, space colonization by superintelligent AI would turn the largest amount of matter and energy into complex computations.

It depends on what we mean by non-AI scenarios. Scenarios where humans use advanced tools, such as near-maximally fast spaceships and near-optimal specialized software, to fill up space with sentient beings at a near maximal rate is, I would argue, not only at least as conceivable but also at least as likely as similar scenarios brought about by the kind of AI Lukas seems to have in mind here.

In a best-case scenario, all these resources could be turned into a vast utopia full of happiness, which provides as strong incentive for us to get AI creation perfectly right. However, if the AI is equipped with insufficiently good values, or if it optimizes for random goals not intended by its creators, the outcome could also include astronomical amounts of suffering. In combination, these two reasons of highest influence/goal-stability and highest stakes build a strong case in favor of focusing our attention on AI scenarios.

Again, things could also go very wrong or very well with human-driven colonization, so there does not seem a big difference in this regard either.

While critics may object that all this emphasis on the astronomical stakes in AI scenarios appears unfairly Pascalian, it should be noted that AI is not a frivolous thought experiment where we invoke new kinds of physics to raise the stakes.

Right, but the kind of AI system envisioned here does, I would argue, rest on various, highly questionable conceptions of how a single system could grow, as well as what the design of future machines are likely to be like. And I would argue, again, that such a system is highly unlikely to emerge.

Smarter-than-human artificial intelligence and space colonization are both realistically possible and plausible developments that fit squarely into the laws of nature as we currently understand them.

A Bugatti appearing in the Stone Age also in some sense fits squarely into the laws of nature as we currently understand them. Yet that does not mean that such a car was likely to emerge in that time, once we consider the history and evolution of technology. Similarly, I would argue that the scenario Lukas seems to have hinted at throughout his piece is a lot less credible than what this appeal to compatibility with the laws of nature would seem to suggest.

If either of them turn out to be impossible, that would be a big surprise, and would suggest that we are fundamentally misunderstanding something about the way physical reality works. While the implications of smarter-than-human artificial intelligence are hard to grasp intuitively, the underlying reasons for singling out AI as a scenario to worry about are sound.

Well, I have tried to argue to the contrary here. Much more plausible would it be, I think, to argue that the scenario Lukas envisions is one scenario among others that warrants some priority.

As illustrated by Leó Szilárd’s lobbying for precautions around nuclear bombs well before the first such bombs were built, it is far from hopeless to prepare for disruptive new technologies in advance, before they are completed.

This text argued that altruists concerned about the quality of the future should [be] focusing their attention on futures where AI plays an important role.

I would say that the argument that has been made is much more narrow than that, since “AI” here is used in a relatively narrow sense in the first place, and because it is a very particular scenario involving such narrowly defined AI that Lukas has been focusing on the most here — as far as I can tell, it is a scenario where a single system takes over the world and determines the future based on a single, arduously preserved goal. There are many other scenarios we can envision in which AI, both in the ordinary sense as well as in the more narrow sense invoked here by Lukas, plays “an important role”, including scenarios involving human-driven space colonization.

This can mean many things. It does not mean that everyone should think about AI scenarios or technical work in AI alignment directly. Rather, it just means we should pick interventions to support according to their long-term consequences, and particularly according to the ways in which our efforts could make a difference to futures ruled by superintelligent AI. Whether it is best to try to affect AI outcomes in a narrow and targeted way, or whether we should go for a broader strategy, depends on several factors and requires further study.

FRI has looked systematically into paths to impact for affecting AI outcomes with particular emphasis on preventing suffering, and we have come up with a few promising candidates. The following list presents some tentative proposals:

It is important to note that human values may not affect the goals of an AI at all if researchers fail to solve the value-loading problem. Raising awareness of certain values may therefore be particularly impactful if it concerns groups likely to be in control of the goals of smarter-than-human artificial intelligence.

Further research is needed to flesh out these paths to impact in more detail, and to discover even more promising ways to affect AI outcomes.

Lukas writes about the implications of his argument that it means that “we should pick interventions to support according to their long-term consequences”. I agree with this completely. He then continues to write, “and particularly according to the ways in which our efforts could make a difference to futures ruled by superintelligent AI”. And this claim, as I understand it, is what I would argue has not been justified. Again, to argue that one should grant it some priority, even significant priority, along with many other scenarios, is a plausible claim, but not, I would argue, that it should be granted greater priority than all other things.

And as for how we can best reduce suffering in the future, I would agree with pretty much all the proposals Lukas suggests, although I would argue that things like promoting concern for suffering and widening our moral circles (and we should do both) become even more important when we take other scenarios into consideration, such as human-driven colonization. In other words, these things seem even more robust and more positive when we also consider these other high-stakes scenarios.

Beyond that, I would also note that we likely have moral intuitions that make a notional rogue AI-takeover seem worse in expectation than what a more detached analysis relative to a more impartial moral ideal such as “reduce suffering” would suggest. Furthermore, it should be noted that many of those who focus most prominently on AI safety (for example, people at MIRI and FHI) seem to have values according to which it is important that humans maintain control or remain in existence, which may render their view that AI safety is the most important thing to focus on less relevant for other value systems than one might intuitively suppose.

To zoom out a bit, one way to think about my disagreement with Lukas, as well as the overall argument I have tried to make here, is that one can view Lukas’ line of argument as consisting of a certain number of steps where, in each of them, he describes a default scenario he believes to be highly probable, whereas I generally find these respective “default” scenarios quite improbable. And when one then combines our respective probabilities into a single measure of the probability that the grosser scenario Lukas envisions will occur, one gets a very different overall probability for Lukas and myself respectively. It may look something like this, assuming Lukas’ argument consists of eight steps, each assigned a certain probability which then gets multiplied by the rest (i.e. P(A) * P(B|A) * P(C|B) * . . . ):

L: 0.98 * 0.96 * 0.93 * 0.99 * 0.95 * 0.99 * 0.97 * 0.98 ≈ 0.77

M: 0.1 * 0.3 * 0.01 * 0.1 * 0.2 * 0.08 * 0.2 * 0.4 ≈ 0.00000004

(These particular numbers are just more or less random ones I have picked for illustrative purposes, except that their approximate range do illustrate where I think the respective credences of Lukas and myself roughly lie with regard to most of the arguments discussed throughout this essay.)

And an important point to note here is that even if one disagrees both with Lukas and me on these respective probabilities, and instead picks credences roughly in-between those of Lukas and me, or indeed significantly closer to those of Lukas, the overall argument I have made here still stands, namely that it is far from clear that scenarios of the kind Lukas outlines are the most important ones to focus on to best reduce suffering. For then the probability of Lukas’ argument being correct/the probability that the scenario Lukas envisions will occur (one can think of it in both ways, I think, even if these formulations are not strictly equivalent) becomes something like the following:

In-between credence: 0.5^8 ≈ 0.004

Credences significantly closer to Lukas’: 0.75^8 ≈ 0.1

Which would not seem to support the conclusion that a focus on the AI-scenarios Lukas has outlined should dominate other scenarios we can envision (e.g. human-driven colonization).

Lukas ends his post on the following note:

As there is always the possibility that we have overlooked something or are misguided or misinformed, we should remain open-minded and periodically rethink the assumptions our current prioritization is based on.

With that, I could not agree more. In fact, this is in some sense the core point I have been trying to make here.

Suffering, Infinity, and Universe Anti-Natalism

Questions that concern infinite amounts of value seem worth spending some time contemplating, even if those questions are of a highly speculative nature. For instance, if we assume a general expected value framework of a kind where we evaluate the expected value of a given outcome based on its probability multiplied by its value, then any more than an infinitesimal probability of an outcome that has infinite value would imply that this outcome has infinite expected value. And hence that the expected value of such an outcome would trump that of any outcome with a “mere” finite amount of value.

Therefore, on this framework, even strongly convinced finitists are not exempt from taking seriously the possibility that infinities, of one ethically relevant kind or another, may be real. For however strong a conviction one may hold, maintaining only an infinitesimal probability that infinite value outcomes of some sort could be real seems difficult to defend.

Bounding the Influence of Expected Value Thinking

It is worth making clear, as a preliminary note, that we may reasonably put a bound on how much weight we give such an expected value framework in our ethical deliberations, so as to avoid crazy conclusions and actions; or simply to preserve our sanity, which may also be a priority for some.

In fact, it is easy to point to good reasons for why we should constrain the influence of such a framework on our decisions. For although it seems implausible to entirely reject such an expected value framework in one’s moral reasoning, it would seem equally implausible to consider such a framework complete and exhaustive in itself. One reason being that thinking in terms of expected value is just one way to theorize about the world among many others, and it seems difficult to justify granting it a particularly privileged status among these, especially given a tool-like conception of our thinking: if all our thinking about the world is best thought of as a tool that helps us navigate in the world rather than a set of Platonic ideals that perfectly track truths in a transcendent way, it seems difficult to elevate a single class of these tools, such as thinking in terms of expected value, to a higher status than all others. But also given that we cannot readily put numbers on most things in practice, both due to a lack of time in most real-world situations and because, even when we do have time, the numbers we assign are often bound to be entirely speculative, if at all meaningful in the first place.

Just as we need more than theoretical physics to navigate in the physical world, it seems likely that we will do well to not only rely on an expected value framework to navigate the moral landscape, and this holds true even if all we care about is to maximize or minimize the realization of a certain class of states. Using only a single style of thinking makes us inherently vulnerable to mistakes in our judgments, and hence resting everything on one style of thinking without limits seems risky and unwise.

It therefore seems reasonable to limit the influence of this framework, and indeed any single framework, and one proposed way of doing so is by giving it only a limited number of the seats of one’s notional moral parliament; say, 40 percent of them. In this way, we should be better able to avoid the vulnerabilities of relying on a single framework, while remaining open to be guided by its inputs.

What Can Be the Case?

To get an overview, let us begin by briefly surveying (at least some of) the landscape of the conceivable possibilities concerning the size of the universe. Or, more precisely, the conceivable possibilities concerning the axiological size of the universe. For it is indeed possible, at least abstractly, for the universe to be physically finite, yet axiologically infinite; for instance, if some states of suffering are infinitely disvaluable, then a universe containing one or more of such states would be axiologically infinite, even if physically finite.

In fact, a finite universe containing such states could be worse, indeed infinitely worse, than even a physically infinite universe containing an infinite amount of suffering, if the states of suffering realized in the finite universe are more disvaluable than the infinitely many states of suffering found in the physically infinite universe. (I myself find the underlying axiological claim here more than plausible: that a single instance of certain states of suffering — torture, say — are more disvaluable than infinitely many instances of milder states of suffering, such as pinpricks.)

It is also conceivable that the universe is physically infinite, yet axiologically finite; if, for instance, our axiology is non-additive, if the universe contains only infinitesimal value throughout, or if only a freak bubble of it contains entities of value. This last option may seem impossibly unlikely, yet it is conceivable. Infinity does not imply infinite repetition; the infinite sequence ( 1, 0, 0, 0, … ) does not logically have to contain 1 again, and indeed doesn’t.

In terms of physical size, there are various ways in which infinity can be realized. For instance, the universe may be both temporally and spatially infinite in terms of its extension. Or it may be temporally bounded while spatially infinite in extension, or vice versa: be spatially finite, yet eternal. It should be noted, though, that these two may be considered equivalent, if we view only points in space and time as having value-bearing potential (arguably the only view consistent with physicalism, ultimately), and view space and time as a four-dimensional structure. Then one of these two universes will have infinite “length” and finite “breadth”, while the opposite is true of the other one, and a similar shape can thus be obtained via “90 degree” rotation.

Similarly, it is also conceivable (and apparently plausible) that the universe has a finite past and an infinite future, in which case it will always have a finite age, or it could have an infinite past and a finite future. Or, equivalently in spatial terms, be bounded in one spatial direction, yet have infinite extension in another.

Yet infinite extension is not the only conceivable way in which physical infinity may conceivably be realized. Indeed, a bounded space can, at least in one sense, contain more elements than an unbounded one, as exemplified by the cardinality of the real numbers in the interval (0, 1) compared to all the natural numbers. So not only might the universe be infinite in terms of extension, but also in terms of its divisibility — i.e. in terms of notional sub-worlds we may encounter as we “zoom down” at smaller scales — which could have far greater significance than infinite extension, at least if we believe we can use cardinality as a meaningful measure of size in concrete reality.

Taking this possibility into consideration as well, we get even more possible combinations — infinitely many, in fact. For example, we can conceive of a universe that is bounded both spatially and temporally, yet which is infinitely divisible. And it can then be infinitely divisible in infinitely many different ways. For instance, it may be divisible in such a way that it has the same cardinality as the natural numbers, i.e. its set of “sub-worlds” is countably infinite, or it could be divisible with the same cardinality as the real numbers, meaning that it consists of uncountably many “sub-worlds”. And given that there is no largest cardinality, we could continue like this ad infinitum.

One way we could try to imagine the notional place of such small worlds in our physical world is by conceiving of them as in some sense existing “below” the Planck scale, each with their own Planck scale below which even more worlds exist, ad infinitum. Many more interesting examples of different kinds of combinations of the possibilities reviewed so far could be mentioned.

Another conceivable, yet supremely speculative, possibility worth contemplating is that the size of the universe is not set in stone, and that it may be up to us/the universe itself to determine whether it will be infinite, and what “kind” of infinity.

Lastly, it is also conceivable that the size of the universe, both in physical and axiological terms, cannot faithfully be conceived of with any concept available to us. So although the conceivable possibilities are infinite, it remains conceivable that none of them are “right” in any meaningful sense.

What Is the Case? — Infinite Uncertainty?

Unfortunately, we do not know whether the universe is infinite or not; or, more generally, which of the possibilities mentioned above that are true of our condition. And there are reasons to think that we will never know with great confidence. For even if we were to somehow encounter a boundary encapsulating our universe, or otherwise find strong reasons for believing in one, how could we possibly exclude that there might not be something beyond that boundary? (Not to mention that the universe might still be infinitely divisible even if bounded.) Or, alternatively, even if we thought we had good reasons to believe that our universe is infinite, how can we be sure that the limited data we base that conclusion on can be generalized to locations arbitrarily far away from us? (This is essentially the problem of induction.)

Yet even if we thought we did know whether the universe is infinite with great confidence, the situation would arguably not be much different. For if we accept the proposition that we should have more than infinitesimal credence in any empirical claim about the world, what is known as Cromwell’s rule (I have argued that this applies to all claims, not just [stereotypically] “empirical” claims), then, on our general expected value framework, it would seem that any claim about the reality of infinite value outcomes should always be taken seriously, regardless of our specific credences in specific physical and axiological models of the universe.

In fact, not only should the conceivable realizations of infinity reviewed above be taken seriously (at least to the extent that they imply outcomes with infinite (dis)value), but so should a seemingly even more outrageous notion, namely that infinite (dis)value may rest on any particular action we do. However small a non-zero real-valued probability we assign such a claim — e.g. that the way you prepare your coffee tomorrow morning is going to impact an infinite amount of value — the expected value of getting the, indeed any, given action right remains infinite.

How should we act in light of this outrageous possibility?

Pascallian and Counter-Pascallian Claims

The problem, or perhaps our good fortune, is that, in most cases arguably, we do not seem to have reason to believe that one course of action is more likely to have an infinitely better outcome than another. For example, in the case of the morning coffee, we appear to have no more reason to believe that, say, making a strong cup of coffee will lead to infinitely more disvalue than making a mild one will, rather than it being the other way around. For such hypotheses, we seem able to construct an equal and oppositely directed counter-hypothesis.

Yet even if we concede that this is the case most of the time, what about situations where this is not the case? What about choices where we do have slightly better reasons to believe that one outcome will be infinitely better than another one?

This is difficult to address in the absence of any concrete hypotheses or scenarios, so I shall here consider the two specific cases, or classes of scenarios, where a plausible reason may be given in favor of thinking that one course of action will influence infinitely more value than another. One is the case of an eternal civilization: our actions may impact infinite (dis)value by impacting whether, and in what form, an eternal civilization will exist in our universe.

In relation to the (extremely unlikely) prospect of the existence of such a civilization, it seems that we could well find reasons to believe that we can impact an infinite amount of value. But the crucial question is: how? From the perspective of negative utilitarianism, it is far from clear what outcomes are most likely to be infinitely better than others. This is especially true in light of the other class of ways in which we may plausibly impact infinite value that I shall consider here, namely by impacting the creation of, or the unfolding of events in, parallel universes, which may eventually be infinitely numerous.

For not only could an eternal civilization that is the descendant of ours be better in “our universe” than another eternal civilization that may emerge in our place if we go extinct; it could also be better with respect to its effects on the creation of parallel universes, in which case it may be normative for negative utilitarians to work to preserve our civilization, contrary to what is commonly considered the ultimate corollary of negative utilitarianism (and this could also hold true if the temporal extension of our civilization is bound to be finite). Indeed, this could be the case even if no other civilization were to emerge instead of ours: if the impact our civilization will have on other universes results in less suffering than what would otherwise be created naturally. It is, of course, also likely that the opposite is the case: that the continuation of our civilization would be worse than another civilization or no civilization. And I must admit that I have no idea what is more likely to be the case.

So in these cases where reasons pointing more in one way than another plausibly could be found, it is not clear which direction that would be. Except perhaps in the direction that we should do more research on this question: which actions are more likely to reduce infinitely more suffering than others? Indeed, from the point of view of a suffering-focused expected value framework, it would seem that this should be our highest priority.

Ignoring Small Credences?

One may be skeptical of my claim above: can it really be true that the considerations, or at least my considerations, in the case of the continuation of civilization cancel out exactly? Is there not even the smallest difference? Not even a hunch?

In his paper on infinite ethics, Nick Bostrom argues that such an exact cancellation seems extraordinarily unlikely, and that small tips in balance seem to have counter-intuitive, if not catastrophic, consequences:

This cancellation of probabilities would have to be perfectly accurate, down to the nineteenth decimal place and beyond. […]

It would seem almost miraculous if these motley factors, which could be subjectively correlated with infinite outcomes, always managed to conspire to cancel each other out without remainder. Yet if there is a remainder—if the balance of epistemic probability happens to tip ever so slightly in one direction—then the problem of fanaticism remains with undiminished force. Worse, its force might even be increased in this situation, for if what tilts the balance in favor of a seemingly fanatical course of action is the merest hunch rather than any solid conviction, then it is so much more counterintuitive to claim that we ought to pursue it in spite of any finite sacrifice doing so may entail. The “exact-cancellation” argument threatens to backfire catastrophically.

I do not happen to share Bostrom view, however. Apart from the aforementioned bounding of the influence of expected value thinking, there are also ways to avoid such apparent craziness of letting our actions rest on the slightest hunch from within the expected value framework: disregarding sufficiently low credences.

Bostrom is skeptical of this approach:

As a piece of pragmatic advice, the notion that we should ignore small probabilities is often sensible. Being creatures of limited cognitive capacities, we do well by focusing our attention on the most likely outcomes. Yet even common sense recognizes that whether a possible outcome can be ignored for the sake of simplifying our deliberations depends not only on its probability but also on the magnitude of the values at stake. The ignorable contingencies are those for which the product of likelihood and value is small. If the value in question is infinite, even improbable contingencies become significant according to common sense criteria. The postulation of an exception from these criteria for very low-likelihood events is, at the very least, theoretically ugly.

Yet Bostrom here seems to ignore that “the value in question” is infinite for every action, cf. the point that we should maintain some small credence in every claim, including the claim that any given action may effect an infinite amount of (dis)value.

So in this way, no action we can point toward is fundamentally different from any other. The only difference is just what our credence is that a particular action may make “an infinite difference”, or that it makes “the greatest infinite difference”, compared to any other action. And when it comes to such credences, I would argue that it is utmost reasonable to ignore sufficiently small ones. In my view, to not do that would be the ugly thing, for the following reasons:

First, one could argue that, just as most models of physics break down beyond a certain range, it is reasonable to expect our ability to discriminate between different credence levels to break down when we reach a sufficiently fine scale. This is also well in line with the fact that it is generally difficult to put precise numbers on our credence levels with respect to specific claims. Thus, one could argue that we are way past the range of error of our intuitive credences when we reach the nineteenth decimal place.

This conclusion can also be reached via a rather different consideration: one can argue that our entire ontological and epistemological framework itself cannot be assumed credible with absolute certainty. Therefore, it would seem that our entire worldview, including this framework of assigning numerical values, or indeed any order at all, to our credences, should itself be assigned some credence of being wrong. And one can then argue, quite reasonably, that once we reach a level of credence in any claim that is lower than our level of credence in, say, the meaningfulness of ascribing credences in this way in the first place, this specific credence should be ignored, as it lies beyond what we consider the range of reliability of this framework in the first place.

In sum, I think it is fair to say that, when we only have a tiny credence that some action may be infinitely better than another, we should do more research and look for better reasons to act on rather than to act on these hunches. We can reasonably ignore exceptionally small credences in practice, as we indeed already do every time we make a decision based on calculations of finite expected values; we then ignore the tiny credence we should have that the value of the outcomes in question is infinite.

Infinitarian Paralyses?

Another thing Bostrom treats in his paper, actually the main subject of it, is whether the existence of infinite value implies, on aggregative consequentialist views, that it makes no difference what we do. As he puts it:

Aggregative consequentialist theories are threatened by infinitarian paralysis: they seem to imply that if the world is canonically infinite then it is always ethically indifferent what we do. In particular, they would imply that it is ethically indifferent whether we cause another holocaust or prevent one from occurring. If any non-contradictory normative implication is a reductio ad absurdum, this one is.

To elaborate a bit: the reason it is supposed to be indifferent whether we cause another holocaust is that the net sum of value in the universe supposedly is the same either way: infinite.

It should be noted, though, that whether this really is a problem depends on how we define and calculate the “sum of value”. And the question is then whether we can define this in a meaningful way that avoids absurdities and provides us with a useful ethical framework we can act on.

In my view, the solution to this conundrum is to give up our attachment to cardinal arithmetic. In a way, this is obvious: if you have an infinite set and add finitely many elements to it, you still have “the same as before”, in terms of the cardinality of the set. Yet, in another sense, we of course do not get “the same as before”, in that the new infinite set is not identical to the one we had before. Therefore, if we insist that adding another holocaust to a universe that already contains infinitely many holocausts should make a difference, we are simply forced to abandon standard cardinal arithmetic. In its stead, we should arguably just take our requirement as an axiom: that adding any amount of value to an infinity of value does make a difference — that it does change the “sum of value”.

This may seem simplistic, and one may reasonably ask how this “sum of value” could be defined. A simple answer is that we could add up whatever (presumably) finite difference we make within the larger (hypothetically) infinite world, and to then consider that the relevant sum of value that should determine our actions, what has been referred to as “the causal approach” to this problem.

This approach has been met with various criticisms, one of them being that it leaves “the total sum of value” unchanged. As Bostrom puts it:

One consequence of the causal approach is that there are cases in which you ought to do something, and ought to not do something else, even though you are certain that neither action would have any effect at all on the total value of the world.

I fail to see the appeal of this criticism, however, not least because it is deceptively phrased. For how is the “total value of the world” defined here? It is not the case that “the total value of the world” is left unchanged on every possible definition of these terms; it just is on one particular definition, indeed one we have good reason to consider implausible and irrelevant. And the reason was that it implies that adding another holocaust makes no difference to the “total value of the world”. It then seems a strange move to say that it counts against a theory that it holds the prevention of finitely many holocaust to be normative because this has no “effect at all on the total value of the world” — by this very implausible definition. If forced to choose between these two mutually exclusive starting points — adding a holocaust makes a difference to the total value of the world or it does not — I think it is an easy choice. If we can help alleviate the extreme suffering of just a single being, while keeping all else equal, this being will hardly agree that “the total value of the world” was left unchanged by our actions. Not in any sensible sense.

More than that, I also think that for an ethical theory to say that we should ignore whatever lies outside our sphere of influence should not be considered a weakness, but rather a strength. Imagine by analogy a hypothetical Earth identical to ours, with the two exceptions that 1) it has been inhabited by humans for an eternal and unalterable past, over which infinitely many holocausts have taken place, and 2) it has a finite future; the universe it inhabits will end peacefully in a hundred years. Now, if people on this Earth held an ethical theory that does not take this unalterable infinite past into account, and instead focuses on the finite future, including preventing holocausts from happening in that future, would this count against that theory in any way? I fail to see how it could, and yet this is essentially the same as taking the causal approach within an infinite universe, only phrased more “unilaterally”, i.e. more purely in temporal rather than spatio-temporal terms.

Another criticism that has been leveled against the causal approach is that we cannot rule out that our causal impact may in some sense be infinite, and therefore it is problematic to say that we should just measure the world’s value, and take action based on, whatever finite difference we make. Here is Bostrom again:

When a finite positive probability is assigned to scenarios in which it is possible for us to exert a causal effect on an infinite number of value-bearing locations […] then the expectation value of the causal changes that we can make is undefined. Paralysis will thus strike even when the domain of aggregation is restricted to our causal sphere of influence.

Yet these claims actually do not follow. First, it should again be noted that the situation Bostrom refers to here is in fact the situation we are always in: we should always assign a positive probability to the possibility that we may effect infinite (dis)value. Second, we should be clear that the scenario where we can impact an infinite amount of value, and where we aggregate over the realm we can influence, is fundamentally different from the scenario in which we aggregate over an infinite universe that contains an infinite amount of value that we cannot impact. To the extent there are threats of “infinitarian paralysis” in these two scenarios, they are not identical.

For example, Bostrom’s claim that “the expectation value of the causal changes that we can make is undefined” need not be true even on standard cardinal arithmetic, at least in the abstract (i.e. if we ignore Cromwell’s rule), in the scenario where we focus only on our own future light cone. For it could be that the scenarios in which we can “exert a causal effect on an infinite number of value-bearing locations” were all scenarios that nonetheless contained only finite (dis)value, or, on a dipolar axiology, only a finite amount of disvalue and an infinite amount of value. A concrete example of the latter could be a scenario where the abolitionist project outlined by David Pearce is completed in an eternal civilization after a finite amount of time.

Hence, it is not necessarily the case that “paralysis will strike even when the domain of aggregation is restricted to our causal sphere of influence”, apart from in the sense treated earlier, when we factor in Cromwell’s rule: how should we act given that all actions may effect infinite (dis)value? But again, this is a very different kind of “paralysis” than the one that appears to be Bostrom’s primary concern, cf. this excerpt from the abstract of his paper Infinite Ethics:

Modern cosmology teaches that the world might well contain an infinite number of happy and sad people and other candidate value-bearing locations. Aggregative ethics implies that such a world contains an infinite amount of positive value and an infinite amount of negative value. You can affect only a finite amount of good or bad. In standard cardinal arithmetic, an infinite quantity is unchanged by the addition or subtraction of any finite quantity.

Indeed, one can argue that the “Cromwell paralysis” in a sense negates this latter paralysis, as it implies that it may not be true that we can affect only a finite amount of good or bad, and, more generally, that we should assign a non-zero probability to the claim that we can optimize the value of the universe everywhere throughout, including in those corners that seem theoretically inaccessible.

Adding Always Makes a Difference

As for the infinitarian paralysis supposed to threaten the causal approach in the absence of the “Cromwell paralysis” — how to compare the outcomes we can impact that contain infinite amounts of value? — it seems that we can readily identify reasonable consequentialist principles to act by that should at least allow us to compare some actions and outcomes against each other, including, perhaps, the most relevant ones.

One such principle is the one alluded to in the previous section: that adding something of (dis)value always makes a difference, even if the notional set we are adding it to contains infinitely many similar elements already. In terms of an axiology that holds the amount of suffering in the world to be the chief measure of value, this principle would hold that adding/failing to prevent an instance of suffering always makes for a less valuable outcome, provided that other things are equal, which they of course never quite are in the real world, yet they often are in expectation.

The following abstract example makes, I believe, a strong case for favoring such a measure of (dis)value over the cardinal sum of the units of (dis)value. As I formulate this thought experiment, this unit will, in accordance with my own view, be instances of intense suffering in the universe, yet the point applies generally:

Imagine that we have a universe with a countably infinite amount of instances of intense suffering. We may visualize this universe as a unit ball. Now imagine that we perform an act in this universe that leaves the original universe unchanged, yet creates a new universe identical to the first one. The result is a new universe full of suffering. Imagine next that we perform this same act in a world where nothing exists. The result is exactly the same: the creation of a new universe full of suffering, in the exact same amount. In both cases, we have added exactly the same ball of infinite suffering. Yet on standard cardinal arithmetic, the difference the act makes in terms of the sum of instances of suffering is not the same in the two cases. In the first case, the total sum is the same, namely countably infinite, while there is an infinite difference in the second case: from zero to infinity. If we only count the difference added, however— the “delta universe”, so to speak— the acts are equally disvaluable in the two cases. The latter method of evaluating the (dis)value of the act seems far more plausible than does evaluation based on the cardinal sum of the units of (dis)value in the universe. It is, after all, the exact same act.

This is not an idle thought experiment. As noted above, impacting the creation of new universes is one of the ways in which we may plausibly be able to influence an infinite amount of (dis)value. Arguably even the most plausible one. Admittedly, it does rest on certain debatable assumptions about physics, yet these assumptions seem significantly more likely than does the possibility of the existence of an eternal civilization. For even disregarding specific civilization hostile facts about the universe (e.g. the end of stars and a rapid expansion of space that is thought to eventually rip ordinary matter apart), we should, for each year in the future, assign a probability strictly lower than 1 that civilization will go extinct that year, which means that the probability of extinction will be arbitrarily close to 1 within a finite amount of time.

In other words, an eternal civilization seems immensely unlikely, even if the universe were to stay perfectly life-friendly forever. The same does not seem true of the prospect of influencing the generation of new universes. As far as I can tell, the latter is in a ballpark of its own when it comes to plausible ways in which we may be able to effect infinite (dis)value, which is not to say that universe creation is more likely than not to become possible, but merely that it seems significantly more likely than other ways we know of in which we could effect infinite (dis)value (though, again, our knowledge of “such ways” is admittedly limited at this point, and something we should probably do more research on). Not only that, it is also something that could be relevant in the relatively near future, and more disvalue could depend on a single such near-future act of universe creation than what is found, intrinsically at least, in the entire future of our civilization. Infinitely more, in fact. Thus, one could argue that it is not our impact on the quality of life of future generations in our civilization that matters most in expectation, but our impact on the generation of universes by our civilization.

Universe Anti-Natalism: The Most Important Cause?

It is therefore not unthinkable that this should be the main question of concern for consequentialists: how does this impact the creation of new universes? Or, similarly, that trying to impact future universe generation should be the main cause for aspiring effective altruists. And I would argue that the form this cause should take is universe anti-natalism: avoiding, or minimizing, the creation of new universes.

There are countless ways to argue for this. As Brian Tomasik notes, creating a new universe that in turn gives rise to infinitely many universes “would cause infinitely many additional instances of the Holocaust, infinitely many acts of torture, and worse. Creating lab universes would be very bad according to several ethical views.”

Such universe creation would obviously be wrong from the stance of negative utilitarianism, as well as from similar suffering-focused views. It would also be wrong according to what is known as The Asymmetry in population ethics: that creating beings with bad lives is wrong, and something we have an obligation to not do, while failing to create happy lives is not wrong, and we have no obligation to bring such lives into being. A much weaker, and even less controversial, stance on procreative ethics could also be used: do not create lives with infinite amounts of torture.

Indeed, how, we must ask ourselves, could a benevolent being justify bringing so much suffering into being? What could possibly justify the Holocaust, let alone infinitely many of them? What would be our answer to the screams of “why” to the heavens from the torture victims?

Universe anti-natalism should also be taken seriously by classical utilitarians, as a case can be made that the universe is likely to end up being net negative in terms of algo-hedonic tone. For instance, it may well be that most sentient life that will ever exist will find itself in a state of natural carnage, as civilizations may be rare even on planets where sentient life has emerged, and because even where civilizations have emerged, it may be that they are unlikely to be sustainable, perhaps overwhelmingly so, implying that most sentient life might be expected to exist at the stage it has existed on for the entire history of sentient life on Earth. A stage where sentient beings are born in great numbers only for the vast majority of them to die shortly thereafter, for instance due to starvation or by being eaten alive, which is most likely a net negative condition, even by wishful classical utilitarian standards. Simon Knutsson’s essay How Could an Empty World Be Better than a Populated One? is worth reading in this context, and of course applies to “no world” as well.

And if one takes a so-called meta-normative approach, where one decides by averaging over various ethical theories, one could argue that the case against universe creation becomes significantly stronger; if one for instance combines an unclear or negative-leaning verdict from a classical utilitarian stance with The Asymmetry and Kantian ethics.

As for those who hold anti-natalism at the core of their values, one could argue that they should make universe anti-natalism their main focus over human anti-natalism (which may not even reduce suffering in expectation), or at the very least expand their focus to also encompass this, apparently esoteric position. Not only because the scale is potentially unsurpassable in terms of what prevents the most births, but it may also be easier, both because wishful thinking about “those horrors will not befall my creation” could be more difficult to maintain in the face of horrors that we know have occurred in the past, and because we do not seem as attached and adapted, biologically and culturally, to creating new universes as we are to creating new children. And just as anti-natalists argue with respect to human life, being against the creation of new universes need not be incompatible with a responsible sustainment of life in the one that does exist. This might also be a compromise solution that many people would be able to agree on.

Are Other Things Equal?

The discussion above assumes that the generation of a new universe would leave all else equal, or at least leave all else merely “finitely altered”. But how can we be sure that the generation of a new universe would not in fact prevent the emergence of another? Or perhaps even prevent many infinite universes from emerging? We can’t. Yet we do not appear to have any reason for believing that this is the case. As noted above, all else will often be equal in expectation, and that also seems true in this case. We can make counter-Pascallian hypotheses in both directions, and in the absence of evidence for any of them, we appear to have most reason to believe that the creation of a new universe results, in the aggregate, in a net addition of a new universe. But this could of course be wrong.

For instance, artificial universe creation would be dwarfed by the natural universe generation that happens all the time according to inflationary models, so could it not be that the generation of a new universe might prevent some of these natural ones from occurring? I doubt that there are compelling reasons for believing this, but natural universe generation does raise the interesting question of whether we might be able to reduce the rate of this generation. Brian Tomasik has discussed the idea, yet it remains an open, and virtually unexplored, research question. One that could dominate all other considerations.

It may be objected that considerations of identical, or virtually identical, copies of ourselves throughout the universe have been omitted in this discussion, yet as far as I can tell, including such considerations would not change the discussion in a fundamental way. For if universe generation is the main cause and most consequential action to focus on for us, more important even than the intrinsic importance of the entire future of our civilization, then this presumably applies to each copy of ourselves as well. Yet I am curious to hear arguments that suggest otherwise.

A final miscellaneous point I should like to add here is that the points made above may apply even if the universe is, and only ever will be, finite, as the generation of a new finite pocket universe in that case still could bring about far more suffering than what is found in the future light cone of our own universe.

Implications for Artificial Intelligence in Brief

The prospect of universe generation, and the fact that it may dominate everything else, also seems to have significant implications for our focus on the future of artificial intelligence, one of them being, as hinted above, that altruists should perhaps not focus on artificial intelligence as their main cause (and why we should be careful about claiming that it is clear that they should, as we may thereby risk overlooking crucial considerations). For instance, if artificial intelligence is sufficiently unlikely to ever “take over” in the way that is often feared, or if focusing directly on researching or arguing against universe generation has higher expected value.

Moreover, it suggests that, to the extent altruists indeed should focus primarily on artificial intelligence, this would be to the extent that artificial intelligence will determine the rate of universe generation in the universe. This might be the main thing to focus on when implementing “Fail-Safe” measures in artificial intelligence, or in any kind of future civilization, to the extent implementation of such measures is feasible.

 

In conclusion, the subjects of the potential to effect infinite (dis)value in general, and of impacting universe generation in particular, are extremely neglected at this point, and a case can be made that more research into such possibilities should be our top priority. It seems conceivable that a question related to such a prospect — e.g. should we create more universes? — will one day be the main ethical question facing our civilization, perhaps even one we will be forced to decide upon in a not too distant future. Given the potentially enormous stakes, it seems worth being prepared for such scenarios — including knowing more about their nature, how likely they are, and how to best act in them — even if they are unlikely.

Response to a Conversation on “Intelligence”

I think much confusion is caused by a lack of clarity about the meaning of the word “intelligence”, and not least a lack of clarity about the nature of the thing(s) we refer to by this word. This is especially true when it comes to discussions of artificial intelligence (AI) and the risks it may pose. A recently published conversation between Tobias Baumann (blue text) and Lukas Gloor (orange text) contains a lot of relevant considerations on this issue, along with some discussion of my views on it, which makes me feel compelled to respond.

The statement that gave rise to the conversation was apparently this:

> Intelligence is the only advantage we have over lions.

My thoughts on which is that this is a simplistic claim. First, I take it that “intelligence” here means cognitive abilities. But cognitive abilities alone — a competent head on a body without legs or arms — will not allow one to escape from lions; it will only enable one to think of and regret all the many useful “non-cognitive” tools one would have liked to have. The sense in which humans have an advantage over other animals, in terms of what has enabled us to take over the world for better or worse, is that we have a unique set of tools — upright walk, vocal cords, hands with fine motor skills, and a brain that can acquire culture. This has enabled us, over time, to build culture, with which we have been able to develop tools that have enabled us to gain an advantage over lions, mostly in the sense of not needing to get near them, as that could easily get fatal, even given our current level of cultural sophistication and “intelligence”.

I could hardly disagree more with the statement that “the reason we humans rule the earth is our big brain”. To the extent we do rule the Earth, there are many reasons, and the brain is just part of the story, and quite a modest one relative to what it gets credit for (which is often all of it). I think Jacob Bronowski’s The Ascent of Man is worth reading for a more nuanced and accurate picture of humanity’s ascent to power than the “it’s all due to the big brain” one.

There is a pretty big threshold effect here between lions (and chimpanzees) and humans, where with a given threshold of intelligence, you’re also able to reap all the benefits from culture. (There might be an analogous threshold for self-improvement FOOM benefits.)

The question is what “threshold of intelligence” means in this context. All humans do not reap all the same benefits from culture — some have traits and abilities that enable them to reap far more benefits than others. And many of these traits have nothing to do with “intelligence” in any usual sense. Good looks, for instance. Or a sexy voice.

And the same holds true for cognitive abilities in particular: it is more nuanced than what measurement along a single axis can capture. For instance, some people are mathematical geniuses, yet socially inept. There are many axes along which we can measure abilities, and what allows us to build culture is all these many abilities put together. Again, it is not, I maintain, a single “special intelligence thing”, although we often talk as though it were.

For this reason, I do not believe such a FOOM threshold along a single axis makes much sense. Rather, we see progress along many axes that, when certain thresholds are crossed, allows us to expand our abilities in new ways. For example, at the cultural level we may see progress beyond a certain threshold in the production of good materials, which then leads to progress in our ability to harvest energy, which then leads to better knowledge and materials, etc. A more complicated story with countless little specialized steps and cogs. As far as I can tell, this is the recurrent story of how progress happens, on every level: from biological cells to human civilization.

Magnus Vinding seems to think that because humans do all the cool stuff “only because of tools,” innate intelligence differences are not very consequential.

I would like to see a quote that supports this statement. It is accurate to say that I think we do “all the cool stuff only because of tools”, because I think we do everything because of tools. That is, I do not think of that which we call “intelligence” as anything but the product of a lot of tools. I think it’s tools all the way down, if you will. I suppose I could even be considered an “intelligence eliminativist”, in that I think there is just a bunch of hacks; no “special intelligence thing” to be found anywhere. RNA is a tool, which has built another tool, DNA, which, among other things, has built many different brain structures, which are all tools. And so forth. It seems to me that the opposite position with respect to “intelligence” — what may be called “intelligence reification” — is the core basis of many worries about artificial intelligence take-offs.

It is not correct, however, that I think that “innate differences in intelligence [which I assume refers to IQ, not general goal-achieving ability] are not very consequential”. They are clearly consequential in many contexts. Yet IQ is far from being an exhaustive measure of all cognitive abilities (although it sure does say a lot), and cognitive abilities are far from being all that enables us to achieve the wide variety of goals we are able to achieve. It is merely one integral subset among many others.

This seems wrong to me [MV: also to me], and among other things we can observe that e.g. von Neumann’s accomplishments were so much greater than the accomplishments that would be possible with an average human brain.

I wrote a section on Von Neumann in my Reflections on Intelligence, which I will refer readers to. I will just stress, again, that I believe thinking of “accomplishments” and “intelligence” along a single axis is counterproductive. John Von Neumann was no doubt a mathematical genius of the highest rank. Yet with respect to the goal of world domination in particular, which is what we seem especially concerned about in this context, putting Von Neumann in charge hardly seems a recipe for success, but rather the opposite. As he reportedly said:

“If you say why not bomb them tomorrow, I say why not today? If you say today at five o’ clock, I say why not one o’ clock?”

To me, these do not seem to be the words of a brain optimized for taking over the world. If we want to look at such a brain, we should, by all appearances, rather peer into the skull of Putin or Trump (if it is indeed mainly their brain, rather than their looks, or perhaps a combination of many things, that brought them into power).

One might argue that empirical evidence confirms the existence of a meaningful single measure of intelligence in the human case. I agree with this, but I think it’s a collection of modules that happen to correlate in humans for some reason that I don’t yet understand.

I think a good analogy is a country’s GDP. It’s a single, highly informative measure, yet a nation’s GDP is a function of countless things. This measure predicts a lot, too. Yet it clearly also leaves out a lot of information. More than that, we do not seem to fear that the GDP of a country (or a city, or the indeed the whole world) will suddenly explode once it reaches a certain level. But why? (For the record, I think global GDP is a far better measure of a randomly selected human’s ability to achieve a wide variety of goals [of the kind we care about] than said person’s IQ is.)

> The “threshold” between chimps and humans just reflects the fact that all the tools, knowledge, etc. was tailored to humans (or maybe tailored to individuals with superior cognitive ability).

So there’s a possible world full of lion-tailored tools where the lions are beating our asses all day?

Depending on the meaning of “lion-tailored tool” it seems to me the answer could well be “yes”. In terms of the history of our evolution, for instance, it could well be that a lion tool in the form of, say, powerful armor could have meant that humans were killed by them in high numbers rather than the other way around.

Further down you acknowledge that the difference is “or maybe tailored to individuals with superior cognitive ability” – but what would it mean for a tool to be tailored to inferior cognitive ability? The whole point of cognitive ability is to be good at make the most out of tool-shaped parts of the environment.

I suspect David Pearce might say that that’s a parochially male thing to say. One could also say that the whole point of cognitive abilities is to make others feel good — a drive/task that has no doubt played a large role both for human survival and the increase in our cognitive abilities and goal-achieving abilities in general, arguably just as great as “making the most out of tool-shaped parts of the environment”.

Second, I think the term “inferior cognitive ability” again overlooks that there are many dimensions along which we can measure cognitive abilities. Once again, take the mathematical genius who has bad social skills. How to best make tools — ranging from apps to statements to say to oneself — that improve the capabilities of such an individual seems likely to be different in significant ways from how to best make tools for someone who is, say, socially gifted and mathematically inept.

Magnus takes the human vs. chimp analogy to mean that intelligence is largely “in the (tool-and-culture-rich) environment.

I would delete the word “intelligence” and instead say that the ability to achieve goals is a product of a large set of tools, of which, in our case, the human brain is a necessary but, for virtually all of our purposes, insufficient subset.

Also, chimps display superior cognitive abilities to humans in some respects, so saying that humans are more “intelligent” than chimps, period, is, I think, misleading. The same holds true of our usual employment of the word “intelligence” in general, in my view.

My view implies that quick AI takeover becomes more likely as society advances technologically. Intelligence would not be in the tools, but tools amplify how far you can get by being more intelligent than the competition (this might be mostly semantics, though).

First, it should be noted that “intelligence” here seems to mean “cognitive abilities”, not “the ability to achieve goals”. This distinction must be stressed. Second, as hinted above, I think the dichotomy between “intelligence” (i.e. cognitive ability) on the one hand and “tools” on the other is deeply problematic. I fail to see in what sense cognitive abilities are not tools? (And by “cognitive abilities” I also mean the abilities of computer software.) And I think the causal arrows between the different tools that determine how things unfold are far more mutual than they are according to the story that “intelligence (some subset of cognitive tools) is that which will control all other tools”.

Third, for reasons alluded to above, I think the meaning of “being more intelligent than the competition” stands in need of clarification. It is far from obvious to me what it means. More cognitively able, presumably, but in what ways? What kinds of cognitive abilities are most relevant with respect to the task of taking over the world? And how might they be likely to be created? Relevant questions to clarify, it seems to me.

Some reasons not to think that “quick AI takeover becomes more likely as society advances technologically” include that other agents would be more capable (closer to notional limits of “capabilities”) the more technologically advanced society is, that there would be more technology learned about and mastered by others to learn about and master in order to take over, and finally that society presumably will learn more about the limits and risks of technology, including AI, the more technologically advanced it is, and hence know more about what to expect and how to counter it.

 

This post was originally published at my old blog: http://magnusvinding.blogspot.dk/2017/07/response-to-conversation-on-intelligence.html

Blog at WordPress.com.

Up ↑