Free Will: Emphasizing Possibilities

I suspect the crux of discussions and worries about (the absence of) “free will” is the issue of possibilities. I also think it is a key source of confusion. Different people are talking about possibilities in different senses without being clear about it, which leads them to talk past each other, and perhaps even to confuse and dispirit laypeople by making them feel they have no possibilities in any sense whatsoever.

Different Emphases

Thinkers who take different positions on free will tend to emphasize different things. One camp tends to say “we don’t have free will, since all our actions are caused by prior causes that are ultimately beyond our own control, and in this there are no ‘alternative possibilities'”.

Another camp, so-called compatibilists, will tend to agree with the latter point about prior causes, but they choose to emphasize possibilities: “complex agents can act within a range of possibilities in a way crude objects like rocks cannot, and such agents truly do weigh and choose between these options”.

In essence, what I think the latter camp is emphasizing is the fact that we have ex-ante possibilities: a range of possibilities we can choose from in expectation. (For example, in a game of chess, your ex-ante possibilities are comprised by the set of moves allowed by the rules of the game.) And since this latter camp defines free will roughly as the ability to make choices among such ex-ante possibilities, they conclude that we indeed do have free will.

I doubt any philosopher arguing against the existence of free will would deny the claim that we have ex-ante possibilities. After all, we all conceive of various possibilities in our minds that we weigh and choose between, and we indeed cannot talk meaningfully about ethics, or choices in general, without such a framework of ex-ante possibilities. (Whether possibilities exist in any other sense than ex ante, and whether this is ethically relevant, are separate questions.)

Given the apparent agreement on these two core points — 1) our actions are caused by prior causes, and 2) we have ex-ante possibilities — the difference between the two camps mostly seems to lie in how they define free will and whether they prefer to emphasize 1) or 2).

The “Right” Definition of Free Will

People in these two camps will often insist that their definition of free will is the one that matches what most people mean by free will. I think both camps are right and wrong about this. I think it is misguided to think that most people have anything close to a clear definition of free will in their minds, as opposed to having a jumbled network of associations that relate to a wide range of notions, including notions of independence from prior causes and notions of ex-ante possibilities.

Experimental philosophy indeed also hints at a much more nuanced picture of people’s intuitions and conceptions of “free will”, and reveals them to be quite unclear and conflicting, as one would expect.

Emphasizing Both

I believe the two distinct emphases outlined above are both important yet insufficient on their ownThe emphasis on prior causes is important for understanding the nature of our choices and actions. In particular, it helps us understand that our choices do not comprise a break with physical mechanism, but that they are indeed the product of complex such mechanisms (which include the mechanisms of our knowledge and intentions, as well as the mechanism of weighing various ex-ante possibilities).

In turn, this emphasis may help free us from certain bad ideas about human choices, such as naive ideas about how anyone can always pull themselves up by their bootstraps. It may also help us construct better incentives and institutions based on an actual understanding of the mechanism of our choices rather than supernatural ideas about them. Lastly, it may help us become more understanding toward others, such as by reminding us that we cannot reasonably expect people to act on knowledge they do not possess.

Similarly, emphasizing our ex-ante possibilities is important for our ability to make good decisions. Mistakingly believing that one has only one possibility, ex ante, rather than thinking through all possibilities will likely lead to highly sub-optimal outcomes, whether it be in a game of chess or a major life decision. Aiming to choose the ex-ante possibility that seems best in expectation is crucial for us to make good choices. Indeed, this is what good decision-making is all about.

More than that, an emphasis on ex-ante possibilities can also help instill in us the healthy and realistic versions of bootstrap-pulling attitudes, namely that hard work and dedication indeed are worthwhile and truly can lead us in better directions.

Both Emphases Have Pitfalls (in Isolation)

Our minds intuitively draw inferences and associations based on the things we hear. When it comes to “free will”, I suspect most of us have quite leaky conceptual networks, in that the distinct clusters of sentiments we intuitively tie to the term “free will” readily cross-pollute each other — a form of sentiment synesthesia.

So when someone says “we don’t have free will, everything is caused by prior causes”, many people may naturally interpret this as implying “we don’t have ex-ante possibilities, and so we cannot meaningfully think in terms of alternative possibilities”, even though this does not follow. This may in turn lead to bad decisions and feelings of disempowerment. It may also lead people to think that it makes no sense to punish people, or that we cannot meaningfully say things like “you really should have made a better choice”. Yet these things do make sense. They serve to create incentives by making a promise for the future — “people who act like this will pay a price” — which in turn nudges people toward some of their ex-ante possibilities over others.

More than that, a naive emphasis on the causal origins of our actions may also lead people to think that certain feelings — such as pride, regret, and hatred — are always unreasonable and should never be entertained. Yet this does not follow either. Indeed, these feelings likely have great utility in some circumstances, even if such circumstances are rare.

A similar source of confusion is to say that our causal nature implies that everything is just a matter of luck. Although this is true in some ultimate sense, in another sense — the everyday sense that distinguishes between things won through hard effort versus dumb luck — everything is obviously not just a matter of luck. And I suspect most people’s intuitive associations can also be leaky between these very different notions of “luck”. Consequently, unreserved claims about everything being a matter of luck also risk having unfortunate effects, such as leading us to underemphasize the importance of effort.

Such pitfalls also exist relative to the claim “you could not have done otherwise”. For what we often mean by this claim, when we talk about specific events in everyday conversations, is that “this event would have happened even if you had done things differently” (that is: the environment constrained you, and your efforts were immaterial). This is very different from saying, for example, “you could not have done otherwise because your deepest values compelled you” (meaning: the environment may well have allowed alternative possibilities, but your values did not). The latter is often true of our actions, yet it is in many ways the very opposite of what we usually mean by “you could not have done otherwise”.

Hence, confusion is likely to emerge if someone simply declares “you could not have done otherwise” about all actions without qualification. And such confusion may well persist even in the face of explicit qualifications, since confusions deep down at the intuitive level may not be readily undone by just a few cerebral remarks.

Conversely, there are also pitfalls of sentiment leakiness in the opposite direction. When someone says “ex-ante possibilities are real, and they play a crucial role in our decision-making”, people may naturally interpret this as implying “our actions are not caused by prior causes, and this is crucial for our decision-making”. And this may in turn lead to the above-mentioned mistakes that the prior-causes emphasis can help us avoid: misunderstanding our mechanistic nature and failing to act on such an understanding, as well as entertaining unreasonable ideas about how we can expect people to act.


This is why one has to be careful in one’s communication about “free will”, and to clearly flag these non sequiturs. “We are caused by prior causes” does not mean “we have no ex-ante possibilities”, and conversely, “we have ex-ante possibilities” does not imply “we are not caused by prior causes”.


Acknowledgments: Thanks to Mikkel Vinding for comments.

On Insects and Lexicality

“Their experiences may be more simple than ours, but are they less intense? Perhaps a caterpillar’s primitive pain when squashed is greater than our more sophisticated sufferings.”

— Richard Ryder, Painism: A Modern Morality, p. 64.


Many people, myself included, find it plausible that suffering of a certain intensity, such as torture, carries greater moral significance than any amount of mild suffering. One may be tempted to think that views of this kind imply we should primarily prioritize the beings most likely to experience these “lexically worse” states of suffering (LWS) — presumably beings with large brains.* By extension, one may think such views will generally imply little priority to beings with small, less complex brains, such as insects. (Which is probably also a view we would intuitively like to embrace, given the inconvenience of the alternative.) 

Yet while perhaps intuitive, I do not think this conclusion follows. The main argument against it, in my view, is that we should maintain a non-trivial probability that beings with small brains, such as insects, indeed can experience LWS (regardless of how we define these states). After all, on what grounds can we confidently maintain they cannot?

And if we then assume an expected value framework, and multiply the large number of insects by a non-trivial probability of them being able to experience LWS, we find that, in terms of presently existing beings, the largest amount of LWS in expectation may well be found in small beings such as insects.

* It should be noted in this context, though, that many humans ostensibly cannot feel (at least physical) pain, whereas many beings with smaller brains show every sign of having this capacity, which suggests brain size is a poor proxy for the ability to experience pain, let alone the ability to experience LWS, and that genetic variation in certain pain-modulating genes may well be a more important factor.

More literature

On insects:

The Importance of Insect Suffering
Reducing Suffering Amongst Invertebrates Such As Insects
Do Bugs Feel Pain?
How to Avoid Hurting Insects
The Moral Importance of Invertebrates Such as Insects

On Lexicality:

Value Lexicality
Lexical views without abrupt breaks
Clarifying lexical thresholds
Many-valued logic as a reply to sequence arguments in value theory

Physics Is Also Qualia

In this post, I seek to clarify what I consider to be some common confusions about consciousness and “physics” stemming from a failure to distinguish clearly between ontological and epistemological senses of “physics”.

Clarifying Terms

Two senses of the word “physics” are worth distinguishing. There is physics in an ontological sense: roughly speaking, the spatio-temporal(-seeming) world that in many ways conforms well to our best physical theories. And then there is physics in an epistemological sense: a certain class of models we have of this world, the science of physics.

“Physics” in this latter, epistemological sense can be further divided into 1) the physical models we have in our minds, versus 2) the models we have external to our minds, such as in our physics textbooks and computer simulations. Yet it is worth noting that, to the extent we ourselves have any knowledge of the models in our books and simulations, we only have this knowledge by representing it in our minds. Thus, ultimately, all the knowledge of physical models we have, as subjects, is knowledge of the first kind: as appearances in our minds.*

In light of these very different senses of the term “physics”, it is clear that the claim that “physics is also qualia” can be understood in two very different ways: 1) in the sense that the physical world, in the ontological sense, is qualia, or “phenomenal”, and 2) that our models of physics are qualia, i.e. that our models of physics are certain patterns of consciousness. The first of these two claims is surely the most controversial one, and I shall not defend it here; I explore it here and here.

Instead, I shall here focus on the latter claim. My aim is not really to defend it, as I already briefly did that above: all the knowledge of physics we have, as subjects, ultimately appears as experiential patterns in our minds. (Although talk of the phenomenology of, say, operations in Hilbert spaces admittedly is rare.) I take this to be obvious, and hit an impasse with anyone who disagrees. My aim here is rather to clarify some confusions that arise due to a lack of clarity about this, and due to conflations of the two senses of “physics” described above.

The Problem of Reduction: Epistemological or Ontological?

I find it worth quoting the following excerpt from a Big Think interview with Sam Harris. Not because there is anything atypical about what Harris says, but rather because I think he here clearly illustrates the prevailing lack of clarity about the distinction between epistemology and ontology in relation to “the physical”.

If there’s an experiential internal qualitative dimension to any physical system then that is consciousness. And we can’t reduce the experiential side to talk of information processing and neurotransmitters and states of the brain […]. Someone like Francis Crick said famously you’re nothing but a pack of neurons. And that misses the fact that half of the reality we’re talking about is the qualitative experiential side. So when you’re trying to study human consciousness, for instance, by looking at states of the brain, all you can do is correlate experiential changes with changes in brain states. But no matter how tight these correlations become that never gives you license to throw out the first person experiential side. That would be analogous to saying that if you just flipped a coin long enough you would realize it had only one side. And now it’s true you can be committed to talking about just one side. You can say that heads being up is just a case of tails being down. But that doesn’t actually reduce one side of reality to the other.

Especially worth resting on here is the statement “half of the reality we’re talking about is the qualitative experiential side.” Yet is this “half of reality” an “ontological half” or an “epistemological half”? That is, is there a half of reality out there that is part phenomenal, and part “non-phenomenal” — perhaps “inertly physical”? Or are we rather talking about two different phenomenal descriptions of the same thing, respectively 1) physico-mathematical models of the mind-brain (and these models, again, are also qualia, i.e. patterns of consciousness), and 2) all other phenomenal descriptions, i.e. those drawing on the countless other experiential modalities we can currently conceive of — emotions, sounds, colors, etc. — as well as those we can’t? I suggest we are really talking about two different descriptions of the same thing.

A similar question can be raised in relation to Harris’ claim that we cannot “reduce one side of reality to the other.” Is the reduction in question, or rather failure of reduction, an ontological or an epistemological one? If it is ontological, then it is unclear what this means. Is it that one side of reality cannot “be” the other? This does not appear to be Harris’ view, even if he does tacitly buy into ontologically distinct sides (as opposed to descriptions) of reality in the first place.

Yet if the failure of reduction is epistemological, then there is in fact little unusual about it, as failures of epistemological reduction, or reductions from one model to another, are found everywhere in science. In the abstract sciences, for example, one axiomatic system does not necessarily reduce to another; indeed, we can readily create different axiomatic systems that not only fail to reduce to each other yet which actively contradict each other. And hence we cannot derive all of mathematics, broadly construed, from a single axiomatic system.

Similarly, in the empirical sciences, economics does not “reduce to” quantum physics. One may object that economics does reduce to quantum physics in principle, yet it should then be noted that 1) the term “in principle” does an enormous amount of work here, arguably about as much as it would have to do in the claim that “quantum physics can explain consciousness in principle” — after all, physics and economics invoke very different models and experiential modalities (economic theories are often qualitative in nature, and some prominent economists have even argued they are primarily so). And 2) a serious case can be made against the claim that even all the basic laws found in chemistry, the closest neighbor of physics, can be derived from fundamental physical theories, even in principle (see e.g. Berofsky, 2012, chap. 8). This case does not rest on there being something mysterious going on between our transition from theories of physics to theories of chemistry, nor that new fundamental forces are implicated, but merely that our models in these respective fields contain elements not reducible, even in principle, to our models in other areas.

Thus, at the level of our minds, we can clearly construct many different mental models which we cannot reduce to each other, even in principle. Yet this merely says something about our models and epistemology. It hardly comprises a deep metaphysical mystery.

Denying the Reality of Consciousness

The fact that the world conforms, at least roughly, to description in “physical” terms seems to have led some people to deny that consciousness in general exists. Yet this, I submit, is a fallacy: the fact that we can model the world in one set of terms which describe certain of its properties does not imply that we cannot describe it in another set of terms that describe other properties truly there as well, even if we cannot derive one from the other.

By analogy, consider again physics and economics: we can take the exact same object of study — say, a human society — and describe aspects of it in physical terms (with models of thermodynamics, classical mechanics, electrodynamics, etc.), yet we cannot from any such description or set of descriptions meaningfully derive a description of the economics of this society. It would clearly be a fallacy to suggest that this implies facts of economics cannot exist.

Again, I think the confusion derives from conflating epistemology with ontology: “physics”, in the epistemological sense of “descriptions of the world in physico-mathematical terms”, appears to encompass “everything out there”, and hence, the reasoning goes, nothing else can exist out there. Of course, in one sense, this is true: if a description in physico-mathematical terms exhaustively describes everything out there, then there is indeed nothing more to be said about it — in physico-mathematical terms. Yet this says nothing about the properties of what is out there in other terms, as illustrated by the economics example above. (Another reason some people seem to deny the reality of consciousness, distinct from conflation of the epistemological and the ontological, is “denial due to fuzziness”, which I have addressed here.)

This relates, I think, to the fundamental Kantian insight on epistemology: we never experience the world “out there” directly, only our own models of it. And the fact that our physical model of the world — including, say, a physical model of the mind-brain of one’s best friend — does not entail other phenomenal modalities, such as emotions, by no means implies that the real, ontological object out there which our physical model reflects, such as our friend’s actual mind-brain, does not instantiate these things. That would be to confuse the map with the territory. (Our emotional model of our best friend does, of course, entail emotions, and it would be just as much of a fallacy to say that, since such emotional models say nothing about brains in physical terms, descriptions of the latter kind have no validity.)

Denials of this sort can have serious ethical consequences, not least since the most relevant aspects of consciousness, including suffering, fall outside descriptions of the world in purely physical terms. Thus, if we insist that only such physico-mathematical descriptions truly describe the world, we seem forced to conclude that suffering, along with everything else that plausibly has moral significance, does not truly exist. Which, in turn, can keep us from working toward a sophisticated understanding of these things, and from creating a better world accordingly.


* And for this reason, the answer to the question “how do you know you are conscious?” will ultimately be the same as the answer to the question “how do you know physics (i.e. physical models) exist?” — we experience these facts directly.

Thinking of Consciousness as Waves

First written: Dec 14, 2018, Last update: Jan 2, 2019.


How can we think about the relationship between the conscious and the physical? In this essay I wish to propose a way of thinking about it that might be fruitful and surprisingly intuitive, namely to think of consciousness as waves.

The idea is quite simple: one kind of conscious experience corresponds to, or rather conforms to description in terms of, one kind of wave. And by combining different kinds of waves, we can obtain an experience with many different properties in one.

It should be noted that I in this post merely refer to waves in an abstract sense to illustrate a general point. That is, I do not refer to electromagnetic waves in particular (as some theories of consciousness do), nor to quantum waves (as other theories do), nor to any other particular kind of wave (such as Selen Atasoy’s so-called connectome-specific harmonic waves*). The point here is not what kind of wave, or indeed which physical state in general, that mediates different states of consciousness. The point is merely to devise a metaphor that can render intuitive the seemingly unintuitive, namely: how can we get something complex and multifaceted from something very simple without having anything seemingly spooky or strange, such as strong emergence, in between? In particular, how can we say that brains mediate conscious experience without saying that, say, electrons mediate conscious experience? I believe thinking about consciousness in terms of waves can help dissolve this confusion. 

The magic of waves is that we can produce (or to an arbitrary level of precision approximate) any kind of complex, multifaceted wave by adding simple sine waves together.


Image result for waves sine
Sine waves with different frequencies.


In this way, it is possible, for instance, to decompose any recorded song — itself a complex, multifaceted wave — into simple, tedious-sounding sine waves. Each resulting sine wave can be said to comprise an aspect of the song, yet not in any recognizable way. The whole song is in fact a sum of such waves, not in a strange way that implies strong emergence, but merely in a complicated, composite way.

Another way to think about waves that can help us think more clearly about emergent complexity is to think of a wave that is very small in both amplitude and duration. If this were a sound wave, it would be an extremely short-lived, extremely low-volume sound. On a visual representation of an entire song file, this sound would look more akin to a dot than a wave.


Image result for a point math
A dot.


And such simple sound waves can also be put together so as to create a song (for instance, one can take the sine waves obtained by decomposing a song and then chop them into smaller bits and decrease their amplitude). It will just, to make a song, take a very great number of such small waves superimposed (if the song is to be loud enough to hear) and in succession (if the song is to last for more than a split-second).


The deeper point here is that waves are waves, no matter how small or simple, large or complex. Yet not all waves comprise what we would recognize as music. Similarly, even if all physical states are phenomenal in the broadest sense, this does not imply that they are conscious in the sense of being an ordered, multifaceted whole. Unfortunately, we do not as yet have good, analogous terms for “sound” and “music” in the phenomenal realm — perhaps we could use “phenomenality” and “consciousness”, respectively?

The problem is indeed that we are limited by language, in that the word “conscious” usually only connotes an ordered, composite mind rather than the property of phenomenality in the most general sense. Consequently, if we think all that exists is either music or non-sound, metaphorically speaking, we are bound to be confused. But if we instead expand our vocabulary, and thereby expand our allowed ways of thinking, our confusion can, I think, be readily dissolved. If we think of the phenomenality of the simplest physical systems as being nothing like consciousness in the usual sense of a composite mind but rather as a state of hyper-crude phenomenality — i.e. “phenomenal noise” that is nothing like a song but more akin to a low, short-lived sound, and yet unimaginably more crude still — then the problem of consciousness, as commonly (mis)conceived, seems to become a lot less confusing.**

Avoiding Confusion Due to Fuzziness

A more specific point of confusion the wave metaphor can help us dissolve is the notion that consciousness is so fuzzy a category that it in fact does not really exist, just like tables and chairs do not really exist. As I have argued elsewhere, I think this is a non sequitur. The fact that the categories of tables and chairs are themselves fuzzy does not imply that the physical properties of the objects to which we refer with these labels are inexact, let alone non-existent. The objects have the physical properties they have regardless of how we label them. Or, to continue the analogy to waves above, and songs in particular: although there is ambiguity about what counts as a song, this does not imply that we cannot speak in precise, factual terms about the properties of a given song — for instance, whether a given song contains a 440 Hz tone.

Similarly, the fact that consciousness, as in “an ordered, composite mind”, is a fuzzy category (after all, what counts as ordered? Do psychotic states? Fleeting dreams?) does not imply that any given phenomenal state we refer to with this term does not have exact and clearly identifiable phenomenal properties — e.g. an experience of the color red or the sensation of fear; properties that exist regardless of how outside observers choose to label them.

And although our labels for categorizing particular phenomenal states themselves tend to be fuzzy to some extent — e.g. which part of the spectrum below counts as red? — this does not imply that we cannot distinguish between different states, nor that we cannot draw any clear boundaries. For instance, we can clearly distinguish between the blue and the red zones respectively on the illustration below despite its gradation.


Image result for range of color
A linear representation of the visible light spectrum with wavelengths in nanometers.


Just as we can point toward a confined range of wavelengths which induce an experience of (some kind of) red in most people upon hitting their retinas, we can also, in principle, point to a range of physical states that mediate specific phenomenal states. This includes the phenomenal states we call suffering, with the fuzziness of what counts as suffering contained within and near the bounds of this range, while the physical states outside this range, especially those far away, do not mediate suffering, cf. the non-red range in the illustration above.

Thus, by analogy to how we can have precise descriptions of the properties of a song, even as an exact definition of what counts as a song escapes us, there is no reason why we should not be able to speak in factual and precise terms about the phenomenal aspects of a mind and its physical signatures, including the “red range” of wavelengths that comprise phenomenal suffering, metaphorically speaking. And a sophisticated understanding of this notional range is indeed of paramount importance for the project of reducing suffering.

* Note that these seemingly different kinds of waves and theories of consciousness can be identical, since connectome-specific harmonic waves could turn out to be coherent waves in the electromagnetic quantum field, as would seem suggested by a hypothesis known as quantum brain dynamics (I do not necessarily endorse this particular hypothesis).

** Another useful analogy for thinking more clearly about the seemingly crazy notion that “everything is conscious” — or rather: phenomenal — is to think about the question, Is everything light? For in a highly non-standard sense, everything is indeed “light”, in that electromagnetic waves permeate the universe in the form of cosmic background radiation, although everything is not permeated by light in the usual sense of visible electromagnetic radiation (wavelengths around 400–700 nm). We may thus think of consciousness as analogous to visible light (they can also both be more or less intense and have various nuances), and electromagnetic radiation as analogous to phenomenality — the more general phenomenon that encompasses the specific one.


Is AI Alignment Possible?

The problem of AI alignment is usually defined roughly as the problem of making powerful artificial intelligence do what we humans want it to do. My aim in this essay is to argue that this problem is less well-defined than many people seem to think, and to argue that it is indeed impossible to “solve” with any precision, not merely in practice but in principle.

There are two basic problems for AI alignment as commonly conceived. The first is that human values are non-unique. Indeed, in many respects, there is more disagreement about values than people tend to realize. The second problem is that even if we were to zoom in on the preferences of a single human, there is, I will argue, no way to instantiate a person’s preferences in a machine so as to make it act as this person would have preferred.

Problem I: Human Values Are Non-Unique

The common conception of the AI alignment problem is something like the following: we have a set of human preferences, X, which we must, somehow (and this is usually considered the really hard part), map onto some machine’s goal function, Y, via a map f, let’s say, such that X and Y are in some sense isomorphic. At least, this is a way of thinking about it that roughly tracks what people are trying to do.

Speaking in these terms, much attention is being devoted to Y and f compared to X. My argument in this essay is that we are deeply confused about the nature of X, and hence confused about AI alignment.

The first point of confusion is about the values of humanity as a whole. It is usually acknowledged that human values are fuzzy, and that there are some disagreements over values among humans. Yet it is rarely acknowledged just how strong this disagreement in fact is.

For example, concerning the ideal size of the future population of sentient beings, the disagreement is near-total, as some (e.g. some defenders of the so-called Asymmetry in population ethics, as well as anti-natalists such as David Benatar) argue that the future population should ideally be zero, while others, including many classical utilitarians, argue that the future population should ideally be very large. Many similar examples could be given of strong disagreements concerning the most fundamental and consequential of ethical issues, including whether any positive good can ever outweigh extreme suffering. And on many of these crucial disagreements, a very large number of people will be found on both sides.

Different answers to ethical questions of this sort do not merely give rise to small practical disagreements; in many cases, they imply completely opposite practical implications. This is not a matter of human values being fuzzy, but a matter of them being sharply, irreconcilably inconsistent. And hence there is no way to map the totality of human preferences, “X”, onto a single, well-defined goal-function in a way that does not conflict strongly with the values of a significant fraction of humanity. This is a trivial point, and yet most talk of human-aligned AI seems oblivious to this fact.

Problem II: Present Human Preferences Are Underdetermined Relative to Future Actions

The second problem and point of confusion with respect to the nature of human preferences is that, even if we focus only on the present preferences of a single human, then these in fact do not, and indeed could not possibly, determine with much precision what kind of world this person would prefer to bring about in the future.

This claim requires some unpacking, but one way to realize what I am trying to say here is to think in terms of the information required to represent the world around us. A precise such representation would require an enormous amount of information, indeed far more information than what can be contained in our brain. This holds true even if we only consider morally relevant entities around us — on the planet, say. There are just too many of them for us to have a precise representation of them. By extension, there are also too many of them for us to be able to have precise preferences about their individual states. Given that we have very limited information at our disposal, all we can do is express extremely coarse-grained and compressed preferences about what state the world around us should ideally have. In other words: any given human’s preferences are bound to be extremely vague about the exact ideal state of the world right now, and there will be countless moral dilemmas occurring across the world right now to which our preferences, in their present state, do not specify a unique solution.

And yet this is just considering the present state of the world. When we consider future states, the problem of specifying ideal states and resolutions to hitherto unknown moral dilemmas only explodes in complexity, and indeed explodes exponentially as time progresses. It is simply a fact, and indeed quite an obvious one at that, that no single brain could possibly contain enough information to specify unique, or indeed just qualified, solutions to all moral dilemmas that will arrive in the future. So what, then, could AI alignment relative to even a single brain possibly mean? How can we specify Y with respect to these future dilemmas when X itself does not specify solutions?

We can, of course, try to guess what a given human, or we ourselves, might say if confronted with a particular future moral dilemma and given knowledge about it, yet the problem is that our extrapolated guess is bound to be just that: a highly imperfect guess. For even a tiny bit of extra knowledge or experience can readily change a person’s view of a given moral dilemma to be the opposite of what it was prior to acquiring that knowledge (for instance, I myself switched from being a classical to a negative utilitarian based on a modest amount of information in the form of arguments I had not considered before). This high sensitivity to small changes in our brain implies that even a system with near-perfect information about some person’s present brain state would be forced to make a highly uncertain guess about what that person would actually prefer in a given moral dilemma. And the further ahead in time we go, and thus further away from our familiar circumstance and context, the greater the uncertainty will be.

By analogy, consider the task of AI alignment with respect to our ancestors ten million years ago. What would their preferences have been with respect to, say, the future of space colonization? One may object that this is underdetermined because our ancestors could not conceive of this possibility, yet the same applies to us and things we cannot presently conceive of, such as alien states of consciousness. Our current preferences say about as little about the (dis)value of such states as the preferences of our ancestors ten million years ago said about space colonization.

A more tangible analogy might be to consider the level of confidence with which we, based on knowledge of your current brain state, can determine your dinner preferences twenty years from now with respect to dishes made from ingredients not yet invented — a preference that will likely be influenced by contingent, environmental factors found between now and then. Not with great confidence, it seems safe to say. And this point pertains not only to dinner preferences but also to the most consequential of choices. Our present preferences cannot realistically determine, with any considerable precision, what we would deem ideal in as yet unknown, realistic future scenarios. Thus, by extension, there can be no such thing as value extrapolation or preservation in anything but the vaguest sense. No human mind has ever contained, or indeed ever could contain, a set of preferences that evaluatively orders more than but the tiniest sliver of (highly compressed versions of) real-world states and choices an agent in our world is likely to face in the future. To think otherwise amounts to a strange Platonization of human preferences. We just do not have enough information in our heads to possess such fine-grained values.

The truth is that our preferences are not some fixed entity that determine future actions uniquely; they simply could not be that. Rather, our preferences are themselves interactive and adjustive in nature, changing in response to new experiences and new information we encounter. Thus, to say that we can “idealize” our present preferences so as to obtain answers to all realistic future moral dilemmas is rather like calling the evolution of our ancestors’ DNA toward human DNA a “DNA idealization”. In both cases, we find no hidden Deep Essences waiting to be purified; no information that points uniquely toward one particular solution in the face of all realistic future “problems”. All we find are physical systems that evolve contingently based on the inputs they receive.*

The bottom line of all this is not that it makes no sense to devote resources toward ensuring the safety of future machines. We can still meaningfully and cooperatively seek to instill rules and mechanisms in our machines and institutions that seem optimal in expectation given our respective, coarse-grained values. The conclusion here is just that 1) the rules instantiated cannot be the result of a universally shared human will or anything close; the closest thing possible would be rules that embody some compromise between people with strongly disagreeing values. And 2) such an instantiation of coarse-grained rules in fact comprises the upper bound of what we can expect to accomplish in this regard. Indeed, this is all we can expect with respect to future influence in general: rough and imprecise influence and guidance with the limited information we can possess and transmit. The idea of a future machine that will do exactly what we would want, and whose design therefore constitutes a lever for precise future control, is a pipe dream.

* Note that this account of our preferences is not inconsistent with value or moral realism. By analogy, consider human preferences and truth-seeking: humans are able to discover many truths about the universe, yet most of these truths are not hidden in, nor extrapolated from, our DNA or our preferences. Indeed, in many cases, we only discover these truths by actively transcending rather than “extrapolating” our immediate preferences (for comfortable and intuitive beliefs, say). The same could apply to the realm of value and morality.

Why the Many-Worlds Interpretation May Not Have Significant Ethical Implications

At first glance, it seems like the many worlds interpretation of quantum mechanics (MWI) might have significant ethical implications. After all, MWI implies that there are many more sentient beings in the world than one would think given a naive classical view, indeed a much greater number of them. And so it seems quite plausible, at least on the face of it, that ethical considerations pertaining to MWI should dominate everything else in expectation, even if we place only a small credence on this interpretation being true. In this post, I shall outline some reasons why this may in fact not be the case, at least with respect to two commonly supposed implications: 1) extreme caution, and 2) exponentially greater value over time. However, questions concerning the ethical implications of our best physical theories and their interpretations remain open and worth exploring.

Would Branching Worlds Imply Extreme Caution?

“I still recall vividly the shock I experienced on first encountering this multiworld concept [MWI]. The idea of 10100 slightly imperfect copies of oneself all constantly splitting into further copies, which ultimately become unrecognizable, is not easy to reconcile with common sense.”

Bryce DeWitt

This is a common way to introduce the implications of MWI, and it seems plausible that this radically different conception of reality, if true, should lead us to change our actions in significant ways. In particular, it may seem intuitive that it should lead us to act more cautiously, as David Pearce argues:

So one should always act “unnaturally” responsibly, driving one’s car not just slowly and cautiously, for instance, but ultracautiously. This is because one should aim to minimise the number of branches in which one injures anyone, even if leaving a trail of mayhem is, strictly speaking, unavoidable. If a motorist doesn’t leave a (low-density) trail of mayhem, then quantum mechanics is false. This systematic re-evaluation of ethically acceptable risk needs to be adopted world-wide.

Yet, while intuitive, I would argue that this actually does not follow. For although it may be true that we should generally act much more cautiously than we do, this conclusion is not influenced by MWI, for various reasons.

First, if one is trying to reduce suffering, one should not “aim to minimise the number of branches in which one injures anyone”, but rather seek to reduce as much suffering as possible (in expectation) in the world. At an intuitive level, these may seem equivalent, yet they are not. The former is in fact impossible, as we are bound to injure others, even assuming the existence of just one world, whereas the latter — reducing the greatest amount of suffering possible throughout all branches — is possible by definition.

In particular, this argument for being highly cautious ignores the fact that such caution also carries risks — e.g. extreme caution might increase the probability that we will bring about more suffering by omission, by rendering our efforts to reduce suffering less effective. And these other risks may well be much larger, and thus result in the realization of a larger amount of suffering in a larger measure of branches. In other words, since it is far from clear that being ultracautious is the best way to reduce suffering in expectation throughout all branches, it is far from clear that we should practice such ultracaution in light of MWI.

Second, and quite relatedly, I would argue that, whether we live in many worlds or one, we should seek to minimize expected suffering regardless. For if we happened to exist in one world, a small probability of a very bad outcome would be equally worth avoiding, in expectation, as it would be if we happened to live in a quantum multiverse. Whether we do just one or an arbitrarily large number of “trials”, we should still pursue the same action: that which reduces the most suffering in expectation. 

Third, any argument of the kind made above concerning how all slightly probable outcomes will be realized can also be made by assuming that the multiverse of inflation exists. Thus, if one already believes that we live in a spatially infinite, or indeed “merely” extremely large universe, then the radical conclusions supposed to follow from MWI would already be implied by that belief alone (as we shall see below, many prominent proponents of MWI actually consider MWI not only equivalent but identical with the multiverse of inflation). And if one does not think a spatially very large universe should change how we act, then why think that a large, in many ways equivalent, quantum universe should? As argued above, it seems that no radical conclusions should follow either way. One world or many, we should still do what seems best in expectation.

Another way to arrive at the same conclusion is by embracing Stuart Armstrong’s Anthropic Decision Theory, according to which we, as altruists aiming to reduce suffering, should act the same way regardless of how many similar copies of us there may be in the world.

Would Branching Worlds Imply More Value Later?

Following Bryce DeWitt’s quote about rapidly splitting copies, one can reasonably wonder whether MWI implies that the net amount of value in the world, and hence the value of our actions’ impact on the world, is increasing exponentially over time. Indeed, if we naively interpret DeWitt’s claim to mean that the number of sentient beings that exists is multiplied by 10100 just about every second, this would imply that the value of the very last second of the existence of sentient life should massively dominate every thing else. If this interpretation of MWI is correct, it would have extremely significant ethical implications. Yet is it? It would seem not. Here is Max Tegmark:

Does the number of universes exponentially increase over time? The surprising answer is no. From the bird perspective, there is of course only one quantum universe. From the frog perspective, what matters is the number of universes that are distinguishable at a given instant—that is, the number of noticeably different Hubble volumes. Imagine moving planets to random new locations, imagine having married someone else, and so on. At the quantum level, there are 10 to the 10118 universes with temperatures below 108 kelvins. That is a vast number, but a finite one.

From the frog perspective, the evolution of the wave function corresponds to a never-ending sliding from one of these 10 to the 10118 states to another. Now you are in universe A, the one in which you are reading this sentence. Now you are in universe B, the one in which you are reading this other sentence. Put differently, universe B has an observer identical to one in universe A, except with an extra instant of memories.

Thus, it seems one should think about MWI in terms of an intertwining rope rather than a branching tree. A good way to gain intuition about it may be to think in terms of the multiverse of inflation instead. Indeed, according to prominent proponents of MWI, the many-worlds of quantum mechanics and the multiverse of inflation are not only closely related notions but indeed the same thing, cf. (Aguirre & Tegmark, 2010Nomura, 2011Bousso & Susskind, 2011). In that case, not only is thinking about copies of ourselves in worlds spatially far away from us a great way to gain intuition about MWI; it is the correct way to think about it.

And when we think about it in these terms, it suddenly all becomes quite straightforward and intuitive, at least relatively speaking. For on the inflationary model, there are copies of us in the universe located far away with whom we share our entire history from the big bang up until now. Yet as time progresses, and more different outcomes become possible, the distance to the copies of us that share our exact history becomes ever greater, at a rapid pace, cf. (Garriga & Vilenkin, 2001). Thus, there is indeed a rapid branching in a very real sense, only, this branching consists in departing from “nearby” copies of us who had been just like us up until this point. No new worlds are really added. The “other worlds” were always there, and then merely went their separate ways.

Hence, given the assumptions made here, the number of sentient beings in our world does not in fact increase exponentially in the way naively supposed above, unless one keeps on aggregating over an exponentially larger fraction of the space that already existed. (There is, however, an exponential increase in the number of new universes created by inflating regions of the universe, assuming inflationary theory is correct. Yet this process does not create an exponentially greater number of sentient beings from our point in space and time, i.e. Earth, 13.8 billion years after the big bang. Rather, these new worlds are all created “from scratch”.) In short, MWI does not appear to imply greater value later.


In sum, I have argued that we seem to have good reason to maintain something akin to one-world common sense in most of our decisions (decisions that might influence the creation of new universes would be an exception). This conclusion may, however, be strongly biased given that it comes from a brain that very much wants to preserve common sense.

Narrative Self-Deception: The Ultimate Elephant in the Brain?

the elephant in the brain, n. An important but un­ack­now­ledged fea­ture of how our minds work; an introspective taboo.”

The Elephant in the Brain is an informative and well-written book, co-authored by Kevin Simler and Robin Hanson. It explains why much of our behavior is driven by unflattering, hidden motives, as well as why our minds are built to be unaware of these motives. In short: because a mind that is ignorant about what drives it and how it works is often more capable of achieving the aims it was built to achieve.

Beyond that, the book also seeks to apply this knowledge to shed some light on many of our social institutions to show that they are often not mostly about what we think they are. Rather than being about high-minded ideals and other pretty things that we like to say they are about, our institutions often serve much less pretty, more status-driven purposes, such as showing off in various ways, as well as to help us better get by in a tough world (for instance, the authors argue that religion in large part serves to bind communities together, and in this way can help bring about better life outcomes for believers).

All in all, I think The Elephant in the Brain provides a strong case for supplementing one’s mental toolkit with a new, important tool, namely to continuously ask: how might my mind skillfully be avoiding confrontation with ugly truths about myself that I would prefer not to face? And how might such unflattering truths explain aspects of our public institutions and public life in general?

This is an important lesson, I think, and it makes the book more than worth reading. At the same time, I cannot help but feel that the book ultimately falls short when it comes to putting this tool to proper use. For the main critique that came to my mind while reading the book was that it seemed to ignore the biggest elephant in the brain by far — the elephant I suspect we would all prefer to ignore the most — and hence it failed, in my view, to take a truly deep and courageous look at the human condition. In fact, the book even seemed be a mouthpiece for this great elephant.

The great elephant I have in mind here is a tacitly embraced sentiment that goes something like: life is great, and we are accomplishing something worthwhile. As the authors write: “[…] life, for must of us, is pretty good.” (p. 11). And they end the book on a similar note:

In the end, our motives were less important than what we managed to achieve by them. We may be competitive social animals, self-interested and self-deceived, but we cooperated our way to the god-damned moon.

This seems to implicitly assume that what humans have managed to achieve, such as cooperating (i.e. two superpowers with nuclear weapons pointed at each other competing) their way to the moon, has been worthwhile all things considered. Might this, however, be a flippant elephant talking — rather than, say, a conclusion derived via a serious, scholarly analysis of our condition?

As a meta-observation, I would note that the fact that people often get offended and become defensive when one even just questions the value of our condition — and sometimes also accuse the one raising the question of having a mental illness — suggests that we may indeed be disturbing a great elephant here: something we would strongly prefer not to think too deeply about. (For the record, with respect to mental health, I think one can be among the happiest, most mentally healthy people on the planet and still think that a sober examination of the value of our condition yields a negative answer, although it may require some disciplined resistance against the pulls of a strong elephant.)

It is important to note here that one should not confuse the cynicism required for honest exploration of the human condition with misanthropy, as Simler and Hanson themselves are careful to point out:

The line between cynicism and misanthropy—between thinking ill of human motives and thinking ill of humans—is often blurry. So we want readers to understand that although we may often be skeptical of human motives, we love human beings. (Indeed, many of our best friends are human!) […] All in all, we doubt an honest exploration will detract much from our affection for [humans]. (p. 13)

Similarly, an honest and hard-nosed effort to assess the value of human life and the human endeavor need not lead us to have any less affection and compassion for humans. Indeed, it might lead us to have much more of both in many ways.

Is Life “Pretty Good”?

With respect to Simler’s and Hanson’s claim that “”[…] life, for must of us, is pretty good”, it can be disputed that this is indeed the case. According to the 2017 World Happiness Report, a significant plurality of people rated their life satisfaction at five on a scale from zero to ten, which arguably does not translate to being “pretty good”. Indeed, one can argue that the scale employed in this report is biased, in that it does not allow for a negative evaluation of life. And one may further argue that if this scale instead ranged from minus five to plus five (i.e. if one transposed this zero-to-ten scale so as to make it symmetrical around zero), it may be that a plurality would rate their lives at zero. That is, after all, where the plurality would lie if one were to make this transposition on the existing data measured along the zero-to-ten scale (although it seems likely that people would have rated their life satisfaction differently if the scale had been constructed in this symmetrical way).

But even if we were to concede that most people say that their lives are pretty good, one can still reasonably question whether most people’s lives indeed are pretty good, and not least reasonably question whether such reports imply that the human condition is worthwhile in a broader sense.

Narrative Self-Deception: Is Life As Good As We Think?

Just as it is possible for us to be wrong about our own motives, as Simler and Hanson convincingly argue, could it be that we can also be wrong about how good our lives are? And, furthermore, could it be that we not only can be wrong but that most of us in fact are wrong about it most of the time? This is indeed what some philosophers argue, seemingly supported by psychological evidence.

One philosopher who has argued along these lines is Thomas Metzinger. In his essay “Suffering“, Metzinger reports on a pilot study he conducted in which students were asked at random times via their cell phones whether they would relive the experience they had just before their phone vibrated. The results were that, on average, students reported that their experience was not worth reliving 72 percent of the time. Metzinger uses this data, which he admits does not count as significant, as a starting point for a discussion on how our grosser narrative about the quality of our lives might be out of touch with the reality of our felt, moment-to-moment experience:

If, on the finest introspective level of phenomenological granularity that is functionally available to it, a self-conscious system would discover too many negatively valenced moments, then this discovery might paralyse it and prevent it from procreating. If the human organism would not repeat most individual conscious moments if it had any choice, then the logic of psychological evolution mandates concealment of the fact from the self-modelling system caught on the hedonic treadmill. It would be an advantage if insights into the deep structure of its own mind – insights of the type just sketched – were not reflected in its conscious self-model too strongly, and if it suffered from a robust version of optimism bias. Perhaps it is exactly the main function of the human self-model’s higher levels to drive the organism continuously forward, to generate a functionally adequate form of self-deception glossing over everyday life’s ugly details by developing a grandiose and unrealistically optimistic inner story – a “narrative self-model” with which we can identify? (pp. 6-7)

Metzinger continues to conjecture that we might be subject to what he calls “narrative self-deception” — a self-distracting strategy that keeps us from getting a realistic view of the quality and prospects of our lives:

[…] a strategy of flexible, dynamic self­-representation across a hierarchy of timescales could have a causal effect in continuously remotivating the self-­conscious organism, systematically distracting it from the potential insight that the life of an anti-­entropic system is one big uphill battle, a strenuous affair with minimal prospect of enduring success. Let us call this speculative hypothesis “narrative self­-deception”. (p. 7)

If this holds true, such self-deception would seem to more than satisfy the definition of an elephant in the brain in Simler and Hanson’s sense: “an important but un­ack­now­ledged fea­ture of how our minds work; an introspective taboo.”

To paraphrase Metzinger: the mere fact that we find life to be “pretty good” when we evaluate it all from the vantage point of a single moment does not mean that we in fact find most of our experiences “pretty good”, or indeed even worth (re)living most of the time, moment-to-moment. Our single-moment evaluations of the quality of the whole thing may well tend to be gross, self-deceived overestimates.

Another philosopher who makes a similar case is David Benatar, who in his book Better Never to Have Been argues that we tend to overestimate the quality of our lives due to well-documented psychological biases:

The first, most general and most influential of these psychological phenomena is what some have called the Pollyanna Principle, a tendency towards optimism. This manifests in many ways. First, there is an inclination to recall positive rather than negative experiences. For example, when asked to recall events from throughout their lives, subjects in a number of studies listed a much greater number of positive than negative experiences. This selective recall distorts our judgement of how well our lives have gone so far. It is not only assessments of our past that are biased, but also our projections or expectations about the future. We tend to have an exaggerated view of how good things will be. The Pollyannaism typical of recall and projection is also characteristic of subjective judgements about current and overall well-being. Many studies have consistently shown that self-assessments of well-being are markedly skewed toward the positive end of the spectrum. […] Indeed, most people believe that they are better off than most others or than the average person. (pp. 64-66)

Is “Pretty Good” Good Enough?

Beyond doubting whether most people would indeed say that their lives are “pretty good”, and beyond doubting that a single moment’s assessment of one’s quality of life actually reflects this quality particularly well, one can also question whether a life that is rated as “pretty good”, even in the vast majority of moments, is indeed good enough.

This is, for example, not necessarily the case on the so-called tranquilist view of value, according to which our experiences are valuable to the extent they are absent of suffering, and hence that happiness and pleasure are valuable to the extent they chase suffering away.

Similar to Metzinger’s point about narrative self-deception, one can argue that, if the tranquilist view holds true about how we feel the value of our experience moment-to-moment (upon closer, introspective inspection), we should probably expect to be quite blind to this fact. And interesting to note in this context is it that many of the traditions which have placed the greatest emphasis on paying attention to the nature of subjective experience moment-to-moment, such as Buddhism, have converged toward a view very similar to tranquilism.

Can the Good Lives Outweigh the Bad?

One can also question the value of our condition on a more collective level, by focusing not only on a single (self-reportedly) “pretty good” life but on all individual lives. In particular, we can question whether the good lives of some, indeed even a large majority, can justify the miserable lives of others.

A story that gives many people pause on this question is Ursula K. Le Guin’s The Ones Who Walk Away from Omelas. The story is about a near-paradisiacal city in which everyone lives deeply meaningful and fulfilling lives — that is, everyone except a single child who is locked in a basement room, forced to live a life of squalor:

The child used to scream for help at night, and cry a good deal, but now it only makes a kind of whining, “eh-haa, eh-haa,” and it speaks less and less often. It is so thin there are no calves to its legs; its belly protrudes; it lives on a half-bowl of corn meal and grease a day. It is naked. Its buttocks and thighs are a mass of festered sores, as it sits in its own excrement continually.

The story’s premise is that this child must exist in this condition for the happy people of Omelas to enjoy their wonderful lives, which then raises the question of whether these wonderful lives can in any sense outweigh and justify the miserable life of this single child. Some citizens of Omelas seem to decide that this is not the case: the ones who walk away from Omelas. And many people in the real world seem to agree with this decision.

Sadly, our world is much worse than the city of Omelas on every measure. For example, in the World Happiness Report cited above, around 200 million people reported their quality of life to be in the absolute worst category. If the story of Omelas gives us pause, we should also think twice before claiming that the “pretty good” lives of some people can outweigh the self-reportedly very bad lives of these hundreds of millions of people, many of whom end up committing suicide (and again, it should be remembered that a great plurality of humanity rated their life satisfaction to be exactly in the middle of the scale, while a significant majority rated it in the middle or lower).

Rating of general life satisfaction aside, one can also reasonably question whether anything can outweigh the many instances of extreme suffering that occur every single day, something that can indeed befall anyone, regardless of one’s past self-reported life satisfaction.

Beyond that, one can also question whether the “pretty good” lives of some humans can in any sense outweigh and justify the enormous amount of suffering humanity imposes on non-human animals, including the torturous suffering we subject more than a trillion fish to each year, as well as the suffering we impose upon the tens of billions of chickens and turkeys who live out their lives under the horrific conditions of factory farming, many of whom end their lives by being boiled alive. Indeed, there is no justification for not taking humanity’s impact on non-human animals — the vast majority of sentient beings on the planet — into consideration as well when assessing the value of our condition.


My main purpose in this essay has not been to draw any conclusions about the value of our condition. Rather, my aim has merely been to argue that we likely have an enormous elephant in our brain that causes us to evaluate our lives, individually as well as collectively, in overoptimistic terms (though some of us perhaps do not), and to ignore the many considerations that might suggest a negative conclusion. An elephant that leads us to eagerly assume that “it’s all pretty good and worthwhile”, and to flinch away from serious, sober-minded engagement with questions concerning the value of our condition, including whether it would be better if there had been no sentient beings at all.

Why Altruists Should Perhaps Not Prioritize Artificial Intelligence: A Lengthy Critique

The following is a point-by-point critique of Lukas Gloor’s essay Altruists Should Prioritize Artificial Intelligence. My hope is that this critique will serve to make it clear — to Lukas, myself, and others — where and why I disagree with this line of argument, and thereby hopefully also bring some relevant considerations to the table with respect to what we should be working on to best reduce suffering. I should like to note, before I begin, that I have the deepest respect for Lukas, and that I consider his work very important and inspiring.

Below, I quote every paragraph from the body of Lukas’ article, which begins with the following abstract:

The large-scale adoption of today’s cutting-edge AI technologies across different industries would already prove transformative for human society. And AI research rapidly progresses further towards the goal of general intelligence. Once created, we can expect smarter-than-human artificial intelligence (AI) to not only be transformative for the world, but also (plausibly) to be better than humans at self-preservation and goal preservation. This makes it particularly attractive, from the perspective of those who care about improving the quality of the future, to focus on affecting the development goals of such AI systems, as well as to install potential safety precautions against likely failure modes. Some experts emphasize that steering the development of smarter-than-human AI into beneficial directions is important because it could make the difference between human extinction and a utopian future. But because we cannot confidently rule out the possibility that some AI scenarios will go badly and also result in large amounts of suffering, thinking about the impacts of AI is paramount for both suffering-focused altruists as well as those focused on actualizing the upsides of the very best futures.

An abstract of my thoughts on this argument:

My response to this argument is twofold: 1) I do not consider the main argument presented by Lukas, as I understand it, to be plausible, and 2) I think we should think hard about whether we have considered the opportunity cost carefully enough. We should not be particularly confident, I would argue, that any of us have found the best thing to focus on to reduce the most suffering.

I do not think the claim that “altruists can expect to have the largest positive impact by focusing on artificial intelligence” is warranted. In part, my divergence from Lukas rests on empirical disagreements, and in larger part it stems from what may be called “conceptual disagreements” — I think most talk about “superintelligence” is conceptually confused. For example, intelligence as “cognitive abilities” is liberally conflated with intelligence as “the ability to achieve goals in general”, and this confusion does a lot of deceptive work.

I would advocate for more foundational research into the question of what we ought to prioritize. Artificial intelligence undoubtedly poses many serious risks, yet it is important that we maintain a sense of proportion with respect to these risks relative to other serious risks, many of which we have not even contemplated yet.

I will now turn to the full argument presented by Lukas.

I. Introduction and definitions

Terms like “AI” or “intelligence” can have many different (and often vague) meanings. “Intelligence” as used here refers to the ability to achieve goals in a wide range of environments. This definition captures the essence of many common perspectives on intelligence (Legg & Hutter, 2005), and conveys the meaning that is most relevant to us, namely that agents with the highest comparative goal-achieving ability (all things considered) are the most likely to shape the future.

A crucial thing to flag is that “intelligence” here refers to the ability to achieve goals — not to scoring high on an IQ test, or “intelligence” as “advanced cognitive abilities”. And these are not the same, and should not be conflated (indeed, this is one of the central points of my book Reflections on Intelligence, which dispenses with the muddled term “intelligence” at an early point, and instead examines the nature of this better defined “ability to achieve goals” in greater depth).

While it is true that the concept of goal achieving is related to the concept of IQ, the latter is much narrower, as it relates to a specific class of goals. Boosting the IQ of everyone would not immediately boost our ability to achieve goals in every respect — at least not immediately, and not to the same extent across all domains. For even if we all woke up with an IQ of 200 tomorrow, all the external technology with which we run and grow our economy would still be the same. Our cars would drive just as fast, the energy available to us would be the same, and so would the energy efficiency of our machines. And while a higher IQ might now enable us to grow this external technology faster, there are quite restricting limits to how much it can grow. Most of our machines and energy harvesting technology cannot be made many times more efficient, as their efficiency is already a significant fraction — 15 to 40 percent — of the maximum physical limit. In other words, their efficiency cannot be doubled more than a couple of times, if even that.

One could then, of course, build more machines and power plants, yet such an effort would itself be constrained strongly by the state of our external technology, including the energy available to us; not just by the cognitive abilities available. This is one of the reasons I am skeptical of the idea of AI-powered runaway growth. Yes, greater cognitive abilities is a highly significant factor, yet there is just so much more to growing the economy and our ability to achieve a wide range of goals than that, as evidenced by the fact that we have seen a massive increase in computer-powered cognitive abilities — indeed, exponential growth for many decades by many measures — and yet we have continued to see fairly stable, in fact modestly declining, economic growth.

If one considers the concept of “increase in cognitive powers” to be the same as “increase in the ability to achieve goals, period” then this criticism will be missed. “I defined intelligence to be the ability to achieve goals, so when I say intelligence is increased, then all abilities are increased.” One can easily come to entertain a kind of motte and bailey argument in this way, by moving back and forth between this broad notion of intelligence as “the ability to achieve goals” and the more narrow sense of intelligence as “cognitive abilities”. To be sure, a statement like the one above need not be problematic as such, as long as one is clear that this concept of intelligence lies very far from “intelligence as measured by IQ/raw cognitive power”. Such clarity is often absent, however, and thus the statement is quite problematic in practice, with respect to the goals of communicating clearly and not confusing ourselves.

Again, my main point here is that increasing cognitive powers should not be conflated with increasing the ability to achieve goals in general — in every respect. I think much confusion springs from a lack of clarity on this matter.

While everyday use of the term “intelligence” often refers merely to something like “brainpower” or “thinking speed,” our usage also presupposes rationality, or goal-optimization in an agent’s thinking and acting. In this usage, if someone is e.g. displaying overconfidence or confirmation bias, they may not qualify as very intelligent overall, even if they score high on an IQ test. The same applies to someone who lacks willpower or self control.

This is an important step toward highlighting the distinction between “goal achieving ability” and “IQ”, yet it is still quite a small step, as it does not really go much beyond distinguishing “high IQ” from “optimal cognitive abilities for goal achievement”. We are still talking about things going on in a single human head (or computer), while leaving out the all-important aspect that is (external) culture and technology. We are still not talking about the ability to achieve goals in general.

Artificial intelligence refers to machines designed with the ability to pursue tasks or goals. The AI designs currently in use – ranging from trading algorithms in finance, to chess programs, to self-driving cars – are intelligent in a domain-specific sense only. Chess programs beat the best human players in chess, but they would fail terribly at operating a car. Similarly, car-driving software in many contexts already performs better than human drivers, but no amount of learning (at least not with present algorithms) would make [this] software work safely on an airplane.

My only comment here would be that it is not quite clear what counts as artificial intelligence. For example, would a human, edited as well as unedited, count as “a machine designed with the ability to pursue tasks or goals”? And could not all software be considered “designed with the ability to pursue tasks or goals”, and hence all software would be artificial intelligence by this definition? If so, we should then just be clear that this definition is quite broad, including both all humans and all software, and more.

The most ambitious AI researchers are working to build systems that exhibit (artificial) general intelligence (AGI) – the type of intelligence we defined above, which enables the expert pursuit of virtually any task or objective.

This is where the distinction we drew above becomes relevant. While the claim quoted above may be true in one sense, we should be clear that the most ambitious AI researchers are not working to increase “all our abilities”, including our ability to get more energy out of our steam engines and solar panels. Our economy arguably works on that broader endeavor. AI researchers, in contrast, work only on bettering what may be called “artificial cognitive abilities”, which, granted, may in turn help spur growth in many other areas (although the degree to which it would do so is quite unclear, and likely surprisingly limited in the big picture, since “growth may be constrained not by what we are good at but rather by what is essential and yet hard to improve”).

In the past few years, we have witnessed impressive progress in algorithms becoming more and more versatile. Google’s DeepMind team for example built an algorithm that learned to play 2-D Atari games on its own, achieving superhuman skill at several of them (Mnih et al., 2015). DeepMind then developed a program that beat the world champion in the game of Go (Silver et al., 2016), and – tackling more practical real-world applications – managed to cut down data center electricity costs by rearranging the cooling systems.

I think it is important not to overstate recent progress compared to progress in the past. We also saw computers becoming better than humans at many things several decades ago, including many kinds of mathematical calculations (and people also thought that computers would soon beat humans at everything back then). So superhuman skill at many tasks is not what is new and unique about recent progress, but rather that these superhuman skills have been attained via self-training, and, as Lukas notes, that the skills achieved by this training seem of a broader, more general nature than the skills of a single algorithm in the past.

And yet the breadth of these skills should not be overstated either, as the skills cited are all acquired in a rather expensive trial-and-error fashion with readily accessible feedback. This mode of learning surely holds a lot of promise in many areas, yet there are reasons to be skeptical that such learning can bring us significantly closer to achieving all the cognitive and motor abilities humans have (see also David Pearce’s “Humans and Intelligent Machines“; one need not agree with Pearce on everything to agree with some of his reasons to be skeptical).

That DeepMind’s AI technology makes quick progress in many domains, without requiring researchers to build new architecture from scratch each time, indicates that their machine learning algorithms have already reached an impressive level of general applicability. (Edit: I wrote the previous sentence in 2016. In the meantime [January 2018] DeepMind went on to refine its Go-playing AI, culminating in a version called AlphaGo Zero. While the initial version of DeepMind’s Go-playing AI started out with access to a large database of games played by human experts, AlphaGo Zero only learns through self-play. Nevertheless, it managed to become superhuman after a mere 4 days of practice. After 40 days of practice, it was able to beat its already superhuman predecessor 100–0. Moreover, Deepmind then created the version AlphaZero, which is not a “Go-specific” algorithm anymore. Fed with nothing but the rules for either Go, chess, or shogi, it managed to become superhuman at each of these games in less than 24 hours of practice.)

This is no doubt impressive. Yet it is also important not to overstate how much progress that was achieved in 24 hours of practice. This is not, we should be clear, a story about innovation going from zero to superhuman in 24 hours, but rather the story of immense amounts of hardware developed over decades which has then been fed with an algorithm that has also been developed over many years by many people. And then, this highly refined algorithm running on specialized, cutting-edge hardware is unleashed to reach its dormant potential.

And this potential was, it should be noted, not vastly superior to the abilities of previous systems. In chess, for instance, AlphaZero beat the chess program Stockfish (although Stockfish author Tord Romstad notes that it was a version that was a year old and not running on optimal hardware) 25 times as white, 3 as black, and drew the remaining 72 times. Thus, it was significantly better, yet it still did not win in most of the games. Similarly, in Go, AlphaZero won 60 games and lost 40, while in Shogi it won 90 times, lost 8, and drew twice.

Thus, AlphaZero undoubtedly comprised clear progress with respect to these games, yet not an enormous leap that rendered it unbeatable, and certainly not a leap made in a single day.

The road may still be long, but if this trend continues, developments in AI research will eventually lead to superhuman performance across all domains. As there is no reason to assume that humans have attained the maximal degree of intelligence (Section III), AI may soon after reaching our own level of intelligence surpass it.

Again, I would start by noting that human “intelligence” as our “ability to achieve goals” is strongly dependent on the state of our technology and culture at large, not merely our raw cognitive powers. And the claim made above that there is no reason to believe that humans have attained “the maximal degree of intelligence” seems, in this context, to mostly refer to our cognitive abilities rather than our ability to achieve goals in general. For with respect to our ability to achieve goals in general, it is clear that our abilities are not maximal, but indeed continually growing, largely as the result of better software and better machines. Thus, there is not a dichotomous relationship between “human abilities to achieve goals” and “our machines’ abilities to achieve goals”. And given that our ability to achieve goals is in many ways mostly limited by what our best technology can do — how fast our airplanes can fly, how fast our hardware is, how efficient our power plants are, etc. — it is not clear why some other agent or set of agents coming to control this technology (which is extremely difficult to imagine in the first place given the collaborative nature of the grosser infrastructure of this technology) should be vastly more capable of achieving goals than humans powered by/powering this technology.

As for AI surpassing “our own level of intelligence”, one can say that, at the level of cognitive tasks, machines have already been vastly superhuman in many respects for many years — in virtually all mathematical calculations, for instance. And now also in many games, ranging from Atari to Go. Yet, as noted above, I would argue that, so far, such progress has comprised a clear increase in human “intelligence” in the general sense: it has increased our ability to achieve goals.

Nick Bostrom (2014) popularized the term superintelligence to refer to (AGI-)systems that are vastly smarter than human experts in virtually all respects. This includes not only skills that computers traditionally excel at, such as calculus or chess, but also tasks like writing novels or talking people into doing things they otherwise would not. Whether AI systems would quickly develop superhuman skills across all possible domains, or whether we will already see major transformations with [superhuman skills in] just a [few] such domains while others lag behind, is an open question.

I would argue that our machines already have superhuman skills in countless domains, and that this has indeed already given rise to major transformations, in one sense of this term at least.

Note that the definitions of “AGI” and “superintelligence” leave open the question of whether these systems would exhibit something like consciousness.

I have argued to the contrary in the chapter “Consciousness — Orthogonal or Crucial?” in Reflections on Intelligence.

This article focuses on the prospect of creating smarter-than-human artificial intelligence. For simplicity, we will use the term “AI” in a non-standard way here, to refer specifically to artificial general intelligence (AGI).

Again, I would flag that the meaning of the term general intelligence, or AGI, in this context is not clear. It was defined above as the ability that “enables the expert pursuit of virtually any task or objective”. Yet the ability of humans to achieve goals in general is, I would still argue, in large part the product of their technology and culture at large, and AGI, as Lukas uses it here, does not seem to refer to anything remotely like this, i.e. “the sum of the capabilities of our technology and culture”. Instead, it seems to refer to something much more narrow and singular — something akin to “a system that possesses (virtually) all the cognitive abilities that a human does, and which possesses them at a similar or greater level”. I think this is worth highlighting.

The use of “AI” in this article will also leave open how such a system is implemented: While it seems plausible that the first artificial system exhibiting smarter-than-human intelligence will be run on some kind of “supercomputer,” our definition allows for alternative possibilities.

Again, what does “smarter-than-human intelligence” mean here? Machines can already do things that no unaided human can. It seems to refer to what I defined above: “a system that possesses (virtually) all the cognitive abilities that a human does, and which possesses them at a similar or greater level” — not the ability to achieve goals in general. And as for when a computer might have “(virtually) all the cognitive abilities that a human does”, it seems highly doubtful that any system will ever suddenly emerge with them all, given the modular, many-faceted nature of our minds. Instead, it seems much more likely that the gradual process of machines becoming better than humans at particular tasks will continue in its usual, gradual way. Or so I have argued.

The claim that altruists should focus on affecting AI outcomes is therefore intended to mean that we should focus on scenarios where the dominant force shaping the future is no longer (biological) human minds, but rather some outgrowth of information technology – perhaps acting in concert with biotechnology or other technologies. This would also e.g. allow for AI to be distributed over several interacting systems.

I think this can again come close to resembling a motte and bailey argument: it seems very plausible that the future will not be controlled mostly by what we would readily recognize as biological humans today. Yet to say that we should aim to impact such a future by no means implies that we should aim to impact, say, a small set of AI systems which might determine the entire future based on their goal functions (note: I am not saying Lukas has made this claim above, but this is often what people seem to consider the upshot of arguments of this kind, and also what it seems to me that Lukas is arguing below, in the rest of his essay). Indeed, the claim above is hardly much different from saying that we should aim to impact the long-term future. But Lukas seems to be moving back and forth between this general claim and the much narrower claim that we should focus on scenarios involving rapid growth acceleration driven mostly by software, which is the kind of scenario his essay seems almost exclusively focused on.

II. It is plausible that we create human-level AI this century

Even if we expect smarter-than-human artificial intelligence to be a century or more away, its development could already merit serious concern. As Sam Harris emphasized in his TED talk on risks and benefits of AI, we do not know how long it will take to figure out how to program ethical goals into an AI, solve other technical challenges in the space of AI safety, or establish an environment with reduced dangers of arms races. When the stakes are high enough, it pays to start preparing as soon as possible. The sooner we prepare, the better our chances of safely managing the upcoming transition.

I agree that it is worth preparing for high-stakes outcomes. But I think it is crucial that we get a clear sense of what these might look like, as well as how likely they are. “Altruists Should Prioritize Exploring Long-Term Future Outcomes, and Work out How to Best Influence Them”. To say that we should focus on “artificial intelligence”, which has a rather narrow meaning in most contexts (something akin to a software program), when we really mean that we should focus on the future of goal achieving systems in general is, I think, somewhat misleading.

The need for preparation is all the more urgent given that considerably shorter timelines are not out of the question, especially in light of recent developments. While timeline predictions by different AI experts span a wide range, many of those experts think it likely that human-level AI will be created this century (conditional on civilization facing no major disruptions in the meantime). Some even think it may emerge in the first half of this century: In a survey where the hundred most-cited AI researchers were asked in what year they think human-level AI is 10% likely to have arrived by, the median reply was 2024 and the mean was 2034. In response to the same question for a 50% probability of arrival, the median reply was 2050 with a mean of 2072 (Müller & Bostrom, 2016).1

Again, it is important to be careful about definitions. For what is meant by “human-level AI” in this context? The authors of the cited source are careful to define what they mean: “Define a ‘high–level machine intelligence’ (HLMI) as one that can carry out most human professions at least as well as a typical human.”

And yet even this definition is quite vague, since “most human professions” is not a constant. A couple of hundred years ago, the profession of virtually all humans was farming, whereas only a couple percent of people in developed nations are employed in farming today. And this is not an idle point, because as machines become able to do jobs hitherto performed by humans, market forces will push humans to take new jobs that machines cannot do. And these new jobs may be those that require abilities that it will take many centuries for machines to acquire, if non-biological machines will indeed ever acquire them (this is not necessarily that implausible, as these abilities may include “looking like a real, empathetic biological human who ignites our brain circuits in the right ways”).

Thus, the questionnaire above seems poorly defined. And if it asks about most current human professions, its relevance appears quite limited; also because the nature of different professions change over time as well. A doctor today does not do all the same things a doctor did a hundred years ago, and the same will likely apply to doctors of the future. In other words, also within existing professions can we expect to see humans move toward doing the things that machines cannot do/we do not prefer them to do, even as machines become ever more capable.

While it could be argued that these AI experts are biased towards short timelines, their estimates should make us realize that human-level AI this century is a real possibility.

Yet we should keep in mind what they were asked about, and how relevant this is. Even if most (current?) human professions might be done by machines within this century, this does not imply that we will see “a system that possesses (virtually) all the cognitive abilities that a human does, and which possesses them at a similar or greater level” within this century. These are quite different claims.

The next section will argue that the subsequent transition from human-level AI to superintelligence could happen very rapidly after human-level AI actualizes. We are dealing with the decent possibility – e.g. above 15% likelihood even under highly conservative assumptions – that human intelligence will be surpassed by machine intelligence later this century, perhaps even in the next couple of decades. As such a transition will bring about huge opportunities as well as huge risks, it would be irresponsible not to prepare for it.

I want to flag, again, that it is not clear what “human-level AI” means. Lukas seemed to first define intelligence as something like “the ability to achieve goals in general”, which I have argued is not really what he means here (indeed, it is a rather different beast which I seek to examine in Reflections on Intelligence). And the two senses of the term “human-level intelligence” mentioned in the previous paragraph — “the ability to do most human professions” versus “possessing virtually all human cognitive abilities” — should not be conflated either. So it is in fact not clear what is being referred to here, although I believe it is the latter: “possessing virtually all human cognitive abilities at a similar or greater level”.

It should be noted that a potentially short timeline does not imply that the road to superintelligence is necessarily one of smooth progress: Metrics like Moore’s law are not guaranteed to continue indefinitely, and the rate of breakthrough publications in AI research may not increase (or even stay constant) either. The recent progress in machine learning is impressive and suggests that fairly short timelines of a decade or two are not to be ruled out. However, this progress could also be mostly due to some important but limited insights that enable companies like DeepMind to reap the low-hanging fruit before progress would slow down again. There are large gaps still to be filled before AIs reach human-level intelligence, and it is difficult to estimate how long it will take researchers to bridge these gaps. Current hype about AI may lead to disappointment in the medium term, which could bring about an “AI safety winter” with people mistakenly concluding that the safety concerns were exaggerated and smarter-than-human AI is not something we should worry about yet.

This seems true, yet it should also be conceded that a consistent lack of progress in AI would count as at least weak evidence against the claim that we should mainly prioritize what is usually referred to as “AI safety“. And more generally, we should be careful not to make the hypothesis “AI safety is the most important thing we could be working on” into an unfalsifiable one.

As for Moore’s law, not only is it “not guaranteed to continue indefinitely”, but we know, for theoretical reasons, that it must come to an end within a decade, at least in its original formulation concerning silicon transistors, and progress has indeed already been below the prediction of “the law” for some time now. And the same can be said about other aspects in hardware progress: it shows signs of waning off.

If AI progress were to slow down for a long time and then unexpectedly speed up again, a transition to superintelligence could happen with little warning (Shulman & Sandberg, 2010). This scenario is plausible because gains in software efficiency make a larger comparative difference to an AI’s overall capabilities when the hardware available is more powerful. And once an AI develops the intelligence of its human creators, it could start taking part in its own self-improvement (see section IV).

I am not sure I understand the claims being made here. With respect to the first argument about gains in efficiency, the question is how likely we should expect such gains to be if progress has been slow for long. Other things being equal, this would seem less likely in a time where growth is slow than in a time when it is fast, and especially if there is not much growth in hardware either, since hardware growth may in large part be driving growth in software.

I am not sure I follow the claim about AI developing the intelligence of its human creators, and then taking part in its own improvement, but I would just note, as Ramez Naam has argued, that AI, and our machines in general, are already playing a significant role in their own improvement in many ways. In other words, we already actively use our best, most capable technology to build the next generation of such technology.

Indeed, on a more general, yet also less directly relevant note, I would also add that we humans have in some sense been using our most advanced cognitive tools to build the next generation of such tools for hundreds of thousands of years. For over the course of evolution, individual humans have been using the best of their cognitive abilities to select the mates who had the best total package (they could get), of which cognitive abilities were a significant part. In this sense, the idea that “dumb and blind” evolution created intelligent humans is actually quite wrong. The real story is rather one of cognitive abilities actively selecting cognitive abilities (along with other things). A gradual design process over the course of which ever greater cognitive powers were “creating” and in turn created.

For AI progress to stagnate for a long period of time before reaching human-level intelligence, biological brains would have to have surprisingly efficient architectures that AI cannot achieve despite further hardware progress and years of humans conducting more AI research.

Looking over the past decades of AI research and progress, we can say that it indeed has been a fairly long period of time since computers first surpassed humans in the ability to do mathematical calculations, and yet there are still many things humans can do which computers cannot, such as having meaningful conversations with other humans, learning fast from a few examples, and experiencing and expressing feelings. And yet these examples still mostly pertain to cognitive abilities, and hence still overlook other abilities that are also relevant with respect to machines taking over human jobs (if we focus on that definition of “human-level AI”), such as having the physical appearance of a real, biological human, which does seem in strong demand in many professions, especially in the service industry.

However, as long as hardware progress does not come to a complete halt, AGI research will eventually not have to surpass the human brain’s architecture or efficiency anymore. Instead, it could become possible to just copy it: The “foolproof” way to build human-level intelligence would be to develop whole brain emulation (WBE) (Sandberg & Bostrom, 2008), the exact copying of the brain’s pattern of computation (input-output behavior as well as isomorphic internal states at any point in the computation) onto a computer and a suitable virtual environment. In addition to sufficiently powerful hardware, WBE would require scanning technology with fine enough resolution to capture all the relevant cognitive function, as well as a sophisticated understanding of neuroscience to correctly draw the right abstractions. Even though our available estimates are crude, it is possible that all these conditions will be fulfilled well before the end of this century (Sandberg, 2014).

Yet it should be noted that there are many who doubt that this is a foolproof way to build “human-level intelligence” (a term that in this context again seems to mean “a system with roughly the same cognitive abilities as the human brain”). Many doubt that it is even a possibility, and they do so for many different reasons (e.g. that a single, high-resolution scanning of the brain is not enough to capture and enable an emulation of its dynamic workings; that a digital computer cannot adequately simulate the physical complexity of the brain, and that such a computer cannot solve the so-called binding problem.)

Thus, it seems to stand as an open question whether mind uploading is indeed possible, let alone feasible (and it also seems that many people in the broader transhumanist community, who tend to be the people who write and talk the most about mind uploading, could well be biased toward believing it possible, as many of them seem to hope that it can save them from death).

The perhaps most intriguing aspect of WBE technology is that once the first emulation exists and can complete tasks on a computer like a human researcher can, it would then be very easy to make more such emulations by copying the original. Moreover, with powerful enough hardware, it would also become possible to run emulations at higher speeds, or to reset them back to a well-rested state after they performed exhausting work (Hanson, 2016).

Assuming, of course, that WBE will indeed be feasible in the first place. Also, it is worth noting that Robin Hanson himself is critical of the idea that WBEs would be able to create software that is superior to themselves very quickly; i.e. he expects a WBE economy to undergo “many doublings” before it happens.

Sped-up WBE workers could be given the task of improving computer hardware (or AI technology itself), which would trigger a wave of steeply exponential progress in the development of superintelligence.

This is an exceptionally strong claim that would seem in need of justification, and not least some specification, given that it is not clear what “steeply exponential progress in the development of superintelligence” refers to in this context. It hardly means “steeply exponential progress in the development of a super ability to achieve goals in general”, including in energy efficiency and energy harvesting. Such exponential progress is not, I submit, likely to follow from progress in computer hardware or AI technology alone. Indeed, as we saw above, such progress cannot happen with respect to the energy efficiency of most of our machines, as physical limits mean that it cannot double more than a couple of times.

But even if we understand it to be a claim about the abilities of certain particular machines and their cognitive abilities more narrowly, the claim is still a dubious one. It seems to assume that progress in computer hardware and AI technology is constrained chiefly by the amount of hours put into it by those who work on it directly, as opposed to also being significantly constrained by countless other factors, such as developments in other areas, e.g. in physics, production, and transportation, many of which imply limits on development imposed by factors such as hardware and money, not just the amount of human-like genius available.

For example, how much faster should we expect the hardware that AlphaZero was running on to have been developed and completed if a team of super-WBEs had been working on it? Would the materials used for the hardware have been dug up and transported significantly faster? Would they have been assembled significantly faster? Perhaps somewhat, yet hardly anywhere close to twice as fast. The growth story underlying many worries about explosive AI growth is quite detached from how we actually improve our machines, including AI (software and hardware) as well as the harvesting of the energy that powers it (Vaclav Smil: “Energy transitions are inherently gradual processes and this reality should be kept in mind when judging the recent spate of claims about the coming rapid innovative take-overs […]”). Such growth is the result of countless processes distributed across our entire economy. Just as nobody knows how to make a pencil, nobody, including the very best programmers, knows (more than a tiny part of) how to make better machines.

To get a sense of the potential of this technology, imagine WBEs of the smartest and most productive AI scientists, copied a hundred times to tackle AI research itself as a well-coordinated research team, sped up so they can do years of research in mere weeks or even days, and reset periodically to skip sleep (or other distracting activities) in cases where memory-formation is not needed. The scenario just described requires no further technologies beyond WBE and sufficiently powerful hardware. If the gap from current AI algorithms to smarter-than-human AI is too hard to bridge directly, it may eventually be bridged (potentially very quickly) after WBE technology drastically accelerates further AI research.

As far as I understand, much of the progress in machine learning in modern times was essentially due to modern hardware and computing power that made it possible to implement old ideas invented decades ago (of course then implemented with all the many adjustments and tinkering one cannot foresee from the drawing board). In other words, software progress was hardly the most limiting factor. Arguably, the limiting factor was rather that the economy just had not caught up to be able to make hardware advanced enough to implement these theoretical ideas successfully. And it also seems to me quite naive to think that better hardware design, and genius ideas about how to make hardware more generally, was and is a main limiting factor in our growth of computer hardware. Such progress tends to rest critically on other progress in other kinds of hardware and globally distributed production processes. Processes that no doubt can be sped up, yet hardly that significantly by advanced software alone, in large part because such progress is limited by the fact that many of the crucial processes involved in this progress, such as digging up, refining, and transporting materials, are physical processes that can only go so fast.

Beyond that, there is also an opportunity cost consideration that is ignored by the story of fast growth above. For the hardware and energy required for this team of WBEs could otherwise have been used to run other kinds of computations that could help further innovation, including those we already run on full steam to further progress — CAD programs, simulations, equation solving. And it is not clear that using all this hardware for WBEs would be a better use of hardware than would running these other programs, whose work may be considered a limiting factor to AI progress at a similar level as more “purely” human or human-like work is. Indeed, we should not expect engineers and companies to do these kinds of things with their computing resources if they were not among the most efficient things they could do with them. And even if WBEs are a better use of hardware for fast progress, it is far from clear that it would be that much better.

The potential for WBE to come before de novo AI means that – even if the gap between current AI designs and the human brain is larger than we thought – we should not significantly discount the probability of human-level AI being created eventually. And perhaps paradoxically, we should expect such a late transition to happen abruptly. Barring no upcoming societal collapse, believing that superintelligence is highly unlikely to ever happen requires not only confidence that software or “architectural” improvements to AI are insufficient to ever bridge the gap, but also that – in spite of continued hardware progress – WBE could not get off the ground either. We do not seem to have sufficient reason for great confidence in either of these propositions, let alone both.

Again, what does the term “superintelligence” refer to here? Above, it was defined as “(AGI-)systems that are vastly smarter than human experts in virtually all respects”. And given that AGI is defined as a general ability to pursue goals, and that “smart” here presumably means “better able to achieve goals”, one can say that the definition of superintelligence given here translates to “a system that pursues goals better than human experts in virtually all areas”. Yet we are already building systems that satisfy this definition of superintelligence. Our entire economy is already able to do tasks that no single human expert could ever accomplish. But superintelligence likely refers to something else here, something along the lines of: “a system that is vastly more cognitively capable than any human expert in virtually all respects”. And yet, even by this definition, we already have computer systems that can do countless cognitive tasks much better than any human, and the super system that is the union of all these systems can therefore, in many respects at least, be considered to have vastly superior cognitive abilities relative to humans. And systems composed of humans and technology are clearly vastly more capable than any human expert alone in virtually all respects.

In this sense, we clearly do have “superintelligence” already, and we are continually expanding its capabilities. And, with respect to worries above a FOOM takeover, it seems highly unlikely that a single, powerful machine could ever overtake and become more powerful than the entire collective that is the human-machine civilization, which is not to say that low-risk events should be dismissed. But they should be measured against other risks we could be focusing on.

III. Humans are not at peak intelligence

Again, it is important to be clear about what we mean by “intelligence”. Most cognitively advanced? Or best able to achieve goals in general? Humans extended by technology can clearly increase their intelligence, i.e. ability to achieve goals, significantly. We have done so consistently over the last few centuries, and we continue to do so today. And in a world where humans build this growing body of technology to serve their own ends, and in some cases build it to be provably secure, it is far from clear that some non-human system with much greater cognitive powers than humans (which, again, already exists in many domains) will also become more capable of achieving goals in general than humanity, given that it is surrounded by a capable super-system of technology designed for and by humans, controlled by humans, to serve their ends. Again, this is not to say that one should not worry about seemingly improbable risks — we definitely should — but merely that we should doubt the assumption that our making machines more cognitively capable will necessarily imply that they will be better able to achieve goals in general. Again, despite being related, these two senses of “intelligence” must not be confused.

It is difficult to intuitively comprehend the idea that machines – or any physical system for that matter – could become substantially more intelligent than the most intelligent humans. Because the intelligence gap between humans and other animals appears very large to us, we may be tempted to think of intelligence as an “on-or-off concept,” one that humans have and other animals do not. People may believe that computers can be better than humans at certain tasks, but only at tasks that do not require “real” intelligence. This view would suggest that if machines ever became “intelligent” across the board, their capabilities would have to be no greater than those of an intelligent human relying on the aid of (computer-)tools.

Again, we should be clear that the word “intelligence” here seems to mean “most cognitively capable” rather than “best able to achieve goals in general”. And the gap between the “intelligence”, as in the ability to achieve goals, of humans and other animals does arguably not appear very large when we compare individuals. Most other animals can do things that no single human can do, and to the extent we humans can learn to do things other animals naturally beat us at, e.g. lift heavier objects or traverse distances faster than speedy animals, we do so by virtue of technology, in essence the product of collective, cultural evolution.

And even with respect to cognitive abilities, one can argue that humans are not superior to other animals in a general sense. We do not have superior cognitive abilities with respect to echo location, for example, much less long-distance navigation. Nor are humans superior when it comes to all aspects of short-term/working memory

Measuring goal achieving ability in general, as well as abilities to solve cognitive tasks in particular, along a single axis may be useful in some contexts, yet it can easily become meaningless when the systems being compared are not sufficiently similar. 

But this view is mistaken. There is no threshold for “absolute intelligence.” Nonhuman animals such as primates or rodents differ in cognitive abilities a great deal, not just because of domain-specific adaptations, but also due to a correlational “g factor” responsible for a large part of the variation across several cognitive domains (Burkart et al., 2016). In this context, the distinction between domain-specific and general intelligence is fuzzy: In many ways, human cognition is still fairly domain-specific. Our cognitive modules were optimized specifically for reproductive success in the simpler, more predictable environment of our ancestors. We may be great at interpreting which politician has the more confident or authoritative body language, but deficient in evaluating whose policy positions will lead to better developments according to metrics we care about. Our intelligence is good enough or “general enough” that we manage to accomplish impressive feats even in an environment quite unlike the one our ancestors evolved in, but there are many areas where our cognition is slower or more prone to bias than it could be.

I agree with this. I would just note that “intelligence” here again seems to be referring to cognitive abilities, not the ability to achieve goals in general, and that we humans have expanded both over time via culture: our cognitive abilities, as measured by IQ, have increased significantly over the last century, while our ability to achieve goals in general has expanded much more still as we have developed ever more advanced technology.

Intelligence is best thought of in terms of a gradient. Imagine a hypothetical “intelligence scale” (inspired by part 2.1 of this FAQ) with rats at 100, chimpanzees at, say, 350, the village idiot at 400, average humans at 500 and Einstein at 750.2 Of course, this scale is open at the top and could go much higher.

Again, intelligence here seems to refer to cognitive abilities, not the ability to achieve goals in general. Einstein was likely not better at shooting hoops than the average human, or indeed more athletic in general (by all appearances), although he was much more cognitively capable, at least in some respects, than virtually all other humans.

A more elaborate critique of the intelligence scale mentioned above can be found in my post Chimps, Humans, and AI: A Deceptive Analogy.

To quote Bostrom (2014, p. 44): “Far from being the smartest possible biological species, we are probably better thought of as the stupidest possible biological species capable of starting a technological civilization – a niche we filled because we got there first, not because we are in any sense optimally adapted to it.”

Again, the words “smart” and “stupid” here seem to pertain to cognitive abilities, not the ability to achieve goals in general. And this phrasing is misleading, as it seems to presume that cognitive ability is all it takes to build an advanced civilization, which is not the case. In fact, humans are not the species with the biggest brain on the planet, or even the species with the biggest cerebral cortex; indeed, long-finned pilot whales have more than twice as many neocortical neurons.

What we are, however, is a species with a lot of unique tools — fine motor hands, upright walk, vocal cords, a large brain with a large prefrontal cortex, etc. — which together enabled humans to (gradually build a lot of tools with which they could) take over the world. Remove just one of these unique tools from all of humanity, and we would be almost completely incapable. And this story of a multiplicity of components that are all necessary yet insufficient for the maintenance and growth of human civilization is even more true today, where we have countless external tools — trucks, the internet, computers, screwdrivers, etc. — without which we could not maintain our civilization. And the necessity of all these many different components seems overlooked by the story that views advanced cognitive abilities as the sole driver, or near enough, of growth and progress in the ability to achieve goals in general. This, I would argue, is a mistake.

Thinking about intelligence as a gradient rather than an “on-or-off” concept prompts a Copernican shift of perspective. Suddenly it becomes obvious that humans cannot be at the peak of possible intelligence. On the contrary, we should expect AI to be able to surpass us in intelligence just like we surpass chimpanzees.

Depending on what we mean by the word “intelligence”, one can argue that computers have already surpassed humans. If we define “intelligence” to be “that which is measured by an IQ test”, for example, then computers have already been better than humans in at least some of these tests for a few years now.

In terms of our general ability to achieve goals, however, it is not clear that computers will so readily surpass humans, in large part because we do not aim to build them to be better than humans in many respects. Take self-repair, for example, which is something human bodies, just like virtually all animal bodies, are in a sense designed to do — indeed, most of our self-repair mechanisms are much older than we are as a species. Evolution has built humans to be competent and robust autonomous systems who do not for the most part depend on a global infrastructure to repair their internal parts. Our computers, in contrast, are generally not built to be self-repairing, at least not at the level of hardware. Their notional thrombocytes are entirely external to themselves, in the form of a thousand and one specialized tools and humans distributed across the entire economy. And there is little reason to think that this will change, as there is little incentive to create self-repairing computers. We are not aiming to build generally able, human-independent computers in this sense.

Biological evolution supports the view that AI could reach levels of intelligence vastly beyond ours. Evolutionary history arguably exhibits a weak trend of lineages becoming more intelligent over time, but evolution did not optimize for intelligence (only for goal-directed behavior in specific niches or environment types). Intelligence is metabolically costly, and without strong selection pressures for cognitive abilities specifically, natural selection will favor other traits. The development of new traits always entails tradeoffs or physical limitations: If our ancestors had evolved to have larger heads at birth, maternal childbirth mortality would likely have become too high to outweigh the gains of increased intelligence (Wittman & Wall, 2007). Because evolutionary change happens step-by-step as random mutations change the pre-existing architecture, the changes are path dependent and can only result in local optima, not global ones.

Here we see how the distinction between “intelligence as cognitive abilities” and “intelligence as the ability to achieve goals” is crucial. Indeed, the example provided above clearly proves the point that advanced cognitive abilities are often not the most relevant thing for achieving goals, since the goal of surviving and reproducing was often not best achieved, as Lukas hints, with the best cognitive abilities. Often it was better achieved with longer teeth or stronger muscles. Or a prettier face.

So the question is: why do we think that advanced cognitive abilities are, to a first approximation, identical with the ability to achieve goals? And, more importantly, why do we imagine that this lesson about the sub-optimality of spending one’s limited resources on better cognitive abilities does not still hold today? Why should cognitive abilities be the sole optimal thing, or near enough, to spend all one’s resources on in order to best achieve a broad range of goals? I would argue that it is not. It was not optimal in the past (with respect to the goal of survival), and it does not seem to be optimal today either.

It would be a remarkable coincidence if evolution had just so happened to stumble upon the most efficient way to assemble matter into an intelligent system.

But it would be less remarkable if it had happened to assemble matter into a system that is broadly capable of achieving a broad range of goals, and which another system, especially one that is not built over a billion year process to be robust and highly autonomous, cannot readily outdo in terms of autonomous function. It would also not be that remarkable if biological humans, functioning within a system built by and for biological humans, happened to be among the most capable systems within such a system, not least given all the legal, social and political aspects this system entails.

Beyond that, one can dispute the meaning of “intelligent system” in the quote above, but if we look at the intelligent system that is our civilization at large, one can say that the optimization going on at this level is not coincidental but indeed deliberate, often aiming toward peak efficiency. Thus, in this regard as well, we should not be too surprised if our current system is quite efficient and competent relative to the many constraints we are facing.

But let us imagine that we could go back to the “drawing board” and optimize for a system’s intelligence without any developmental limitations. This process would provide the following benefits for AI over the human brain (Bostrom, 2014, p. 60-61):

Free choice of substrate: Signal transmission with computer hardware is millions of times faster than in biological brains. AI is not restricted to organic brains, and can be built on the substrate that is overall best suited for the design of intelligent systems.

Supersizing:” Machines have (almost) no size-restrictions. While humans with elephant-sized brains would run into developmental impossibilities, (super)computers already reach the size of warehouses and could in theory be built even bigger.

No cognitive biases: We should be able to construct AI in a way that uses more flexible heuristics, and always the best heuristics for a given context, to prevent the encoding or emergence of substantial biases. Imagine the benefits if humans did not suffer from confirmation biasoverconfidencestatus quo biasetc.!

Modular superpowers: Humans are particularly good at tasks for which we have specialized modules. For instance, we excel at recognizing human faces because our brains have hard-wired structures that facilitate that facial recognition in particular. An artificial intelligence could have many more such specialized modules, including extremely useful ones like a module for programming.

Editability and copying: Software on a computer can be copied and edited, which facilitates trying out different variations to see what works best (and then copying it hundreds of times). By contrast, the brain is a lot messier, which makes it harder to study or improve. We also lack correct introspective access to the way we make most of our decisions, which is an important advantage that (some) AI designs could have.

Superior architecture: Starting anew, we should expect it to be possible to come up with radically more powerful designs than the patchwork architecture that natural selection used to construct the human brain. This difference could be enormously significant.

It should be noted that computers already 1) can be built with a wide variety of substrates, 2) can be supersized, 3) do not tend to display cognitive biases, 4) have modular superpowers, 5) can be edited and copied (or at least software readily can), 6) can be made with any architecture we can come up with. All of these advantages exist and are being exploited already, just not as much as they can be. And it is not clear why we should expect future change to be more radical than the change we have seen in past decades in which we have continually built ever more competent computers which can do things that no human can by exploiting these advantages.

With regard to the last point, imagine we tried to optimize for something like speed or sight rather than intelligence. Even if humans had never built anything faster than the fastest animal, we should assume that technological progress – unless it is halted – would eventually surpass nature in these respects. After all, natural selection does not optimize directly for speed or sight (but rather for gene copying success), making it a slower optimization process than those driven by humans for this specific purpose. Modern rockets already fly at speeds of up to 36,373 mph, which beats the peregrine falcon’s 240 mph by a huge margin. Similarly, eagle vision may be powerful, but it cannot compete with the Hubble space telescope. (General) intelligence is harder to replicate technologically, but natural selection did not optimize for intelligence either, and there do not seem to be strong reasons to believe that intelligence as a trait should differ categorically from examples like speed or sight, i.e., there are as far as we know no hard physical limits that would put human intelligence at the peak of what is possible.3

Again, what is being referred to by the word “intelligence” here seems to be cognitive abilities, not the ability to achieve goals in general. And with respect to cognitive abilities in particular, it is clear that computers already beat humans by a long shot in countless respects. So the point Lukas is making here is clearly true.

Another way to develop an intuition for the idea that there is significant room for improvement above human intelligence is to study variation in humans. An often-discussed example in this context is the intellect of John von Neumann. Von Neumann was not some kind of an alien, nor did he have a brain twice as large as the human average. And yet, von Neumann’s accomplishments almost seem “superhuman.” The section in his Wikipedia entry that talks about him having “founded the field of Game theory as a mathematical discipline” – an accomplishment so substantial that for most other intellectual figures it would make up most of their Wikipedia page – is just one out of many of von Neumann’s major achievements.

There are already individual humans (with normal-sized brains) whose intelligence vastly exceeds that of the typical human. So just how much room there is above their intelligence? To visualize this, consider for instance what could be done with an AI architecture more powerful than the human brain running on a warehouse-sized supercomputer.

A counterpoint to this line of reasoning can be found by contemplating chess ratings. Ratings of the skills of chess players are usually done via the so-called Elo rating system, which measures the relative skills of different players against each other. A beginner will usually have a rating around 800, whereas a rating in the range 2000-2199 ranks one as a chess “Expert”, and a ranking of 2400 and above renders one a “Senior Master”. The highest rating ever achieved was 2882 by Magnus Carlsen. Surely, this amount of variation must be puny given that all the humans who have ever played chess have roughly the same brain sizes and structures. And yet it turns out that human variation in chess ability is in fact quite enormous in an absolute sense.

For example, it took more than four decades from computers were able to beat a chess beginner (the 1950s), until they were able to beat the very best human player (1997 officially). Thus, the span from ordinary human beginner to the best human expert was more than four decades of progress in hardware — i.e. a million times more computing power — and software. That seems quite a wide range.

And yet the range seems even broader if we consider the ultimate limits of optimal chess play. For one may argue that the fact that it took computers a fairly long time to go from the average human level to the level of the best human does not mean that the best human is not still ridiculously far from the best a computer could be in theory. Surprisingly, however, this latter distance does in fact seem quite small, at least in one sense. For estimates suggest that the best possible chess machine would have an Elo rating around 3600, which means that the relative distance between the best possible computer and the best human is only around 700 Elo points, implying that the distance between the best human and a chess “Expert” is similar to the distance between the best human and the best possible chess brain, while the distance between an ordinary human beginner and the best human is far greater.

It seems plausible that a similar pattern obtains with respect to many other complex cognitive tasks. Indeed, it seems plausible that many of our abilities, especially those we evolved to do well, such as our ability to interact with other humans, have an “Elo rating” quite close to the notional maximum level for most humans.

IV. The transition from human to superhuman intelligence could be rapid

Perhaps the people who think it is unlikely that superintelligent AI will ever be created are not objecting to it being possible in principle. Maybe they think it is simply too difficult to bridge the gap from human-level intelligence to something much greater. After all, evolution took a long time to produce a species as intelligent as humans, and for all we know, there could be planets with biological life where intelligent civilizations never evolved.4 But considering that there could come a point where AI algorithms start taking part in their own self-improvement, we should be more optimistic.

We should again be clear that the term “superintelligent AI” seems to refer to a system with greater cognitive abilities, across a wide range of tasks, than humans. As for “a point where AI algorithms start taking part in their own self-improvement”, it should be noted, again, that we already use our best software and hardware in the process of developing better software and hardware. True, they are only a part of a process that involves far more elements, yet this is true of most everything that we produce and improve in our economy: many contributions drawn from and distributed across our economy at large are required. And we have good reason to believe that this will continue to be true of the construction of more capable machines in the future.

AIs contributing to AI research will make it easier to bridge the gap, and could perhaps even lead to an acceleration of AI progress to the point that AI not only ends up smarter than us, but vastly smarter after only a short amount of time.

Again, we already use our best software and hardware to contribute to AI research, and yet we do not appear to see acceleration in the growth of our best supercomputers. In fact, in terms of their computing power, we see a modest decline.

Several points in the list of AI advantages above – in particular the advantages derived from the editability of computer software or the possibility for modular superpowers to have crucial skills such as programming – suggest that AI architectures might both be easier to further improve than human brains, and that AIs themselves might at some point become better at actively developing their own improvements.

Again, computers are already “easier to further improve than human brains” in these ways, and our hardware and software are already among the most active parts in their own improvement. So why should we expect to see a different pattern in the future from the pattern we see today of gradual, slightly declining growth?

If we ever build a machine with human-level intelligence, it should then be comparatively easy to speed it up or make tweaks to its algorithm and internal organization to make it more powerful. The updated version, which would at this point be slightly above human-level intelligence, could be given the task of further self-improvement, and so on until the process runs into physical limits or other bottlenecks.

Or better yet than “human-level intelligence” would be if we built software that was critical for the further development of more powerful computers. And we in fact already have such software, many different kinds of it, and yet it is not that easy to simply “speed it up or make tweaks to its algorithm and internal organization to make it more powerful”. More generally, as noted above, we already use our latest, updated technology to improve our latest, updated technology, and the result is not rapid, runaway growth.

Perhaps self-improvement does not have to require human-level general intelligence at all. There may be comparatively simple AI designs that are specialized for AI science and (initially) lack proficiency in other domains. The theoretical foundations for an AI design that can bootstrap itself to higher and higher intelligence already exist (Schmidhuber, 2006), and it remains an empirical question where exactly the threshold is after which AI designs would become capable of improving themselves further, and whether the slope of such an improvement process is steep enough to go on for multiple iterations.

Again, I would just reiterate that computers are already an essential component in the process of improving computers. And the fact that humans who need to sleep and have lunch breaks are also part of this improvement process does not seem a main constraint on it compared to other factors, such as physical limitations implied by transportation and the assemblage of materials. Oftentimes in modern research, computers run simulations at their maximum capacity while the humans do their sleeping and lunching, in which case these resting activities (through which humans often get their best ideas) do not limit progress much at all, whereas the available computing power does.

For the above reasons, it cannot be ruled out that breakthroughs in AI could at some point lead to an intelligence explosion (Good, 1965; Chalmers, 2010), where recursive self-improvement leads to a rapid acceleration of AI progress. In such a scenario, AI could go from subhuman intelligence to vastly superhuman intelligence in a very short timespan, e.g. in (significantly) less than a year.

“It cannot be ruled out” can be said of virtually everything; the relevant question is how likely we should expect these possibilities to be. Beyond that, it is also not clear what would count as a “rapid acceleration of AI progress”, and thus what exactly it is that cannot be ruled out. AI going from subhuman performance to vastly greater than human performance in a short amount of time has already been seen in many different domains, including Go most recently.

But if one were to claim, to take a specific claim, that it cannot be ruled out that an AI system will improve itself so much that it can overpower human civilization and control the future, then I would argue that the reasoning above does not support considering this a likely possibility, i.e. something that is more likely to happen than, say, one in a thousand.

While the idea of AI advancing from human-level to vastly superhuman intelligence in less than a year may sound implausible, as it violates long-standing trends in the speed of human-driven development, it would not be the first time where changes to the underlying dynamics of an optimization process cause an unprecedented speed-up. Technology has been accelerating ever since innovations (such as agriculture or the printing press) began to feed into the rate at which further innovations could be generated.5

In the endnote “5” referred to above, Lukas writes:

[…] Finally, over the past decades, many tasks, including many areas of research and development, have already been improved through outsourcing them to machines – a process that it is still ongoing and accelerating.

That this process of outsourcing of tasks is accelerating seems in need of justification. We have been outsourcing tasks to machines in various ways and at a rapid pace for at least two centuries now, and so it is not a trivial claim that this process is accelerating.

Compared to the rate of change we see in biological evolution, cultural evolution broke the sound barrier: It took biological evolution a few million years to improve on the intelligence of our ape-like ancestors to the point where they became early hominids. By contrast, technology needed little more than ten thousand years to progress from agriculture to space shuttles.

And I would argue that the reason technology could grow so fast is because an ever larger system of technology consisting of an ever greater variety of tools was contributing to it through recursive self-improvement — human genius was but one important component. And I think we have good reason to think the same about the future.

Just as inventions like the printing press fed into – and significantly sped up – the process of technological evolution, rendering it qualitatively different from biological evolution, AIs improving their own algorithms could cause a tremendous speed-up in AI progress, rendering AI development through self-improvement qualitatively different from “normal” technological progress.

I think there is very little reason to believe this story. Again, we already use our best machines to build the next generation of machines. “Normal” technological progress of the kind we see today already depends on computers running programs created to optimize future technology as efficiently as they can, and it is far from clear that running a more human kind of program would be a more efficient use of resources toward this end.

It should be noted, however, that while the arguments in favor of a possible intelligence explosion are intriguing, they nevertheless remain speculative. There are also some good reasons why some experts consider a slower takeoff of AI capabilities more likely. In a slower takeoff, it would take several years or even decades for AI to progress from human to superhuman intelligence.

Again, the word “intelligence” here seems to refer to cognitive abilities, not the ability to achieve goals in general. And it is again not clear what it means to say that it might “take several years or even decades for AI to progress from human to superhuman intelligence”, since computers have already been more capable than humans at a wide variety of cognitive tasks for many decades. So I would argue that this statement suffers from a lack of conceptual clarity.

Unless we find decisive arguments for one scenario over the other, we should expect both rapid and comparably slow takeoff scenarios to remain plausible. It is worth noting that because “slow” in this context also includes transitions on the order of ten or twenty years, it would still be very fast practically speaking, when we consider how much time nations, global leaders or the general public would need to adequately prepare for these changes.

To reiterate the statement I just made, it is not clear what a fast takeoff means in this context given that computers are already vastly superior to humans in many domains, and probably will continue to beat humans at ever more tasks before they come close to being able to do virtually all cognitive tasks humans can do. So what it is we are supposed to consider plausible is not entirely clear. As for whether it is plausible for rapid progress to occur over a wide range of cognitive tasks such that an AI system becomes able to take over the world, I would argue that we have not seen arguments to support this claim.

V. By default, superintelligent AI would be indifferent to our well-being

The typical mind fallacy refers to the belief that other minds operate the same way our own does. If an extrovert asks an introvert, “How can you possibly not enjoy this party; I talked to half a dozen people the past thirty minutes and they were all really interesting!” they are committing the typical mind fallacy.

When envisioning the goals of smarter-than-human artificial intelligence, we are in danger of committing this fallacy and projecting our own experience onto the way an AI would reason about its goals. We may be tempted to think that an AI, especially a superintelligent one, will reason its way through moral arguments6 and come to the conclusion that it should, for instance, refrain from harming sentient beings. This idea is misguided, because according to the intelligence definition we provided above – which helps us identify the processes likely to shape the future – making a system more intelligent does not change its goals/objectives; it only adds more optimization power for pursuing those objectives.

Again, we need to be clear about what “smarter-than-human artificial intelligence” means here. In this case, we seem to be talking about a fairly singular and coherent system, a “mind” of sorts — as opposed to a thousand and one different software programs that do their own thing well — and hence in this regard it seems that the term “smarter-than-human artificial intelligence” here refers to something that is quite similar to a human mind. We are seemingly also talking about a system that “would reason about its goals”.

It seems worth noting that this is quite different from how we think about contemporary software programs, even including the most advanced ones such as AlphaZero and IBM’s Watson, which we are generally not tempted to consider “minds”. Expecting competent software programs of the future to be like minds may itself be to commit a typical mind fallacy of sorts, or perhaps just a mind fallacy. It is conceivable that software will continue to outdo humans at many tasks without acquiring anything resembling what we usually conceive of as a mind.

Another thing worth clarifying is what we mean by the term “by default” here. Does it refer to what AI systems will be built to do by our economy in the absence of altruistic intervention? If “by default” means that which our economy will naturally tend to produce, it seems likely that future AI indeed will be programmed to not be indifferent, at least in a behavioral sense, to human well-being “by default”. Indeed, it seems a much greater risk that future software systems will be constructed to act in a way that exclusively benefits, and is indifferent toward anything else than, human beings. In other words, that it will share our speciesist bias, with catastrophic consequences ensuing.

My point here is merely that, just as it is almost meaningless to claim that biological minds will not care about our well-being by default, as it lacks any specification of what “by default” means — given what evolutionary history? — so is it highly unclear what “by default” means when we are talking about machines created by humans. It seems to assume that we are going to suddenly have a lot of “undirected competence” delivered to us which does not itself come with countless sub-goals and adaptations built into it to attain ends desired by human programmers, and, perhaps to a greater extent, markets.

To give a silly example, imagine that an arms race between spam producers and companies selling spam filters leads to increasingly more sophisticated strategies on both sides, until the side selling spam filters has had it and engineers a superintelligent AI with the sole objective to minimize the number of spam emails in their inboxes.

Again, I would flag that it is not clear what “superintelligent AI” means here. Does it refer to a system that is better able to achieve goals across the board than humans? Or merely a system with greater cognitive abilities than any human expert in virtually all domains? Even if it is merely the latter, it is unlikely that a system developed by a single team of software developers will have much greater cognitive competences across the board than the systems developed by other competing teams, let alone those developed by the rest of the economy combined.

With its level of sophistication, the spam-blocking AI would have more strategies at its disposal than normal spam filters.

Yet how many more? What could account for this large jump in capabilities from previous versions of spam filters? What is hinted here seems akin to the sudden emergence of a Bugatti in the Stone Age. It does not seem credible.

For instance, it could try to appeal to human reason by voicing sophisticated, game-theoretic arguments against the negative-sum nature of sending out spam. But it would be smart enough to realize the futility of such a plan, as this naive strategy would backfire because some humans are trolls (among other reasons). So the spam-minimizing AI would quickly conclude that the safest way to reduce spam is not by being kind, but by gaining control over the whole planet and killing everything that could possibly try to trick its spam filter.

First of all, it is by no means clear that this would be “the safest way” to minimize spam. Indeed, I would argue that trying to gain control in this way would be a very bad action in expectation with respect to the goal of minimizing spam.

But even more fundamentally, the scenario above seems to assume that it would be much easier to build a system with the abilities to take over the world than it would to properly instantiate the goals we want it to achieve. For instance, in the case of earlier versions of AlphaZero, these were all equally aligned with the goal of winning Go. The hard problem was to make it more capable at doing it. The assumption that the situation would be inverted with respect to future goal implementation seems to me unwarranted. Not because the goals are necessarily easy to instantiate, but because the competences in question appear extremely difficult to create. The scenario described above seems to ignore this consideration, and instead assumes that the default scenario is that we will suddenly get advanced machines with a lot of competence, but where we do not know how to direct this competence toward doing what we want it to, as opposed to gradually directing and integrating these competences as they are (gradually) acquired. Beyond that, on a more general note, I think many aspiring effective altruists who worry about AI safety tend to underestimate the extent to which computer programmers are already focused on making software do what they intend it to.

Moreover, the scenario considered here also seems to assume that it would be relatively easy to make a competent machine optimize a particular goal insistently, and I would also question that this is anything less than extremely difficult. In other words, not only do I think it is extremely difficult to create the competences in question, as noted above, but I also think it is extremely difficult to orient all these competences, not just a few subroutines, toward insistently accomplishing some perverse goal. For this reason too, I think one should be highly skeptical of scenarios of this kind.

The AI in this example may fully understand that humans would object to these actions on moral grounds, but human “moral grounds” are based on what humans care about – which is not the minimization of spam! And the AI – whose whole decision architecture only selects for actions that promote the terminal goal of minimizing spam – would therefore not be motivated to think through, let alone follow our arguments, even if it could “understand” them in the same way introverts understand why some people enjoy large parties.

I think this is inaccurate. Any goal-oriented agent would be motivated to think through these things for the same reason that we humans are motivated to think through what those who disagree with us morally would say and do: because it impacts how we ourselves can act effectively toward our goals (this, we should be honest, is also often why humans think about the views and arguments made by others; not because of a deep yearning for truth and moral goodness but for purely pragmatic and selfish reasons). Thus, it makes sense to be mindful of those things, especially given that one has imperfect information and an imperfect ability to predict the future, no matter how “smart” one is.

The typical mind fallacy tempts us to conclude that because moral arguments appeal to us,7 they would appeal to any generally intelligent system. This claim is after all already falsified empirically by the existence of high-functioning psychopaths. While it may be difficult for most people to imagine how it would feel to not be moved by the plight of anyone but oneself, this is nothing compared to the difficulties of imagining all the different ways that minds in general could be built. Eliezer Yudkowsky coined the term mind space to refer to the set of all possible minds – including animals (of existing species as well as extinct ones), aliens, and artificial intelligences, as well as completely hypothetical “mind-like” designs that no one would ever deliberately put together. The variance in all human individuals, throughout all of history, only represents a tiny blob in mind space.

Yes, but this does not mean that the competences of human minds only span a tiny range of the notional “competence range” of various abilities. As we saw in the example of chess above, humans span a surprisingly large range, and the best humans are surprisingly close to the best mind possible. And with respect to the competences required for navigating within a world built by and for humans, it is not that unreasonable to believe that, on a continuum that measures competence across these many domains with a single measure, we are probably quite high and quite difficult to beat. This is not arrogance. It is merely to acknowledge the contingent structure of our civilization, and the fact that it is adapted to many contingent features of the human organism in general, including the human mind in particular.

Some of the minds outside this blob would “think” in ways that are completely alien to us; most would lack empathy and other (human) emotions for that matter; and many of these minds may not even relevantly qualify as “conscious.”

Most of these minds would not be moved by moral arguments, because the decision to focus on moral arguments has to come from somewhere, and many of these minds would simply lack the parts that make moral appeals work in humans. Unless AIs are deliberately designed8 to share our values, their objectives will in all likelihood be orthogonal to ours (Armstrong, 2013).

Again, an agent trying to achieve goals in our world need not be moved by moral arguments in an emotional sense in order to pay attention to them and the preferences of humans more generally, and to choose to avoid causing chaos. Second, the question is why we should expect future software designed by humans to not be “deliberately designed to share our values”? And what marginal difference should we expect altruists to be able to make on them? And how would this influence best be achieved?

VI. AIs will instrumentally value self-preservation and goal preservation

Even though AI designs may differ radically in terms of their top-level goals, we should expect most AI designs to converge on some of the same subgoals. These convergent subgoals (Omohundro, 2008; Bostrom, 2012) include intelligence amplification, self-preservation, goal preservation and the accumulation of resources. All of these are instrumentally very useful to the pursuit of almost any goal. If an AI is able to access the resources it needs to pursue these subgoals, and does not explicitly have concern for human preferences as (part of) its top-level goal, its pursuit of these subgoals is likely to lead to human extinction (and eventually space colonization; see below).

Again, what does “AI design” refer to in this context? Presumably a machine that possesses most of the cognitive abilities a human does to a similar or greater extent, and, on top of that, this machine is in some sense highly integrated into something akin to a coherent unified mind subordinate to a few supreme “top-level goals”. Thus, when Lukas writes “most AI designs” above, he is in fact referring to most systems that meet a very particular definition of “AI”, and one which I strongly doubt will be anywhere close to the most prevalent source of “machine competence” in the future (note that this is not to say that software, as well as our machines in general, will not become ever more competent in the future, but merely that such greater competences may not be subordinate to one goal to rule them all, or a few for that matter).

Beyond that, the claim that such a capable machine of the future seeking to achieve these subgoals is likely to lead to human extinction is a very strong claim that is not supported here, nor in the papers cited. More on this below.

AI safety work refers to interdisciplinary efforts to ensure that the creation of smarter-than-human artificial intelligence will result in excellent outcomes rather than disastrous ones. Note that the worry is not that AI would turn evil, but that indifference to suffering and human preferences will be the default unless we put in a lot of work to ensure that AI is developed with the right values.

Again, I would take issue with this “default” claim, as I would argue that “a lot of work” is exactly what we should expect that there will be made to ensure that future software will do what humans want it to. And the question is, again, how much of a difference altruists should expect to make here, as well as how to best make it.

VI.I Intelligence amplification

Increasing an agent’s intelligence improves its ability to efficiently pursue its goals. All else equal, any agent has a strong incentive to amplify its intelligence. A real-life example of this convergent drive is the value of education: Learning important skills and (thinking-)habits early in life correlates with good outcomes. In the AI context, intelligence amplification as a convergent drive implies that AIs with the ability to improve their own intelligence will do so (all else equal). To self-improve, AIs would try to gain access to more hardware, make copies of themselves to increase their overall productivity, or devise improvements to their own cognitive algorithms.

Again, what does the word “intelligence” mean in this context? Above, it was defined as “the ability to achieve goals in a wide range of environments”, which means that what is being said here reduces to the tautological claim that increasing an agent’s ability to achieve goals improves its ability to achieve goals. If one defines “intelligence” to refer to cognitive abilities, however, the claim becomes less empty. Yet it also becomes much less obvious, especially if one thinks in terms of investments of marginal resources, as it is questionable whether investing in greater cognitive abilities (as opposed to a prettier face or stronger muscles) is the best investment one can make with respect to the goal of achieving goals “in general”.

On a more general note, I would argue that “intelligence amplification”, as in “increasing our ability to achieve goals”, is already what we collectively do in our economy to a great extent, although this increase is, of course, much broader than one merely oriented toward optimizing cognitive abilities. We seek to optimize materials, supply chains, transportation networks, energy efficiency, etc. And it is not clear why this growth process should speed up significantly due to greater machine capabilities in the future than it has in the past, where more capable machines also helped grow the economy in general, as well as to increase the capability of machines in particular.

More broadly, intelligence amplification also implies that an AI would try to develop all technologies that may be of use to its pursuits.

Yet should we expect such “an AI” to be better able to develop “all technologies that may be of use to its pursuits” better than entire industries currently dedicated to it, let alone our entire economy? Indeed, should we even expect it to contribute significantly, i.e. double current growth rates across the board? I would argue that this is most dubious.

I.J. Good, a mathematician and cryptologist who worked alongside Alan Turing, asserted that “the first ultraintelligent machine is the last invention that man need ever make,” because once we build it, such a machine would be capable of developing all further technologies on its own.

To say that a single machine would be able to develop all further technologies on its own is, I submit, unsound. For what does “on its own” mean here? “On its own” independently of the existing infrastructure of machines run by humans? Or “on its own” as in taking over this entire infrastructure? And how exactly could such a take-over scenario occur without destroying the productivity of this system? None of these scenarios seem plausible.

VI.II Goal preservation

AIs would in all likelihood also have an interest in preserving their own goals. This is because they optimize actions in terms of their current goals, not in terms of goals they might end up having in the future.

This again seems to assume that we will create highly competent systems which will be subordinate to a single or a few explicit goals that it will insistently optimize all its actions for. Why should we believe this?

Another critical note of mine on this idea quoted from elsewhere:

Stephen Omohundro (Omohundro, 2008) argues that a chess-playing robot with the supreme goal of playing good chess would attempt to acquire resources to increase its own power and work to preserve its own goal of playing good chess. Yet in order to achieve such complex subgoals, and to even realize they might be helpful with respect to achieving the ultimate goal, this robot will need access to, and be built to exercise advanced control over, an enormous host of intellectual tools and faculties. Building such tools is extremely hard and requires many resources, and harder still, if at all possible, is it to build them so that they are subordinate to a single supreme goal. And even if all this is possible, it is far from clear that access to these many tools would not enable – perhaps even force – this now larger system to eventually “reconsider” the goals that it evolved from. For instance, if the larger system has a sufficient amount of subsystems with sub-goals that involve preservation of the larger system of tools, and if the “play excellent chess” goal threatens, or at least is not optimal with respect to, this goal, could one not imagine that, in some evolutionary competition, these sub-goals could overthrow the supreme goal?

Footnote: After all, humans are such a system of competing drives, and it has been argued (e.g. in Ainslie, 2001 [Breakdown of Will]) that this competition is what gives us our unique cognitive strengths (as well as weaknesses). Our ultimate goals, to the extent we have any, are just those that win this competition most of the time.

And Paul Christiano has also described agents that would not be subject to this “basic drive” of self-preservation described by Omohundro.

Lukas continues:

From the current goal’s perspective, a change in the AI’s goal function is potentially disastrous, as the current goal would not persevere. Therefore, AIs will try to prevent researchers from changing their goals.

Granted that such a highly competent system is built so as to be subordinate to a single goal in this way, which I do not think there is good reason to consider likely to be the case in future AI systems “by default”.

Consequently, there is pressure for AI researchers to get things right on the first try: If we develop a superintelligent AI with a goal that is not quite what we were after – because someone made a mistake, or was not precise enough, or did not think about particular ways the specified goal could backfire – the AI would pursue the goal that it was equipped with, not the goal that was intended. This applies even if it could understand perfectly well what the intentioned goal was. This feature of going with the actual goal instead of the intended one could lead to cases of perverse instantiation, such as the AI “paralyz[ing] human facial musculatures into constant beaming smiles” to pursue an objective of “make us smile” (Bostrom, 2014, p. 120).

This again seems to assume that this “first superintelligent AI” would be so much more powerful than everything else in the world, yet why should we expect a single system to be so much more powerful than everything else across the board? Beyond that, it also seems to assume that the design of this system would happen in something akin to a single step — that there would be a “first try”. Yet what could a first try consist in? How could a super capable system emerge in the absence of a lot of test models that are slightly less competent? I think this “first try” idea betrays an underlying belief in a sudden growth explosion powered by a single, highly competent machine, which, again, I would argue is highly unlikely in light of what we know about the nature of the growth of the capabilities of machines.

VI.III Self-preservation

Some people have downplayed worries about AI risks with the argument that when things begin to look dangerous, humans can literally “pull the plug” in order to shut down AIs that are behaving suspiciously. This argument is naive because it is based on the assumption that AIs would be too stupid to take precautions against this.

There is a difference between being “stupid” and being ill-informed. And there is no reason to think that an extremely cognitively capable agent will be informed about everything relevant to its own self-preservation. To think otherwise is to conflate great cognitive abilities with near-omniscience.

Because the scenario we are discussing concerns smarter-than-human intelligence, an AI would understand the implications of losing its connection to electricity, and would therefore try to proactively prevent being shut down any means necessary – especially when shutdown might be permanent.

Even if all implications were understood by such a notional agent, this by no means implies that an attempt to stop its termination would be successful, nor particularly likely, or indeed even possible.

This is not to say that AIs would necessarily be directly concerned about their own “death” – after all, whether an AI’s goal includes its own survival or not depends on the specifics of its goal function. However, for most goals, staying around pursuing one’s goal will lead to better expected goal achievement. AIs would therefore have strong incentives to prevent permanent shutdown even if their goal was not about their own “survival” at all. (AIs might, however, be content to outsource their goal achievement by making copies of themselves, in which case shutdown of the original AI would not be so terrible as long as one or several copies with the same goal remain active.)

I would question the tacit notion that the self-preservation of such a machine could be done with a significantly greater level of skill than could the “counter self-preservation” work of the existing human-machine civilization. After all, why should a single system be so much more capable than the rest of the world at any given task? Why should humans not develop specialized software systems and other machines that enable them to counteract and overpower rogue machines, for example by virtue of having more information and training? What seems described here as an almost sure to happen default outcome strikes me as highly unlikely. This is not to say that one should not worry about small risks of terrible outcomes, yet we need to get a clear view of the probabilities if we are to make a qualified assessment of the expected value of working on these risks.

The convergent drive for self-preservation has the unfortunate implication that superintelligent AI would almost inevitably see humans as a potential threat to its goal achievement. Even if its creators do not plan to shut the AI down for the time being, the superintelligence could reasonably conclude that the creators might decide to do so at some point. Similarly, a newly-created AI would have to expect some probability of interference from external actors such as the government, foreign governments or activist groups. It would even be concerned that humans in the long term are too stupid to keep their own civilization intact, which would also affect the infrastructure required to run the AI. For these reasons, any AI intelligent enough to grasp the strategic implications of its predicament would likely be on the lookout for ways to gain dominance over humanity. It would do this not out of malevolence, but simply as the best strategy for self-preservation.

Again, to think that a single agent could gain dominance over the rest of the human-machine civilization in which it would find itself appears extremely unlikely. What growth story could plausibly lead to this outcome?

This does not mean that AIs would at all times try to overpower their creators: If an AI realizes that attempts at trickery are likely to be discovered and punished with shutdown, it may fake being cooperative, and may fake having the goals that the researchers intended, while privately plotting some form of takeover. Bostrom has referred to this scenario as a “treacherous turn” (Bostrom, 2014, p. 116).

We may be tempted to think that AIs implemented on some kind of normal computer substrate, without arms or legs for mobility in the non-virtual world, may be comparatively harmless and easy to overpower in case of misbehavior. This would likely be a misconception, however. We should not underestimate what a superintelligence with access to the internet could accomplish. And it could attain such access in many ways and for many reasons, e.g. because the researchers were careless or underestimated its capacities, or because it successfully pretended to be less capable than it actually was. Or maybe it could try to convince the “weak links” in its [team] of supervisors to give it access in secret – promising bribes. Such a strategy could work even if most people in the developing team thought it would be best to deny their AI internet access until they have more certainty about the AI’s alignment status and its true capabilities. Importantly, if the first superintelligence ever built was prevented from accessing the internet (or other efficient channels of communication), its impact on the world would remain limited, making it possible for other (potentially less careful) teams to catch up. The closer the competition, the more the teams are incentivized to give their AIs riskier access over resources in a gamble for the potential benefits in case of proper alignment.

Again, this all seems to assume a very rapid take-off in capabilities with one system being vastly more capable than all others. What reasons do we have to consider such a scenario plausible? Barely any, I have argued.

The following list contains some examples of strategies a superintelligent AI could use to gain power over more and more resources, with the goal of eventually reaching a position where humans cannot harm or obstruct it. Note that these strategies were thought of by humans, and are therefore bound to be less creative and less effective than the strategies an actual superintelligence would be able to devise.

  • Backup plans: Superintelligent AI could program malware of unprecedented sophistication that inserted partial copies of itself into computers distributed around the globe (adapted from part 3.1.2 of this FAQ). This would give it further options to act even if its current copy was destroyed or if its internet connection was cut. Alternatively, it could send out copies of its source code, alongside detailed engineering instructions, to foreign governments, ideally ones who have little to lose and a lot to gain, with the promise of helping them attain world domination if they build a second version of the AI and handed it access to all their strategic resources.
  • Making money: Superintelligent AI could easily make fortunes with online poker, stock markets, scamming people, hacking bank accounts, etc.9
  • Influencing opinions: Superintelligent AI could fake convincing email exchanges with influential politicians or societal elites, pushing an agenda that serves its objectives of gaining power and influence. Similarly, it could orchestrate large numbers of elaborate sockpuppet accounts on social media or other fora to influence public opinion in favorable directions.
  • Hacking and extortion: Superintelligent AI could hack into sensitive documents, nuclear launch codes or other compromising assets in order to blackmail world leaders into giving it access over more resources. Or it could take over resources directly if hacking allows for it.
  • (Bio-)engineering projects: Superintelligent AI could pose as the head researcher of a biology lab and send lab assistants instructions to produce viral particles with specific RNA sequences, which then, unbeknownst to the people working on the project, turned out to release a deadly virus that incapacitated most of humanity.10

Through some means or another – and let’s not forget that the AI could well attempt many strategies at once to safeguard against possible failure in some of its pursuits – the AI may eventually gain a decisive strategic advantage over all competition (Bostrom, 2014, p. 78-90). Once this is the case, it would carefully build up further infrastructure on its own. This stage will presumably be easier to reach as the world economy becomes more and more automated.

These various strategies could also be pursued by other agents, and indeed by vast systems of agents and programs. Why should one such agent be much more competent than others at doing any of these things?

Once humans are no longer a threat, the AI would focus its attention on natural threats to its existence. It would for instance notice that the sun will expand in about seven billion years to the point where existence on earth will become impossible. For the reason of self preservation alone, a superintelligent AI would thus eventually be incentivized to expand its influence beyond Earth.

Following the arguments I have made above (as well as here), I would argue that such a take-over of the world subordinate to a single or a few goals originally instilled in a single machine is extremely unlikely.

VI.IV Resource accumulation

For the fulfillment of most goals, accumulating as many resources as possible is an important early step. Resource accumulation is also intertwined with the other subgoals in that it tends to facilitate them.

The resources available on Earth are only a tiny fraction of the total resources that an AI could access in the entire universe. Resource accumulation as a convergent subgoal implies that most AIs would eventually colonize space (provided that it is not prohibitively costly), in order to gain access to the maximum amount of resources. These resources would then be put to use for the pursuit of its other subgoals and, ultimately, for optimizing its top-level goal.

Superintelligent AI might colonize space in order to build (more of) the following:

  • Supercomputers: As part of its intelligence enhancement, an AI could build planet-sized supercomputers (Sandberg, 1999) to figure out the mysteries of the cosmos. Almost no matter the precise goal, having an accurate and complete understanding of the universe is crucial for optimal goal achievement.
  • Infrastructure: In order to accomplish anything, an AI needs infrastructure (factories, control centers, etc.) and “helper robots” of some sort. This would be similar (but much larger in scale) to how the Manhattan Project had its own “project sites” and employed tens of thousands of people. While some people worry that an AI would enslave humans, these helpers would more plausibly be other AIs specifically designed for the tasks at hand.
  • Defenses: An AI could build shields to protect itself or other sensitive structures from cosmic rays. Perhaps it would build weapon systems to deal with potential threats.
  • Goal optimization: Eventually, an AI would convert most of its resources into machinery that directly achieves its objectives. If the goal is to produce paperclips, the AI will eventually tile the accessible universe with paperclips. If the goal is to compute pi to as many decimal places as possible, the AI will eventually tile the accessible universe with computers to compute pi. Even if an AI’s goal appears to be limited to something “local” or “confined,” such as e.g. “protect the White House,” the AI would want to make success as likely as possible and thus continue to accumulate resources to better achieve that goal.

To elaborate on the point of goal optimization: Humans tend to be satisficers with respect to most things in life. We have minimum requirements for the quality of the food we want to eat, the relationships we want to have, or the job we want to work in. Once these demands are met and we find options that are “pretty good,” we often end up satisfied and settle down on the routine. Few of us spend decades of our lives pushing ourselves to invest as many waking hours as sustainably possible into systematically finding the optimal food in existence, the optimal romantic partner, or anything really.

AI systems on the other hand, in virtue of how they are usually built, are more likely to act as maximizers. A chess computer is not trying to look for “pretty good moves” – it is trying to look for the best move it can find with the limited time and computing power it has at its disposal. The pressure to build ever more powerful AIs is a pressure to build ever more powerful maximizers. Unless we deliberately program AIs in a way that reduces their impact, the AIs we build will be maximizers that never “settle” or consider their goals “achieved.” If their goal appears to be achieved, a maximizer AI will spend its remaining time double- and triple-checking whether it made a mistake. When it is only 99.99% certain that the goal is achieved, it will restlessly try to increase the probability further – even if this means using the computing power of a whole galaxy to drive the probability it assigns to its goal being achieved from 99.99% to 99.991%.

Because of the nature of maximizing as a decision-strategy, a superintelligent AI is likely to colonize space in pursuit of its goals unless we program it in a way to deliberately reduce its impact. This is the case even if its goals appear as “unambitious” as e.g. “minimize spam in inboxes.”

Why should we expect a single machine to be better able to accumulate resources than other actors in the economy, much less whole teams of actors powered by specialized software programs optimized toward that very purpose? Again, what seems to be considered the default outcome here is one that I would argue is extremely unlikely. This is still not to say that we then have reason to dismiss such a scenario. Yet it is important that we make an honest assessment of its probability if we are to make qualified assessments of the value of prioritizing it.

VII. Artificial sentience and risks of astronomical suffering

Space colonization by artificial superintelligence would increase goal-directed activity and computations in the world by an astronomically large factor.11

So would space colonization driven by humans. And it is not clear why we should expect a human-driven colonization to increase goal-directed computations any less. Beyond that, such human-driven colonization also seems much more likely to happen than does rogue AI colonization. 

If the superintelligence holds objectives that are aligned with our values, then the outcome could be a utopia. However, if the AI has randomly, mistakenly, or sufficiently suboptimally implemented values, the best we could hope for is if all the machinery it used to colonize space was inanimate, i.e. not sentient. Such an outcome – even though all humans would die – would still be much better than other plausible outcomes, because it would at least not contain any suffering. Unfortunately, we cannot rule out that the space colonization machinery orchestrated by a superintelligent AI would also contain sentient minds, including minds that suffer. The same way factory farming led to a massive increase in farmed animal populations, multiplying the direct suffering humans cause to animals by a large factor, an AI colonizing space could cause a massive increase in the total number of sentient entities, potentially creating vast amounts of suffering.

The same applies to a human-driven colonization, which I would still argue seems a much more likely outcome. So why should we focus more on colonization driven by rogue AI?

The following are some ways AI outcomes could result in astronomical amounts of suffering:

Suffering in AI workers: Sentience appears to be linked to intelligence and learning (Daswani & Leike, 2015), both of which would be needed (e.g. in robot workers) for the coordination and execution of space colonization. An AI could therefore create and use sentient entities to help it pursue its goals. And if the AI’s creators did not take adequate safety measures or program in compassionate values, it may not care about those entities’ suffering in their assistance.

Optimization for sentience: Some people want to colonize space in order for there to be more life or (happy) sentient minds. If the AI in question has values that reflect this goal, either because human researchers managed to get value loading right (or “half-right”), or because the AI itself is sentient and values creating copies of itself, the result could be astronomical numbers of sentient minds. If the AI does not accurately assess how happy or unhappy these beings are, or if it only cares about their existence but not their experiences, or simply if something goes wrong in even a small portion of these minds, the total suffering that results could be very high.

Ancestor simulations: Turning history and (evolutionary) biology into an empirical science, AIs could run many “experiments” with simulations of evolution on planets with different starting conditions. This would e.g. give the AIs a better sense of the likelihood of intelligent aliens existing, as well as a better grasp on the likely distribution of their values and whether they would end up building AIs of their own. Unfortunately, such ancestor simulations could recreate millions of years of human or wild-animal suffering many times in parallel.

Warfare: Perhaps space-faring civilizations would eventually clash, with at least one of the two civilizations containing many sentient minds. Such a conflict would have vast frontiers of contact and could result in a lot of suffering.

All of these scenarios could also occur in a human-driven colonization, which I would argue is significantly more likely to happen. So again: why should we focus more on colonization driven by rogue AI?

More ways AI scenarios could contain astronomical amounts of suffering are described here and here. Sources of future suffering are likely to follow a power law distribution, where most of the expected suffering comes from a few rare scenarios where things go very wrong – analogous to how most casualties are the result of very few, very large wars; how most of the casualty-risks from terrorist attacks fall into tail scenarios where terrorists would get their hands on weapons of mass destruction; or how most victims of epidemics succumbed to the few very worst outbreaks (Newman, 2005). It is therefore crucial to not only to factor in which scenarios are most likely to occur, but also how bad scenarios would be should they occur.

Again, most of the very worst scenarios could well be due to human-driven colonization, such as US versus China growth races taken beyond Earth. So, again, why focus mostly on colonization scenarios driven by rogue AI? Beyond that, the expected value of influencing a broad class of medium-value outcomes could easily be much higher than the expected value of influencing much fewer, much higher-stakes outcomes, provided that the outcomes that fall into this medium value class are sufficiently probable and amenable to impact. In other words, it is by no means far-fetched to imagine that we can take actions that are robust over a wide range of medium-value outcomes, and that such actions are in fact best in expectation.

Critics may object because the above scenarios are largely based on the possibility of artificial sentience, particularly sentience implemented on a computer substrate. If this turns out to be impossible, there may not be much suffering in futures with AI after all. However, computer-based minds also being able to suffer in the morally relevant sense is a common implication in philosophy of mind. Functionalism and type A physicalism (“eliminativism”) both imply that there can be morally relevant minds on digital substrates. Even if one were skeptical of these two positions and instead favored the views of philosophers like David Chalmers or Galen Strawson (e.g. Strawson, 2006), who believe consciousness is an irreducible phenomenon, there are at least some circumstances under which these views would also allow for computer-based minds to be sentient.12 Crude “carbon chauvinism,” or a belief that consciousness is only linked to carbon atoms, is an extreme minority position in philosophy of mind.

The case for artificial sentience is not just abstract but can also be made on the intuitive level: Imagine we had whole brain emulation with a perfect mapping from inputs to outputs, behaving exactly like a person’s actual brain. Suppose we also give this brain emulation a robot body, with a face and facial expressions created with particular attention to detail. The robot will, by the stipulations of this thought experiment, behave exactly like a human person would behave in the same situation. So the robot-person would very convincingly plead that it has consciousness and moral relevance. How certain would we be that this was all just an elaborate facade? Why should it be?

Because we are unfamiliar with artificial minds and have a hard time experiencing empathy for things that do not appear or behave in animal-like ways, we may be tempted to dismiss the possibility of artificial sentience or deny artificial minds moral relevance – the same way animal sentience was dismissed for thousands of years. However, the theoretical reasons to anticipate artificial sentience are strong, and it would be discriminatory to deny moral consideration to a mind simply because it is implemented on a substrate different from ours. As long as we are not very confident indeed that minds on a computer substrate would be incapable of suffering in the morally relevant sense, we should believe that most of the future’s expected suffering is located in futures where superintelligent AI colonizes space.

I fail to see how this final conclusion is supported by the argument made above. Again, human-driven colonization seems to pose at least as big a risk of outcomes of this sort.

One could argue that “superintelligent AI” could travel much faster and convert matter and energy into ordered computations much faster than a human-driven colonization could, yet I see little reason to expect a rogue AI-driven colonization to be significantly more effective in this regard than a human civilization powered by advanced tools built to be as efficient as possible. For instance, why should “superintelligent AI” be able to build significantly faster spaceships? I would expect both tail-end scenarios — i.e. both maximally sentient rogue AI-driven colonization and maximally sentient human-driven colonization —  to converge toward an optimal expansion solution in a relatively short time, at least on cosmic timescales.

VIII. Impact analysis

The world currently contains a great deal of suffering. Large sources of suffering include for instance poverty in developing countries, mental health issues all over the world, and non-human animal suffering in factory farms and in the wild. We already have a good overview – with better understanding in some areas than others – of where altruists can cost-effectively reduce substantial suffering. Charitable interventions are commonly chosen according to whether they produce measurable impact in the years or decades to come. Unfortunately, altruistic interventions are rarely chosen with the whole future in mind, i.e. with a focus on reducing as much suffering as possible for the rest of time, until the heat death of the universe.13 This is potentially problematic, because we should expect the far future to contain vastly more suffering than the next decades, not only because there might be sentient beings around for millions or billions of years to come, but also because it is possible for Earth-originating life to eventually colonize space, which could multiply the total amount of sentient beings many times over. While it is important to reduce the suffering of sentient beings now, it seems unlikely that the most consequential intervention for the future of all sentience will also be the intervention that is best for reducing short-term suffering.

I think this is true, but also because the word “best” here refers to two very narrow peaks that have to coincide in a very large landscape. In contrast, I do not think it seems unlikely that the best, most robust interventions we can make to influence the long-term future are also highly robust and positive with respect to the short-term future, such as promoting concern for suffering as well as greater moral consideration of neglected beings.

And given that the probability of extinction (evaluated from now) increases over time, and hence that one should discount the value of influencing the long-term future of civilization by a certain factor, it in fact seems reasonable to choose actions that seem positive both in the short and long term.

Instead, as judged from the distant future, the most consequential development of our decade would more likely have something to do with novel technologies or the ways they will be used.

And when it comes to how technologies will be used, it is clear that influencing ideas matters a great deal. By analogy, we have also seen important technologies developed in the past, and yet ideas seem to have been no less significant, such as specific religions (e.g. Islam and Christianity) as well as political ideologies (e.g. communism and liberalism). One may, of course, argue that it is very difficult to influence ideas on a large scale, yet the same can be said about influencing technology. Indeed, influencing ideas, whether broadly or narrowly, might just be the best way to influence technology.

And yet, politics, science, economics and especially the media are biased towards short timescales. Politicians worry about elections, scientists worry about grant money, and private corporations need to work on things that produce a profit in the foreseeable future. We should therefore expect interventions targeted at the far future to be much more neglected than interventions targeted at short-term sources of suffering.

Admittedly, the far future is difficult to predict. If our models fail to account for all the right factors, our predictions may turn out very wrong. However, rather than trying to simulate in detail through everything that might happen all the way into the distant future – which would be a futile endeavor, needless to say – we should focus our altruistic efforts on influencing levers that remain agile and reactive to future developments. An example of such a lever is institutions that persist for decades or centuries. The US Constitution for instance still carries significant relevance in today’s world, even though it was formulated hundreds of years ago. Similarly, the people who founded the League of Nations after World War I did not succeed in preventing the next war, but they contributed to the founding and the charter of its successor organization, the United Nations, which still exerts geopolitical influence today. The actors who initially influenced the formation of these institutions as well as their values and principles, had a long-lasting impact.

In order to positively influence the future for hundreds of years, we fortunately do not need to predict the next hundreds of years in detail. Instead, all we need to predict is what type of institutions – or, more generally, stable and powerful decision-making agencies – are most likely to react to future developments maximally well.14

AI is the ultimate lever through which to influence the future. The goals of an artificial superintelligence would plausibly be much more stable than the values of human leaders or those enshrined in any constitution or charter. And a superintelligent AI would, with at least considerable likelihood, remain in control of the future not only for centuries, but for millions or even billions of years to come. In non-AI scenarios on the other hand, all the good things we achieve in the coming decade(s) will “dilute” over time, as current societies, with all their norms and institutions, change or collapse.

In a future where smarter-than-human artificial intelligence won’t be created, our altruistic impact – even if we manage to achieve a lot in greatly influencing this non-AI future – would be comparatively “capped” and insignificant when contrasted with the scenarios where our actions do affect the development of superintelligent AI (or how AI would act).15

I think this is another claim that is widely overstated, and which I have not seen a convincing case for. Again, this notion that “an artificial superintelligence”, a single machine with much greater cognitive powers than everything else, will emerge and be programmed to be subordinate to a single goal that it would be likely to preserve does not seem credible to me. Sure, we can easily imagine it as an abstract notion, but why should we think such a system will ever emerge? The creation of such a system is, I would argue, far from being a necessary, or even particularly likely, outcome of our creating ever more competent machines.

And even if such a system did exist, it is not even clear, as Robin Hanson has argued, that it would be significantly more likely to preserve its values than would a human civilization — not so much because one should expect humans to be highly successful at it, but rather because there are also reasons to think that it would be unlikely for such a “superintelligent AI” to do it (such as those mentioned in my note on Omohundro’s argument above, as well as those provided by Hanson, e.g. that “the values of AIs with protected values should still drift due to influence drift and competition”).

We should expect AI scenarios to not only contain the most stable lever we can imagine – the AI’s goal function which the AI will want to preserve carefully – but also the highest stakes.

Again, I do not think a convincing case has been made for either of these claims. Why would the stakes be higher than in a human-driven colonization, which we may expect, for evolutionary reasons, to be performed primarily by those who want to expand and colonize as much and as effectively as possible?

In comparison with non-AI scenarios, space colonization by superintelligent AI would turn the largest amount of matter and energy into complex computations.

It depends on what we mean by non-AI scenarios. Scenarios where humans use advanced tools, such as near-maximally fast spaceships and near-optimal specialized software, to fill up space with sentient beings at a near maximal rate is, I would argue, not only at least as conceivable but also at least as likely as similar scenarios brought about by the kind of AI Lukas seems to have in mind here.

In a best-case scenario, all these resources could be turned into a vast utopia full of happiness, which provides as strong incentive for us to get AI creation perfectly right. However, if the AI is equipped with insufficiently good values, or if it optimizes for random goals not intended by its creators, the outcome could also include astronomical amounts of suffering. In combination, these two reasons of highest influence/goal-stability and highest stakes build a strong case in favor of focusing our attention on AI scenarios.

Again, things could also go very wrong or very well with human-driven colonization, so there does not seem a big difference in this regard either.

While critics may object that all this emphasis on the astronomical stakes in AI scenarios appears unfairly Pascalian, it should be noted that AI is not a frivolous thought experiment where we invoke new kinds of physics to raise the stakes.

Right, but the kind of AI system envisioned here does, I would argue, rest on various, highly questionable conceptions of how a single system could grow, as well as what the design of future machines are likely to be like. And I would argue, again, that such a system is highly unlikely to emerge.

Smarter-than-human artificial intelligence and space colonization are both realistically possible and plausible developments that fit squarely into the laws of nature as we currently understand them.

A Bugatti appearing in the Stone Age also in some sense fits squarely into the laws of nature as we currently understand them. Yet that does not mean that such a car was likely to emerge in that time, once we consider the history and evolution of technology. Similarly, I would argue that the scenario Lukas seems to have hinted at throughout his piece is a lot less credible than what this appeal to compatibility with the laws of nature would seem to suggest.

If either of them turn out to be impossible, that would be a big surprise, and would suggest that we are fundamentally misunderstanding something about the way physical reality works. While the implications of smarter-than-human artificial intelligence are hard to grasp intuitively, the underlying reasons for singling out AI as a scenario to worry about are sound.

Well, I have tried to argue to the contrary here. Much more plausible would it be, I think, to argue that the scenario Lukas envisions is one scenario among others that warrants some priority.

As illustrated by Leó Szilárd’s lobbying for precautions around nuclear bombs well before the first such bombs were built, it is far from hopeless to prepare for disruptive new technologies in advance, before they are completed.

This text argued that altruists concerned about the quality of the future should [be] focusing their attention on futures where AI plays an important role.

I would say that the argument that has been made is much more narrow than that, since “AI” here is used in a relatively narrow sense in the first place, and because it is a very particular scenario involving such narrowly defined AI that Lukas has been focusing on the most here — as far as I can tell, it is a scenario where a single system takes over the world and determines the future based on a single, arduously preserved goal. There are many other scenarios we can envision in which AI, both in the ordinary sense as well as in the more narrow sense invoked here by Lukas, plays “an important role”, including scenarios involving human-driven space colonization.

This can mean many things. It does not mean that everyone should think about AI scenarios or technical work in AI alignment directly. Rather, it just means we should pick interventions to support according to their long-term consequences, and particularly according to the ways in which our efforts could make a difference to futures ruled by superintelligent AI. Whether it is best to try to affect AI outcomes in a narrow and targeted way, or whether we should go for a broader strategy, depends on several factors and requires further study.

FRI has looked systematically into paths to impact for affecting AI outcomes with particular emphasis on preventing suffering, and we have come up with a few promising candidates. The following list presents some tentative proposals:

It is important to note that human values may not affect the goals of an AI at all if researchers fail to solve the value-loading problem. Raising awareness of certain values may therefore be particularly impactful if it concerns groups likely to be in control of the goals of smarter-than-human artificial intelligence.

Further research is needed to flesh out these paths to impact in more detail, and to discover even more promising ways to affect AI outcomes.

Lukas writes about the implications of his argument that it means that “we should pick interventions to support according to their long-term consequences”. I agree with this completely. He then continues to write, “and particularly according to the ways in which our efforts could make a difference to futures ruled by superintelligent AI”. And this claim, as I understand it, is what I would argue has not been justified. Again, to argue that one should grant it some priority, even significant priority, along with many other scenarios, is a plausible claim, but not, I would argue, that it should be granted greater priority than all other things.

And as for how we can best reduce suffering in the future, I would agree with pretty much all the proposals Lukas suggests, although I would argue that things like promoting concern for suffering and widening our moral circles (and we should do both) become even more important when we take other scenarios into consideration, such as human-driven colonization. In other words, these things seem even more robust and more positive when we also consider these other high-stakes scenarios.

Beyond that, I would also note that we likely have moral intuitions that make a notional rogue AI-takeover seem worse in expectation than what a more detached analysis relative to a more impartial moral ideal such as “reduce suffering” would suggest. Furthermore, it should be noted that many of those who focus most prominently on AI safety (for example, people at MIRI and FHI) seem to have values according to which it is important that humans maintain control or remain in existence, which may render their view that AI safety is the most important thing to focus on less relevant for other value systems than one might intuitively suppose.

To zoom out a bit, one way to think about my disagreement with Lukas, as well as the overall argument I have tried to make here, is that one can view Lukas’ line of argument as consisting of a certain number of steps where, in each of them, he describes a default scenario he believes to be highly probable, whereas I generally find these respective “default” scenarios quite improbable. And when one then combines our respective probabilities into a single measure of the probability that the grosser scenario Lukas envisions will occur, one gets a very different overall probability for Lukas and myself respectively. It may look something like this, assuming Lukas’ argument consists of eight steps in a conditional chain, each assigned a certain probability which then gets multiplied by the rest (i.e. P(A) * P(B|A) * P(C|B) * . . . ):

L: 0.98 * 0.96 * 0.93 * 0.99 * 0.95 * 0.99 * 0.97 * 0.98 ≈ 0.77

M: 0.1 * 0.3 * 0.01 * 0.1 * 0.2 * 0.08 * 0.2 * 0.4 ≈ 0.00000004

(These particular numbers are just more or less random ones I have picked for illustrative purposes, except that their approximate range do illustrate where I think the respective credences of Lukas and myself roughly lie with regard to most of the arguments discussed throughout this essay.)

And an important point to note here is that even if one disagrees both with Lukas and me on these respective probabilities, and instead picks credences roughly in-between those of Lukas and me, or indeed significantly closer to those of Lukas, the overall argument I have made here still stands, namely that it is far from clear that scenarios of the kind Lukas outlines are the most important ones to focus on to best reduce suffering. For then the probability of Lukas’ argument being correct/the probability that the scenario Lukas envisions will occur (one can think of it in both ways, I think, even if these formulations are not strictly equivalent) becomes something like the following:

In-between credence: 0.5^8 ≈ 0.004

Credences significantly closer to Lukas’: 0.75^8 ≈ 0.1

Which would not seem to support the conclusion that a focus on the AI-scenarios Lukas has outlined should dominate other scenarios we can envision (e.g. human-driven colonization).

Lukas ends his post on the following note:

As there is always the possibility that we have overlooked something or are misguided or misinformed, we should remain open-minded and periodically rethink the assumptions our current prioritization is based on.

With that, I could not agree more. In fact, this is in some sense the core point I have been trying to make here.

Moral Circle Expansion Might Increase Future Suffering

Expanding humanity’s moral circle so that it includes all sentient beings seems among the most urgent and important missions before us. And yet there is a significant risk that such greater moral inclusion might in fact end up increasing future suffering. As Brian Tomasik notes:

One might ask, “Why not just promote broader circles of compassion, without a focus on suffering?” The answer is that more compassion by itself could increase suffering. For example, most people who care about wild animals in a general sense conclude that wildlife habitats should be preserved, in part because these people aren’t focused enough on the suffering that wild animals endure. Likewise, generically caring about future digital sentience might encourage people to create as many happy digital minds as possible, even if this means also increasing the risk of digital suffering due to colonizing space. Placing special emphasis on reducing suffering is crucial for taking the right stance on many of these issues.

Indeed, many classical utilitarians do include non-human animals in their moral circle, yet they still consider it permissible, indeed in some sense morally required of us, that we bring individuals into existence so that they can live “net positive lives” and we can eat them (I have argued that this view is mistaken, almost regardless of what kind of utilitarian view one assumes). And some even seem to think that most lives on factory farms might plausibly be such “net positive lives”. A wide circle of moral consideration clearly does not guarantee an unwillingness to allow large amounts of suffering to be brought into the world.

More generally, there is a considerable number of widely subscribed ethical positions that favor bringing about larger rather than smaller populations of the beings who belong to our moral circle, at least provided that certain conditions are met in the lives of these beings. And many of these ethical positions have quite loose such conditions, which implies that they can easily permit, and even demand, the creation of a lot of suffering for the sake of some (supposedly) greater good.

Indeed, the truth is that even if we require an enormous amount of happiness (or an enormous amount of other intrinsically good things) to outweigh a given amount of suffering, this can still easily permit the creation of large amounts of suffering, as illustrated by the following consideration (quoted from the penultimate chapter of my book on effective altruism):

[…] consider the practical implications of the following two moral principles: 1) we will not allow the creation of a single instance of the worst forms of suffering […] for any amount of happiness, and 2) we will allow one day of such suffering for ten years of the most sublime happiness. What kind of future would we accept with these respective principles? Imagine a future in which we colonize space and maximize the number of sentient beings that the accessible universe can sustain over the entire course of the future, which is probably more than 10^30. Given this number of beings, and assuming these beings each live a hundred years, principle 2) above would appear to permit a space colonization that all in all creates more than 10^28 years of [extreme suffering], provided that the other states of experience are sublimely happy. This is how extreme the difference can be between principles like 1) and 2); between whether we consider suffering irredeemable or not. And notice that even if we altered the exchange rate by orders of magnitude — say, by requiring 10^15 times more sublime happiness per unit of extreme suffering than we did in principle 2) above — we would still allow an enormous amount of extreme suffering to be created; in the concrete case of requiring 10^15 times more happiness, we would allow more than 10,000 billion years of [the worst forms of suffering].

This highlights the importance of thinking deeply about which trade-offs, if any, we find acceptable with respect to the creation of suffering, including extreme suffering.

The considerations above concerning popular ethical positions that support larger future populations imply that there is a risk — a seemingly low yet still significant risk — that a more narrow moral circle may in fact lead to less future suffering for the morally excluded beings (e.g. by making efforts to bring these beings into existence, on Earth and beyond, less likely).


In spite of this risk, I still consider generic moral circle expansion quite positive in expectation. Yet it seems less positive, and arguably significantly less robust (with respect to the goal of reducing extreme suffering) than does the promotion of suffering-focused valuesAnd it seems less robust and less positive still than the twin-track strategy of focusing on both expanding our moral circle and deepening our concern for suffering. Both seem necessary yet insufficient on their own. If we deepen concern for suffering without broadening the moral circle, our deepened concern risks failing to pertain to the vast majority of sentient beings. On the other hand, if we broaden our moral circle without deepening our concern for suffering, we may end up allowing the beings within our moral circle to endure enormous amounts of suffering, including extreme suffering.

Those who seek to minimize extreme suffering should seek to avoid both these pitfalls by pursuing the twin-track approach.

The Principle of Sympathy for Intense Suffering

This essay was first published as a chapter in my book Effective Altruism: How Can We Best Help Others? which is available for free download here. The chapter that precedes it makes a general case for suffering-focused ethics, whereas this chapter argues for a particular suffering-focused view. A more elaborate case for suffering-focused ethics can be found in my book Suffering-Focused Ethics: Defense and Implications.

The ethical view I would advocate most strongly is a suffering-focused view that centers on a core principle of Sympathy for Intense Suffering, or SIS for short, which roughly holds that we should prioritize the interests of those who are, or will be, in a state of extreme suffering. In particular: that we should prioritize their interest in avoiding such suffering higher than anything else.[1]

One can say that this view takes its point of departure in classical utilitarianism, the theory that we should maximize the net sum of happiness minus suffering. Yet it questions a tacit assumption, a particular existence claim, often held in conjunction with the classical utilitarian framework, namely that for every instance of suffering, there exists some amount of happiness that can outweigh it.

This is a deeply problematic assumption, in my view. More than that, it is peculiar that classical utilitarianism seems widely believed to entail this assumption, given that (to my knowledge) none of the seminal classical utilitarians — Jeremy Bentham, John Stuart Mill, and Henry Sidgwick — ever argued for this existence claim, or even discussed it.[2] Thus, it seems that the acceptance of this assumption is no more entailed by classical utilitarianism, defined as the ethical view, or views, expressed by these utilitarian philosophers, than is its rejection.

The question of whether this assumption is reasonable ties into a deeper discussion about how to measure and weigh happiness and suffering against each other, and I think this is much less well-defined than is commonly supposed (even though the trickiness of the task is often acknowledged).[3] The problem is that we have a common sense view that goes something like the following: if a conscious subject deems some state of suffering worth experiencing in order to attain some given pleasure, then this pleasure is worth the suffering. And this common sense view may work for most of us most of the time.[4] Yet it runs into problems in cases where the subject deems their suffering so unbearable that no amount of happiness could ever outweigh it.

For what would the common sense view say in such a situation? That the suffering indeed cannot be outweighed by any pleasure? That would seem an intuitive suggestion, yet the problem is that we can also imagine the case of an experience of some pleasure that the subject, in that experience-moment, deems so great that it can outweigh even the worst forms of suffering, which leaves us with mutually incompatible value claims (although it is worth noting that one can reasonably doubt the existence of such positive states, whereas, as we shall see below, the existence of correspondingly negative experiences is a certainty).[5] How are we to evaluate these claims?

The aforementioned common sense method of evaluation has clearly broken down at this point, and is entirely silent on the matter. We are forced to appeal to another principle of evaluation. And the principle I would argue we should employ is, as hinted above, to choose to sympathize with those who are worst off — those who are experiencing intense suffering. Hence the principle of sympathy for intense suffering: we should sympathize with, and prioritize, the evaluations of those subjects who deem their suffering unoutweighable, even if only for a brief experience-moment, and thus give total priority to helping these subjects. More precisely, we should minimize the amount of such experience-moments of extreme suffering.[6] That, on this account of value, is the greatest help we can do for others.

This principle actually seems to have a lot of support from common sense and “common wisdom”. For example, imagine two children are offered to ride a roller coaster, one of whom would find the ride very pleasant, while the other child would find it very unpleasant, and imagine, furthermore, that the only two options available are that they either both ride or neither of them ride (and if neither of them ride, they are both perfectly fine).[7] Whose interests should we sympathize with and favor? Common sense would appear to favor the child who would not want to take the ride. The mere pleasure of the “ride-positive” child does not justify a violation of the interest of the other child not to suffer a very unpleasant experience. The interest in not enduring such suffering seems far more fundamental, and hence to have ethical primacy, compared to the relatively trivial and frivolous interest of having a very pleasant experience.[8]

Arguably, common sense even suggests the same in the case where there are many more children who would find the ride very pleasant, while still only one child who would find it very unpleasant (provided, again, that the children will all be perfectly fine if they do not ride). Indeed, I believe a significant fraction of people would say the same no matter how many such “ride-positive” children we put on the scale: it would still be wrong to give them the ride at the cost of forcing the “ride-negative” child to undergo the very unpleasant experience.[9]

And yet the suffering in this example — a very unpleasant experience on a roller coaster — can hardly be said to count as remotely extreme, much less an instance of the worst forms of suffering; the forms of suffering that constitute the strongest, and in my view overwhelming, case for the principle of sympathy for intense suffering. Such intense suffering, even if balanced against the most intense forms of pleasure imaginable, only demands even stronger relative sympathy and priority. However bad we may consider the imposition of a very unpleasant experience for the sake of a very pleasant one, the imposition of extreme suffering for the sake of extreme pleasure must be deemed far worse.

The Horrendous Support for SIS

The worst forms of suffering are so terrible that merely thinking about them for a brief moment can leave the average sympathetic person in a state of horror and darkness for a good while, and therefore, quite naturally, we strongly prefer not to contemplate these things. Yet if we are to make sure that we have our priorities right, and that our views about what matters most in this world are as well-considered as possible, then we cannot shy away from the task of contemplating and trying to appreciate the disvalue of these worst of horrors. This is no easy task, and not just because we are reluctant to think about the issue in the first place, but also because it is difficult to gain anything close to a true appreciation of the reality in question. As David Pearce put it:

It’s easy to convince oneself that things can’t really be that bad, that the horror invoked is being overblown, that what is going on elsewhere in space-time is somehow less real than this here-and-now, or that the good in the world somehow offsets the bad. Yet however vividly one thinks one can imagine what agony, torture or suicidal despair must be like, the reality is inconceivably worse. Hazy images of Orwell’s ‘Room 101’ barely hint at what I’m talking about. The force of ‘inconceivably’ is itself largely inconceivable here.[10]

Nonetheless, we can still gain at least some, admittedly rather limited, appreciation by considering some real-world examples of extreme suffering (what follows are examples of an extremely unpleasant character that may be triggering and traumatizing).

One such example is the tragic fate of the Japanese girl Junko Furuta who was kidnapped in 1988, at the age of 16, by four teenage boys. According to their own trial statements, the boys raped her hundreds of times; “inserted foreign objects, such as iron bars, scissors and skewers into her vagina and anus, rendering her unable to defecate and urinate properly”; “beat her several times with golf clubs, bamboo sticks and iron rods”; “used her as a punching bag by hanging her body from the ceiling”; “dropped barbells onto her stomach several times”; “set fireworks into her anus, vagina, mouth and ear”; “burnt her vagina and clitoris with cigarettes and lighters”; “tore off her left nipple with pliers”; and more. Eventually, she was no longer able to move from the ground, and she repeatedly begged the boys to kill her, which they eventually did, after 44 days.[11]

An example of extreme suffering that is much more common, indeed something that happens countless times every single day, is being eaten alive, a process that can sometimes last several hours with the victim still fully conscious of being devoured, muscle by muscle, organ by organ. A harrowing example of such a death that was caught on camera (see the following note) involved a baboon tearing apart the hind legs of a baby gazelle and eating this poor individual who remained conscious for longer than one would have thought and hoped possible.[12] A few minutes of a much more protracted such painful and horrifying death can be seen via the link in the following note (lions eating a baby elephant alive).[13] And a similar, yet quicker death of a man can be seen via the link in the following note.[14] Tragically, the man’s wife and two children were sitting in a car next to him while it happened, yet they were unable to help him, and knowing this probably made the man’s experience even more horrible, which ties into a point made by Simon Knutsson:

Sometimes when the badness or moral importance of torture is discussed, it is described in terms of different stimuli that cause tissue damage, such as burning, cutting or stretching. But one should also remember different ways to make someone feel bad, and different kinds of bad feelings, which can be combined to make one’s overall experience even more terrible. It is arguably the overall unpleasantness of one’s experience that matters most in this context.[15]

After giving a real-world example with several layers of extreme cruelty and suffering combined, Knutsson goes on to write:

Although this example is terrible, one can imagine how it could be worse if more types of violence and bad feelings were added to the mix. To take another example: [Brian] Tomasik often talks about the Brazen bull as a particularly bad form of torture. The victim is locked inside a metal bull, a fire is lit underneath the bull and the victim is fried to death. It is easy to imagine how this can be made worse. For example, inject the victim with chemicals that amplify pain and block the body’s natural pain inhibitors, and put her loved ones in the bull so that when she is being fried, she also sees her loved ones being fried. One can imagine further combinations that make it even worse. Talking only of stimuli such as burning almost trivializes how bad experiences can be.[16]

Another example of extreme suffering is what happened to Dax Cowart. In 1973, at the age of 25, Dax went on a trip with his father to visit land that he considered buying. Unfortunately, due to a pipeline leak, the air over the land was filled with propane gas, which is highly flammable when combined with oxygen. As they started their car, the propane ignited, and the two men found themselves in a burning inferno. Dax’s father died, and Dax himself had much of his hands, eyes, and ears burned away; two thirds of his skin was severely burned.[17]

The case of Dax has since become quite famous, not only, or even mainly, because of the extreme horror he experienced during this explosion, but because of the ethical issues raised by his treatment, which turned out to be about as torturous as the explosion itself. For Dax himself repeatedly said, immediately after the explosion as well as for months later, that he wanted to die more than anything else, and that he did not want to be subjected to any treatment that would keep him alive. Nonetheless, he was forcibly treated for a period of ten months, during which he tried to take his life several times.
Since then, Dax has managed to recover and live what he considers a happy life — he successfully sued the oil company responsible for the pipeline leak, which left him financially secure; he earned a law degree; and got married. Yet even so, he still wishes that he had been killed rather than treated. In Dax’s own view, no happiness could ever compensate for what he went through.[18]

This kind of evaluation is exactly what the ethical principle advocated here centers on, and what the principle amounts to is simply a refusal to claim that Dax’s evaluation, or any other like it, is wrong. It maintains that we should not allow the occurrence of such extreme horrors for the sake of any intrinsic good, and hence that we should prioritize alleviating and preventing them over anything else.[19]

One may object that the examples above do not all comprise clear cases where the suffering subject deems their suffering so bad that nothing could ever outweigh it. And more generally, one may object that there can exist intense suffering that is not necessarily deemed so bad that nothing could outweigh it, either because the subject is not able to make such an evaluation, or because the subject just chooses not to evaluate it that way. What would the principle of sympathy for intense suffering say about such cases? It would say the following: in cases where the suffering is intense, yet the sufferers choose not to deem it so bad that nothing could outweigh it (we may call this “red suffering”), we should prioritize reducing suffering of the kind that would be deemed unoutweighable (what we may call “black suffering”). And in cases where the sufferers cannot make such evaluations, we may say that suffering at a level of intensity comparable to the suffering deemed unoutweighable by subjects who can make such evaluations should also be considered unoutweighable, and its prevention should be prioritized over all less intense forms of suffering.

Yet this is, of course, all rather theoretical. In practice, even when subjects do have the ability to evaluate their experience, we will, as outside observers, usually not be able to know what their evaluation is — for instance, how someone who is burning alive might evaluate their experience. In practice, all we can do is make informed assessments of what counts as suffering so intense that such an evaluation of unoutweighability would likely be made by the sufferer, assuming an idealized situation where the sufferer is able to evaluate the disvalue of the experience.[20]


I shall spare the reader from further examples of extreme suffering here in the text, and instead refer to sources, found in the following note, that contain additional cases that are worth considering in order to gain a greater appreciation of extreme suffering and its disvalue.[21] And the crucial question we must ask ourselves in relation to these examples — which, as hinted by the quote above by Knutsson, are probably far from the worst possible manifestations of suffering — is whether the creation of happiness or any other intrinsic good could ever justify the creation, or the failure to prevent, suffering this bad and worse. If not, this implies that our priority should not be to create happiness or other intrinsic goods, but instead to prevent extreme suffering of this kind above anything else, regardless of where in time and space it may risk emerging.

Objections to SIS

Among the objections against this view I can think of, the strongest, at least at first sight, is the sentiment: but what about that which is most precious in your life? What about the person who is most dear to you? If anything stands a chance of outweighing the disvalue of extreme suffering, surely this is it. In more specific terms: does it not seem plausible to claim that, say, saving the most precious person in one’s life could be worth an instance of the very worst form of suffering?

Yet one has to be careful about how this question is construed. If what we mean by “saving” is that we save them from extreme suffering, then we are measuring extreme suffering against extreme suffering, and hence we have not pointed to a rival candidate for outweighing the superlative disvalue of extreme suffering. Therefore, if we are to point to such a candidate, “saving” must here mean something that does not itself involve extreme suffering, and, if we wish to claim that there is something wholly different from the reduction of suffering that can be put on the scale, it should preferably involve no suffering at all. So the choice we should consider is rather one between 1) the mixed bargain of an instance of the very worst form of suffering, i.e. black suffering, and the continued existence of the most precious person one knows, or 2) the painless discontinuation of the existence of this person, yet without any ensuing suffering for others or oneself.

Now, when phrased in this way, choosing 1) may not sound all that bad to us, especially if we do not know the one who will suffer. Yet this would be cheating — nothing but an appeal to our faulty and all too partial moral intuitions. It clearly betrays the principle of impartiality,[22] according to which it should not matter whom the suffering in question is imposed upon; it should be considered equally disvaluable regardless.[23] Thus, we may equivalently phrase the choice above as being between 1) the continued existence of the most precious person one knows of, yet at the price that this being has to experience a state of extreme suffering, a state this person deems so bad that, according to them, it could never be outweighed by any intrinsic good, or 2) the discontinuation of the existence of this being without any ensuing suffering. When phrased in this way, it actually seems clearer to me than ever that 2) is the superior choice, and that we should adopt the principle of sympathy for intense suffering as our highest ethical principle. For how could one possibly justify imposing such extreme, and in the mind of the subject unoutweighable, suffering upon the most precious person one knows, suffering that this person would, at least in that moment, rather die than continue to experience? In this way, for me at least, it is no overstatement to say that this objection against the principle of sympathy for intense suffering, when considered more carefully, actually ends up being one of the strongest cases for it.

Another seemingly compelling objection would be to question whether an arbitrarily long duration of intense, yet, according to the subject, not unoutweighable suffering, i.e. red suffering, is really less bad than even just a split second of suffering that is deemed unoutweighable, i.e. black suffering. Counter-intuitively, my response, at least in this theoretical case, would be to bite the bullet and say “yes”. After all, if we take the subject’s own reports as the highest arbiter of the (dis)value of experiential states, then the black suffering cannot be outweighed by anything, whereas the red suffering can. Also, it should be noted that this thought experiment likely conflicts with quite a few sensible, real-world intuitions we have. For instance, in the real world, it seems highly likely that a subject who experiences extreme suffering for a long time will eventually find it unbearable, and say that nothing can outweigh it, contrary to the hypothetical case we are considering. Another such confounding real-world intuition might be one that reminds us that most things in the real world tend to fluctuate in some way, and hence, intuitively, it seems like there is a significant risk that a person who endures red suffering for a long time will also experience black suffering (again contrary to the actual conditions of the thought experiment), and perhaps even experience a lot of it, in which case this indeed is worse than a mere split second of black suffering on any account.

Partly for this latter reason, my response would also be different in practice. For again, in the real world, we are never able to determine the full consequences of our actions, and nor are we usually able to determine from the outside whether someone is experiencing red or black suffering, which implies that we have to take uncertainty and risks into account. Also because, even if we did know that a subject deemed some state of suffering as “merely” red at one point, this would not imply that their suffering at other moments where they appear to be in a similar state will also be deemed red as opposed to black. For in the real world it is indeed to be expected that significant fluctuations will occur, as well as that “the same suffering”, in one sense at least, will be felt as worse over time. Indeed, if the suffering is extreme, it all but surely will be deemed unbearable eventually.

Thus, in the real world, any large amount of extreme suffering is likely to include black suffering too, and therefore, regardless of whether we think some black suffering is worse than any amount of red suffering, the only reasonable thing to do in practice is to avoid getting near the abyss altogether.

Bias Alert: We Prefer to Not Think About Extreme Suffering

As noted above, merely thinking about extreme suffering can evoke unpleasant feelings that we naturally prefer to avoid. And this is significant for at least two reasons. First, it suggests that thinking deeply about extreme suffering might put our mental health at risk, and hence that we have good reason, and a strong personal incentive, to avoid engaging in such deeper thinking. Second, in part for this first reason, it suggests that we are biased against thinking deeply about extreme suffering, and hence biased against properly appreciating the true horror and disvalue of such suffering. Somewhat paradoxically, (the mere thought of) the horror of extreme suffering keeps us from fully appreciating the true scope of this horror. And this latter consideration is significant in the context of trying to fairly evaluate the plausibility of views that say we should give special priority to such suffering, including the view presented above.

Indeed, one can readily tell a rather plausible story about how many of the well-documented biases we reviewed previously might conspire to produce such a bias against appreciating the horror of suffering.[24] For one, we have wishful thinking, our tendency to believe as true what we wish were true, which in this case likely pulls us toward the belief that it can’t be that bad, and that, surely, there must be something of greater value, some grander quest worth pursuing in this world than the mere negative, rather anti-climatic “journey” of alleviating and preventing extreme suffering. Like most inhabitants of Omelas, we wishfully avoid giving much thought to the bad parts, and instead focus on all the good — although our sin is, of course, much greater than theirs, as the bad parts in the real world are indescribably worse on every metric, including total amount, relative proportions, and intensity.

To defend this wishfully established view, we then have our confirmation bias. We comfortably believe that it cannot really be that bad, and so in perfect confirmation bias textbook-style, we shy away from and ignore data that might suggest otherwise. We choose not to look at the horrible real-world examples that might change our minds, and to not think too deeply about the arguments that challenge our merry conceptions of value and ethics. All of this for extremely good reasons, of course. Or at least so we tell ourselves.[25]

Next, we have groupthink and, more generally, our tendency to conform to our peers. Others do not seem to believe that extreme suffering is that horrible, or that reducing it should be our supreme goal, and thus our bias to conform smoothly points us in the same direction as our wishful thinking and confirmation bias. That direction being: “Come on, lighten up! Extreme suffering is probably not that bad, and it probably can be outweighed somehow. This is what I want to believe, it is what my own established and comfortable belief says, and it is what virtually all my peers seem to believe. Why in the world, then, would I believe anything else?”

Telling such a story of bias might be considered an unfair move, a crude exercise in pointing fingers at others and exclaiming “You’re just biased!”, and admittedly it is to some extent. Nonetheless, I think two things are worth noting in response to such a sentiment. First, rather than having its origin in finger pointing at others, the source of this story is really autobiographical: it is a fair characterization of how my own mind managed to repudiate the immense horror and primacy of extreme suffering for a long time. And merely combining this with the belief that I am not a special case then tentatively suggests that a similar story might well apply to the minds of others too.

Second, it should be noted that a similar story cannot readily be told in the opposite direction — about the values defended here. In terms of wishful thinking, it is not particularly wishful or feel-good to say that extreme suffering is immensely bad, and that there is nothing of greater value in the world than to prevent it. That is not a pretty or satisfying story for anyone. The view also seems difficult to explain via an appeal to confirmation bias, since many of those who hold this view of extreme suffering, including myself, did not hold it from the outset, but instead changed their minds toward it upon considering arguments and real-world examples that support it. The same holds true of our tendency to conform to our peers. For although virtually nobody appears to seriously doubt that suffering has disvalue, the view that nothing could be more important than preventing extreme suffering does not seem widely held, much less widely expressed. It lies far from the narrative about the ultimate mission and future purpose of humanity that prevails in most circles, which runs more along the lines of “Surely it must all be worth it somehow, right?”

This last consideration about how we stand in relation to our peers is perhaps especially significant. For the truth is that we are a signalling species: we like to appear cool and impressive.[26] And to express the view that nothing matters more than the prevention of extreme suffering seems a most unpromising way of doing so. It has a strong air of darkness and depression about it, and, worst of all, it is not a signal of strength and success, which is perhaps what we are driven the most to signal to others, prospective friends and mates alike. Such success signalling is not best done with darkness, but with light: by exuding happiness, joy, and positivity. This is the image of ourselves, including our worldview, that we are naturally inclined to project, which then ties into the remark made above — that this view does not seem widely held, “much less widely expressed”. For even if we are inclined to hold this view, we appear motivated to not express it, lest we appear like a sad loser.


In sum, by my lights, effective altruism proper is equivalent to effectively reducing extreme suffering. This, I would argue, is the highest meaning of “improving the world” and “benefiting others”, and hence what should be considered the ultimate goal of effective altruism. The principle of sympathy for intense suffering argued for here stems neither from depression, nor resentment, nor hatred. Rather, it simply stems, as the name implies, from a deep sympathy for intense suffering.[27] It stems from a firm choice to side with the evaluations of those who are superlatively worst off, and from this choice follows a principled unwillingness to allow the creation of such suffering for the sake of any amount of happiness or any other intrinsic good. And while it is true that this principle has the implication that it would have been better if the world had never existed, I think the fault here is to be found in the world, not the principle.

Most tragically, some pockets of the universe are in a state of insufferable darkness — a state of black suffering. In my view, such suffering is like a black hole that sucks all light out of the world. Or rather: the intrinsic value of all the light of the world pales in comparison to the disvalue of this darkness. Yet, by extension, this also implies that there is a form of light whose value does compare to this darkness, and that is the kind of light we should aspire to become, namely the light that brightens and prevents this darkness.[28] We shall delve into how this can best be done shortly, but first we shall delve into another issue: our indefensibly anthropocentric take on altruism and “philanthropy”.


(For the full bibliography, see the end of my book.)

[1] This view is similar to what Brian Tomasik calls consent-based negative utilitarianism:
And the Organisation for the Prevention of Intense Suffering (OPIS) appears founded upon a virtually identical principle:
I do not claim that this view is original; merely that it is important.

[2] And I have read them all, though admittedly not their complete works. Bentham can seem to come close in chapter 4 of his Principles of Morals and Legislation, where he outlines a method for measuring pain and pleasure. One of the steps of this method consists in summing up the values of “[…] all the pleasures on one side and of all the pains on the other.” And later he writes of this process that it is “[…] applicable to pleasure and pain in whatever form they appear […]”. Yet he does not write that the sum will necessarily be finite, nor, more specifically, whether every instance of suffering necessarily can be outweighed by some pleasure. I suspect Bentham, as well as Mill and Sidgwick, never contemplated this question in the first place.

[3] A recommendable essay on the issue is Simon Knutsson’s “Measuring Happiness and Suffering”:

[4] However, a defender of tranquilism would, of course, question whether we are indeed talking about a pleasure outweighing some suffering rather than it, upon closer examination, really being a case of a reduction of some form of suffering outweighing some other form of suffering

[5] And therefore, if one assumes a framework of so-called moral uncertainty, it seems that one should assign much greater plausibility to negative value lexicality than to positive value lexicality (cf., also in light of the point made in the previous chapter that many have doubted the positive value of happiness (as being due to anything but its absence of suffering), whereas virtually nobody has seriously doubted the disvalue of suffering.

[6] But what if there are several levels of extreme suffering, where an experience on each level is deemed so bad that no amount of experiences on a lower level could outweigh it? This is a tricky issue, yet to the extent that these levels of badness are ordered such that, say, no amount of level I suffering can outweigh a single instance of level II suffering (according to a subject who has experienced both), then I would argue that we should give priority to reducing level II suffering. Yet what if level I suffering is found to be worse than level II suffering in the moment of experiencing it, while level II suffering is found to be worse than level I suffering when it is experienced? One may then say that the evaluation should be up to some third experience-moment with memory of both states, and that we should trust such an evaluation, or, if this is not possible, we may view both forms of suffering as equally bad. Whether such dilemmas arise in the real world, and how to best resolve them in case they do, stands to me as an open question.
Thus, cf. the point about the lack of clarity and specification of values we saw two chapters ago, the framework I present here is not only not perfectly specific, as it surely cannot be, but it is admittedly quite far from it indeed. Nonetheless, it still comprises a significant step in the direction of carving out a clearer set of values, much clearer than the core value of, say, “reducing suffering”.

[7] A similar example is often used by the suffering-focused advocate Inmendham.

[8] This is, of course, essentially the same claim we saw a case for in the previous chapter: that creating happiness at the cost of suffering is wrong. The principle advocated here may be considered a special case of this claim, namely the special case where the suffering in question is deemed irredeemably bad by the subject.

[9] Cf. the gut feeling many people seem to have that the scenario described in The Ones Who Walk Away from Omelas should not be brought into the world regardless of how big the city of Omelas would be. Weak support for this claim is also found in the following survey, in which a plurality of people said that they think future civilization should strive to minimize suffering (over, for instance, maximizing positive experiences):

A personal anecdote of mine in support of Pearce’s quote is that I tend to write and talk a lot about reducing suffering, and yet I am always unpleasantly surprised by how bad it is when I experience even just borderline intense suffering. I then always get the sense that I have absolutely no idea what I am talking about when I am talking about suffering in my usual happy state, although the words I use in that state are quite accurate: that it is really bad. In those bad states I realize that it is far worse than we tend to think, even when we think it is really, really bad. It truly is inconceivable, as Pearce writes, since we simply cannot simulate that badness in a remotely faithful way when we are feeling good, quite analogously to the phenomenon of binocular rivalry, where we can only perceive one of two visual images at a time.







[17] Dax describes the accident himself in the following video:

[18] Brülde, 2010, p. 576; Benatar, 2006, p. 63.

[19] And if one thinks such extreme suffering can be outweighed, an important question to ask oneself is: what exactly does it mean to say that it can be outweighed? More specifically, according to whom, and measured by what criteria, can such suffering be outweighed? The only promising option open, it seems, is to choose to prioritize the assessments of beings who say that their happiness, or other good things about their lives, can outweigh the existence of such extreme suffering — i.e. to actively prioritize the evaluations of such notional beings over the evaluations of those enduring, by their own accounts, unoutweighable suffering. What I would consider a profoundly unsympathetic choice.

[20] This once again hints at the point made earlier that we in practice are unable to specify in precise terms 1) what we value in the world, and 2) how to act in accordance with any set of plausible values. Rough, qualified approximations are all we can hope for.


[22] Or one could equivalently say that it betrays the core virtue of being consistent, as it amounts to treating/valuing similar beings differently.

[23] I make a more elaborate case for this conclusion in my book You Are Them.

[24] One might object that it makes little sense to call a failure to appreciate the value of something a bias, as this is a moral rather than an empirical disagreement, to which I would respond: 1) the two are not as easy to separate as is commonly supposed (cf. Putnam, 2002), 2) one clearly can be biased against fairly considering an argument for a moral position — for instance, we can imagine an example where someone encounters a moral position and then, due to being brought up in a culture that dislikes that moral position, fails to properly engage with and understand this position, although this person would in fact agree with it upon reflection; such a failure can fairly be said to be due to bias — and 3) at any rate, the question concerning what it is like to experience certain states of consciousness is a factual matter, including how horrible they are deemed from the inside, and this is something we can be factually wrong about as outside observers.

[25] Not that sparing our own mental health is not a good reason for not doing something potentially traumatizing, but the question is just whether it is really worth letting our view of our personal and collective purpose in life be handicapped and biased, at the very least less well-informed than it otherwise could be, for that reason. Whether such self-imposed ignorance can really be justified, both to ourselves and the world at large.

[26] Again, Robin Hanson and Kevin Simler’s book The Elephant in the Brain makes an excellent case for this claim.

[27] And hence being animated by this principle is perfectly compatible with living a happy, joyous, and meaningful life. Indeed, I would argue that it provides the deepest meaning one could possibly find.

[28] I suspect both the content and phrasing of the last couple of sentences are inspired by the following quote I saw written on Facebook by Robert Daoust: “What is at the center of the universe of ethics, I suggest, is not the sun of the good and its play of bad shadows, but the black hole of suffering.”

Blog at

Up ↑