Thoughts on AI pause

Whether to push for an AI pause is a hotly debated question. This post contains some of my thoughts on the issue of AI pause and the discourse that surrounds it.

Contents

The motivation for an AI pause

Generally speaking, it seems that the primary motivation behind pushing for an AI pause is that work on AI safety is far from where it needs to be for humanity to maintain control of future AI progress. Therefore, a pause is needed so that work on AI safety — and other related work, such as AI governance — can catch up with the pace of progress in AI capabilities.

My thoughts on AI pause, in brief

Whether it is worth pushing for an AI pause obviously depends on various factors. For one, it depends on the opportunity cost: what could we be doing otherwise? After all, even if one thinks that an AI pause is desirable, one might still have reservations about its tractability compared to other aims. And even if one thinks that an AI pause is both desirable and tractable, there might still be other aims and activities that are even more beneficial, such as working on worst-case AI safety (Gloor, 2016; Yudkowsky, 2017; Baumann, 2018), or increasing the priority that people devote to reducing risks of astronomical suffering (s-risks) (Althaus & Gloor, 2016; Baumann 2017; 2022; DiGiovanni, 2021).

Furthermore, there is the question of whether an AI pause would even be beneficial in the first place. This is a complicated question, and I will not explore it in detail here. (For a critical take, see “AI Pause Will Likely Backfire” by Nora Belrose.) Suffice it to say that, in my view, it seems highly uncertain whether any realistic AI pause would be beneficial overall — not just from a suffering-focused perspective, but from the perspective of virtually all impartial value systems. It seems to me that most advocates for AI pause are quite overconfident on this issue.

But to clarify, I am by no means opposed to advocating for an AI pause. It strikes me as something that one can reasonably conclude is helpful and worth doing (depending on one’s values and empirical judgement calls). My current assessment is just that it is unlikely to be among the best ways to reduce future suffering, mainly because I view the alternative activities outlined above as being more promising, and because I suspect that most realistic AI pauses are unlikely to be clearly beneficial overall.

My thoughts on AI pause discourse

A related critical observation about much of the discourse around AI pause is that it tends toward a simplistic “doom vs. non-doom” dichotomy. That is, the picture that is conveyed seems to be that either humanity loses control of AI and goes extinct, which is bad, or humanity maintains control, which is good. And your probability of the former is your “p(doom)”.

Of course, one may argue that it makes sense to speak in such dichotomous terms for strategic and communication purposes. Yet the problem, in my view, is that this kind of picture is not accurate even to a first approximation. From an altruistic perspective, it is not remotely the case that “loss of control to AI” = “bad”, while “humans maintaining control” = “good”.

For example, if we are concerned with the reduction of s-risks (which is important by the lights of virtually all impartial value systems), we must compare the relative risks of “loss of control to AI” with the risks of “humans maintaining control” — however we define these rough categories. And sadly, it is not the case that “humans maintaining control” is associated with a negligible or trivial risk of worst-case outcomes. Indeed, it is not clear whether “humans maintaining control” is generally associated with better or worse prospects than “loss of control to AI” when it comes to s-risks.

In general, the question of whether a “human-controlled future” is better or worse with respect to reducing future suffering is a difficult one that has been discussed and debated at some length, and no clear consensus has emerged. As a case in point, Brian Tomasik places a 52 percent subjective probability on the claim that “Human-controlled AGI in expectation would result in less suffering than uncontrolled”.

This near-50/50 view stands in stark contrast to what often seems assumed as a core premise in much of the discourse surrounding AI pause, namely that a human-controlled future would obviously be far better.

(Some reasons why one might be pessimistic regarding human-controlled futures can be found in the literature on human moral failings; see e.g. Cooper, 2018; Huemer, 2019; Kidd, 2020; Svoboda, 2022. Other reasons include basic competitive aims and dynamics that are likely to be found in a wide range of futures, including human-controlled ones; see e.g. Tomasik, 2013; Knutsson, 2022, sec. 3. See also Vinding, 2022.)

Massive moral urgency: Yes, in both categories of worst-case risks

There is a key point on which I agree strongly with advocates for an AI pause: there is a massive moral urgency in ensuring that we do not end up with horrific AI-controlled outcomes. Too few people appreciate this insight, and even fewer seem deeply moved by it.

At the same time, I think there is a similarly massive urgency in ensuring that we do not end up with horrific human-controlled outcomes. And humanity’s current trajectory is unfortunately not all that reassuring with respect to either of these broad classes of risks.

The upshot for me is that there is a roughly equal moral urgency in avoiding each of these categories of worst-case risks, and it seems doubtful to me that pushing for an AI pause is the best way to reduce these risks overall.

June 6, 2024

From AI to distant probes

The aim of this post is to present a hypothetical future scenario that challenges some of our basic assumptions and intuitions about our place in the cosmos.

Hypothetical future scenario: Earth-descendant probes

Imagine a future scenario in which AI progress continues, and where the ruling powers on Earth eventually send out advanced AI-driven probes to explore other star systems. The ultimate motives of these future Earth rulers may be mysterious and difficult to grasp from our current vantage point, yet we can nevertheless understand that their motives — in this hypothetical scenario — include the exploration of life forms that might have emerged or will emerge elsewhere in the universe. (The fact that there are already projects aimed at sending out (much less advanced) probes to other star systems is arguably some evidence of the plausibility of this future scenario.)

Such exploration may be considered important by these future Earth rulers for a number of reasons, but a prominent reason they consider it important is that it helps inform their broader strategy for the long-term future. By studying the frequency and character of nascent life elsewhere, they can build a better picture of the long-run future of life in the universe. This includes gaining a better picture of where and when these Earth descendants might eventually encounter other species — or probes — that are as advanced as themselves, and not least what these other advanced species might be like in terms of their motives and their propensities toward conflict or cooperation.

The Earth-descendant probes will take an especially strong interest in life forms that are relatively close to matching their own, functionally optimized level of technological development. Why? First of all, they wish to ensure that the ascending civilizations do not come to match their own level of technological sophistication, which the Earth-descendant probes will eventually take steps to prevent so as to not lose their power and influence over the future.

Second, they will study ascending civilizations because what takes place at that late “sub-optimized” stage may be particularly informative for estimating the nature of the fully optimized civilizations that the Earth-descendant probes might encounter in the future (at least the late sub-optimized stage of development seems more informative than do earlier stages of life where comparatively less change happens over time).

From the point of view of these distant life forms, the Earth-descendant probes are almost never visible, and when they occasionally are, they appear altogether mysterious. After all, the probes represent a highly advanced form of technology that the distant life forms do not yet understand, much less master, and the potential motives behind the study protocols of these rarely appearing probes are likewise difficult to make sense of from the outside. Thus, the distant life forms are being studied by the Earth-descendant probes without having any clear sense of their zoo-like condition.

Back to Earth

Now, what is the point of this hypothetical scenario? One point I wish to make is that this is not an absurd or unthinkable scenario. There are, I submit, no fantastical or unbelievable steps involved here, and we can hardly rule out that some version of this scenario could play out in the future. This is obviously not to say that it is the most likely future scenario, but merely that something like this scenario seems fairly plausible provided that technological development continues and eventually expands into space (perhaps around 1 to 10 percent likely?).

But what if we now make just one (theoretically) small change to this scenario such that Earth is no longer the origin of the advanced probes in question, but instead one of the perhaps many planets that are being visited and studied by advanced probes that originated elsewhere in the universe? Essentially, we are changing nothing in the scenario above, except for swapping which exact planet Earth happens to be.

Given the structural equivalence of these respective scenarios, we should hardly consider the swapped scenario to be much less plausible. Sure, we know for a fact that life has arisen on Earth, and hence the projection that Earth-originating life might eventually give rise to advanced probes is not entirely speculative. Yet there is a countervailing consideration that suggests that — conditional on a scenario equivalent to the one described above occurring — Earth is unlikely to be the first planet to give rise to advanced space probes, and is instead more likely to be observed by probes from elsewhere.

The reason is simply that Earth is but one planet, whereas there are many other planets from which probes could have been sent to study Earth. For example, in a scenario in which a single civilization creates advanced probes that eventually go out and explore, say, a thousand other planets with life at roughly our stage of development (observed at different points in time), we would have a 1 in 1,001 chance of being that first exploring civilization — and a 1,000 in 1,001 chance of being an observed one, under this assumed scenario.

Indeed, even if the exploring civilization in this kind of scenario only ever visits, say, two other planets with life at roughly our stage, we would still be more likely to be among the observed ones than that first observing one (2 in 3 versus 1 in 3). Thus, whatever probability we assign to the hypothetical future scenario in which Earth-descendant space probes observe other life forms at roughly our stage, we should arguably assign a greater probability to a scenario in which we are being observed by similar such probes.

Nevertheless, I think many of us will intuitively think just the opposite, namely that the scenario involving Earth-descendant probes observing others seems far more plausible than the scenario in which we are currently being observed by foreign probes. Indeed, many of us intuitively find the foreign-probes scenario to be quite ridiculous. (That is also largely the attitude that is expressed in leading scholarly books on the Fermi paradox, with scant justification.)

Yet this complete dismissal is difficult to square with the apparent plausibility — or at least the non-ridiculousness — of the “Earth-descendant probes observing others” scenario, as well as the seemingly greater plausibility of the foreign probe scenario compared to the “Earth-descendant probes observing others” scenario. There appears to be a breakdown of the transitivity of plausibility and ridiculousness at the level of our intuitions.

What explains this inconsistency?

I can only speculate on what explains this apparent inconsistency, but I suspect that various biases and cultural factors are part of the explanation.

For example, wishful thinking could well play a role: we may better like a scenario in which Earth’s descendants will be the most advanced species in the universe, compared to a scenario in which we are a relatively late-coming and feeble party without any unique influence over the future. This could in turn cause us to ignore or downplay any considerations that speak against our preferred beliefs. And, of course, apart from our relative feebleness, being observed by an apparently indifferent superpower that does not intervene to prevent even the most gratuitous suffering would seem like bad news as well.

Perhaps more significantly, there is the force of cultural sentiment and social stigma. Most of us have grown up in a culture that openly ridicules the idea of an extraterrestrial presence around Earth. Taking that idea seriously has effectively been just another way of saying that you are a dumb-dumb (or worse), and few of us want to be seen in that way. For the human mind, that is a pressure so strong that it can move continents, and even block mere open-mindedness.

Given the unreasonable effectiveness of such cultural forces in schooling our intuitions, many of us intuitively “just know” in our bones that the idea of an extraterrestrial presence around Earth is ridiculous, with little need to invoke actual cogent reasons.

To be clear, my point here is not that we should positively believe in such a foreign presence, but merely that we may need to revise our intuitive assessment of this possibility, or at least question whether our intuitions and our level of open-mindedness toward this possibility are truly well-grounded.

March 21, 2024

What might we infer about optimized futures?

It is plausible to assume that technology will keep on advancing along various dimensions until it hits fundamental physical limits. We may refer to futures that involve such maxed-out technological development as “optimized futures”.

My aim in this post is to explore what we might be able to infer about optimized futures. Most of all, my aim is to advance this as an important question that is worth exploring further.

Contents

Optimized futures: End-state technologies in key domains

The defining feature of optimized futures is that they entail end-state technologies that cannot be further improved in various key domains. Some examples of these domains include computing power, data storage, speed of travel, maneuverability, materials technology, precision manufacturing, and so on.

Of course, there may be significant tradeoffs between optimization across these respective domains. Likewise, there could be forms of “ultimate optimization” that are only feasible at an impractical cost — say, at extreme energy levels. Yet these complications are not crucial in this context. What I mean by “optimized futures” are futures that involve practically optimal technologies within key domains (such as those listed above).

Why optimized futures are plausible

There are both theoretical and empirical reasons to think that optimized futures are plausible (by which I here mean that they are at least somewhat probable — perhaps more than 10 percent likely).

Theoretically, if the future contains advanced goal-driven agents, we should generally expect those agents to want to achieve their goals in the most efficient ways possible. This in turn predicts continual progress toward ever more efficient technologies, at least as long as such progress is cost-effective.

Empirically, we have an extensive record of goal-oriented agents trying to improve their technology so as to better achieve their aims. Humanity has gone from having virtually no technology to creating a modern society surrounded by advanced technologies of various kinds. And even in our modern age of advanced technology, we still observe persistent incentives and trends toward further improvements in many domains of technology — toward better computers, robots, energy technology, and so on.

It is worth noting that the technological progress we have observed throughout human history has generally not been the product of some overarching collective plan that was deliberately aimed at technological progress. Instead, technological progress has in some sense been more robust than that, since even in the absence of any overarching plan, progress has happened as the result of ordinary demands and desires — for faster computers, faster and safer transportation, cheaper energy, etc.

This robustness is a further reason to think that optimized futures are plausible: even without any overarching plan aimed toward such a future, and even without any individual human necessarily wanting continued technological development leading to an optimized future, we might still be pulled in that direction all the same. And, of course, this point about plausibility applies to more than just humans: it applies to any set of agents who will be — or have been — structuring themselves in a sufficiently similar way so as to allow their everyday demands to push them toward continued technological development.

An objection against the plausibility of optimized futures is that there might be a lot of hidden potential for progress far beyond what our current understanding of physics seems to allow. However, such hidden potential would presumably be discovered eventually, and it seems probable that such hidden potential would likewise be exhausted at some point, even if it may happen later and at more extreme limits than we currently envision. That is, the broad claim that there will ultimately be some fundamental limits to technological development is not predicated on the more narrow claim that our current understanding of those limits is necessarily correct; the broader claim is robust to quite substantial extensions of currently envisioned limits. Indeed, the claim that there will be no fundamental limits to future technological development overall seems a stronger and less empirically grounded claim than does the claim that there will be such limits (cf. Lloyd, 2000; Krauss & Starkman, 2004).

Why optimized futures are worth exploring

The plausibility of optimized futures is one reason to explore them further, and arguably a sufficient reason in itself. Another reason is the scope of such futures: the futures that contain the largest numbers of sentient beings will most likely be optimized futures, suggesting that we have good reason to pay disproportionate attention to such futures, beyond what their degree of plausibility might suggest.

Optimized futures are also worth exploring given that they seem to be a likely point of convergence for many different kinds of technological civilizations. For example, an optimized future seems a plausible outcome of both human-controlled and AI-controlled Earth-originating civilizations, and it likewise seems a plausible outcome of advanced alien civilizations. Thus, a better understanding of optimized futures can potentially apply robustly to many different kinds of future scenarios.

An additional reason it is worth exploring optimized futures is that they overall seem quite neglected, especially given how plausible and consequential such futures appear to be. While some efforts have been made to clarify the physical limits of technology (see e.g. Sandberg, 1999; Lloyd, 2000; Krauss & Starkman, 2004), almost no work has been done on the likely trajectories and motives of civilizations with optimized technology, at least to my knowledge.

Lastly, the assumption of optimized technology is a rather strong constraint that might enable us to say quite a lot about futures that conform to that assumption, suggesting that this could be a fruitful perspective to adopt in our attempts to think about and predict the future.

What can we say about optimized futures?

The question of what we can say about optimized futures is a big one that deserves elaborate analysis. In this section, I will merely raise some preliminary points and speculative reflections.

Humanity may be close to (at least some) end-state technologies

One point that is worth highlighting is that a continuation of current rates of progress seems to imply that humanity could develop end-state technologies in information processing power within a few hundred years, perhaps 250 years at most (if current growth rates persist and assuming that our current understanding of the relevant physics is largely correct).

So at least in this important respect, and under the assumption of continued steady growth, humanity is surprisingly close to reaching an optimized future (cf. Lloyd, 2000).

Optimized civilizations may be highly interested in near-optimized civilizations

Such potential closeness to an optimized future could have significant implications in various ways. For example, if, hypothetically, there exists an older civilization that has already reached a state of optimized technology, any younger civilization that begins to approach optimized technologies within the same cosmic region would likely be of great interest to that older civilization.

One reason it might be of interest is that the optimized technologies of the younger civilization could potentially become competitive with the optimized technologies of the older civilization, and hence the older civilization may see a looming threat in the younger civilization’s advance toward such technologies. After all, since optimized technologies would represent a kind of upper bound of technological development, it is plausible that different instances of such technologies could be competitive with each other regardless of their origins.

Another reason the younger civilization might be of interest is that its trajectory could provide valuable information regarding the likely trajectories and goals of distant optimized civilizations that the older civilization may encounter in the future. (More on this point here.)

Taken together, these considerations suggest that if a given civilization is approaching optimized technology, and if there is an older civilization with optimized technology in its vicinity, this older civilization should take an increasing interest in this younger civilization so as to learn about it before the older civilization might have to permanently halt the development of the younger one.

Strong technological convergence across civilizations?

Another implication of optimized futures is that the technology of advanced civilizations across the universe might be remarkably convergent. Indeed, there are already many examples of convergent evolution in biology on Earth (e.g. eyes and large brains evolving several times independently). Likewise, many cases of convergence are found in cultural evolution in both early history (e.g. the independent emergence of farming, cities, and writing across the globe) as well as in recent history (e.g. independent discoveries in science and mathematics).

Yet the degree of convergence could well be even more pronounced in the case of the end-state technologies of advanced civilizations. After all, this is a case where highly advanced agents are bumping up against the same fundamental constraints, and the optimal engineering solutions in the face of these constraints will likely converge toward the same relatively narrow space of optimal designs — or at least toward the same narrow frontier of optimal designs given potential tradeoffs between different abilities.

In other words, the technologies of advanced civilizations might be far more similar and more firmly dictated by fundamental physical limits than we intuitively expect, especially given that we in our current world are used to seeing continually changing and improving technologies.

If technology stabilizes at an optimum, what might change?

The plausible convergence and stabilization of technological hardware also raises the interesting question of what, if anything, might change and vary in optimized futures.

This question can be understood in at least two distinct ways: what might change or vary across different optimized civilizations, and what might change over time within such civilizations? And note that prevalent change of the one kind need not imply prevalent change of the other kind. For example, it is conceivable that there might be great variation across civilizations, yet virtually no change in goals and values over time within civilizations (cf. “lock-in scenarios”).

Conversely, it is conceivable that goals and values change greatly over time within all optimized civilizations, yet such change could in principle still be convergent across civilizations, such that optimized civilizations tend to undergo roughly the same pattern of changes over time (though such convergence admittedly seems unlikely conditional on there being great changes over time in all optimized civilizations).

If we assume that technological hardware becomes roughly fixed, what might still change and vary — both over time and across different civilizations — includes the following (I am not claiming that this is an exhaustive list):

Space expansion: Civilizations might expand into space so as to acquire more resources; and civilizations may differ greatly in terms of how much space they manage to acquire.
More or different information: Knowledge may improve or differ over time and space; even if fundamental physics gets solved fairly quickly, there could still be knowledge to gain about, for example, how other civilizations tend to develop.
- There would presumably also be optimization for information that is useful and actionable. After all, even a technologically optimized probe would still have limited memory, and hence there would be a need to fill this memory with the most relevant information given its tasks and storage capacity.
Different algorithms: The way in which information is structured, distributed, and processed might evolve and vary over time and across civilizations (though it is also conceivable that algorithms will ultimately converge toward a relatively narrow space of optima).
Different goals and values: As mentioned above, goals and values might change and vary, such as due to internal or external competition, or (perhaps less likely) through processes of reflection.

In other words, even if everyone has — or is — practically the same “iPhone End-State”, what is running on these iPhone End-States, and how many of them there are, may still vary greatly, both across civilizations and over time. And these distinct dimensions of variation could well become the main focus of optimized civilizations, plausibly becoming the main dimensions on which civilizations seek to develop and compete.

Note also that there may be conflicts between improvements along these respective dimensions. For example, perhaps the most aggressive forms of space expansion could undermine the goal of gaining useful information about how other civilizations tend to develop, and hence advanced civilizations might avoid or delay aggressive expansion if the information in question would be sufficiently valuable (cf. the “info gain motive”). Or perhaps aggressive expansion would pose serious risks at the level of a civilization’s internal coordination and control, thereby risking a drift in goals and values.

In general, it seems worth trying to understand what might be the most coveted resources and the most prioritized domains of development for civilizations with optimized technology.

Information that says something about other optimized civilizations as an extremely coveted resource?

As hinted above, one of the key objectives of a civilization with optimized technology might be to learn, directly or indirectly, about other civilizations that it could encounter in the future. After all, if a civilization manages to both gain control of optimized technology and avoid destructive internal conflicts, the greatest threat to its apex status over time will likely be other civilizations with optimized technology. More generally, the main determinant of an optimized civilization’s success in achieving its goals — whether it can maintain an unrivaled apex status or not — could well be its ability to predict and interact gainfully with other optimized civilizations.

Thus, the most precious resource for any civilization with optimized technology might be information that can prepare this civilization for better exchanges with other optimized agents, whether those exchanges end up being cooperative, competitive, or outright aggressive. In particular, since the technology of optimized civilizations is likely to be highly convergent, the most interesting features to understand about other civilizations might be what kinds of institutions, values, decision procedures, and so on they end up adopting — the kinds of features that seem more contingent.

But again, I should stress that I mention these possibilities as speculative conjectures that seem worth exploring, not as confident predictions.

Practical implications?

In this section, I will briefly speculate on the implications of the prospect of optimized futures. Specifically, what might this prospect imply in terms of how we can best influence the future?

Prioritizing values and institutions rather than pushing for technological progress?

One implication is that there may be limited long-term payoffs in pushing for better technology per se, and that it might make more sense to prioritize the improvement of other factors, such as values and institutions. That is, if the future is in any case likely to be headed toward some technological optimum, and if the values and institutions (etc.) that will run this optimal technology are more contingent and “up for grabs”, then it arguably makes sense to prioritize those more contingent aspects.

To be clear, this is not to say that values and institutions will not also be subject to significant optimization pressures that push them in certain directions, but these pressures will plausibly still be weaker by comparison. After all, a wide range of values will imply a convergent incentive to create optimized technology, yet optimized technology seems compatible with a wide range of values and institutions. And it is not clear that there is a similarly strong pull toward some “optimized” set of values or institutions given optimized technology.

This perspective is arguably also supported by recent history. For example, we have seen technology improve greatly, with computing power heading in a clear upward direction over the past decades. Yet if we look at our values and institutions, it is much less clear whether they have moved in any particular direction over time, let alone an upward direction. Our values and institutions seem to have faced much less of a directional pressure compared to our technology.

More research

Perhaps one of the best things we can do to make better decisions with respect to optimized futures is to do research on such futures. The following are some broad questions that might be worth exploring:

What are the likely features and trajectories of optimized futures?
- Are optimized futures likely to involve conflicts between different optimized civilizations?
- Other things being equal, is a smaller or a larger number of optimized civilizations generally better for reducing risks of large-scale conflicts?
- More broadly, is a smaller or larger number of optimized civilizations better for reducing future suffering?
What might the likely features and trajectories of optimized futures imply in terms of how we can best influence the future?
Are there some values or cooperation mechanisms that would be particularly beneficial to instill in optimized technology?
- If so, what might they be, and how can we best work to ensure their (eventual) implementation?

Conclusion

The future might in some ways be more predictable than we imagine. I am not claiming to have drawn any clear or significant conclusions about how optimized futures are likely to unfold; I have mostly aired various conjectures. But I do think the question is valuable, and that it may provide a helpful lens for exploring how we can best impact the future.

Acknowledgments

Thanks to Tobias Baumann for helpful comments.

November 18, 2023

Does digital or “traditional” sentience dominate in expectation?

My aim in this post is to critique two opposite positions that I think are both mistaken, or which at least tend to be endorsed with too much confidence.

The first position is that the vast majority of future sentient beings will, in expectation, be digital, meaning that they will be “implemented” in digital computers.

The second position is in some sense a rejection of the first one. Based on a skepticism of the possibility of digital sentience, this position holds that future sentience will not be artificial, but instead be “traditionally” biological — that is, most future sentient beings will, in expectation, be biological beings roughly as we know them today.

I think the main problem with this dichotomy of positions is that it leaves out a reasonable third option, which is that most future beings will be artificial but not necessarily digital.

Contents

Reasons to doubt that digital sentience dominates in expectation

One can roughly identify two classes of reasons to doubt that most future sentient beings will be digital.

First, there are object-level arguments against the possibility of digital sentience. For example, based on his physicalist view of consciousness, David Pearce argues that the discrete and disconnected bits of a digital computer cannot, if they remain discrete and disconnected, join together into a unified state of sentience. They can at most, Pearce argues, be “micro-experiential pixels”.

Second, regardless of whether one believes in the possibility of digital sentience, the future dominance of digital sentience can be doubted on the grounds that it is a fairly strong and specific claim. After all, even if digital sentience is perfectly possible, it by no means follows that future sentient beings will necessarily converge toward being digital.

In other words, the digital dominance position makes strong assumptions about the most prevalent forms of sentient computation in the future, and it seems that there is a fairly large space of possibilities that does not imply digital dominance, such as (a future predominance of) non-digital neuron-based computers, non-digital neuron-inspired computers, and various kinds of quantum computers that have yet to be invented.

When one takes these arguments into account, it at least seems quite uncertain whether digital sentience dominates in expectation, even if we grant that artificial sentience does.

Reasons to doubt that “traditional” biological sentience dominates in expectation

A reason to doubt that “traditional” sentience dominates is that, whatever one’s theory of sentience, it seems likely that sentience can be created artificially — i.e. in a way that we would deem artificial. (An example might be further developed and engineered versions of brain organoids.) Specifically, regardless of which physical processes or mechanisms we take to be critical to sentience, those processes or mechanisms can most likely be replicated in other systems than just live biological animals as we know them.

If we combine this premise with an assumption of continued technological evolution (which likely holds true in the future scenarios that contain the largest numbers of sentient beings), it overall seems doubtful that the majority of future beings will, in expectation, be “traditional” biological organisms — especially when we consider the prospect of large futures that involve space colonization.

More broadly, we have reason to doubt the “traditional” biological dominance position for the same reason that we have reason to doubt the digital dominance position, namely that the position entails a rather strong and specific claim along the lines that: “this particular class of sentient being is most numerous in expectation”. And, as in the case of digital dominance, it seems that there are many plausible ways in which this could turn out to be wrong, such as due to neuron-inspired or other yet-to-be-invented artificial systems that could become both sentient and prevalent.

Why does this matter?

Whether artificial sentience dominates in expectation plausibly matters for our priorities (though it is unclear how much exactly, since some of our most robust strategies for reducing suffering are probably worth pursuing in roughly the same form regardless). Yet those who take artificial sentience seriously might adopt suboptimal priorities and communication strategies if they primarily focus on digital sentience in particular.

At the level of priorities, they might restrict their focus to an overly narrow set of potentially sentient systems, and perhaps neglect the great majority of future suffering as a result. At the level of communication, they might needlessly hamper their efforts to raise concern for artificial sentience by mostly framing the issue in terms of digital sentience. This framing might lead people who are skeptical of digital sentience to mistakenly dismiss the broader issue of artificial sentience.

Similar points apply to those who believe that “traditional” biological sentience dominates in expectation: they, too, might restrict their focus to an overly narrow set of systems, and thereby neglect to consider a wide range of scenarios that may intuitively seem like science fiction, yet which nevertheless deserve serious consideration on reflection (e.g. scenarios that involve a large-scale spread of suffering due to space colonization).

In summary, there are reasons to doubt both the digital dominance position and the “traditional” biological dominance position. Moreover, it seems that there is something to be gained by not using the narrow term “digital sentience” to refer to the broader category of “artificial sentience”, and by being clear about just how much broader this latter category is.

November 8, 2023

Distrusting salience: Keeping unseen urgencies in mind

The psychological appeal of salient events and risks can be a major hurdle to optimal altruistic priorities and impact. My aim in this post is to outline a few reasons to approach our intuitive fascination with salient events and risks with a fair bit of skepticism, and to actively focus on that which is important yet unseen, hiding in the shadows of the salient.

Contents

The human mind is subject to various biases that involve an overemphasis on the salient, i.e. that which readily stands out and captures our attention.

In general terms, there is the availability bias, also known as the availability heuristic, namely the common tendency to base our beliefs and judgments on information that we can readily recall. For example, we tend to overestimate the frequency of events when examples of these events easily come to mind.

Closely related is what is known as the salience bias, which is the tendency to overestimate salient features and events when making decisions. For instance, when deciding to buy a given product, the salience bias may lead us to give undue importance to a particularly salient feature of that product — e.g. some fancy packaging — while neglecting less salient yet perhaps more relevant features.

A similar bias is the recency bias: our tendency to give disproportionate weight to recent events in our belief-formation and decision-making. This bias is in some sense predicted by the availability bias, since recent events tend to be more readily available to our memory. Indeed, the availability bias and the recency bias are sometimes considered equivalent, even though it seems more accurate to view the recency bias as a consequence or a subset of the availability bias; after all, readily remembered information does not always pertain to recent events.

Finally, there is the phenomenon of belief digitization, which is the tendency to give undue weight to (what we consider) the single most plausible hypothesis in our inferences and decisions, even when other hypotheses also deserve significant weight. For example, if we are considering hypotheses A, B, and C, and we assign them the probabilities 50 percent, 30 percent, and 20 percent, respectively, belief digitization will push us toward simply accepting A as though it were true. In other words, belief digitization pushes us toward altogether discarding B and C, even though B and C collectively have the same probability as A. (See also related studies on Salience Theory and on the overestimation of salient causes and hypotheses in predictive reasoning.)

All of the biases mentioned above can be considered different instances of a broader cluster of availability/salience biases, and they each give us reason to be cautious of the influence that salient information has on our beliefs and our priorities.

One way in which our attention can become preoccupied with salient (though not necessarily crucial) information is through the news. Much has been written against spending a lot of time on the news, and the reasons against it are probably even stronger for those who are trying to spend their time and resources in ways that help sentient beings most effectively.

For even if we grant that there is substantial value in following the news, it seems plausible that the opportunity costs are generally too high, in terms of what one could instead spend one’s limited time learning about or advocating for. Moreover, there is a real risk that a preoccupation with the news has outright harmful effects overall, such as by gradually pulling one’s focus away from the most important problems and toward less important and less neglected problems. After all, the prevailing news criteria or news values decidedly do not reflect the problems that are most important from an impartial perspective concerned with the suffering of all sentient beings.

I believe the same issue exists in academia: A certain issue becomes fashionable, there are calls for abstracts, and there is a strong pull to write and talk about that given issue. And while it may indeed be important to talk and write about those topics for the purpose of getting ahead — or not falling behind — in academia, it seems more doubtful whether such topical talk is at all well-adapted for the purpose of making a difference in the world. In other words, the “news values” of academia are not necessarily much better than the news values of mainstream journalism.

The narrow urgency delusion

A salience-related pitfall that we can easily succumb to when following the news is what we may call the “narrow urgency delusion”. This is when the news covers some specific tragedy and we come to feel, at a visceral level, that this tragedy is the most urgent problem that is currently taking place. Such a perception is, in a very important sense, an illusion.

The reality is that tragedy on an unfathomable scale is always occurring, and the tragedies conveyed by the news are sadly but a tiny fraction of the horrors that are constantly taking place around us. Yet the tragedies that are always occurring, such as children who suffer and die from undernutrition and chickens who are boiled alive, are so common and so underreported that they all too readily fade from our moral perception. To our intuitions, these horrors seemingly register as mere baseline horror — as unsalient abstractions that carry little felt urgency — even though the horrors in question are every bit as urgent as the narrow sliver of salient horrors conveyed in the news (Vinding, 2020, sec. 7.6).

We should thus be clear that the delusion involved in the narrow urgency delusion is not the “urgency” part — there is indeed unspeakable horror and urgency involved in the tragedies reported by the news. The delusion rather lies in the “narrow” part; we find ourselves in a condition that contains extensive horror and torment, all of which merits compassion and concern.

So it is not that the salient victims are less important than what we intuitively feel, but rather that the countless victims whom we effectively overlook are far more important than what we (do not) feel.

Massive problems that always face us: Ongoing moral disasters and future risks

The following are some of the urgent problems that always face us, yet which are often less salient to us than the individual tragedies that are reported in the news:

Prevalent forms of human suffering (e.g. due to cancer, the second most common cause of human death, or due to political oppression — a recent report concluded that 70 percent of the world’s population live in autocracies).
The industrial farming and slaughter of non-human animals.
The suffering of wild animals due to natural processes.
Risks of astronomical future suffering (s-risks).

These common and ever-present problems are, by definition, not news, which hints at the inherent ineffectiveness of news when it comes to giving us a clear picture of the reality we inhabit and the problems that confront us.

As the final entry on the list above suggests, the problems that face us are not limited to ongoing moral disasters. We also face risks of future atrocities, potentially involving horrors on an unprecedented scale. Such risks will plausibly tend to feel even less salient and less urgent than do the ongoing moral disasters we are facing, even though our influence on these future risks — and future suffering in general — could well be more consequential given the vast scope of the long-term future.

So while salience-driven biases may blind us to ongoing large-scale atrocities, they probably blind us even more to future suffering and risks of future atrocities.

Salience-driven distortions in efforts to reduce s-risks

There are many salience-related hurdles that may prevent us from giving significant priority to the reduction of future suffering. Yet even if we do grant a strong priority to the reduction of future suffering, including s-risks in particular, there are reasons to think that salience-driven distortions still pose a serious challenge in our prioritization efforts.

Our general availability bias gives us some reason to believe that we will overemphasize salient ideas and hypotheses in efforts to reduce future suffering. Yet perhaps more compelling are the studies on how we tend to greatly overestimate salient hypotheses when we engage in predictive and multi-stage reasoning in particular. (Multi-stage reasoning is when we make inferences in successive steps, such that the output of one step provides the input for the next one.)

After all, when we are trying to predict the main sources of future suffering, including specific scenarios in which s-risks materialize, we are very much engaging in predictive and multi-stage reasoning. Therefore, we should arguably expect our reasoning about future causes of suffering to be too narrow by default, with a tendency to give too much weight to a relatively small set of salient risks at the expense of a broader class of less salient (yet still significant) risks that we are prone to dismiss in our multi-stage inferences and predictions.

This effect can be further reinforced through other mechanisms. For example, if we have described and explored — or even just imagined — a certain class of risks in greater detail than other risks, then this alone may lead us to regard those more elaborately described risks as being more likely than less elaborately explored scenarios. Moreover, if we find ourselves in a group of people who focus disproportionally on a certain class of future scenarios, this may further increase the salience and perceived likelihood of these scenarios, compared to alternative scenarios that may be more salient in other groups and communities.

Reducing salience-driven distortions

The pitfalls mentioned above seem to suggest some concrete ways in which we might reduce salience-driven distortions in efforts to reduce future suffering.

First, they recommend caution about the danger of neglecting less salient hypotheses when engaging in predictive and multi-stage reasoning. Specifically, when thinking about future risks, we should be careful not to simply focus on what appears to be the single greatest risk, and to effectively neglect all others. After all, even if the risk we regard as the single greatest risk indeed is the single greatest risk, that risk might still be fairly modest compared to the totality of future risks, and we might still do better by deliberately working to reduce a relatively broad class of risks.

Second, the tendency to judge scenarios to be more likely when we have thought about them in detail would seem to recommend that we avoid exploring future risks in starkly unbalanced ways. For instance, if we have explored one class of risks in elaborate detail while largely neglecting another, it seems worth trying to outline concrete scenarios that exemplify the more neglected class of risks, so as to correct any potentially unjustified disregard of their importance and likelihood.

Third, the possibility that certain ideas can become highly salient in part for sociological reasons may recommend a strategy of exchanging ideas with, and actively seeking critiques from, people who do not fully share the outlook that has come to prevail in one’s own group.

In general, it seems that we are likely to underestimate our empirical uncertainty (Vinding, 2020, sec. 9.1-9.2). The space of possible future outcomes is vast, and any specific risk that we may envision is but a tiny subset of the risks we are facing. Hence, our most salient ideas regarding future risks should ideally be held up against a big question mark that represents the many (currently) unsalient risks that confront us.

Put briefly, we need to cultivate a firm awareness of the limited reliability of salience, and a corresponding awareness of the immense importance of the unsalient. We need to make an active effort to keep unseen urgencies in mind.

December 8, 2022

What does a future dominated by AI imply?

Among altruists working to reduce risks of bad outcomes due to AI, I sometimes get the impression that there is a rather quick step from the premise “the future will be dominated by AI” to a practical position that roughly holds that “technical AI safety research aimed at reducing risks associated with fast takeoff scenarios is the best way to prevent bad AI outcomes”.

I am not saying that this is the most common view among those who work to prevent bad outcomes due to AI. Nor am I saying that the practical position outlined above is necessarily an unreasonable one. But I think I have seen (something like) this sentiment assumed often enough for it to be worthy of a critique. My aim in this post is to argue that there are many other practical positions that one could reasonably adopt based on that same starting premise.

Contents

“A future dominated by AI” can mean many things

“AI” can mean many things

It is worth noting that the premise that “the future will be dominated by AI” covers a wide range of scenarios. After all, it covers scenarios in which advanced machine learning software is in power; scenarios in which brain emulations are in power; as well as scenarios in which humans stay in power while gradually updating their brains with gene technologies, brain implants, nanobots, etc., such that their intelligence would eventually be considered (mostly) artificial intelligence by our standards. And there are surely more categories of AI than just the three broad ones outlined above.

“Dominated by” can mean many things

The words “in power” and “dominated by” can likewise mean many different things. For example, they could mean anything from “mostly in power” and “mostly dominated by” to “absolutely in power” and “absolutely dominated by”. And these respective terms cover a surprisingly wide spectrum.

After all, a government in a democratic society could reasonably be claimed to be “mostly in power” in that society, and a future AI system that is given similar levels of power could likewise be said to be “mostly in power” in the society it governs. By contrast, even the government of North Korea falls considerably short of being “absolutely in power” on a strong definition of that term, which hints at the wide spectrum of meanings covered by the general term “in power”.

Note that the contrast above actually hints at two distinct (though related) dimensions on which different meanings of “in power” can vary. One has to do with the level of power — i.e. whether one has more or less of it — while the other has to do with how the power is exercised, e.g. whether it is democratic or totalitarian in nature.

Thus, “a future society with AI in power” could mean a future in which AI possesses most of the power in a democratically elected government, or it could mean a future in which AI possesses total power with no bounds except the limits of physics.

Combinations of many things

Lastly, we can make a combinatorial extension of the points made above. That is, we should be aware that “a future dominated by AI” could — and is perhaps likely to — combine different kinds of AI. For instance, one could imagine futures that contain significant numbers of AIs from each of the three broad categories of AI mentioned above.

Additionally, these AIs could exercise power in distinct ways and in varying degrees across different parts of the world. For example, some parts of the world might make decisions in ways that resemble modern democratic processes, with power distributed among many actors, while other parts of the world might make decisions in ways that resemble autocratic decision procedures.

Such a diversity of power structures and decision procedures may be especially likely in scenarios that involve large-scale space expansion, since different parts of the world would then eventually be causally disconnected, and since a larger volume of AI systems presumably renders greater variation more likely in general.

These points hint at the truly vast space of possible futures covered by a term such as “a future dominated by AI”.

Future AI dominance does not imply fast AI development

Another conceptual point is that “a future dominated by AI” does not imply that technological or social progress toward such a future will happen soon or that it will occur suddenly. Furthermore, I think one could reasonably argue that such an imminent or sudden change is quite unlikely (though it obviously becomes more likely the broader our conception of “a future dominated by AI” is).

An elaborate justification for my low credence in such sudden change is beyond the scope of this post, though I can at least note that part of the reason for my skepticism is that I think trends and projections in both computer hardware and economic growth speak against such rapid future change. (For more reasons to be skeptical, see Reflections on Intelligence and “A Contra AI FOOM Reading List”.)

A future dominated by AI could emerge through a very gradual process that occurs over many decades or even hundreds of years (conditional on it ever happening). And AI scenarios involving such gradual development could well be both highly likely and highly consequential.

An objection against focusing on such slow-growth scenarios might be that scenarios involving rapid change have higher stakes, and hence they are more worth prioritizing. But it is not clear to me why this should be the case. As I have noted elsewhere, a so-called value lock-in could also happen in a slow-growth scenario, and the probability of success — and of avoiding accidental harm — may well be higher in slow-growth scenarios (cf. “Which World Gets Saved”).

The upshot could thus be the very opposite, namely that it is ultimately more promising to focus on scenarios with relatively steady growth in AI capabilities and power. (I am not claiming that this focus is in fact more promising; my point is simply that it is not obvious and that there are good reasons to question a strong focus on fast-growth scenarios.)

Fast AI development does not imply concentrated AI development

Likewise, even if we grant that the pace of AI development will increase rapidly, it does not follow that this growth will be concentrated in a single (or a few) AI system(s), as opposed to being widely distributed, akin to an entire economy of machines that grow fast together. This issue of centralized versus distributed growth was in fact the main point of contention in the Hanson-Yudkowsky FOOM debate; and I agree with Hanson that distributed growth is considerably more likely.

Similar to the argument outlined in the previous section, one could argue that there is a wager to focus on scenarios that entail highly concentrated growth over those that involve highly distributed growth, even if the latter may be more likely. Perhaps the main argument in favor of this view is that it seems that our impact can be much greater if we manage to influence a single system that will eventually gain power compared to if our influence is dispersed across countless systems.

Yet I think there are good reasons to doubt that argument. One reason is that the strategy of influencing such a single AI system may require us to identify that system in advance, which might be a difficult bet that we could easily get wrong. In other words, our expected influence may be greatly reduced by the risk that we are wrong about which systems are most likely to gain power. Moreover, there might be similar and ultimately more promising levers for “concentrated influence” in scenarios that involve more distributed growth and power. Such levers may include formal institutions and societal values, both of which could exert a significant influence on the decisions of a large number of agents simultaneously — by affecting the norms, laws, and social equilibria under which they interact.

“A future dominated by AI” does not mean that either “technical AI safety” or “AI governance” is most promising

Another impression I have is that we sometimes tacitly assume that work on “avoiding bad AI outcomes” will fall either in the categories of “technical AI safety” or “AI governance”, or at least that it will mostly fall within these categories. But I do not think that this is the case, partly for the reasons alluded to above.

In particular, it seems to me that we sometimes assume that the aim of influencing “AI outcomes” is necessarily best pursued in ways that pertain quite directly to AI today. Yet why should we assume this to be the case? After all, it seems that there are many plausible alternatives.

For example, one could think that it is generally better to pursue broad investments so as to build flexible resources that make us better able to tackle these problems down the line — e.g. investments toward general movement building and toward increasing the amount of money that we will be able to spend later, when we might be better informed and have better opportunities to pursue direct work.

A complementary option is to focus on the broader contextual factors hinted at in the previous section. That is, rather than focusing primarily on the design of the AI systems themselves, or on the laws that directly govern their development, one may focus on influencing the wider context in which they will be developed and deployed — e.g. general values, institutions, diplomatic relations, collective knowledge and wisdom, etc. After all, the broader context in which AI systems will be developed and put into action could well prove critical to the outcomes that future AI systems will eventually create.

Note that I am by no means saying that work on technical AI safety or AI governance is not worth pursuing. My point is merely that these other strategies focused on building flexible resources and influencing broader contextual factors should not be overlooked as ways to influence “a future dominated by AI”. Indeed, I believe that these strategies are among the most promising ways in which we can have a beneficial such influence at this point.

Concluding clarification

On a final note, I should clarify that the main conceptual points I have been trying to make in this post likely do not contradict the explicitly endorsed views of anyone who works to reduce risks from AI. The objects of my concern are more (what I perceive to be) certain implicit models and commonly employed terminologies that I worry may distort how we think and talk about these issues.

Specifically, it seems to me that there might be a sort of collective availability heuristic at work, through which we continually boost the salience of a particular AI narrative — or a certain class of AI scenarios — along with a certain terminology that has come to be associated with that narrative (e.g. ‘AI takeoff’, ‘transformative AI’, etc). Yet if we change our assumptions a bit, or replace the most salient narrative with another plausible one, we might find that this terminology does not necessarily make a lot of sense anymore. We might find that our typical ways of thinking about AI outcomes may be resting on a lot of implicit assumptions that are more questionable and more narrow than we tend to realize.

September 6, 2022

Some reasons not to expect a growth explosion

Many people expect global economic growth to accelerate in the future, with growth rates that are not just significantly higher than those of today, but orders of magnitude higher.

The following are some of the main reasons I do not consider a growth explosion to be the most likely future outcome.

Contents

Most economists do not expect a growth explosion

Estimates of the future of economic growth from economists themselves generally predict a continual decline in growth rates. For instance, one “review of publicly available projections of GDP per capita over long time horizons” concluded that growth will most likely continue to decline in most countries in the coming decades. A similar report from PWC came up with similar projections.

Some accessible books that explore economic growth in the past and explain why it is reasonable to expect stagnant growth rates in the future include Robert J. Gordon’s Rise and Fall of American Growth (short version) and Tyler Cowen’s The Great Stagnation (synopsis).

It is true that there are some economists who expect growth rates to be several orders of magnitude higher in the future, but these are generally outliers. Robin Hanson suggests that such a growth explosion is likely in his book The Age of Em, which, to give some context, fellow economist Bryan Caplan calls “the single craziest claim” of the book. Caplan further writes that Hanson’s arguments for such growth expectations were “astoundingly weak”.

The point here is not that the general opinion of economists is by any means a decisive reason to reject a growth explosion (as the most likely outcome). The point is merely that it represents a significant reason to doubt an imminent growth explosion, and that it is not in fact those who doubt a rapid rise in growth rates who are the consensus-defying contrarians (and in terms of imminence, it is worth noting that even Robin Hanson does not expect a growth explosion within the next couple of decades).

Rates of innovation and progress in science have slowed down

See Bloom et al.’s Are Ideas Getting Harder to Find? and Cowen & Southwood’s Is the rate of scientific progress slowing down? A couple of graphs from the latter:

Moore’s law is coming to an end

One of the main reasons to expect a growth acceleration in the future is the promise of information technology. And economists, including Gordon and Cowen mentioned above, indeed agree that information technology has been a key driver of the growth we have seen in recent decades. But the problem is that we have strong theoretical reasons to expect the underlying trend that has been driving most progress in information technology since the 1960s — i.e. Moore’s law — will be coming to an end within the next few years.

And while it may be that other hardware paradigms will replace silicon chips as we know them, and continue the by now familiar growth in information technology, we must admit that it is quite unclear whether this will happen, especially since we are already lacking noticeably behind this trend line.

One may object that this is just a matter of hardware, and that the real growth in information technology lies in software. But a problem with this claim is that, empirically, growth in software seems largely determined by growth in hardware.

The growth of supercomputers has been slowing down for years

Developments of the performance of the 500 fastest supercomputers in the world conform well to the pattern we should expect given that we are nearing the end of Moore’s law:

The 500th fastest supercomputer in the world was on a clear exponential trajectory from the early 1990s to 2010, after which growth in performance has been steadily declining. Roughly the same holds true of both the fastest supercomputer and the sum of the 500 fastest supercomputers: a clear exponential trajectory from the early 1990s to around 2013, after which the performance has been diverging ever further from the previous trajectory, in fact so much so that the performance of the sum of the 500 fastest supercomputers is now below the performance we should expect the single fastest supercomputer to have today based on 1993-2013 extrapolation.

Many of our technologies cannot get orders of magnitude more efficient

This point is perhaps most elaborately explored in Robert J. Gordon’s book mentioned above: it seems that we have already reaped much of the low-hanging fruit in terms of technological innovation, and in some respects it is impossible to improve things much further.

Energy efficiency is an obvious example, as many of our machines and energy harvesting technologies have already reached a significant fraction of the maximally possible efficiency. For instance, electric pumps and motors tend to have around 90 percent energy efficiency, while the efficiency of the best solar panels are above 40 percent. Many of our technologies thus cannot be made orders of magnitude more efficient, and many of them can at most be marginally improved, simply because they have reached the ceiling of hard physical limits.

Three objections in brief

#1. What about the exponential growth in the compute of the largest AI training runs from 2012-2018?

This is indeed a data point in the other direction. Note, however, that this growth does not appear to have continued after 2018. Moreover, much of this growth seems to have been unsustainable. For example, DeepMind lost more than a billion dollars in 2016-2018, with the loss getting greater each year: “$154 million in 2016, $341 million in 2017, $572 million in 2018”. And the loss was apparently even greater in 2019.

#2. What about the Open Philanthropy post in which David Roodman presented a diffusion model of future growth that predicted much higher growth rates?

I think that model overlooks most of the points made above. Second, I think the following figure from Roodman’s article is a strong indication about the fit of the model, particularly how the growth rates in 1600-1970 are virtually all in the high percentiles of the model, while the growth rates in 1980-2019 are all in the low percentiles, and generally in a lower percentile as time progresses. That is a strong sign that the model does not capture our actual trajectory, and that the fit is getting worse as time progresses.

#3. We have a wager to give much more weight to high-growth scenarios.

First, I think it is questionable that scenarios with higher growth rates merit greater priority (e.g. a so-called value lock-in could also emerge in slow-growth scenarios, and it may be more feasible to influence slow-growth scenarios because they give us more time to acquire the requisite insights and resources to exert a significant and robustly positive influence). And it is less clear still that scenarios with higher growth merit much greater priority than scenarios with lower growth rates. But even if we grant that high-growth scenarios do merit greater priority, this should not change the bare epistemic credence we assign different scenarios. Our descriptive picture should not be distorted by such priority claims.

June 7, 2021

Effective altruism and common sense

Thomas Sowell once called Milton Friedman “one of those rare thinkers who had both genius and common sense”.

I am not here interested in Sowell’s claim about Friedman, but rather in his insight into the tension between abstract smarts and common sense, and particularly how it applies to the effective altruism (EA) community. For it seems to me that there sometimes is an unbalanced ratio of clever abstractions to common sense in EA discussions.

To be clear, my point is not that abstract ideas are unimportant, or even that everyday common sense should generally be favored over abstract ideas. After all, many of the core ideas of effective altruism are highly abstract in nature, such as impartiality and the importance of numbers, and I believe we are right to stand by these ideas. But my point is that common sense is underutilized as a sanity check that can prevent our abstractions from floating into the clouds. More generally, I seem to observe a tendency to make certain assumptions, and to do a lot of clever analysis and deductions based on those assumptions, but without spending anywhere near as much energy exploring the plausibility of these assumptions themselves.

Below are three examples that I think follow this pattern.

Boltzmann brains

A highly abstract idea that is admittedly intriguing to ponder is that of a Boltzmann brain: a hypothetical conscious brain that arises as the product of random quantum fluctuations. Boltzmann brains are a trivial corollary given certain assumptions: let some basic combinatorial assumptions hold for a set amount of time, and we can conclude that a lot of Boltzmann brains must exist in this span of time (at least as a matter of statistical certainty, similar to how we can derive and be certain of the second law of thermodynamics).

But this does not mean that Boltzmann brains are in fact possible, as the underlying assumptions may well be false. Beyond the obvious possibility that the lifetime of the universe could be too short, it is also conceivable that the combinatorial assumptions that allow a functioning 310 K human brain to emerge in ~ 0 K empty space do not in fact obtain, e.g. because it falsely assumes a combinatorial independence concerning the fluctuations that happen in each neighboring “bit” of the universe (or for some other reason). If any such key assumption is false, it could be that the emergence of a 310 K human brain in ~ 0 K space is not in fact allowed by the laws of physics, even in principle, meaning that even an infinite amount of time would never spontaneously produce a 310 K human Boltzmann brain.

Note that I am not claiming that Boltzmann brains cannot emerge in ~ 0 K space. My claim is simply that there is a big step from abstract assumptions to actual reality, and there is considerable uncertainty about whether the starting assumptions in question can indeed survive that step.

Quantum immortality

Another example is the notion of quantum immortality — not in the sense of merely surviving an attempted quantum suicide for improbably long, but in the sense of literal immortality because a tiny fraction of Everett branches continue to support a conscious survivor indefinitely.

This is a case where I think skeptical common sense and a search for erroneous assumptions is essential. Specifically, even granting a picture in which, say, a victim of a serious accident survives for a markedly longer time in one branch than in another, there are still strong reasons to doubt that there will be any branches in which the victim will survive for long. Specifically, we have good reason to believe that the measure of branches in which the victim survives will converge rapidly toward zero.

An objection might be that the measure indeed will converge toward zero, but that it never actually reaches zero, and hence there will in fact always be a tiny fraction of branches in which the victim survives. Yet I believe this rests on a false assumption. Our understanding of physics suggests that there is only — and could only be — a finite number of distinct branches, meaning that even if the measure of branches in which the victim survives is approximated well by a continuous function that never exactly reaches zero, the critical threshold that corresponds to a zero measure of actual branches with a surviving victim will in fact be reached, and probably rather quickly.

Of course, one may argue that we should still assign some probability to quantum immortality being possible, and that this possibility is still highly relevant in expectation. But I think there are many risks that are much less Pascallian and far more worthy of our attention.

Intelligence explosion

Unlike the two previous examples, this last example has become quite an influential idea in EA: the notion of a fast and local “intelligence explosion“.

I will not here restate my lengthy critiques of the plausibility of this notion (or the critiques advanced by others). And to be clear, I do not think the effective altruism community is at all wrong to have a strong focus on AI. But the mistake I think I do see is that there are many abstractly grounded assumptions pertaining to a hypothetical intelligence explosion that have received an insufficient amount of scrutiny from common sense and empirical data (Garfinkel, 2018 argues along similar lines).

I think part of the problem stems from the fact that Nick Bostrom’s book Superintelligence framed the future of AI in a certain way. Here, for instance, is how Bostrom frames the issue in the conclusion of his book (p. 319):

Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. … We have little idea when the detonation will occur, though if we hold the device to our ear we can hear a faint ticking sound. … Some little idiot is bound to press the ignite button just to see what happens.

I realize Bostrom is employing a metaphor here, and I realize that he assigns a substantial credence to many different future scenarios. But the way his book is framed is nonetheless mostly in terms of such a metaphorical bomb that could ignite an intelligence explosion (i.e. FOOM). And it seems that this kind of scenario in effect became the standard scenario many people assumed and worked on, with comparatively little effort going into the more fundamental question of how plausible this future scenario is in the first place. An abstract argument about (a rather vague notion of) “intelligence” recursively improving itself was given much weight, and much clever analysis focusing on this FOOM picture and its canonical problems followed.

Again, my claim here is not that this picture is wrong or implausible, but rather that the more fundamental questions about the nature and future of “intelligence” should be kept more alive, and that our approach to these questions should be more informed by empirical data, lest we misprioritize our resources.

In sum, our fondness for abstractions is plausibly a bias we need to control for. We can do this by applying common-sense heuristics to a greater extent, by spending more time considering how our abstract models might be wrong, and by making a greater effort to hold our assumptions up against empirical reality.

March 8, 2021

Two biases relevant to expected AI scenarios

My aim in this essay is to briefly review two plausible biases in relation to our expectations of future AI scenarios. In particular, these are biases that I think risk increasing our estimates of the probability of a local, so-called FOOM takeoff.

An important point to clarify from the outset is that these biases, if indeed real, do not in themselves represent reasons to simply dismiss FOOM scenarios. It would clearly be a mistake to think so. But they do, I submit, constitute reasons to be somewhat more skeptical of them, and to re-examine our beliefs regarding FOOM scenarios. (Stronger, more direct reasons to doubt FOOM have been reviewed elsewhere.)

Egalitarian intuitions looking for upstarts

The first putative bias has its roots in our egalitarian origins. As Christopher Boehm argues in his Hierarchy in the Forrest, we humans evolved in egalitarian tribes in which we created reverse dominance hierarchies to prevent domineering individuals from taking over. Boehm thus suggests that our minds are built to be acutely aware of the potential for any individual to rise and take over, perhaps even to the extent that we have specialized modules whose main task is to be attuned to this risk.

Western “Great Man” intuitions

The second putative bias is much more culturally contingent, and should be expected to be most pronounced in Western (“WEIRD“) minds. As Joe Henrich shows in his book The WEIRDest People in the World, Western minds are uniquely focused on individuals, so much so that their entire way of thinking about the world tends to revolve around individuals and individual properties (as opposed to thinking in terms of collectives and networks, which is more common among East Asian cultures).

The problem is that this Western, individualist mode of thinking, when applied straightforwardly to the dynamics of large-scale societies, is quite wrong. For while it may be mnemonically pragmatic to recount history, including the history of ideas and technology, in terms of individual actions and decisions, the truth is usually far more complex than this individualist narrative lets on. As Henrich argues, innovation is largely the product of large-scale systemic factors (such as the degree of connectedness between people), and these factors are usually far more important than is any individual, suggesting that Westerners tend to strongly overestimate the role that single individuals play in innovation and history more generally. Henrich thus alleges that the Western way of thinking about innovation reflects an “individualism bias” of sorts, and further notes that:

thinking about individuals and focusing on them as having dispositions and kind of always evaluating everybody [in terms of which] attributes they have … leads us to what’s called “the myth of the heroic inventor”, and that’s the idea that the great advances in technology and innovation are the products of individual minds that kind of just burst forth and give us these wonderful inventions. But if you look at the history of innovation, what you’ll find time after time was that there was lucky recombinations, people often invent stuff at the same time, and each individual only makes a small increment to a much larger, longer process.

In other words, innovation is the product of numerous small and piecemeal contributions to a much greater extent than Western “Great Man” storytelling suggests. (Of course, none of this is to say that individuals are unimportant, but merely that Westerners seem likely to vastly overestimate the influence that single individuals have on history and innovation.)

Upshot

If we have mental modules specialized to look for individuals that accumulate power and take control, and if we have expectations that roughly conform to this pattern in the context of future technology, with one individual entity innovating its way to a takeover, it seems that we should at least wonder whether this expectation may derive partly from our forager-age intuitions rather than resting purely on solid epistemics. Especially when this view of the future seems in strong tension with our actual understanding of innovation. This understanding being that innovation — contra Western intuition — is distributed, with increases in abilities generally the product of countless “small” insights and tools rather than a few big ones.

Both of the tendencies listed above lead us (or in the second case, mostly Westerners) to focus on individual agents rather than larger, systemic issues that may be crucial to future outcomes, yet which are less intuitively appealing for us to focus on. And there may well be more general explanations for this lack of appeal than just the two reasons listed above. The fact that there were no large-scale systemic issues of any kind for almost all of our species’ history renders it unsurprising that we are not particularly prone to focus on such issues (except for local signaling purposes).

Perhaps we need to control for this, and try to look more toward systemic issues than we are intuitively inclined to do. After all, the claim that the future will be dominated by AI systems in some form need not imply that the best way to influence that future is to focus on individual AI systems, as opposed to broader, institutional issues.

February 13, 2021

When Machines Improve Machines

The following is an excerpt from my book Reflections on Intelligence (2016/2024).

The term “Artificial General Intelligence” (AGI) refers to a machine that can perform any cognitive task at least as well as any human. This is often considered the holy grail of artificial intelligence research. It is also what many believe will give rise to an “intelligence explosion”, as machines will then be able to take over the design of smarter machines, and hence their further development will no longer be held back by the slowness of humans.

A Radical Shift?

Luke Muehlhauser and Anna Salamon describe the transition toward machines designing machines in the following way:

Once human programmers build an AI with a better-than-human capacity for AI design, the instrumental goal for self-improvement may motivate a positive feedback loop of self-enhancement. Now when the machine intelligence improves itself, it improves the intelligence that does the improving. (Muehlhauser & Salamon, 2012, p. 13)

While this might seem like a radical shift, software engineer Ramez Naam has argued that it is less radical than we might think, since we already use our latest technology to improve on itself and build the next generation of technology (Naam, 2010). As noted in the previous chapter, the way new tools are built and improved is by means of an enormous conglomerate of tools, and newly developed tools tend to become an addition to this existing set of tools. In Naam’s words:

[A] common assertion is that the advent of greater-than-human intelligence will herald The Singularity. These super intelligences will be able to advance science and technology faster than unaugmented humans can. They’ll be able to understand things that baseline humans can’t. And perhaps most importantly, they’ll be able to use their superior intellectual powers to improve on themselves, leading to an upward spiral of self improvement with faster and faster cycles each time.

In reality, we already have greater-than-human intelligences. They’re all around us. And indeed, they drive forward the frontiers of science and technology in ways that unaugmented individual humans can’t.

These superhuman intelligences are the distributed intelligences formed of humans, collaborating with one another, often via electronic means, and almost invariably with support from software systems and vast online repositories of knowledge. (Naam, 2010)

The design and construction of new machines is not the product of human ingenuity alone, but instead the product of a large system of advanced tools in which human ingenuity is just one component, albeit a component that plays many roles. Moreover, as Naam hints, superhuman intellectual abilities already play a crucial role in this design process. For example, computer programs make illustrations and calculations that no human could possibly make, and these have become indispensable components in the design of new tools in virtually all technological domains. In this way, superhuman intellectual abilities are already a significant part of the process of building superhuman intellectual abilities. This has led to continued growth, yet hardly an abrupt intelligence explosion.

Naam gives a specific example of an existing self-improving “superintelligence” (i.e. a super goal achiever), namely Intel:

Intel employs giant teams of humans and computers to design the next generation of its microprocessors. Faster chips mean that the computers it uses in the design become more powerful. More powerful computers mean that Intel can do more sophisticated simulations, that its CAD (computer aided design) software can take more of the burden off of the many hundreds of humans working on each chip design, and so on. There’s a direct feedback loop between Intel’s output and its own capabilities. …

Self-improving superintelligences have changed our lives tremendously, of course. But they don’t seem to have spiraled into a hard takeoff towards “singularity”. On a percentage basis, Google’s growth in revenue, in employees, and in servers have all slowed over time. It’s still a rapidly growing company, but that growth rate is slowly decelerating, not accelerating. The same is true of Intel and of the bulk of tech companies that have achieved a reasonable size. Larger typically means slower growing.

My point here is that neither superintelligence nor the ability to improve or augment oneself always lead to runaway growth. Positive feedback loops are a tremendously powerful force, but in nature (and here I’m liberally including corporate structures and the worldwide market economy in general as part of ‘nature’) negative feedback loops come into play as well, and tend to put brakes on growth. (Naam, 2010)

I quote Naam at length here because he makes this important point well, and because he is an expert with experience in the pursuit of using technology to make better technology. In addition to Naam’s point about Intel and other large tech companies that effectively improve themselves, I would add that although such mega-companies are highly competent collectives, they still only constitute a tiny part of the larger collective system that is the world economy, which they each contribute modestly to, and which they are entirely dependent upon.

A Familiar Dynamic

It has always been the latest, most advanced tools that, combined with the already existing set of tools, have collaborated to build the latest, most advanced tools. The expected “machines building machines” revolution is therefore not as revolutionary as it might seem at first sight. Strong versions of the “once machines can program AI better than humans” argument seem to assume that human software engineers are by far the main bottleneck to progress in the construction of more competent machines, which is a questionable premise. But even if it were true, and if we suddenly had a million times as many agents working to create better software, other bottlenecks would soon emerge, such as hardware production and energy. Essentially, we would be returned to the task of advancing our entire economy, something that pretty much all humans and machines are participating in already, knowingly or not.

The question concerning whether “intelligence” can explode is therefore basically: can the economy explode? To which we can answer that rapid increases in the growth rate of the world economy certainly have occurred in the past, and some argue that this is likely to happen again in the future (Hanson 1998; 2016). However, recent trends in economic growth, as well as in hardware growth in particular, give us some reason to be skeptical of such a future growth explosion (see e.g. Vinding, 2021; 2022).

August 9, 2020

The motivation for an AI pause

My thoughts on AI pause, in brief

My thoughts on AI pause discourse

Massive moral urgency: Yes, in both categories of worst-case risks

Hypothetical future scenario: Earth-descendant probes

Back to Earth

What explains this inconsistency?

Optimized futures: End-state technologies in key domains

Why optimized futures are plausible

Why optimized futures are worth exploring

What can we say about optimized futures?

Humanity may be close to (at least some) end-state technologies

Optimized civilizations may be highly interested in near-optimized civilizations

Strong technological convergence across civilizations?

If technology stabilizes at an optimum, what might change?

Information that says something about other optimized civilizations as an extremely coveted resource?

Practical implications?

Prioritizing values and institutions rather than pushing for technological progress?

More research

Conclusion

Acknowledgments

Reasons to doubt that digital sentience dominates in expectation

Reasons to doubt that “traditional” biological sentience dominates in expectation

Why does this matter?

General reasons for caution: Availability bias and related biases

The news: A common driver of salience-related distortions

The narrow urgency delusion

Massive problems that always face us: Ongoing moral disasters and future risks

Salience-driven distortions in efforts to reduce s-risks

Reducing salience-driven distortions

“A future dominated by AI” can mean many things

“AI” can mean many things

“Dominated by” can mean many things

Combinations of many things

Future AI dominance does not imply fast AI development

Fast AI development does not imply concentrated AI development

“A future dominated by AI” does not mean that either “technical AI safety” or “AI governance” is most promising

Concluding clarification

Most economists do not expect a growth explosion

Rates of innovation and progress in science have slowed down

Moore’s law is coming to an end

The growth of supercomputers has been slowing down for years

Many of our technologies cannot get orders of magnitude more efficient

Three objections in brief

Boltzmann brains

Quantum immortality

Intelligence explosion

Egalitarian intuitions looking for upstarts

Western “Great Man” intuitions

Upshot

A Radical Shift?

A Familiar Dynamic