When Machines Improve Machines

The following is an excerpt from my book Reflections on Intelligence (2016/2020).


The term “Artificial General Intelligence” (AGI) refers to a machine that can perform any task at least as well as any human. This is often considered the holy grail of artificial intelligence research, and also the thing that many consider likely to give rise to an “intelligence explosion”, the reason being that machines then will be able to take over the design of smarter machines, and hence their further development will no longer be held back by the slowness of humans. Luke Muehlhauser and Anna Salamon express the idea in the following way:

Once human programmers build an AI with a better-than-human capacity for AI design, the instrumental goal for self-improvement may motivate a positive feedback loop of self-enhancement. Now when the machine intelligence improves itself, it improves the intelligence that does the improving.

(Muehlhauser & Salamon, 2012, p. 13)

This seems like a radical shift, yet is it really? As author and software engineer Ramez Naam has pointed out (Naam, 2010), not quite, since we already use our latest technology to improve on itself and build the next generation of technology. As I argued in the previous chapter, the way new tools are built and improved is by means of an enormous conglomerate of tools, and newly developed tools merely become an addition to this existing set of tools. In Naam’s words:

[A] common assertion is that the advent of greater-than-human intelligence will herald The Singularity. These super intelligences will be able to advance science and technology faster than unaugmented humans can. They’ll be able to understand things that baseline humans can’t. And perhaps most importantly, they’ll be able to use their superior intellectual powers to improve on themselves, leading to an upward spiral of self improvement with faster and faster cycles each time.

In reality, we already have greater-than-human intelligences. They’re all around us. And indeed, they drive forward the frontiers of science and technology in ways that unaugmented individual humans can’t.

These superhuman intelligences are the distributed intelligences formed of humans, collaborating with one another, often via electronic means, and almost invariably with support from software systems and vast online repositories of knowledge.

(Naam, 2010)

The design and construction of new machines is not the product of human ingenuity alone, but of a large system of advanced tools in which human ingenuity is just one component, albeit a component that plays many roles. And these roles, it must be emphasized, go way beyond mere software engineering – they include everything from finding ways to drill and transport oil more effectively, to coordinating sales and business agreements across countless industries.

Moreover, as Naam hints, superhuman intellectual abilities already play a crucial role in this design process. For example, computer programs make illustrations and calculations that no human could possibly make, and these have become indispensable components in the design of new tools in virtually all technological domains. In this way, superhuman intellectual abilities are already a significant part of the process of building superhuman intellectual abilities. This has led to continued growth, yet hardly an intelligence explosion.

Naam gives a specific example of an existing self-improving “superintelligence” (a “super” goal achiever, that is), namely Intel:

Intel employs giant teams of humans and computers to design the next generation of its microprocessors. Faster chips mean that the computers it uses in the design become more powerful. More powerful computers mean that Intel can do more sophisticated simulations, that its CAD (computer aided design) software can take more of the burden off of the many hundreds of humans working on each chip design, and so on. There’s a direct feedback loop between Intel’s output and its own capabilities. …

Self-improving superintelligences have changed our lives tremendously, of course. But they don’t seem to have spiraled into a hard takeoff towards “singularity”. On a percentage basis, Google’s growth in revenue, in employees, and in servers have all slowed over time. It’s still a rapidly growing company, but that growth rate is slowly decelerating, not accelerating. The same is true of Intel and of the bulk of tech companies that have achieved a reasonable size. Larger typically means slower growing.

My point here is that neither superintelligence nor the ability to improve or augment oneself always lead to runaway growth. Positive feedback loops are a tremendously powerful force, but in nature (and here I’m liberally including corporate structures and the worldwide market economy in general as part of ‘nature’) negative feedback loops come into play as well, and tend to put brakes on growth.

(Naam, 2010)

I quote Naam at length here because he makes this important point well, and because he is an expert with experience in the pursuit of using technology to make better technology. In addition to Naam’s point about Intel and other companies that improve themselves, I would add that although these are enormous competent collectives, they still only constitute a tiny part of the larger collective system that is the world economy that they contribute modestly to, and which they are entirely dependent upon.

“The” AI?

The discussion above hints at a deeper problem in the scenario Muelhauser and Salomon lay out, namely the idea that we will build an AI that will be a game-changer. This idea seems widespread in modern discussions about both risks and opportunities of AI. Yet why should this be the case? Why should the most powerful software competences we develop in the future be concentrated into anything remotely like a unitary system?

The human mind is unitary and trapped inside a single skull for evolutionary reasons. The only way additional cognitive competences could be added was by lumping them onto the existing core in gradual steps. But why should the extended “mind” of software that we build to expand our capabilities be bound in such a manner? In terms of the current and past trends of the development of this “mind”, it only seems to be developing in the opposite direction: toward diversity, not unity. The pattern of distributed specialization mentioned in the previous chapter is repeating itself in this area as well. What we see is many diverse systems used by many diverse systems in a complex interplay to create ever more, increasingly diverse systems. We do not appear to be headed toward any singular super-powerful system, but instead toward an increasingly powerful society of systems (Kelly, 2010).

Greater Than Individual or Collective Human Abilities?

This also hints at another way in which our speaking of “intelligent machines” is somewhat deceptive and arbitrary. For why talk about the point at which these machines become as capable as human individuals rather than, say, an entire human society? After all, it is not at the level of individuals that accomplishments such as machine building occurs, but rather at the level of the entire economy. If we talked about the latter, it would be clear to us, I think, that the capabilities that are relevant for the accomplishment of any real-world goal are many and incredibly diverse, and that they are much more than just intellectual: they also require mechanical abilities and a vast array of materials.

If we talked about “the moment” when machines can do everything a society can, we would hardly be tempted to think of these machines as being singular in kind. Instead, we would probably think of them as a society of sorts, one that must evolve and adapt gradually. And I see no reason why we should not think about the emergence of “intelligent machines” with abilities that surpass human intellectual abilities in the same way.

After all, this is exactly what we see today: we gradually build new machines – both software and hardware – that can do things better than human individuals, but these are different machines that do different things better than humans. Again, there is no trend toward the building of disproportionally powerful, unitary machines. Yes, we do see some algorithms that are impressively general in nature, but their generality and capabilities still pale in comparison to the generality and the capabilities of our larger collective of ever more diverse tools (as is also true of individual humans).

Relatedly, the idea of a “moment” or “event” at which machines surpass human abilities is deeply problematic in the first place. It ignores the many-faceted nature of the capabilities to be surpassed, both in the case of human individuals and human societies, and, by extension, the gradual nature of the surpassing of these abilities. Machines have been better than humans at many tasks for centuries, yet we continue to speak as though there will be something like a “from-nothing-to-everything” moment – e.g. “once human programmers build an AI with a better-than-human capacity for AI design”. Again, this is not congruous with the way in which we actually develop software: we already have software that is superhuman in many regards, and this software already plays a large role in the collective system that builds smarter machines.

A Familiar Dynamic

It has always been the latest, most advanced tools that, in combination with the already existing set of tools, have collaborated to build the latest, most advanced tools. The expected “machines building machines” revolution is therefore not as revolutionary as it seems at first sight. The “once machines can program AI better than humans” argument seems to assume that human software engineers are the sole bottleneck of progress in the building of more competent machines, yet this is not the case. But even if it were, and if we suddenly had a thousand times as many people working to create better software, other bottlenecks would quickly emerge – materials, hardware production, energy, etc. All of these things, indeed the whole host of tasks that maintain and grow our economy, are crucial for the building of more capable machines. Essentially, we are returned to the task of advancing our entire economy, something that pretty much all humans and machines are participating in already, knowingly or not, willingly or not.

By themselves, the latest, most advanced tools do not do much. A CAD program alone is not going to build much, and the same holds true of the entire software industry. In spite of all its impressive feats, it is still just another cog in a much grander machinery.

Indeed, to say that software alone can lead to an “intelligence explosion” – i.e. a capability explosion – is akin to saying that a neuron can hold a conversation. Such statements express a fundamental misunderstanding of the level at which these accomplishments are made. The software industry, like any software program in particular, relies on the larger economy in order to produce progress of any kind, and the only way it can do so is by becoming part of – i.e. working with and contributing to – this grander system that is the entire economy. Again, individual goal-achieving ability is a function of the abilities of the collective. And it is here, in the entire economy, that the greatest goal-achieving ability is found, or rather distributed.

The question concerning whether “intelligence” can explode is therefore essentially: can the economy explode? To which we can answer that rapid increases in the growth rate of the world economy certainly have occurred in the past, and some argue that this is likely to happen again in the future (Hanson 1998/2000, 2016). However, there are reasons to be skeptical of such a future growth explosion (Murphy, 2011; Modis, 2012; Gordon, 2016; Caplan, 2016; Vinding, 2017b; Cowen & Southwood, 2019).

“Intelligence Though!” – A Bad Argument

A type of argument often made in discussions about the future of AI is that we can just never know what a “superintelligent machine” could do. “It” might be able to do virtually anything we can think of, and much more than that, given “its” vastly greater “intelligence”.

The problem with this argument is that it again rests on a vague notion of “intelligence” that this machine “has a lot of”. For what exactly is this “stuff” it has a lot of? Goal-achieving ability? If so, then, as we saw in the previous chapter, “intelligence” requires an enormous array of tools and tricks that entails much more than mere software. It cannot be condensed into anything we can identify as a single machine.

Claims of the sort that a “superintelligent machine” could just do this or that complex task are extremely vague, since the nature of this “superintelligent machine” is not accounted for, and neither are the plausible means by which “it” will accomplish the extraordinarily difficult – perhaps even impossible – task in question. Yet such claims are generally taken quite seriously nonetheless, the reason being that the vague notion of “intelligence” that they rest upon is taken seriously in the first place. This, I have tried to argue, is the cardinal mistake.

We cannot let a term like “superintelligence” provide a carte blanche to make extraordinary claims or assumptions without a bare minimum of justification. I think Bostrom’s book Superintelligence is an example of this. Bostrom worries about a rapid “intelligence explosion” initiated by “an AI” throughout the book, yet offers very little in terms of arguments for why we should believe that such a rapid explosion is plausible (Hanson, 2014), not to mention what exactly it is that is supposed to explode (Hanson, 2010; 2011a).

No Singular Thing, No Grand Control Problem

The problem is that we talk about “intelligence” as though it were a singular thing; or, in the words of brain and AI researcher Jeff Hawkins, as though it were “some sort of magic sauce” (Hawkins, 2015). This is also what gives rise to the idea that “intelligence” can explode, because one of the things that this “intelligence” can do, if you have enough of it, is to produce more “intelligence”, which can in turn produce even more “intelligence”.

This stands in stark contrast to the view that “intelligence” – whether we talk about cognitive abilities in particular or goal-achieving abilities in general – is anything but singular in nature, but rather the product of countless clever tricks and hacks built by a long process of testing and learning. On this latter view, there is no single master problem to crack for increasing “intelligence”, but rather just many new tricks and hacks we can discover. And finding these is essentially what we have always been doing in science and engineering.

Robin Hanson makes a similar point in relation to his skepticism of a “blank-slate AI mind-design” intelligence explosion:

Sure if there were a super mind theory that allowed vast mental efficiency gains all at once, but there isn’t. Minds are vast complex structures full of parts that depend intricately on each other, much like the citizens of a city. Minds, like cities, best improve gradually, because you just never know enough to manage a vast redesign of something with such complex inter-dependent adaptations.

(Hanson, 2010)

Rather than a concentrated center of capability that faces a grand control problem, what we see is a development of tools and abilities that are distributed throughout the larger economy. And we “control” – i.e. specify the function of – these tools, including software programs, gradually as we make them and put them to use in practice. The design of the larger system is thus the result of our solutions to many, comparatively small “control problems”. I see no compelling reason to believe that the design of the future will be any different.

See also Chimps, Humans, and AI: A Deceptive Analogy.

Is AI Alignment Possible?

The problem of AI alignment is usually defined roughly as the problem of making powerful artificial intelligence do what we humans want it to do. My aim in this essay is to argue that this problem is less well-defined than many people seem to think, and to argue that it is indeed impossible to “solve” with any precision, not merely in practice but in principle.

There are two basic problems for AI alignment as commonly conceived. The first is that human values are non-unique. Indeed, in many respects, there is more disagreement about values than people tend to realize. The second problem is that even if we were to zoom in on the preferences of a single human, there is, I will argue, no way to instantiate a person’s preferences in a machine so as to make it act as this person would have preferred.

Problem I: Human Values Are Non-Unique

The common conception of the AI alignment problem is something like the following: we have a set of human preferences, X, which we must, somehow (and this is usually considered the really hard part), map onto some machine’s goal function, Y, via a map f, let’s say, such that X and Y are in some sense isomorphic. At least, this is a way of thinking about it that roughly tracks what people are trying to do.

Speaking in these terms, much attention is being devoted to Y and f compared to X. My argument in this essay is that we are deeply confused about the nature of X, and hence confused about AI alignment.

The first point of confusion is about the values of humanity as a whole. It is usually acknowledged that human values are fuzzy, and that there are some disagreements over values among humans. Yet it is rarely acknowledged just how strong this disagreement in fact is.

For example, concerning the ideal size of the future population of sentient beings, the disagreement is near-total, as some (e.g. some defenders of the so-called Asymmetry in population ethics, as well as anti-natalists such as David Benatar) argue that the future population should ideally be zero, while others, including many classical utilitarians, argue that the future population should ideally be very large. Many similar examples could be given of strong disagreements concerning the most fundamental and consequential of ethical issues, including whether any positive good can ever outweigh extreme suffering. And on many of these crucial disagreements, a very large number of people will be found on both sides.

Different answers to ethical questions of this sort do not merely give rise to small practical disagreements. In many cases, they imply completely opposite practical implications. This is not a matter of human values being fuzzy, but a matter of them being sharply, irreconcilably inconsistent. And hence there is no way to map the totality of human preferences, “X”, onto a single, well-defined goal-function in a way that does not conflict strongly with the values of a significant fraction of humanity. This is a trivial point, and yet most talk of human-aligned AI seems to skirt this fact.

Problem II: Present Human Preferences Are Underdetermined Relative to Future Actions

The second problem and point of confusion with respect to the nature of human preferences is that, even if we focus only on the present preferences of a single human, then these in fact do not, and indeed could not, determine with much precision what kind of world this person would prefer to bring about in the future.

One way to see this point is to think in terms of the information required to represent the world around us. A perfectly precise such representation would require an enormous amount of information, indeed far more information than what can be contained in our brain. This holds true even if we only consider morally relevant entities around us — on the planet, say. There are just too many of them for us to have a precise representation of them. By extension, there are also too many of them for us to be able to have precise preferences about their individual states. Given that we have very limited information at our disposal, all we can do is express extremely coarse-grained and compressed preferences about what state the world around us should ideally have. In other words, any given human’s preferences are bound to be extremely vague about the exact ideal state of the world right now, and there will be countless moral dilemmas occurring across the world right now to which our preferences, in their present state, do not specify a unique solution.

And yet this is just considering the present state of the world. When we consider future states, the problem of specifying ideal states and resolutions to hitherto unknown moral dilemmas only explodes in complexity, and indeed explodes exponentially as time progresses. It is simply a fact, and indeed quite an obvious one at that, that no single brain could possibly contain enough information to specify unique, or indeed just qualified, solutions to all moral dilemmas that will arrive in the future. So what, then, could AI alignment relative to even a single brain possibly mean? How can we specify Y with respect to these future dilemmas when X itself does not specify solutions?

We can, of course, try to guess what a given human, or we ourselves, might say if confronted with a particular future moral dilemma and given knowledge about it, yet the problem is that our extrapolated guess is bound to be just that: a highly imperfect guess. For even a tiny bit of extra knowledge or experience can readily change a person’s view of a given moral dilemma to be the opposite of what it was prior to acquiring that knowledge (for instance, I myself switched from being a classical to a negative utilitarian based on a modest amount of information in the form of arguments I had not considered before). This high sensitivity to small changes in our brain implies that even a system with near-perfect information about some person’s present brain state would be forced to make a highly uncertain guess about what that person would actually prefer in a given moral dilemma. And the further ahead in time we go, and thus further away from our familiar circumstance and context, the greater the uncertainty will be.

By analogy, consider the task of AI alignment with respect to our ancestors ten million years ago. What would their preferences have been with respect to, say, the future of space colonization? One may object that this is underdetermined because our ancestors could not conceive of this possibility, yet the same applies to us and things we cannot presently conceive of, such as alien states of consciousness. Our current preferences say about as little about the (dis)value of such states as the preferences of our ancestors ten million years ago said about space colonization.

A more tangible analogy might be to consider the level of confidence with which we, based on knowledge of your current brain state, can determine your dinner preferences twenty years from now with respect to dishes made from ingredients not yet invented — a preference that will likely be influenced by contingent, environmental factors found between now and then. Not with great confidence, it seems safe to say. And this point pertains not only to dinner preferences but also to the most consequential of choices. Our present preferences cannot realistically determine, with any considerable precision, what we would deem ideal in as yet unknown, realistic future scenarios. Thus, by extension, there can be no such thing as value extrapolation or preservation in anything but the vaguest sense. No human mind has ever contained, or indeed ever could contain, a set of preferences that evaluatively orders more than but the tiniest sliver of (highly compressed versions of) real-world states and choices an agent in our world is likely to face in the future. To think otherwise amounts to a strange Platonization of human preferences. We just do not have enough information in our heads to possess such fine-grained values.

The truth is that our preferences are not some fixed entity that determine future actions uniquely; they simply could not be that. Rather, our preferences are themselves interactive and adjustive in nature, changing in response to new experiences and new information we encounter. Thus, to say that we can “idealize” our present preferences so as to obtain answers to all realistic future moral dilemmas is rather like calling the evolution of our ancestors’ DNA toward human DNA a “DNA idealization”. In both cases, we find no hidden Deep Essences waiting to be purified; no information that points uniquely toward one particular solution in the face of all realistic future “problems”. All we find are physical systems that evolve contingently based on the inputs they receive.*

The bottom line of all this is not that it makes no sense to devote resources toward ensuring the safety of future machines. We can still meaningfully and cooperatively seek to instill rules and mechanisms in our machines and institutions that seem optimal in expectation given our respective, coarse-grained values. The conclusion here is just that 1) the rules instantiated cannot be the result of a universally shared human will or anything close; the closest thing possible would be rules that embody some compromise between people with strongly disagreeing values. And 2) such an instantiation of coarse-grained rules in fact comprises the upper bound of what we can expect to accomplish in this regard. Indeed, this is all we can expect with respect to future influence in general: rough and imprecise influence and guidance with the limited information we can possess and transmit. The idea of a future machine that will do exactly what we would want, and whose design therefore constitutes a lever for precise future control, is a pipe dream.

* Note that this account of our preferences is not inconsistent with value or moral realism. By analogy, consider human preferences and truth-seeking: humans are able to discover many truths about the universe, yet most of these truths are not hidden in, nor extrapolated from, our DNA or our preferences. Indeed, in many cases, we only discover these truths by actively transcending rather than “extrapolating” our immediate preferences (for comfortable and intuitive beliefs, say). The same could apply to the realm of value and morality.

Blog at WordPress.com.

Up ↑