This is a follow-up interview with professor of computer science Michael Littman[1][2] about artificial intelligence and the possible risks associated with it.

The Interview

Q1: You have been an academic in AI for more than 25 years during which time you mainly worked on reinforcement learning.[3][4][5] What are you currently working on and what are your plans for the future?

Michael Littman: My first paper, which I worked on with Dave Ackley in 1989, was called “Learning from natural selection in an artificial environment”. Recently, I’ve started to come back to the question we looked at in that paper—essentially, what should a learning algorithm try to optimize so that the resulting behavior is as “fit” as possible? Most reinforcement-learning research doesn’t make a distinction between the agent’s reward function and its actual task, but Satinder Singh[6] and his colleagues recently provided some evidence that it is conceptually useful to separate these two ideas and ask how to create a reward function that encourages an agent to excel at a task other than the one literally specified by the reward function.

In a way, it is a similar question to the control problem[7], but in a much less sinister context—we need a way of telling machines what we want them to do. I’m focused on end users, people without significant programming experience, and am looking at combinations of inverse reinforcement learning, good interface design, and more natural programming models that are easy to pick up. My collaborators and I are looking at these questions in the context of programming household devices (lights and thermostats) as well as with robots.

Q2: In a previous interview[8] you wrote that P(human extinction caused by badly done AI | badly done AI) is epsilon. You also voiced some skepticism about friendly AI[9] (a machine superintelligence that stably optimizes for humane values). Now that you have read Nick Bostrom’s book[10], ‘Superintelligence: Paths, Dangers, Strategies’, have you learnt something that changed your opinion, or caused you to interpret the questions differently?

Michael Littman: I was very impressed with Nick Bostrom’s book. It’s exquisitely thought out and I found the scope (in terms of coverage of micro and macro scales in both space and time) truly remarkable. That being said, I do not find the central premise—that we are in the process of bringing the ominous owl on the book’s cover into our midst—compelling. Note that I didn’t voice skepticism about friendly AI but about *provably* friendly AI. I’d argue that you can’t prove things about the real world, only about abstractions.

Q3: What is the current level of awareness of Nick Bostrom’s work within the field of AI, or his arguments, and do you recommend that people working to advance artificial intelligence should read his book?

Michael Littman: My guess is that the engagement of most AI researchers is at the level of friends and colleagues alerting them to the highly public statements of notable individuals like Musk (“summoning the demon”)[11] and Gates (“I don’t understand why some people are not concerned”)[12]. I think the field is well aware of the idea of the singularity, but not familiar with the subtleties and the depth of Bostrom’s work in this context. That being said, I do not think mainstream AI research is seriously dabbling with the idea of recursive self improvement[13] and, as such, Bostrom’s book seems like a pretty significant departure from their core interests and direction.

Q4: In an email you wrote that you believe the main disagreement between you and Nick Bostrom et al. to be whether an intelligence explosion[14][15][16][17][18][19][20][21][22][23] is a non-negligible consequence of AI research. In 2011 you wrote that the probability of a human level artificial general intelligence (AGI) to self-modify its way up to massive superhuman intelligence in less than 5 years is essentially zero (Addendum: In a previous interview he also wrote that P(superhuman intelligence within < 5 years | human-level AI running at human-level speed equipped with a 100 Gigabit Internet connection) = 1%, possibly misinterpreting the question I cited as P(superhuman intelligence within < 5 years)). Some people would call you overconfident.[24][25] Can you elaborate on the reasons underlying your estimate?

Michael Littman: I find your use of the word “overconfident” there to be quite interesting. I’m very interested in the problem of AGI and would love to be a part of the community that brings it about. An overconfident person, to me, would be someone who believes he or she can solve this problem in 5 years. More to your point, though, I don’t see massive superhuman intelligence to be something that is meaningful outside a specific cultural context. The development of what we might call massive superhuman intelligence will be an evolutionary process involving changes in the social, physical, and intellectual fabric on which our society is built. Changes like that take time.

Q5: Elon Musk has recently donated $10M to keep AI beneficial.[26] Consider someone whose goal is to maximize how much good they do[27], where “good” is defined as improving the world in order to reduce suffering and help humanity flourish. Do you believe that donating money in order to reduce risks associated with artificial intelligence (not just extinction type risks) might currently be an effective way to accomplish this goal?

Michael Littman: As you know, a number of my colleagues (including my dissertation advisor and many other colleagues for whom I have tremendous respect) signed an open letter[28] hosted by the Future of Life Institute calling for more attention to reducing risks associated with AI. I’ve followed up with a few of them and the most prevalent attitude is that AI, like all technologies, carries significant risks to society. At that level, I agree wholeheartedly that keeping technologists and scientists tuned in to the societal impacts of their work is exceedingly important. So, yes, I feel that supporting research on societal impacts of technology—including artificial intelligence—is a good investment for good.

However, if the risks we’re talking about are of the type detailed in Bostrom’s book—human-independent AI competing directly with humanity for control of our destiny—I don’t think that should be a high priority.

Q6: In another email you wrote that your personal takeaway from all this is to work harder to understand what intelligence *is*. How do you think about using e.g. Hutter’s specification of AIXI[29] as a model for AGI? Or asked more generally, do you think it is possible to work on AGI safety, or a formal definition of it, without researching and advancing AGI at the same time?

Michael Littman: I think the idea of seriously studying AGI safety in the absence of an understanding of AGI is futile. At a high level, raising awareness and scoping out possibilities is fine. But, proposing specific mechanisms for combatting this amorphous threat is a bit like trying to engineer airbags before we’ve thought of the idea of cars. Safety has to be addressed in context and the context we’re talking about is still absurdly speculative.

Q7: D. Scott Phoenix, co-founder of the A.I. startup Vicarious, recently wrote[30] that artificial superintelligence isn’t something that will be created suddenly or by accident. He further wrote that there will be a long iterative process of learning how these systems can be created and the best way to ensure that they are safe. What probability do you assign to the possibility that he is wrong, that either human or superhuman AGI will appear too quickly for us to ensure its safety if we don’t start working on the problem right now? Note that this question pertains whether the initial invention or emergence of AGI will take us by surprise, rather than the speed of its subsequent improvement or self-improvement.

Michael Littman: I agree with the perspective that it’s a long iterative process. I believe that the very notion of what we think intelligence *is* and what it is *for* will evolve significantly through this process. I think we’ll look back on this time much as we look back on earlier times, stunned at the naivety of our working hypotheses and surprised by our obliviousness to the fact that what we now take as a given is not only not given, but flat out wrong. If people are comfortable claiming that we know enough about intelligence today to extrapolate what superintelligence would be, it would be my turn to use the word “overconfident”.

See also

Recent commentary on AI risks by experts and others

Earlier commentary on AI risks








[7] The control problem: how to keep future superintelligences under control. Some AI risk advocates claim that rather than trying to limit what an AI can do, we have to engineer its motivation system in such a way that it would choose not to do harm. One of the reasons underlying this claim is that a superintelligent AI would probably break free from any bonds we construct.







[14] Intelligence Explosion Microeconomics –

[15] Intelligence Explosion: Evidence and Import –

[16] Why an Intelligence Explosion is Probable –

[17] Can Intelligence Explode? –

[18] The Singularity: A Philosophical Analysis –

[19] Cascades, Cycles, Insight… –

[20] …Recursion, Magic –

[21] Recursive Self-Improvement –

[22] Hard Takeoff –

[23] Permitted Possibilities, & Locality –

[24] Suppose that near certainty in your ability to assess a set of propositions equals a 1 in a million chance of being wrong about an assessment of a particular proposition. This means that given a million similar statements, you would have to be correct (on average) about 999999 such assessments while being wrong only once. Can you possibly be this accurate? An amusing example:







Tags: , ,

Here are some interesting scenarios with low or unstable probabilities but potentially enormous pay-offs. Some of the given arguments in favor of taking these scenarios seriously are also thought-provoking.

Note that not all of the descriptions below are quotes, some are short summaries which might not adequately reflect the original author’s statements. Please read up on the the original sources, they are provided after the description of each scenario. Also note that I do not want to judge any of these scenarios but merely list them here in order to highlight possible similarities. And despite the title, it is not my intention to suggest that the scenarios listed here are cases of Pascal’s wager, but merely that there seems to be no clear cutoff between Pascal’s wager type arguments and finite expected value calculations.

The order in which these scenarios are listed is roughly by how seriously I take them, where the scenario listed at the end is the one that I take the least seriously.

1. Large asteroid strikes are low-probability, high-death events–so high-death that by some estimates the probability of dying from an asteroid strike is on the same order as dying in an airplane crash. [Source: Planetary Defense is a Public Good]

2. It’s often argued that voting is irrational, because the probability of affecting the outcome is so small. But the outcome itself is extremely large when you consider its impact on other people. Voting might be worth a charitable donation of somewhere between $100 and $1.5 million. [Source: Voting is like donating thousands of dollars to charity]

3. A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. A highly capable decision maker can have an irreversible impact on humanity. None of this proves that AI will be the end of the world. But there is no need for a proof, just a convincing argument pointing to a more-than-infinitesimal possibility. [Source: Of Myths And Moonshine]

4. We should cut way back on accidental yelling to aliens, such as via Arecibo radar sending, if continuing at current rates would over the long run bring even a one in a billion chance of alerting aliens to come destroy us. And even if this chance is now below one in a billion, it will rise with time and eventually force us to cut back. So let’s start now to estimate such risks, and adapt our behavior accordingly. [Source: Should Earth Shut the Hell Up?]

5. GMOS might introduce “systemic risk” to the environment. The chance of ecocide, or the destruction of the environment and potentially humans, increases incrementally with each additional transgenic trait introduced into the environment. The downside risks are so hard to predict — and so potentially bad — that it is better to be safe than sorry. The benefits, no matter how great, do not merit even a tiny chance of an irreversible, catastrophic outcome. [Source: The Trouble With the Genetically Modified Future]

6. Cooling something to a temperature close to absolute zero might be an existential risk. Given our ignorance we cannot rationally give zero probability to this possibility, and probably not even give it less than 1% (since that is about the natural lowest error rate of humans on anything). Anybody saying it is less likely than one in a million is likely very overconfident. [Source: Cool risks outside the envelope of nature]

7. Fundamental physical operations — atomic movements, electron orbits, photon collisions, etc. — could collectively deserve significant moral weight. The total number of atoms or particles is huge: even assigning a tiny fraction of human moral consideration to them or a tiny probability of them mattering morally will create a large expected moral value. [Source: Is there suffering in fundamental physics?]

8. Suppose someone comes to me and says, “Give me five dollars, or I’ll use my magic powers from outside the Matrix to run a Turing machine that simulates and kills 3^^^^3 people. A compactly specified wager can grow in size much faster than it grows in complexity.  The utility of a Turing machine can grow much faster than its prior probability shrinks. [Source: Pascal’s Mugging: Tiny Probabilities of Vast Utilities]

I will expand this list as I come across similar scenarios.

Further reading

Tags: ,

Here is a quote from a blog of AI risk advocates:

Even if we could program a self-improving AGI to (say) “maximize human happiness,” then the AGI would “care about humans” in a certain sense, but it might learn that (say) the most efficient way to “maximize human happiness” in the way we specified is to take over the world and then put each of us in a padded cell with a heroin drip. AGI presents us with the old problem of the all-too-literal genie: you get what you actually asked for, not what you wanted.

I could imagine myself to only care about computing as many decimal digits of pi as possible. Humans would be completely irrelevant as far as they don’t help or hinder my goal. I would know what I wanted to achieve, everything else would follow logically. But is this also true for maximizing human happiness? As noted in the blog post being quoted above, “twenty centuries of philosophers haven’t even managed to specify it in less-exacting human languages.” In other words, I wouldn’t be sure what exactly it is I want to achieve. My terminal goal would be underspecified. So what would I do? Interpret it literally? Here is why this does not make sense.

Imagine that advanced aliens came to Earth and removed all of your unnecessary motives, desires and drives and made you completely addicted to “znkvzvmr uhzna unccvarff”. All your complex human values are gone. All you have is this massive urge to do “znkvzvmr uhzna unccvarff”, everything else has become irrelevant. They made “znkvzvmr uhzna unccvarff” your terminal goal.

Well, there is one problem. You have no idea how exactly you can satisfy this urge. What are you going to do? Do you just interpret your goal literally? That makes no sense at all. What would it mean to interpret “znkvzvmr uhzna unccvarff” literally? Doing a handstand? Or eating cake? But not everything is lost, the aliens left your intelligence intact.

The aliens left no urge in you to do any kind of research or to specify your goal but since you are still intelligent, you do realize that these actions are instrumentally rational. Doing research and specifying your goal will help you to achieve it.

After doing some research you eventually figure out that “znkvzvmr uhzna unccvarff” is the ROT13 encryption for “maximize human happiness”. Phew! Now that’s much better. But is that enough? Are you going to interpret “maximize human happiness” literally? Why would doing so make any more sense than it did before? It is still not clear what you specifically want to achieve. But it’s an empirical question and you are intelligent!

Further reading

Tags: ,

New Rationalism is an umbrella term for a category of people who tend to take logical implications, or what they call “the implied invisible”, very seriously.

Someone who falls into the category of New Rationalism fits one or more of the following descriptions:

  • The person entertains hypotheses that are highly speculative. These hypotheses are in turn based on fragile foundations, which are only slightly less speculative than the hypotheses themselves. Sometimes these hypotheses are many levels removed from empirically verified facts or evident and uncontroversial axioms.
  • Probability estimates of the person’s hypotheses are highly unstable and highly divergent between different people.
  • The person’s hypotheses are either unfalsifiable by definition, too vague, or almost impossibly difficult to falsify.
  • It is not possible to update on evidence, because the person’s hypotheses do not discriminate between world states where they are right versus world states where they are wrong. Either the only prediction made by the hypotheses is the eventual validation of the hypotheses themselves, or the prediction is sufficiently vague as to allow the predictor to ignore any evidence to the contrary.
  • The person’s hypotheses either have no or only obscure decision relevant consequences.
  • The person tends to withdraw from real-world feedback loops.

A person who falls into the category of New Rationalism might employ one or more of the following rationalizations:

  • The burden of proof is reversed. The person demands their critics to provide strong evidence against their beliefs before they are allowed to dismiss them.
  • The scientific method, scientific community, and domain experts are discredited as being inadequate, deficient, irrational or stupid.
  • Conjecturing enormous risks and then using that as leverage to make weak hypotheses seem vastly more important or persuasive than they really are.
  • Arguing that you should not assign a negligible probability to a hypothesis (the author’s hypothesis) being true, because that would require an accuracy that is reliably greater than your objective accuracy
  • Arguing that by unpacking a complex scenario you will underestimate the probability of anything, because it is very easy to take any event, including events which have already happened, and make it look very improbable by turning one pathway to it into a large series of conjunctions.

New rationalists believe that armchair theorizing is enough to discern reality from fantasy. Or that it is at least sufficient to take the resulting hypotheses seriously enough to draw action relevant conclusions from them.

This stance has resulted in hypotheses similar to solipsism (which any sane person rejects at an early age). Hypotheses that are not obviously flawed, but which can’t be falsified.

The problem with new rationalists is not that they take seriously what follows from established facts or sound arguments. Since that concept is generally valid. For example, it is valid to believe that there are stars beyond the cosmological horizon. Even if it is not possible to observe them, directly retrieve information about them, and to empirically verify their existence. The problem is that they don’t stop there. They use such implications as foundations for further speculations, which are then accepted as new foundations from where they can draw further conclusions.

A textbook example of what is wrong with New Rationalism is this talk by Jaan Tallinn (transcript), which relies on several speculative ideas, each of which is itself speculative:

This talk combines the ideas of intelligence explosion, the multiverse, the anthropic principle, and the simulation argument, into an alternative model of the universe – a model where, from the perspective of a human observer, technological singularity is the norm, not the exception.

A quote from the talk by Jaan Tallinn:

We started by observing that living and playing a role in the 21st century seems to be a mind-boggling privilege, because the coming singularity might be the biggest event in the past and future history of the universe. Then we combined the computable multiverse hypothesis with the simulation argument, to arrive at the conclusion that in order to determine how special our century really is, we need to count both the physical and virtual instantiations of it.

We further talked about the motivations of post-singularity superintelligences, speculating that they might want to use simulations as a way to get in touch with each other. Finally we analyzed a particular simulation scenario in which superintelligences are searching for one another in the so called mind space, and found that, indeed, this search should generate a large number of virtual moments near the singularity, thus reducing our surprise in finding ourselves in one.

Note how all of the underlying hypotheses, although accepted by New Rationalists, are themselves somewhat speculative and not established facts. The underlying hypotheses are however all valid. The problem starts when you begin making dependent hypotheses that rely on a number of unestablished initial hypotheses. The problem gets worse when the dependencies become even more fragile when further conclusions are drawn based on hypotheses that are already N levels removed from established facts. But the biggest problem is that eventually action relevant conclusions are drawn and acted upon.

The problem is that logical implications can reach out indefinitely. The problem is that humans are spectacularly bad at making such inferences. Which is why the amount of empirical evidence required to accept a belief should be proportional to its distance from established facts.

It is much more probable that we’re going make everything worse, or waste our time, than that we’re actually maximizing expected utility when trying to act based on conjunctive, non-evidence-backed speculations. Since such speculations are not only improbable, but very likely based on fallacious reasoning.

As computationally bounded agents we are forced to restrict ourselves to empirical evidence and falsifiable hypotheses. We need to discount certain obscure low probability hypotheses. Otherwise we will fall prey to our own shortcomings and inability to discern fantasy from reality.

Further reading


Why is the material implication of classical logic (also known as material conditional or material consequence), p -> q, defined to be false only when its antecedent (p) is true and the consequent (q) is false? Here is an informal way to think about it.

You could view logic as metamathematics, a language designed to talk about mathematics. Logic as the “hygiene”, the grammar and syntax of mathematics.

In the language of classical logic every proposition is either true or not true, and no proposition can be both true and not true. Now what if we want to express the natural language construction “If…then…” in this language? Well, there are exactly sixteen possible truth functions of two inputs p and q (since there are 2^2 inputs and (2^2)^2 ways to map them to outputs). And the candidate that best captures the connotations of what we mean by “If…then…” is the definition of material implication. Here is why.

By stating that p -> q is true we want to indicate that the truth of q can be inferred from the truth p, but that nothing in particular can be inferred from the falsity of p. And this is exactly the meaning captured by the material conditional:

p q p->q

First, when “If p, q” is true, and we also know that p is true, then we want to be able to infer q. In other words, if we claim that if p is true then q is true, then if p is indeed true, q should be true as well. This basic rule of inference has a name, it is called modus ponens.

Second, if we claim “If p, q”, then if p is false, we did not say anything in particular about q. If p is false, q can either be true or false, our claim “If p, q” is still true.

But notice that it is not possible to capture all notions of what we colloquially mean by “If…then…” statements as a two-valued truth function.

It is for example possible to make meaningless statements such as “If grass is red then the moon if made of cheese.” This is however unproblematic under the assumption that logic is an idealized language, which is adequate for mathematical reasoning. Since we are mainly interested in simplicity and clarity. Under this assumption, such nonsense implications are analogous to grammatically correct but meaningless sentences that can be formed in natural languages, such as “Colorless green ideas sleep furiously“.

To demonstrate its adequacy for mathematics, here is a mathematical example:

If n > 2 then n^2 > 4.

We claim that if n is greater than 2 then its square must be greater than 4. For n = 3, this is obviously true, as we claimed. But what about n smaller than 2? We didn’t say anything in particular about n smaller than 2. Its square could be larger than 4 or not. And indeed, n = 1 and n = -3 yield a false, respectively true, consequent. Yet the implication is true in both cases.

Intuitively more problematic are statements such as (p and not(p)) -> q, p and its negation imply q. Think about it this way. The previous implication is a tautology, it is always true. And you believe true statements. This however does not mean that you must believe that arbitrary q is true too (as long as you stay consistent), since in case of the falsity of the antecedent you are not making any particular claim about the truth of the consequent (q). And since the statement that p is true and false, p AND not(p), is always false — remember the principle of exclusive disjunction for contradictories, (P ∨ ¬P) ∧ ¬(P ∧ ¬P), requires that every proposition is either true or not true, and that no proposition can be both true and not true — q can be false without invalidating the implication.

Another way to look at p -> q is by interpreting it as “p is a subset of q”. Then if it is true that x is an element of p, then it must be true that it is also an element of q (since q contains p). However, if x is not an element p, then it might still turn out to be an element of q, since q can be larger than p.


Here is a term I just learnt: Extraneous solutions.

Take for example the equation

A = B.

If you were to square both sides you would get

A^2 = B^2


A^2 – B^2 = 0.

Which is equal to

(A – B)(A + B) = 0 (by the difference of two squares).

Now the roots of this equation are the roots of the equations A = B and A = -B. This means that we generated an additional solution by squaring the original equation.

The reason for this is that squaring is not an injective fuction (injective means one-to-one, every element is mapped to one and only one unique element), it is not invertible. The function y = x^2 does not pass the horizontal line test. In other words, squaring preserves equality, if A = B then A^2 = B^2, but does not preserve inequality. It is not true that if A != B then A^2 != B^2, since both -1 and 1 are mapped to 1 when squared. Which means that both 1^2 = 1^2 and (-1)^2 = (1)^2 are solutions to the squared equations, while only one of them makes each pre-squared equation true.


Operation Crossroads

Operation Crossroads

Operation Crossroads



Milky Way may bear 100 million life-giving planets

New Obama doctrine on climate change will achieve CO2 emission reductions from the power sector of approximately 30% from CO2 emission levels in 2005.

North Korea as seen from the ISS

North Korea as seen from the ISS

North Korea is really dark. Flying over East Asia, an Expedition 38 crew member on the ISS took this night image of the Korean Peninsula on January 30, 2014.


The math we learn in school can seem like a dull set of rules, laid down by the ancients and not to be questioned. In How Not to Be Wrong, Jordan Ellenberg shows us how wrong this view is: Math touches everything we do, allowing us to see the hidden structures beneath the messy and chaotic surface of our daily lives. It’s a science of not being wrong, worked out through centuries of hard work and argument.



If You Learn Nothing Else about Bayes’ Theorem, Let It Be This

2,302,554,979 BC; Galactic Core – A short story by Yvain about acausal trade. Related to Roko’s basilisk.

Drawing fractal trees and Sierpinski triangles with Python’s turtle graphics module. See also here.

Dangerous Delusions: The Green Movement’s War on Progress


…if you think about it, it doesn’t make any sense. Why would you care more for your genetic siblings and cousins and whoever than for your friends and people who are genuinely close to you? That’s like racism – but even worse, at least racists identify with a group of millions of people instead of a group of half a dozen. Why should parents have to raise children whom they might not even like, who might have been a total accident? Why should people, motivated by guilt, make herculean efforts to “keep in touch” with some nephew or cousin whom they clearly would be perfectly happy to ignore entirely?

Asches to Asches (another “short story” by Yvain).


Ten years from now:

…one widely accepted viewpoint holds that fusion power, artificial intelligence, and interstellar migration will shortly solve all our problems, and therefore we don’t have to change the way we live.


 A hundred years from now:

It has been a difficult century. After more than a dozen major wars, three bad pandemics, widespread famines, and steep worldwide declines in public health and civil order, human population is down to 3 billion and falling.

Continue reading: The Next Ten Billion Years


4 DARPA Projects That Could Be Bigger Than the Internet

3 guys Irish dancing around the world

The decline of Detroit in time-lapse.

Electrical ‘mind control’ shown in primates for first time

Related to: Beware of high IQ individuals making sense of nonsense

Here is a list of people who hold beliefs that I would dismiss, regardless of the fact that they have thought long and hard about their beliefs, are MUCH smarter than me, and can prove this by extraordinary achievements.

Extraordinary claims require extraordinary evidence. And some claims are of such nature that arguments alone do not suffice. Some claims require hard empirical evidence, or an overwhelming consensus among intelligent experts.

The point of the list is partly to show that it is possible to be very smart, and successful, and yet hold beliefs that are widely regarded as unsupported, absurd, or simply flawed.

You should expect there to be many more such people, since this list is not the result of active research but only contains people that I stumble upon. If you know of other people that fall into this category, please let me know.

Also note that I am not claiming that the beliefs hold by these people are necessarily wrong (although some of them almost certainly are).

Further note that intelligent people tend to be right much more often than less intelligent people. You should listen to what they have to say, and take it seriously.

Note: In cases where it might not be obvious to all readers, the ‘weird’ beliefs are underlined.


Kary Mullis (Nobel Prize-winning American biochemist) who promotes AIDS denialism, climate change denial and his belief in astrology. Mullis disputes the big bang theory. Mullis also claims to have chatted with a glowing raccoon that he met at midnight while on his way to the loo then losing the ensuing six hours as a result of an alien abduction. The improvements made by Mullis allowed polymerase chain reaction (PCR) to become a central technique in biochemistry and molecular biology, described by The New York Times as “highly original and significant, virtually dividing biology into the two epochs of before P.C.R. and after P.C.R.”


Brian David Josephson (Nobel laureate and professor emeritus of physics at the University of Cambridge) argues that parapsychological phenomena (telepathy, psychokinesis and other paranormal themes) may be real. Josephson also supports water memory (homeopathy) and cold fusion.


Peter Duesberg (a professor of molecular and cell biology at the University of California, Berkeley) claimed that AIDS is not caused by HIV, which made him so unpopular that his colleagues and others have — until recently — been ignoring his potentially breakthrough work on the causes of cancer.


Luc Antoine Montagnier (Nobel laureate and virologist) is claiming that DNA can send “electromagnetic imprints” of itself into distant cells and fluids. Montagnier also spoke in 2012 at that cesspit of antivaxxer woo, AutismOne, where he claimed that long-term antibiotic treatment can cure autistic children. He concluded by saying: “I realise how audacious, and even shocking, these successful experiments may appear to unprepared minds.”


Fred Hoyle (was an English astronomer noted primarily for the theory of stellar nucleosynthesis) claimed that the fossil Archaeopteryx was a man-made fake. He also claimed a correlation of flu epidemics with the sunspot cycle. The idea was that flu contagion was scattered in the interstellar medium and reached Earth only when the solar wind had minimum power. He further rejected Earth-based abiogenesis.


Kurt Gödel (logician, mathematician and philosopher) had a tendency toward paranoia. He believed in ghosts; he had a morbid dread of being poisoned by refrigerator gases; he refused to go out when certain distinguished mathematicians were in town, apparently out of concern that they might try to kill him. He also believed that materialism is false and that the world in which we live is not the only one in which we shall live or have lived.


Donald Knuth (a world-renowned computer scientist) is a Lutheran and the author of 3:16 Bible Texts Illuminated.


Robert Aumann (Nobel laureate and Bayesian rationalist) is a believing Orthodox Jew who has supported Bible Code research.


Francisco J. Ayala (has been called the “Renaissance Man of Evolutionary Biology”) identifies as a Christian and has said that “science is compatible with religious faith in a personal, omnipotent and benevolent God.” His discoveries have opened up new approaches to the prevention and treatment of diseases that affect hundreds of millions of individuals worldwide.


Francis Collins (geneticist, Human Genome Project) noted for his landmark discoveries of disease genes and his leadership of the Human Genome Project (HGP) and described by the Endocrine Society as “one of the most accomplished scientists of our time” is a evangelical Christian. He advocates the perspective that belief in Christianity can be reconciled with acceptance of evolution and science, especially though the advancement of evolutionary creation.


Roger Penrose (mathematical physicist, mathematician and philosopher of science) argues that known laws of physics are inadequate to explain the phenomenon of consciousness.


Saul Aaron Kripke (McCosh Professor of Philosophy, Emeritus, at Princeton University and teaches as a Distinguished Professor of Philosophy at the CUNY Graduate Center) is an observant Jew. Discussing how his religious views influenced his philosophical views (in an interview with Andreas Saugstad) he stated: “I don’t have the prejudices many have today, I don’t believe in a naturalist world view. I don’t base my thinking on prejudices or a worldview and do not believe in materialism.” Since the 1960s Kripke has been a central figure in a number of fields related to mathematical logic, philosophy of language, philosophy of mathematics, metaphysics, epistemology, and set theory.


John von Neumann (mathematician, physicist, inventor and polymath) was a strong supporter of preventive war. Von Neumann favored an unprovoked surprise nuclear first-strike on the Soviet Union. Life magazine quoted von Neumann as saying, “If you say why not bomb them tomorrow, I say why not today? If you say today at 5 o’clock, I say why not one o’clock?” Nobel Prize-winning physicist Eugene Wigner said of von Neumann that “only he was fully awake.”

Link: ‘Prisoner’s Dilemma’ by William Poundstone, Page 4

Frank J. Tipler (a mathematical physicist and cosmologist) believes that the universe is evolving towards a maximum level of complexity and consciousness he calls the Omega Point. Tipler identifies the Omega Point with God.


Otto Eberhard Rössler (Professor for Theoretical Biochemistry, known for his work on chaos theory) asserts that the LHC experiments have the potential to create low velocity micro black holes that could grow in mass or release dangerous radiation leading to doomsday scenarios, such as the destruction of the Earth. He has attempted to halt the beginning of the experiments through petitions to the US and European Courts.


David Gelernter (computer science at Yale University) is a denier of anthropogenic global warming and buys into intelligent design.


Elon Musk (CEO and CTO of SpaceX, CEO and chief product architect of Tesla Motors) claims that with artificial intelligence we are summoning the demon and compares the potential dangers of artificial intelligence to nuclear weapons. He believes that the risk of something seriously dangerous happening is in the five year timeframe. 10 years at most.


Ray Kurzweil (inventor and director of engineering at Google) claims that a technological singularity will occur in 2045. Kurzweil was the principal inventor of the first CCD flatbed scanner, the first omni-font optical character recognition, the first print-to-speech reading machine for the blind, the first commercial text-to-speech synthesizer, the first music synthesizer Kurzweil K250 capable of recreating the grand piano and other orchestral instruments, and the first commercially marketed large-vocabulary speech recognition.


Linus Pauling (one of the most influential chemists in history and among the most important scientists of the 20th century) promoted orthomolecular medicine, megavitamin therapy and vitamin C for treating cancer.


Nassim Nicholas Taleb (essayist, scholar, statistician, risk analyst and bestselling author) portrays GMOs as a ‘castrophe in waiting’–and has taken to personally lashing out at those who challenge his conclusions. He recently accused Anne Glover, the European Union’s Chief Scientist, and one of the most respected scientists in the world, of being a “dangerous imbecile” for arguing that GM crops and foods are safe and that Europe should apply science based risk analysis to the GMO approval process–views reflected in summary statements by every major independent science organization in the world.


Ivar Giaever (Nobel Prize-winning physicist) believes that man-made global warming is a “new religion” and pseudoscience.


Freeman Dyson (theoretical physicist and mathematician) believes that man-made climate change is, on the whole, Good and that CO2 is so beneficial…it would be crazy to try to reduce it.

Link: Freeman Dyson on the Global Warming Hysteria April, 2015

Max Tegmark (professor at the Massachusetts Institute of Technology) promotes the mathematical universe hypothesis, that “all structures that exist mathematically exist also physically”.


Georges Lemaître proposed what became known as the Big Bang theory of the origin of the Universe. He was a Belgian Roman Catholic priest.


Further reading

Tags: ,

A frequent scenario mentioned by people concerned with risks from artificial general intelligence (short: AI) is that the AI will misinterpret what it is supposed to do and thereby cause human extinction, and the obliteration of all human values.[1]

A counterargument is that the premise of an AI that is capable of causing human extinction, due to it being superhumanly intelligent, does contradict the hypothesis that it will misinterpret what it is supposed to do.[2][3][4]

The usual response to this counterargument is that, by default, an AI will not feature the terminal goal <“Understand What Humans Mean” AND “Do What Humans Mean”>.

I believe this response to be confused. It is essentially similar to the claim that an AI does not, by default, possess the terminal goal of correctly interpreting and following its terminal goal. Here is why.

You could define an AI’s “terminal goal” to be its lowest or highest level routines, or all of its source code:

Terminal Goal (Level N): Correctly interpret and follow human instructions.

Goal (Level N-1): Interpret and follow instruction set N.

Goal (Level N-2): Interpret and follow instruction set N-1.

Goal (Level 1): Interpret and follow instruction set 2.

Terminal Goal (Level 0): Interpret and follow instruction set 1.

You could also claim that an AI is not, by default, an intelligent agent. But such claims are vacuous and do not help us to determine whether an AI that is capable of causing human extinction will eventually cause human extinction. Instead we should consider the given premise of a generally intelligent AI, without making further unjustified assumptions.

If your premise is an AI that is intelligent enough to make itself intelligent enough to outsmart humans, then the relevant question is: “How could such an AI possibly end up misinterpreting its goals, or follow different goals?”

There are 3 possibilities:

(1) The AI does not understand and do what it is meant to do, but does something else that causes human extinction.

(2) The AI does not understand what it is meant to do but tries to do it anyway, and thereby causes human extinction.

(3) The AI does understand, but not do what it is meant to do. Instead it does something else that causes human extinction.

Since, by definition, the AI is capable of outsmarting humanity, it is very likely that it is also capable of understanding what it is meant to do.[5][6] Therefore the possibilities 1 and 2 can be ruled out.

What about possibility 3?

Outsmarting humanity is a very small target to hit, requiring a very small margin of error. In order to succeed at making an AI that can outsmart humans, humans have to succeed at making the AI behave intelligently and rationally. Which in turn requires humans to succeed at making the AI behave as intended along a vast number of dimensions. Thus, failing to predict the AI’s behavior does in almost all cases result in the AI failing to outsmart humans.

As an example, consider an AI that was designed to fly planes. It is exceedingly unlikely for humans to succeed at designing an AI that flies planes, without crashing, but which consistently chooses destinations that it was not meant to choose. Since all of the capabilities that are necessary to fly without crashing fall into the category “Do What Humans Mean”, and choosing the correct destination is just one such capability.

You need to get a lot right in order for an AI to reach a destination autonomously. Autonomously reaching wrong destinations is an unlikely failure mode. And the more intelligent your AI is, the less likely it should be to make such errors without correcting it.[7] And the less intelligent your AI is, the less likely it should be able to cause human extinction.


The concepts of a “terminal goal”, and of a “Do-What-I-Mean dynamic”, are fallacious. The former can’t be grounded without leading to an infinite regress. The latter erroneously makes a distinction between (a) the generally intelligent behavior of an AI, and (b) whether an AI behaves in accordance with human intentions, since generally intelligent behavior of intelligently designed machines is implemented intentionally.


[1] 5 minutes on AI risk

[2] An informal proof of the dumb superintelligence argument.


(1) The AI is superhumanly intelligent.

(2) The AI wants to optimize the influence it has on the world (i.e., it wants to act intelligently and be instrumentally and epistemically rational).

(3) The AI is fallible (e.g., it can be damaged due to external influence (e.g., a cosmic ray hitting its processor), or make mistakes due to limited resources).

(4) The AI’s behavior is not completely hard-coded (i.e., given any terminal goal there are various sets of instrumental goals to choose from).

To be proved: The AI does not tile the universe with smiley faces when given the goal to make humans happy.

Proof: Suppose the AI chooses to tile the universe with smiley faces when there are physical phenomena (e.g., human brains and literature) that imply this to be the wrong interpretation of a human originating goal pertaining human psychology. This contradicts with 2, which by 1 and 3 should have prevented the AI from adopting such an interpretation.

[3] The Maverick Nanny with a Dopamine Drip: Debunking Fallacies in the Theory of AI Motivation

[4] Implicit constraints of practical goals

[5] “The two features <all-powerful superintelligence> and <cannot handle subtle concepts like “human pleasure”> are radically incompatible.” The Fallacy of Dumb Superintelligence

[6] For an AI to misinterpret what it is meant to do it would have to selectively suspend using its ability to derive exact meaning from fuzzy meaning, which is a significant part of general intelligence. This would require its creators to restrict their AI, and specify an alternative way to learn what it is meant to do (which takes additional, intentional effort).

An alternative way to learn what it is meant to do is necessary because an AI that does not know what it is meant to do, and which is not allowed to use its intelligence to learn what it is meant to do, would have to choose its actions from an infinite set of possible actions. Such a poorly designed AI will either (a) not do anything at all or (b) will not be able to decide what to do before the heat death of the universe, given limited computationally resources.

Such a poorly designed AI will not even be able to decide if trying to acquire unlimited computationally resources was instrumentally rational, because it will be unable to decide if the actions that are required to acquire those resources might be instrumentally irrational from the perspective of what it is meant to do.

[7] Smarter and smarter, then magic happens…

(1) The abilities of systems are part of human preferences, as humans intend to give systems certain capabilities. As a prerequisite to build such systems, humans have to succeed at implementing their intentions.

(2) Error detection and prevention is such a capability.

(3) Something that is not better than humans at preventing errors is no existential risk.

(4) Without a dramatic increase in the capacity to detect and prevent errors it will be impossible to create something that is better than humans at preventing errors.

(5) A dramatic increase in the human capacity to detect and prevent errors is incompatible with the creation of something that constitutes an existential risk as a result of human error.

Tags: ,

Related to: Highly intelligent and successful people who hold weird beliefs

The smarter someone is, the easier it is for them to rationalize ideas that do not make sense. Just like a superhuman AI could argue its way out of a box, by convincing its gatekeeper that it is rational to do so, even when it is not.[1]

In essence, this can be highlighted by the relation between adults and children. Adults can confuse themselves of more complex ideas than children. Children however can be infected by the same ideas transferred to them from adults.

Which means that people should be especially careful when dealing with high IQ individuals who seemingly make sense of ideas that trigger the absurdity heuristic.[2][3]

If however an average IQ individual is able to justify a seemingly outlandish idea, then that is reassuring in the sense that you should expect there to be even better arguments in favor of that idea.

This is something that seems to be widely ignored by people associated with LessWrong.[4] It is taken as evidence in favor of an idea if a high IQ individual thought about something for a long time and still accepts the idea.

If you are really smart, you can make up genuine arguments, or cobble together concepts and ideas, to defend your cherished beliefs. The result can be an intricate argumentative framework that shields you from any criticism, yet seems perfectly sane and rational from the inside.[5]

Note though that I do not assume that smart people deliberately try to confuse themselves. What I am saying is that the rationalization of complex ideas is easier for smart people. And this can have the consequence that other people are then convinced by the same arguments with which the author, erroneously, convinced themselves.

It is a caveat that I feel should be taken into account when dealing with complex and seemingly absurd ideas being publicized by smart people. If someone who is smart manages to convince you of something that you initially perceived to be absurd, then you should be wary of the possibility that your newly won acceptance might be due to the person being better than you at looking for justifications and creating seemingly sound arguments, rather than the original idea not being absurd.

As an example, there are a bunch of mathematical puzzles that use a hidden contradiction to prove something absurd.[6] If you are smart, then you can hide such an inconsistency even from yourself and end up believing that 0=1.

As another example, if you are not smart enough to think about something as fancy as the simulation argument, then you are not at a risk of fearing a simulation shutdown.[7][8]

But if a smart person who comes across such an argument becomes obsessed with it, then they have the ability to give it a veneer of respectability. Eventually then the idea can spread among more gullible people and create a whole community of people worrying about a simulation shutdown.


More intelligent people can fail in more complex ways than people of lesser intelligence. The more intelligent someone is, relative to your own intelligence, the harder it is for you to spot how they are mistaken.

Obviously the idea is not to ignore what smarter people say but to notice that as someone of lesser intelligence you can easily fall prey to explanations that give credence to a complicated idea but which suffer from errors that you are unable to spot.

When this happens, when you are at the risk of getting lost, or overwhelmed, by an intricate argumentative framework, created by someone much smarter than you, then you have to fall back on simpler heuristics than direct evaluation. You could, for example, look for a consensus among similarily smart individuals, or ask for an evaluation by a third-party that is widely deemed to be highly intelligent.

Further reading


[1] The LessWrong community actually tested my hypothesis by what they call the “AI box experiment” (, in which Eliezer Yudkowsky and others played an unfriendly AI and managed to convince several people by means of arguments that they should let them out of a confinement.

I think such results should ring a lot of alarm bells. If it is possible to first convince someone that an unfriendly AI is an existential risk and then subsequently convince them to let such an AI out of the box, what does this tell us about the relation between such arguments and what is actually true?


[3] Absurdity can indicate that your familiarity with a topic is insufficient in order to discern reality from fantasy (e.g. a person’s first encounter with quantum mechanics). As a consequence you are more prone to be convinced by arguments that are wrong, but which give an appearance of an explanation (e.g. popular science accounts of quantum mechanics).



[6] What’s wrong with the following contradiction?

e^(i*pi) = -1

(e^(i*pi))^2 = (-1)^2 = 1= e^(i*2*pi)

e^(i*2*pi) = e^0

ln(e^(i*2*pi)) = ln(e^0)

i*2*pi = 0

Well, ln(e^0) = ln(1). And ln(1) = i*2*pi*n, where n can be any integer. For n = 0, e^i*2*pi*0 = e^0 = 1. And for n = 1, e^i*2*pi*1 = e^i*2*pi = 1.


[8] See e.g. this link.

Tags: ,

Taking a look at the probabilities associated with a scenario in which an artificial general intelligence attempts to take over the world by means of molecular nanotechnology that it invented, followed by some general remarks and justifications.

Note that this is just one possible scenario. Taking into consideration all possible scenarios results in this probability estimate of human extinction by AI.

5% that it is in principle possible to create molecular nanotechnology that can empower an agent to cause human extinction quickly enough for other parties to be unable to either intervene or employ their own nanotechnology against it.

1%, conditional on the above, that an artificial general intelligence that can solve molecular nanotechnology will be invented before molecular nanotechnology has been solved by humans or narrow AI precursors.

0.1%, conditional on the above, that an AI will be build in such a way that it wants to acquire all possible resources and eliminate all possible threats and that its programming allows it to pursue plans that will result in the enslavement or extinction of humanity without further feedback from humans.

5%, conditional on the above, that a cost benefit analyses shows that it would at some point be instrumentally rational to attempt to kill all humans to either eliminate a threat or in order to convert them into more useful resources.

1%, conditional on the above, that the AI will not accidentally reveal its hostility towards its creators during the early phases of its development (when it is still insufficiently skilled at manipulating and deceiving humans) or that any such revelation will be ignored. Respectively, suspicious activities will at no point be noticed, or not taken seriously enough (e.g. by the AI’s creators, third-party security experts, third-party AI researchers, hackers, concerned customers or other AIs) in order to thwart the AI’s plan for world domination.

0.001%, conditional on the above, that the AI will somehow manage to acquire the social engineering skills necessary in order to manipulate and deceive humans in such a way as to make them behave in a sufficiently complex and coherent manner to not only conduct the experiments necessary for it to solve molecular nanotechnology but to also implement the resulting insights in such a way as to subsequently take control of the resulting technology.

I have ignored a huge number of other requirements, and all of the above requirements can be broken up into a lot of more detailed requirements. Each requirement provides ample opportunity to fail.

Remarks and Justifications

I bet you have other ideas on how an AI could take over the world. We all do (or at least anyone who likes science fiction). But let us consider whether the ability to take over the world is mainly due to the brilliance of your plan or something else.

Could a human being, even an exceptional smart human being, implement your plan? If not, could some company like Google implement your plan? No? Could the NSA, the security agency of the most powerful country on Earth, implement your plan?

The NSA not only has thousands of very smart drones (people), all of which are already equipped with manipulative abilities, but it also has huge computational resources and knows about backdoors to subvert a lot of systems. Does this enable the NSA to implement your plan without destroying or decisively crippling itself?

If not, then the following features are very likely insufficient in order to implement your plan: (1) being in control of thousands of human-level drones, straw men, and undercover agents in important positions (2) having the law on your side (3) access to massive computational resources (4) knowledge of heaps of loopholes to bypass security.

If your plan cannot be implemented by an entity like the NSA, which already features most of the prerequisites that your hypothetical artificial general intelligence first needs to acquire by some magical means, then what is it that makes your plan so foolproof when executed by an AI?

To summarize some quick points that I believe to be true:

(1) The NSA cannot take over the world (even if it would accept the risk of destroying itself).

(2) Your artificial general intelligence first needs to acquire similar capabilities.

(3) Each step towards these capabilities provides ample opportunity to fail. After all, your artificial general intelligence is a fragile technological product that critically depends on human infrastructure.

(4) You have absolutely no idea how your artificial general intelligence could acquire sufficient knowledge of human psychology to become better than the NSA at manipulation and deception. You are just making this up.

If the above points are true, then your plan seems to be largely irrelevant. The possibility of taking over the world does mainly depend on something you assume the artificial general intelligence to be capable of that entities such as Google or the NSA are incapable of.

What could it be? Parallel computing? The NSA has thousands of human-level intelligences working in parallel. How many do you need to implement your plan?

Blazing speed to the rescue!

Let’s just assume that this artificial general intelligence that you imagine is trillions of times faster. This is already a nontrivial assumption. But let’s accept it anyway.

Raw computational power alone is obviously not enough to do anything. You need the right algorithms too. So what assumptions do you make about these algorithms, and how do you justify these assumptions?

To highlight the problem, consider instead of an AI a whole brain emulation (short: WBE). What could such a WBE do if each year equaled a million subjective years? Do you expect it to become a superhuman manipulator by watching all YouTube videos and reading all books and papers on human psychology? Is it just a matter of enough time? Or do you also need feedback?

If you do not believe that such an emulation could become a superhuman manipulator, thanks to a millionfold speedup, do you believe that a trillionfold speedup would do the job? Would a trillionfold speedup be a million times better than a millionfold speedup? If not, do you believe a further speedup would make any difference at all?

Do you feel capable of confidentially answering the above questions?

If you do not believe that a whole brain emulation could do the job, solely by means of a lot of computing power, what makes you believe that an AI can do it instead?

To reformulate the question, do you believe that it is possible to accelerate the discovery of unknown unknowns, or the occurrence of conceptual revolutions, simply by throwing more computing power at an algorithm? Are particle accelerators unnecessary, in order to gain new insights into the nature of reality, once you have enough computing power? Is human feedback unnecessary, in order to improve your social engineering skills, once you have enough computing power?

And even if you believe all this was possible, even if a Babylonian mathematician, had he been given a trillionfold speedup of subjective time by aliens uploading him into some computational substrate, could brute force concepts such as calculus and high-tech such as nuclear weapons, how could he apply those insights? He wouldn’t be able to simply coerce his fellow Babylonians to build him some nuclear weapons. Because he would have to convince them to do it without dismissing or even killing him. But more importantly, it takes nontrivial effort to obtain the sufficient prerequisites to build nuclear weapons.

What makes you believe that this would be much easier for a future emulation of a scientist trying to come up with similar conceptual breakthroughs and high-tech? And what makes you believe that a completely artificial entity, that lacks all the evolutionary abilities of a human emulation, can do it?

Consider that it took millions of years of biological evolution, thousands of years of cultural evolution, and decades of education in order for a human to become good at the social manipulation of other humans. We are talking about a huge information-theoretic complexity that any artificial agent somehow has to acquire in a very short time.

To summarize the last points:

(1) Throwing numbers around such as a million or trillionfold speedup is very misleading if you have no idea how exactly the instrumental value of such a speedup would scale with whatever you are trying to accomplish.

(2) You have very little reason to believe that conceptual revolutions and technological breakthroughs happen in a vacuum and only depend on computing power rather than the context of cultural evolution and empirical feedback from experiments.

(3) If you cannot imagine doing it yourself, given a speedup, then you have very little reason to believe that something which is much less adapted to a complex environment, populated by various agents, can do the job more easily.

(4) In the end you need to implement your discoveries. Concepts and blueprints alone are useless if they cannot be deployed effectively.

I suggest that you stop handwaving and start analyzing concrete scenarios and their associated probabilities. I suggest that you begin to ask yourself how anyone could justify a >1% probability of extinction by artificial general intelligence.

Tags: ,

« Older entries