Using Math to deal with Moral Uncertainty

There is a standard result that if a "rational" agent is uncertain about what the outcome of events will be, (i.e. they have to choose between two "lotteries") then they should maximize the expectation of some utility function. Formally, if we define a lottery as $L=\sum_i p_i O_i$ where $\{O_i\}$ are the outcomes and $\{p_i\}$ their associated probabilities, then for any "rational" preference ordering $\preceq$ there is a utility function $u$ such that
$$E\left[u(L)\right]\leq E\left[u(L')\right] \leftrightarrow L \preceq L'$$
Traditionally, this is used when people aren't certain about what the outcomes of their actions will be. However, I recently attended an interesting presentation by Brian Hedden where he discussed using this in cases of normative uncertainty, i.e. in cases when we know what the outcome of our actions will be, but we just don't know what the correct thing to value is.

An analog to equation (1) in this case is to introduce ethical theories $T_1,\dots,T_n$ to which we might subscribe and $u_i(o)$ the value of an outcome $o$ under theory $T_i$ and then ask whether there is a utility function $u$ such that for $M(o) = \sum_i p(T_i)u_i(o)$ we have:
$$M(o)\leq M(o') \leftrightarrow o \preceq o'$$
Brian referred to this "meta-" theory as Maximize InterTheoretical Expectation or MITE. He believes that
There are moral theories which it can be rational to take seriously, such that if you do take them seriously, MITE cannot say anything about what you super-subjectively ought to do, given your normative uncertainty.
I show here that:
  1. Contrary to Brian's argument, a MITE function always exists.
  2. Furthermore, the output of this function is always just a vector of real numbers


The basis of this post is the fact that we can generalize the above equation (2) to an arbitrary ordered group $G=(\Omega,+,\leq)$. Rather than bore the reader with a recitation of the group axioms, I will just point the reader to Wikipedia and point out that the possibly questionable assumption here is existence of inverses (i.e. the claim that for any lottery $L$ there is a lottery $L'$ such that the agent is indifferent between participating in both lotteries and neither).1

There are probably prettier ways of doing this, but here's a simple way of defining a group which is guaranteed to work. Let's say that:

  • Each theory $T_i$ has some set of possible values $V_i$ and that we can find the (intratheoretic) value of an outcome via $u_i:\mathcal{O}\to V_i$. Crucially, we are not claiming that these values are in any way comparable to each other. ($u_i$ is guaranteed to exist because it could just be the identity function.)
  • $\Omega_i = \mathbb R \times V_i$ is a tuple which joins the probability of an outcome with its value. 
  • $\Omega =\prod_i \Omega_i$ and that $\pi_i:\Omega_i\hookrightarrow \Omega$ is the canonical embedding (i.e. $\pi_i(\omega)$ is zero everywhere except it puts $\omega$ into the $i$th position).
  • $G=(\Omega, +)$ with addition being defined element wise 

Theorem 1: For any partial order $\preceq\in \Omega\times \Omega$, $G$ satisfies (2).

Proof: It's clear that
$$M(o)=\sum_i \pi_i \left(p(T_i), u_i(o)\right)$$
will just embed the information into G, which can easily inherit the order. Of course if we are really dedicated to the notation in (2) we can define $x\cdot y = \pi(x,y)$ and then get
$$M(o)=\sum_i p(T_i) \cdot u_i(o)$$

So what?

So far we've managed to show that you can redefine addition to mean whatever you want, and therefore utility functions will basically always exist. But it will turn out that we are actually dealing with some pretty standard groups here.

First, a little commentary on terms. One of the major objections Brian raises is the notion of "options", i.e. the fact that in certain moral theories we have "optional" things and "required" things. For example we might say that donating to charities is optional but not murdering people is required. Furthermore, these types of goods bear a non-Archimedean relationship to each other – that is, no amount of donating to charity can offset a murder.

For any ordered group $G$ there is a chain of subgroups $C_1\subset C_2\subset\dots\subset G$ such that each $C_i$ is "convex". Convex subgroups represents this notion of "optionality": $C_1$ represents all the "optional" things, $C_2$ is everything that is either required or optional, etc. Note that I am not assuming anything new here; it is a standard result that the set of all convex subgroups form a chain in any ordered group (see Glass, Lemma 3.2.1).

Theorem 2: Our above group can be order-embedded into a subset of $\mathbb R ^n$ ordered lexically, i.e. we are just dealing with a set of vectors where each component of the vector is a real number. Furthermore, the number of components in the vector is identical to the number of "degrees" of optionality.
Proof: This is the Hahn embedding theorem. $\square$

Corollary: if (and only if!) none of our theories that we give credence to have "optionality", then we are just dealing with the real numbers.


The above was really abstract, so it's reasonable to ask for an example. But before I do that I would like to give a standard math joke:
(Prof. finishes proving Liouville's theorem that any bounded entire function is constant.)
Student: I'm not sure I really understand. Could you give an example?
Prof.: Sure. 7.
(Prof. goes back to writing on the blackboard.)
The joke here is that $f(x)=7$ is "obviously" a constant function whereas the student somehow wanted a more exotic example. But the professor had just proven that no such examples exist!

So I will give some examples which the astute reader will point out are "obviously" instances of lexically ordered vectors of real numbers. This is because I have just proven that there are no other examples. Hopefully it will still be useful.

First, let's discuss how just satisficing consequentialism by itself is a lexically ordered vector. Consider the decision criterion that $(x_1,x_2)\leq (y_1,y_2)$ if and only if $x_2< y_2$ or both $(x_2 = y_2)$ and $(x_1\leq y_1)$ (i.e. it is lexically ordered from the right). So we could for example represent giving a thousand dollars to charity as $(1000,0)$ and murdering someone as $(0,-10000)$; this gives us our desired result that no amount of donations can offset a murder (i.e. $(x,-10000)\prec(0,0)$ for all $x$). And of course this is a vector of real numbers which is lexically ordered, in accordance with our theorem.

Now let's contrast this with standard utilitarianism, which would say that murdering someone could be offset by donating enough money to charity to prevent someone from dying. Let's call that amount $\$$10,000 (i.e. murdering someone has -10,000 utils). There are no "optional" things in standard utilitarianism, so we can write this as $(0,u)$ where $u$ is the utility of the outcome. In this case we have that $(0,x-10,000)\succ (0,0)$ if $x\geq 10,000$, i.e. donations greater than $\$$10,000 offset a murder.

Now let's ask about the inter-theoretic uncertainty case. We have to choose between either doing nothing or murdering someone and donating $\$$15,000 to charity. We believe in satisficing consequentialism with probability $p$ and in standard utilitarianism with probability $1-p$. Therefore we have
p(15000,-10000) + (1-p)(0, 5000) & = (15000p,-10000p + 5000(1-p)) \\
& = (15000p,5000-15000p)
$$ This is strongly preferred to the $(0,0)$ option if $p< 1/3$; if $p=1/3$ exactly then it is weakly preferred.

This isn't the only way we can make inter-theoretic comparisons. I actually don't even think it's the best way. But is one example where we're using a lexically ordered vector of real numbers, and all other examples will be similar.

A Counterexample

It may be useful to construct a decision criterion which can't be represented using a MITE formula. (Obviously, it will have to disobey one of the ordered-group axioms due to theorem 1.)

Here's one example:
Let's say we represent an outcome having deontological value $d$ and utility $u$ as $(d,u)$ and we believe deontology with probability $p$. Then $(d_1,u_1)\preceq (d_2,u_2)$ if and only if $p(u_1\mod d_1)\leq p(u_2\mod d_2)$.
This is not order-preserving because sometimes increasing utility is good but other times increasing utility is bad. So it doesn't make up an ordered group.


Brian took as his definition of "rational" the standard von Neumann-Morgenstern axioms. This is of course a perfectly reasonable thing to do in general, but as he points out many individual moral theories fail these axioms. (Insert joke here about utilitarianism being the only "rational" moral system.)

I personally find the idea of optionality pretty stupid and think it causes all sorts of problems even without needing to compare it to other theories. But if you do want to give it some credence, then a MITE formula will work fine for you.


  1. Note that this also requires "modding out" by an indifference relation

Ridiculous math things which Ethics shouldn't depend on but does

There is a scene in Gulliver's Travels where the protagonist calls up the ghosts of all the philosophers since Aristotle, and the ghosts all admit that Aristotle was way better than them at everything. Especially Descartes – Jonathan Swift wants to make very clear that Aristotle is a way better philosopher than Descartes, and that all of Descartes's ideas are stupid. (I think this was supposed to prove a point in some long-forgotten religious dispute.)

If I ever become a prominent philosopher and we develop the technology to call up ghosts in order to win points in literary holy wars (I will let the reader decide which of those two conditions is more likely), please reincarnate me to talk ethics with Aristotle. Basically all the problems I'm worried about deal with mathematical concepts which weren't developed until around a century ago, and I'm excited to hear whether a virtuous person would accept Zorn's Lemma.

Today I want to share two mathematical assumptions which are so esoteric that even most mathematicians don't bother worrying about them. Despite that, they actually critically influence what we think about ethics.

The Axiom of Choice

The Axiom of Choice is everyone's favorite example of something which seems like an innocuous assumption but isn't. (The Axiom of Choice is the axiom of choice for such situations, if you will.) Here's Wikipedia's informal description:
The axiom of choice says that given any collection of bins, each containing at least one object, it is possible to make a selection of exactly one object from each bin.
Seems pretty reasonable right? Unfortunately, it leads to a series of paradoxes like that any ball can be doubled into two balls, both of which have the same size as the first.

In many cases, a weaker assumption known as the "axiom of dependent choice" suffices and has the advantage of not leading to any (known) paradoxes. Sadly, this doesn't work for ethics.

Consider the two following reasonable assumptions:

  1. Weak Pareto: if we can make someone better off and no one worse off, we should.
  2. Intergenerational Equality: we should value the welfare of every generation equally.

Theorem (proven by Zame): we cannot prove the existence of an ethical system which satisfies both Weak Pareto and Intergenerational Equality without using the axiom of choice (i.e. the axiom of dependent choice doesn't work).

Sorry grandma, but unless you can make that ball double in size we're gonna have to start means-testing Medicare

Hyperreal numbers

The observant reader will note that the previous theorem showed only that we could prove the existence of a "good" ethical system if we use the axiom of choice, it didn't say anything about us actually being able to find it. To get that we have to enter the exciting world of hyperreal numbers!

The founding fathers weren't as impressed with Thomas Jefferson's original nonconstructive proof that the Bill of Rights could, in theory, be created

I recently asked my girlfriend whether she would prefer:
  1. Having one unit of happiness every day, for the rest of eternity, or
  2. Having two units of happiness every day, for the rest of eternity
She told me that the answer was obvious: she's a total utilitarian and in the first circumstance she would have one unit of happiness for an infinite amount of time, i.e. one infinity's worth of happiness. But in the second case she would have two units for an infinite amount of time, i.e. two infinities of happiness. And clearly two infinities are bigger than one.

My guess is that how reasonable you think this statement is will depend in a U-shaped way on how much math you've learned:

To the average Joe, it's incredibly obvious that two infinities are bigger than one. More advanced readers will note that the above utility series don't converge, so it's not even meaningful to talk about one series being bigger than another. But those who've dealt with the bizarre world of nonstandard analysis know that notions like "convergence" and "limit" are conspiracies propagated by high school calculus teachers to hide the truth about infinitesimals. In fact, there is a perfectly well-defined sense in which two infinities are bigger than one, and the number system which this gives rise to is known as the "hyperreal numbers."

From an ethical standpoint, here are the relevant things you need to know:

Theorem (proven by Basu and Mitra): if we use only our normal "real" numbers, then we can't construct an ethical system which obeys the above Weak Pareto and Intergenerational Equality assumptions.
Theorem (proven by Pivato): we can find such a system if we use the hyperreal numbers.

To any TV producers reading this: the success of the hyperreal approach over the "standard calculus" approach would make me an excellent soft-news-show guest. While most stations can drum up some old crotchety guy complaining about how schools are corrupting the minds of today's youths, only I can actually prove that calculus teaches kids to be unethical.

Conclusion / Apologies / Further Reading

As far as the laws of mathematics refer to reality, they are not certain; as far as they are certain, they do not refer to reality. - Einstein
It goes without saying that I've heavily simplified the arguments I've cited, and any mistakes are mine. If you are interested in using logical reasoning to improve the world, then you should check out Effective Altruism. If you are more of a "nonconstructive altruist" then you can do a Google scholar search for "sustainable development" or read the papers cited below to learn more.

And most importantly: if you are student who is being punished for misbehaving in a calculus class, please 1) tell your teacher the Basu-Mitra-Pivato result about how calculus causes people to disrespect their elders and 2) film their reaction and put it on YouTube. (Now that's effective altruism!)

  • Basu, Kaushik, and Tapan Mitra. "Aggregating infinite utility streams with intergenerational equity: the impossibility of being Paretian." Econometrica 71.5 (2003): 1557-1563.
  • Pivato, Marcus. "Sustainable preferences via nondiscounted, hyperreal intergenerational welfare functions." (2008).
  • ZAME, WILLIAM R. "Can intergenerational equity be operationalized?."Theoretical Economics 2 (2007): 187-202.

If you want to start a startup, go work for someone else

When you look online for advice about entrepreneurship, you will see a lot of "just do it":
The best way to get experience... is to start a startup. So, paradoxically, if you're too inexperienced to start a startup, what you should do is start one. That's a way more efficient cure for inexperience than a normal job. - Paul Graham, Why to Not Not Start a Startup
There is very little you will learn in your current job as a {consultant, lawyer, business person, economist, programmer} that will make you better at starting your own startup. Even if you work at someone else’s startup right now, the rate at which you are learning useful things is way lower than if you were just starting your own. -  David Albert, When should you start a startup?
This advice almost never comes with citations to research or quantitative data, from which I have concluded:
The sort of person who jumps in and gives advice to the masses without doing a lot of research first generally believes that you should jump in and do things without doing a lot of research first. 
As readers of this blog know, I don't believe in doing anything without doing a ton of research first, and have therefore come to the surprising conclusion that the best way to start a startup is by doing a lot of background research first.

Specifically, I would make two claims:
  1. It's unclear whether the average person learns anything from a startup.
  2. It is clear that the average person learns something working in direct employment, and that they almost certainly will make more money working in direct employment (which can fund their later ventures).
I think these two theoretical claims lead to one empirical one:
If you want to start a successful startup, you should work in direct employment first.


Rather than boring you with a narrative, I will just present some choice quotes:

Even a stopped clock is right twice a day

It's interesting to think about what exactly the "people don't learn anything from a startup" hypothesis would look like. If we take the above cited numbers of everyone having a 20% chance of succeeding in a given startup, then even if each success is independent most people will have succeeded at least once by their fourth venture.

So the underlying message that many in the startup community say of "if you keep at it long enough, eventually you will succeed" is still completely true. I just think you could succeed quicker if you go work for someone else first.

But… Anecdata!

I am sure that there are a lot of people who sucked on their first startup, learned a ton, and then crushed it on their second startup. But those people probably also would've sucked at their first year of direct employment, learned a ton, and then crushed it even more when they did start a company.

There are probably people who learn better in a startup environment and you may be one of them, but the odds are against it.

Attribution errors

So if entrepreneurs don't learn anything in their startups, why do very smart people with a ton of experience like Paul Graham think they do? One explanation which has been advanced is the "Fundamental Attribution Error", which refers to "people's tendency to place an undue emphasis on internal characteristics to explain someone else's behavior in a given situation, rather than considering external factors." Wikipedia gives this example:
Subjects read essays for and against Fidel Castro, and were asked to rate the pro-Castro attitudes of the writers. When the subjects believed that the writers freely chose the positions they took (for or against Castro), they naturally rated the people who spoke in favor of Castro as having a more positive attitude towards Castro. However, contradicting Jones and Harris' initial hypothesis, when the subjects were told that the writer's positions were determined by a coin toss, they still rated writers who spoke in favor of Castro as having, on average, a more positive attitude towards Castro than those who spoke against him. In other words, the subjects were unable to properly see the influence of the situational constraints placed upon the writers; they could not refrain from attributing sincere belief to the writers.
Even in the extreme circumstance where people are explicitly told that an actor's performance is solely due to luck, they still believe that there must've been some internal characteristic involved. In the noisy world of startups where great ideas fail and bad ideas succeed it's no surprise that people greatly overestimate the effect of "skill". Baum and Silverman found that:
VCs... appear to make a common attribution error overemphasizing startups’ human capital when making their investment decisions. - Picking winners or building them? Alliance, intellectual, and human capital as selection criteria in venture financing and performance of biotechnology startups
And if venture capitalists, who sole job consists of figuring out which startups will succeed, regularly make these errors then imagine how much worse it must be for the rest of us.

(It also doesn't bode well for this essay – I'm sure that even after reading all the evidence I cited most readers will still attribute their startup heros' success to said heroes' skill, intelligence and perseverance.)


I wrote this because I've become annoyed with the "just do it" mentality of so many entrepreneurs who spout some perversion of Lean Startup methods at me. Yes, doing experiments is awesome but learning from people who have already done those experiments is usually far more efficient. (Academics joke that "a month in the lab can save you an hour in the library.")

If you just think a startup will be fun then by all means go ahead and start something from your dorm room. But if you really want to be successful then consider apprenticing yourself to someone else for a couple years first.

(NB: I am the founder of a company which I started after eight years of direct employment.)

Works cited 

  • Baum, Joel AC, and Brian S. Silverman. "Picking winners or building them? Alliance, intellectual, and human capital as selection criteria in venture financing and performance of biotechnology startups." Journal of business venturing 19.3 (2004): 411-436.
  • Gompers, Paul, et al. Skill vs. luck in entrepreneurship and venture capital: Evidence from serial entrepreneurs. No. w12592. National Bureau of Economic Research, 2006.
  • Kaiser, Ulrich, and Nikolaj Malchow-Møller. "Is self-employment really a bad experience?: The effects of previous self-employment on subsequent wage-employment wages." Journal of Business Venturing 26.5 (2011): 572-588.
  • Song, M., Podoynitsyna, K., Van Der Bij, H. and Halman, J. I. M. (2008), Success Factors in New Ventures: A Meta-analysis. Journal of Product Innovation Management, 25: 7–27. doi: 10.1111/j.1540-5885.2007.00280.x
  • Also see Pablo's comment below