### Summary

It is possible that our universe is infinite in both time
and space. We might therefore reasonably consider the following question: given
some sequences $u = (u_1, u_2,\dots)$ and $u' = (u_1’, u_2’,\dots)$ (where each $u_t$
represents the welfare of persons living at time $t$), how can we tell if $u$
is morally preferable to $u’$?

It has been demonstrated that there is no “reasonable”
ethical algorithm which can compare any two such sequences. Therefore, we want to
look for subsets of sequences which can be compared, and (perhaps retro-justified)
arguments for why these subsets are the only ones which practically matter.

Adam Jonsson has published a preprint of what seems to me to
be the first legitimate such ethical system. He considers the following: suppose
at any time $t$ we are choosing between a finite set of options. We have an
infinite number of times in which we make a choice (giving us an infinite
sequence), but at each time step we have only finitely many choices. (Formally,
he considers Markov Decision Processes.) He has shown that an ethical algorithm he
calls “limit-discounted utilitarianism” (LDU) can compare any two such sequences,
and moreover the outcome of LDU agrees with our ethical intuitions.

This is the first time that (to my knowledge), we have some justification for thinking that a certain algorithm is all we will "practically" need when comparing infinite utility streams.

This is the first time that (to my knowledge), we have some justification for thinking that a certain algorithm is all we will "practically" need when comparing infinite utility streams.

### Limit-discounted Utilitarianism (LDU)

Given $u = (u_1, u_2,\dots)$ and $u' = (u_1’, u_2’,\dots)$ it seems
reasonable to say $u\geq u’$ if

$$\sum_{t = 0} ^ {\infty} (u_t - u_t’) \geq 0$$

Of course, the problem is that this series may not converge
and then it’s unclear which sequence is preferable. A classic example is the
choice between $(0, 1, 0, 1,\dots)$ and $(1, 0, 1, 0,\dots)$. (See the example below.)

Intuitively, we might consider adding a discount factor $0<
\delta< 1$ like this:

$$\sum_{t = 0} ^ {\infty} \delta ^ t (u_t - u_t’) $$

This modified series may converge even though the original
one doesn’t. Of course, this convergence is at the cost of us caring more about
people who are born earlier, which might not endear us to our children.

Therefore, we can take the limit case:

$$\liminf_{\delta\to 1 ^ -} \sum_{t = 0} ^ {\infty} \delta ^
t (u_t - u_t’) $$

This modified summand is what’s used for LDU.

LDU has a number of desirable properties, which are
summarized on page 7 of this
paper by Jonsson and Voorneveld. I won’t go into them much here other than
to say that LDU generally extends our intuitions about what should happen in
the finite case to the infinite one.

#### Example

Suppose we want to compare $u = (1, 0, 1, 0,\dots)$ and $u' = (0, 1, 0, 1,\dots)$. Let's take the standard series:

$$\begin{align}

\sum_{i = 0} ^\infty (u_i - u_i') & = (1-0) + (0-1) + (1-0) + (0-1) +\dots\\

& = 1-1+1-1+\dots\\

& =\sum_{i = 0} ^\infty(-1) ^ i

\end{align}$$

This is Grandi’s series, which famously does not converge under the usual definitions of convergence.

$$\begin{align}

\sum_{i = 0} ^\infty (u_i - u_i') & = (1-0) + (0-1) + (1-0) + (0-1) +\dots\\

& = 1-1+1-1+\dots\\

& =\sum_{i = 0} ^\infty(-1) ^ i

\end{align}$$

This is Grandi’s series, which famously does not converge under the usual definitions of convergence.

LDU though will place in a discount term $\delta$ to get:

$$\sum_{i = 0} ^\infty (-1) ^ i\delta ^ i =\sum_{i = 0}
^\infty (-\delta) ^ i $$

It is clear that this is simply a geometric series, and
we can find its value using the standard formula for geometric series:

$$\sum_{i = 0} ^\infty (-\delta) ^ i = \frac {1} {1+\delta} $$

Taking the limit:

$$\liminf_{\delta\to 1 ^ -}\frac {1} {1+\delta} = 1/2$$

Therefore, the Abel sum of this series is one half, and, since $1/2 > 0$, we have determined that $(1, 0, 1, 0,\dots)$ is better than (morally preferable to) $(0, 1, 0, 1,\dots)$.

This seems kind of intuitive: as you add more and more
terms, the value of the series oscillates between zero and one, so in some
sense the limit of the series is one half.

### Markov Decision Processes (MDP)

Markov Decision Processes, according to Wikipedia,
are:

At each time step, the process is in some state $s$, and the decision maker may choose any action $a$ that is available in state $s$. The process responds at the next time step by randomly moving into a new state $s'$, and giving the decision maker a corresponding reward $R_a(s,s')$.

The probability that the process moves into its new state $s'$ is influenced by the chosen action. Specifically, it is given by the state transition function $P_a(s,s')$. Thus, the next state $s'$ depends on the current state $s$ and the decision maker's action $a$.

At each time step the decision-maker chooses between a
finite number of options, which causes the universe to (probabilistically) move
into one of a finite number of states, giving the decision-maker a (finite)
payoff. By repeating this process an infinite number of times, we can construct
a sequence $u_1, u_2,\dots$ where $u_t$ is the payoff at time $t$.

The set of all sequences generated by a decision-maker who
follows a single, time independent, (i.e. stationary) policy is what is
considered by Jonsson. Crucially, he shows that

**LDU is able to compare any two streams generated by a stationary Markov decision process**. [1]### Why This Matters

My immediate objection upon reading this paper was “of
course if you limit us to only finitely many choices then the problem is soluble
– the entire problem only occurs because we want to examine infinite things!”

After having thought about it more though, I think this is
an important step forward, and MDPs represent an importantly large class of
decision processes.

Even though the universe may be infinite in time and space,
in any time interval there is plausibly only finitely many states I could be in,
e.g. perhaps because there are only finitely many neurons in my brain.

(Someone who knows more about physics than I might be able
to comment on a stronger argument: if locality holds,
then perhaps it is a law of nature that only finitely many things can affect us
within a finite time window?)

Sequences generated by MDPs are therefore plausibly the
only set of sequences a decision-maker may need to practically consider.

### Outstanding Issues

My biggest outstanding concern with modeling our decisions
with an MDP is that the payoffs have to remain constant. It seems likely that,
as we learn more, we will discover that certain states are more or less
valuable than we had previously thought. E.g. we may learn that insects are more
conscious than previously expected, and therefore insect suffering affects our
payoffs more highly than we had originally thought. It seems like maybe one
could have a “meta-MDP” which somehow models this, but I’m not familiar enough
with the area to say for sure.

A more theoretical question is: what sequences can be
generated via MDPs? My hope is that one day someone will show LDU (or a
similarly intuitive algorithm) can compare any two computable sequences, but I
don’t think that this is that proof.

Lastly, we have the standard problems of infinitarian fanaticism
and paralysis. E.g. even if our current best model of the universe predicted
that MDP was exactly correct, there would still be some positive probability
that it was wrong and then our “meta-decision procedure” is unclear.

### Conclusion

*I would like to thank Adam Jonsson for discussing this with me. I have done my best to represent LDU, but any errors in the above are mine. Notably, the justification for why MDP's are all we need to consider is entirely mine, and I'm not sure what Adam thinks about it.*

1. This is not explicitly stated in Jonsson's paper, but it follows from the proof of theorem 1. Jonsson confirmed this in email discussions with me.

Hey! Just found your infinite ethics posts--neat stuff! Might be obvious, but wrt moral uncertainty, if you're willing to allow only finitely many moral positions M then you can just use the state space M x S and the theorem still holds.

ReplyDeleteThis does add the subtlety that your policy pi now has to include prescriptions for dealing with moral uncertainty, but it seems like a good policy should. Having only finite ethical perspectives is wonkier imo, but if you're okay with finite brainstates then maybe similar handwaves apply.

Thanks for sharing this wonderful blog with us. This is more helpful for find the value of

ReplyDeleteR&D incentive to the mixapproach.