Big Advance in Infinite Ethics


Summary

It is possible that our universe is infinite in both time and space. We might therefore reasonably consider the following question: given some sequences $u = (u_1, u_2,\dots)$ and $u' = (u_1’, u_2’,\dots)$ (where each $u_t$ represents the welfare of persons living at time $t$), how can we tell if $u$ is morally preferable to $u’$?

It has been demonstrated that there is no “reasonable” ethical algorithm which can compare any two such sequences. Therefore, we want to look for subsets of sequences which can be compared, and (perhaps retro-justified) arguments for why these subsets are the only ones which practically matter.

Adam Jonsson has published a preprint of what seems to me to be the first legitimate such ethical system. He considers the following: suppose at any time $t$ we are choosing between a finite set of options. We have an infinite number of times in which we make a choice (giving us an infinite sequence), but at each time step we have only finitely many choices. (Formally, he considers Markov Decision Processes.) He has shown that an ethical algorithm he calls “limit-discounted utilitarianism” (LDU) can compare any two such sequences, and moreover the outcome of LDU agrees with our ethical intuitions.

This is the first time that (to my knowledge), we have some justification for thinking that a certain algorithm is all we will "practically" need when comparing infinite utility streams.

Limit-discounted Utilitarianism (LDU)

Given $u = (u_1, u_2,\dots)$ and $u' = (u_1’, u_2’,\dots)$ it seems reasonable to say $u\geq u’$ if
$$\sum_{t = 0} ^ {\infty} (u_t - u_t’) \geq 0$$
Of course, the problem is that this series may not converge and then it’s unclear which sequence is preferable. A classic example is the choice between $(0, 1, 0, 1,\dots)$ and $(1, 0, 1, 0,\dots)$. (See the example below.) 

LDU handles this by using Abel summation. Here is a rough explanation of how that works. 

Intuitively, we might consider adding a discount factor $0< \delta< 1$ like this:
$$\sum_{t = 0} ^ {\infty} \delta ^ t (u_t - u_t’) $$
This modified series may converge even though the original one doesn’t. Of course, this convergence is at the cost of us caring more about people who are born earlier, which might not endear us to our children.

Therefore, we can take the limit case:
$$\liminf_{\delta\to 1 ^ -} \sum_{t = 0} ^ {\infty} \delta ^ t (u_t - u_t’) $$
This modified summand is what’s used for LDU.

LDU has a number of desirable properties, which are summarized on page 7 of this paper by Jonsson and Voorneveld. I won’t go into them much here other than to say that LDU generally extends our intuitions about what should happen in the finite case to the infinite one.

Example

Suppose we want to compare $u = (1, 0, 1, 0,\dots)$ and $u' = (0, 1, 0, 1,\dots)$. Let's take the standard series:
$$\begin{align}
\sum_{i = 0} ^\infty (u_i - u_i') & = (1-0) + (0-1) + (1-0) + (0-1) +\dots\\
& = 1-1+1-1+\dots\\
& =\sum_{i = 0} ^\infty(-1) ^ i
\end{align}$$
This is Grandi’s series, which famously does not converge under the usual definitions of convergence.

LDU though will place in a discount term $\delta$ to get:
$$\sum_{i = 0} ^\infty (-1) ^ i\delta ^ i =\sum_{i = 0} ^\infty (-\delta) ^ i $$
It is clear that this is simply a geometric series, and we can find its value using the standard formula for geometric series:
$$\sum_{i = 0} ^\infty (-\delta) ^ i = \frac {1} {1+\delta}  $$
Taking the limit:
$$\liminf_{\delta\to 1 ^ -}\frac {1} {1+\delta}  = 1/2$$
Therefore, the Abel sum of this series is one half, and, since $1/2 > 0$, we have determined that $(1, 0, 1, 0,\dots)$ is better than (morally preferable to) $(0, 1, 0, 1,\dots)$.


This seems kind of intuitive: as you add more and more terms, the value of the series oscillates between zero and one, so in some sense the limit of the series is one half.


Markov Decision Processes (MDP)

Markov Decision Processes, according to Wikipedia, are:
At each time step, the process is in some state $s$, and the decision maker may choose any action $a$ that is available in state $s$.  The process responds at the next time step by randomly moving into a new state $s'$, and giving the decision maker a corresponding reward $R_a(s,s')$.
The probability that the process moves into its new state $s'$ is influenced by the chosen action.  Specifically, it is given by the state transition function $P_a(s,s')$.  Thus, the next state $s'$ depends on the current state $s$ and the decision maker's action $a$.
At each time step the decision-maker chooses between a finite number of options, which causes the universe to (probabilistically) move into one of a finite number of states, giving the decision-maker a (finite) payoff. By repeating this process an infinite number of times, we can construct a sequence $u_1, u_2,\dots$ where $u_t$ is the payoff at time $t$.

The set of all sequences generated by a decision-maker who follows a single, time independent, (i.e. stationary) policy is what is considered by Jonsson. Crucially, he shows that LDU is able to compare any two streams generated by a stationary Markov decision process. [1] 

Why This Matters

My immediate objection upon reading this paper was “of course if you limit us to only finitely many choices then the problem is soluble – the entire problem only occurs because we want to examine infinite things!”

After having thought about it more though, I think this is an important step forward, and MDPs represent an importantly large class of decision processes.

Even though the universe may be infinite in time and space, in any time interval there is plausibly only finitely many states I could be in, e.g. perhaps because there are only finitely many neurons in my brain.

(Someone who knows more about physics than I might be able to comment on a stronger argument: if locality holds, then perhaps it is a law of nature that only finitely many things can affect us within a finite time window?)

Sequences generated by MDPs are therefore plausibly the only set of sequences a decision-maker may need to practically consider.

Outstanding Issues

My biggest outstanding concern with modeling our decisions with an MDP is that the payoffs have to remain constant. It seems likely that, as we learn more, we will discover that certain states are more or less valuable than we had previously thought. E.g. we may learn that insects are more conscious than previously expected, and therefore insect suffering affects our payoffs more highly than we had originally thought. It seems like maybe one could have a “meta-MDP” which somehow models this, but I’m not familiar enough with the area to say for sure.

A more theoretical question is: what sequences can be generated via MDPs? My hope is that one day someone will show LDU (or a similarly intuitive algorithm) can compare any two computable sequences, but I don’t think that this is that proof.

Lastly, we have the standard problems of infinitarian fanaticism and paralysis. E.g. even if our current best model of the universe predicted that MDP was exactly correct, there would still be some positive probability that it was wrong and then our “meta-decision procedure” is unclear.

Conclusion

Overall, I don't think that this completely solves the questions with comparing infinite utility streams, but it's a large step forward. Previous algorithms like the overtaking criterion had fairly "obvious" incomparable streams, with no real justification for why those streams would not be encountered by a decision-maker. LDU is not complete, but we at least have some reason to think that it may be all we "practically" need.

I would like to thank Adam Jonsson for discussing this with me. I have done my best to represent LDU, but any errors in the above are mine. Notably, the justification for why MDP's are all we need to consider is entirely mine, and I'm not sure what Adam thinks about it.

1. This is not explicitly stated in Jonsson's paper, but it follows from the proof of theorem 1. Jonsson confirmed this in email discussions with me.

No comments:

Post a Comment