|
Answer to question #14
You have 52 playing cards (26 red, 26 black). You draw
cards one by one. A red card pays you a dollar. A black one fines you a dollar.
You can stop any time you want. Cards are not returned to the deck after being
drawn. What is the optimal stopping rule in terms of maximizing expected payoff?
Also, what is the expected payoff following this optimal rule?
The solution to this problem is, in my
opinion the most difficult to understand of all the puzzles. Indeed I
was unable to solve it and didn't receive a complete solution until two
years after originally posting it. The final solution, in the form of the spreadsheet was sent to me by Han Zheng.
For this reason I have left on the page
the thoughts i had before I had the final solution as they represent an
easier to understand and more simplistic approach. Also the reasoning
may help you arrive at the final solution by yourself or help you
understand it. I would
recommend reading that answer before you dive into the full answer. But
an important thing to note are that as the player we can't lose this
game as we can gamble till all the cards are drawn and our net position
is zero.
From our earlier
analysis it is clear we need a dynamic quit rule. A singal value is not
sufficent. We must, at each stage consider what cards are remaining,
and therefor the probability of a positive or negative outcome from drawing again.
For the explanation
i will ask you first to consider a deck containing only 6 cards, 3 +ve
& 3 -ve (note i'm no longer calling the cards black and red, it
confuses me.)
|
Negative Cards Remaining (N)
|
3
|
2
|
1
|
0
|
+ve
Cards
remaining
(P)
|
3
|
.5 0 .5 |
.4 -1 .6
|
.25 -2 .75 |
0 -3 1
|
2
|
.6 1 .4
|
.5 0 .5
|
1/3 -1 2/3
|
0 -2 1
|
1
|
.75 2 .25
|
2/3 1 1/3
|
.5 0 .5
|
0 -1 1
|
0
|
1 3 0
|
1 2 0
|
1 1 0
|
0
|
Things to note about the table:
- We start the game with 3 of each card
remaining ie the top left
- The number in black represents our cash position at that point in the game
- the red number represents the fractional chance of us moving right ie drawing a -ve card
- the number in green represents the fractional chance of us moving down ie drawing a +ve card
The probability of moving down is given by P/(N+P),
where N+P gives the total number of remaining cards, the probability of
moving right is N/(N+P) and the cash value of a cell is N-P.
There are some cells we can intuitively speculate as to our course of action without too much trouble:
If
we are in cell 3,0 (ie 3 -ve remaining and 0 +ve remaining) we have
100% chance of negatively affecting our position so we "stick"
In cell 0,1 we have 100% chance of improving our position, indeed this is true of all the cells in the 0 column
Now consider the cell 1,1 (as an example,) from here we have a 50% chance of improving our
position to +1 and a 50% chance of harming it to -1. So the weighted
outcome of a gamble might appear to be even. However, if we were to win
from this gamble we would quit on +1 in cell 1,0, but if we were to
lose this gamble we would "gamble" again from cell 0,1.
This means we can consider the expected return from cell 0,1 to be 0 and from 1,0 to be 1
To calculate the expected return from 1,1 we then have 50% of 0 and 50% of 1 ie 0.5
We can now calculate the expected return from 2,1 it's 2/3 moving lright
plus 1/3 moving down or 2/3 of cell 1,1 plus 1/3 of cell 2,0 which
calculates as (2/3)*.5 + (1/3)*2 = 1
(note we used the expected return for (1,1)
that we calculated earlier.)
using the same procedure and our expected return from 2,1 we can do the same for cell 3,1
expected return is given by .25 * cell(3,0) + .75 * cell(2,1) = .25 * 3 + .75 * 1 = 1.5
So
we have that the expected return from cell 3,1 is 1.5. But in cell 3,1
we are holding 2 dollars in our hand, so gambling would not be in
our interest, as it would harm our expected return.
This is our quit rule!!!!
We quit when the mathematical expected return of gambling is less than the cash we are currently holding.
We calculate the expected return of gambling as being the probability
weighted outcome of the winning and losing cells, where the value
assigned to these cells is inturn their expected return.
This approach can be extented to a 52 card deck.
The spread sheet below does this calculation, the value in each cell is
the maximum of either what we are currently holding or the expected
return of a gamble. If the former is the maximum then we quit.....
My original Solution.....
Firstly I must concede I have nothing
approaching a complete solution for this, I've consulted many
Mathematicians and am still working on a complete solution. However, i
have been getting e-mails about this, so i thought I'd give you what i
have so far....
What do we mean by a stopping rule????
A Stopping rule could be anything, it could be 'I'll stop after just one game', 'I'll wont play the game...' etc...
In this section we will consider the stopping rule:
I will quit when I get 'x' Dollars ahead.
Where we will have to determine x.
The first thing to note:
We
can't lose!!!!! Since there are 26 black cards and 26 red if we are
behind we can just keep playing until the end and we will be even, not
up, not down.
My
first thought was that a good quit stratergy given this fact would be
to simply quit when we get ahead. This however does not yield the
maximum profit.
That is to say x is not equal to 1.
To determine that value of x i have written a simple computer program. It will allow you to enter various values for x
then use this value to play the game 1,000,000 times, (takes <
a minute on a newish computer.) You will be able to see what your
profit is as the game progresses....

(Click to download - 33k)
The source code for this program is available here, it's written in QBasic.
So we now have a value for x, and and expected return. Incase
you didn't do it yourself, you can highlight the line below for the
answer. which for a large part is the solution to this problem.
x = 4, expected return ~= 1.0917
However there may be more we can do:
Further thinking...
A clear limitation of our model is that, what i call the quit value, is static.
consider this:
We are using a value of x of 4, we have just drawn the 51st card, and we are one dollar ahead.
In this case we know that the next card will be such that
we will lose our one dollar. so in this instance we should change
our quit value to 1. Clearly this is the begining of a
model that allows us to change the value of depending on how the deck
is loaded....
I will continue to play with this idea and would welcome any
suggestions for a model along with their expected returns. I'll
happily post the results here.
BACK
|
|