C2C #1: Game Theory

Types of games

The most simple type of game is the one that we built in the setup section. Cards get flipped over and you either win or lose. Not much skill…and not much fun!

Other games like Bingo, Dreidel, and of course Snakes and Ladders can deceptively seem like you have strategic control of the game, but really this is just the “illusion of agency”, since you’re only, for example, rolling dice to randomize your next move. A disturbing fact:

There are three main classes of games that do have significant strategic agency: perfect information, perfect information with randomness, and imperfect (hidden) information. Perfect information means all game information is visible, while imperfect information meanas there is hidden/private information.

We can also think about games along the axis of fixed or random opponents and adversarial opponents.

(There are also other game classes like sequential vs. simultaneous action games.)

Game/Opponent	Fixed/Random Opponent	Adversarial Opponent
No Player Agency/Decisions	Candy Land, War, Dreidel, Bingo, Snakes and Ladders	Blackjack dealer
Perfect Info	Puzzles, Rubik’s Cube	Tictactoe, Checkers, Chess, Arimaa, Go, Connect Four
Perfect Info with Randomness	Monopoly, Risk	Backgammon
Imperfect Info	Wordle, Blackjack	Poker, Rock Paper Scissors, Figgie, StarCraft

Perfect information games

Perfect information games like Chess and Go are complex and have been the focus of recent AI research (along with poker). Solving these games can theoretically be done using backward induction, which means starting from possible end positions and working backwards.

Imperfect information games

Here we want to primarily focus on the bottom right of this chart: imperfect information games with an adversarial opponent.

These games cannot be solved in the same way as perfect information games because we don’t know always what true state of the game we are in because of the hidden information (we’ll look at this more in Game Trees).

Imperfect information games are representative of real world situations where information is usually incomplete, e.g. job interviews, investments, dating, and negotiations.

Winning games

The goal of a game is to maximize the utility, which is usually a score or money or some kind of value.

There are two fundamental strategies:

Exploiting opponent weaknesses: Capitalize on specific flaws or tendencies of opponents, while putting yourself at risk if the opponent adapts well.
Being unexploitable: Playing a balanced, theoretically sound strategy, also known as game theory optimal. This does not maximize against weaker opponents, but rather aims to minimize losses against any possible opponent.

Rock Paper Scissors

Rock defeats scissors, scissors defeats paper, and paper defeats rock
You get +1 point for a win, -1 for a loss, and 0 for ties

This simple game was the subject of a DeepMind paper in 2023, where they wrote:

In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We propose a benchmark for multiagent learning based on repeated play of the simple game Rock, Paper, Scissors.

Let’s dig in to why RPS is a more interesting game than it might seem.

Payoff matrix

The core features of a game are its players, the actions of each player, and the payoffs. We can show this for RPS in the below payoff matrix, also known as normal-form.

Player 1/2	Rock	Paper	Scissors
Rock	(0, 0)	(-1, 1)	(1, -1)
Paper	(1, -1)	(0, 0)	(-1, 1)
Scissors	(-1, 1)	(1, -1)	(0, 0)

The payoffs for Player 1 are on the left and for Player 2 on the right in each payoff outcome of the game. For example, the bottom left payoff is when Player 1 plays Scissors and Player 2 plays Rock, resulting in -1 for P1 and +1 for P2.

A strategy says which actions you would take for every state of the game.

Expected value

Expected value in a game represents the average outcome of a strategy if it were repeated many times. It’s calculated by multiplying each possible outcome by its probability of occurrence and then summing these products.

Suppose that Player 1 plays the strategy:

\begin{cases} r = 0.5 \\ p = 0.25 \\ s = 0.25 \end{cases}

and Player 2 plays the strategy:

\begin{cases} r = 0.1 \\ p = 0.3 \\ s = 0.6 \end{cases}

Let’s add these to the matrix:

Player 1/2	Rock (r_2=0.1)	Paper (p_2=0.3)	Scissors (s_2=0.6)
Rock (r_1=0.5)	(0, 0)	(-1, 1)	(1, -1)
Paper (p_1=0.25)	(1, -1)	(0, 0)	(-1, 1)
Scissors (s_1=0.25)	(-1, 1)	(1, -1)	(0, 0)

To simplify, let’s just write the payoffs for Player 1 since payoffs for Player 2 will simply be the opposite:

Player 1/2	Rock (r_2=0.1)	Paper (p_2=0.3)	Scissors (s_2=0.6)
Rock (r_1=0.5)	0	-1	1
Paper (p_1=0.25)	1	0	-1
Scissors (s_1=0.25)	-1	1	0

Now we can multiply the player action strategies together to get a percentage occurrence for each payoff in the matrix:

Player 1/2	Rock (r_2=0.1)	Paper (p_2=0.3)	Scissors (s_2=0.6)
Rock (r_1=0.5)	Val: 0 Pr: 0.5(0.1) = 0.05	Val: -1 Pr: 0.5(0.3) = 0.15	Val: 1 Pr: 0.5(0.6) = 0.3
Paper (p_1=0.25)	Val: 1 Pr: 0.25(0.1) = 0.025	Val: 0 Pr: 0.25(0.3) = 0.075	Val: -1 Pr: 0.25(0.6) = .15
Scissors (s_1=0.25)	Val: -1 Pr: 0.25(0.1) = 0.025	Val: 1 Pr: 0.25(0.3) = 0.075	Val: 0 Pr: 0.25(0.6) = .15

Note that the total probabilities sum to 1 and each row and column sums to the probability of playing that row or column.

We can work out the expected value of the game to Player 1 (summing all payoffs multiplied by probabilities from top left to bottom right):

\mathbb{E}[P_1] = 0(0.05) + -1(0.15) + 1(0.3) + 1(0.025) + 0(0.075) + -1(0.15) + -1(0.025) + 1(0.075) + 0(0.15) = 0.075

Therefore P1 is expected to gain 0.075 per game given these strategies. Since payoffs are reversed for P2, P2’s expectation is -0.075 per game.

Zero-sum

We see in the matrix that every payoff is zero-sum, i.e. the sum of the payoffs to both players is 0. This means the game is one of pure competition. Any amount P1 wins is from P2 and vice versa.

Nash equilibrium

A Nash equilibrium means that no player can improve their expected payoff by unilaterally changing their strategy. That is, changing one’s strategy can only result in the same or worse payoff (assuming the other player does not change).

In RPS, the Nash equilibrium strategy is to play each action r = p = s = 1/3 of the time. I.e., to play totally randomly.

Played a combination of strategies is called a mixed strategy, as opposed to a pure strategy, which would select only one action. Mixed strategies are useful in games of imperfect information because it’s valuable to not be predictable and to conceal your private information. In perfect information games, the theoretically optimal play would not contain any mixing (i.e., if you could calculate all possible moves to the end of the game).

The equilibrium RPS strategy is worked out below:

Nash equilibrium strategy for RPS

Player 1/2	Rock	Paper	Scissors
Rock (r)	(0, 0)	(-1, 1)	(1, -1)
Paper (p)	(1, -1)	(0, 0)	(-1, 1)
Scissors (s)	(-1, 1)	(1, -1)	(0, 0)

If Player 1 plays Rock with probability r, Paper with probability p, and Scissors with probability s, we have the following expected value equations for Player 2:

\mathbb{E}[P_2(R)] = -1p + 1s

\mathbb{E}[P_2(P)] = 1r - 1s

\mathbb{E}[P_2(S)] = -1r + 1p

(To compute each of these, sum the payoffs for each column with P2 payoffs and P1 probabilities. P2 payoffs because these are the expected values for P2 and P1 probabiltiies because the payoffs depend on the strategy probabilties of P1 against each of P2’s actions.)

Since no action dominates, we know that the EV of every strategic action should be equal (since if a certain strategy was best, we’d want to always play that strategy).

To solve for r, p, and s, we can start by setting these EVs equal:

\mathbb{E}[P_2(R)] = \mathbb[P_2(P)]

-1p + 1s = 1r - 1s

2s = p + r

Then setting these equal:

\mathbb{E}[P_2(R)] = \mathbb{E}[P_2(S)]

-1p + 1s = -1r + 1p

s + r = 2p

And finally setting these equal:

\mathbb{E}[P_2(P)] = \mathbb{E}[P_2(S)]

1r - 1s = -1r + 1p

2r = s + p

Now we have these equations:

\begin{cases} 2s = p + r \\ s + r = 2p \\ 2r = s + p \end{cases}

We can rewrite the 1st:

r = 2s - p

And combine with the 2nd:

s + (2s - p) = 2p

3s = 3p

Resulting in:

s = p

Now we can go back to the 2nd equation:

s + r = 2p

And insert s = p:

s + r = 2s

And arrive at:

r = s

We now see that all are equal:

s = p = r

We also know that they must all sum to 1:

r + p + s = 1

Since they’re all equal and sum to 1, we can substitute p and s with r:

3r = 1

r = 1/3

So all actions are taken with probability 1/3:

r = p = s = 1/3 \quad \blacksquare

By symmetry, the same equilibrium strategy is true for Player 2.

Playing this strategy means that whatever your opponent does, you will breakeven! For example, think about an opponent that always plays Rock.

\begin{equation} \begin{split} \mathbb{E}(\text{Equilibrium vs. Rock}) &= 0(r) + 1(p) + -1(s) \\ &= 0(1/3) + 1(1/3) + -1(1/3) \\ &= 0 \end{split} \end{equation}

How about the case of the opponent playing 60% Rock, 20% Paper, 20% Scissors?

\begin{equation} \begin{split} \mathbb{E}(\text{Equilibrium vs. 622}) &= 0.6(\text{Equilibrium vs. Rock}) \\ &\quad{}+ 0.2(\text{Equilibrium vs. Paper}) \\ &\quad{}+ 0.2(\text{Equilibrium vs. Scissors}) \\ &= 0.6(0) + 0.2(0) + 0.2(0) \\ &= 0 \end{split} \end{equation}

The random equilibrium strategy will result in 0 against any pure strategy and any combination of strategies including 622 and the opponent playing the random strategy.

Exploiting vs. Nash

The equilibrium strategy vs. a pure Rock opponent is a useful illustration of the limitations of playing at equilibrium. The Rock opponent is playing the worst possible strategy, yet equilibrium is still breaking even!

What’s the best that we could do against Rock only? We could play purely paper. The payoffs are written for playing Paper and the probabilities indicate the opponent playing only Rock.

\begin{equation} \begin{split} \mathbb{E}(\text{Paper vs. Rock}) &= 1(r) + 0(p) + -1(s) \\ &= 1(1) + 0(0) + -1(0) \\ &= 1 \end{split} \end{equation}

We’d win 1 each game playing Paper vs. Rock.

How about against the opponent playing 60% Rock, 20% Paper, 20% Scissors? Here we can see that because they are overplaying Rock, our best strategy is again to always play Paper. We write the payoffs for playing Paper and the probabilities according to the 622 strategy.

\begin{equation} \begin{split} \mathbb{E}(\text{Paper vs. 622}) &= 1(r) + 0(p) + -1(s) \\ &= 1(0.6) + 0(0.2) + -1(0.2) \\ &= 0.6 + 0 - 0.2 \\ &= 0.4 \end{split} \end{equation}

Playing Paper vs. 622 results in an expected win of 0.4 per game.

Soccer Kicker vs. Goalie

Consider the Soccer Penalty Kick game where a Kicker is trying to score a goal and the Goalie is trying to block it.

Kicker/Goalie	Lean Left	Lean Right
Kick Left	(0, 0)	(2, -2)
Kick Right	(1, -1)	(0, 0)

The game setup is zero-sum. If Kicker and Goalie both go in one direction, then it’s assumed that the goal will miss and both get 0 payoffs. If the Kicker plays Kick Right when the Goalie plays Lean Left, then the Kicker is favored and gets a payoff of +1. If the Kicker plays Kick Left when the Goalie plays Lean Right, then the kicker is even more favored, because it’s easier to kick left than right, and gets +2.

Nash equilibrium?

Which of these, if any, is a Nash equilibrium? You can check by seeing if either player would benefit by changing their action.

Kicker	Goalie	Equilibrium or Change?
Left	Left
Left	Right
Right	Left
Right	Right

Solution

There are no pure Nash equilibrium solutions because when the actions match, the Kicker will always want to change, and when they don’t match, the Goalie will always want to change.

Kicker	Goalie	Equilibrium or Change?
Left	Left	Kicker changes to right
Left	Right	Goalie changes to left
Right	Left	Goalie changes to right
Right	Right	Kicker changes to left

Assume that they both play randomly – Left 50% and Right 50% – what is the expected value of the game?

Solution

Kicker/Goalie	Lean Left (0.5)	Lean Right (0.5)
Kick Left (0.5)	(0, 0)	(2, -2)
Kick Right (0.5)	(1, -1)	(0, 0)

We apply these probabilities to each of the 4 outcomes and write in terms of the Kicker payoffs:

Kicker/Goalie	Lean Left (0.5)	Lean Right (0.5)
Kick Left (0.5)	Val: 0 Pr: 0.25	Val: 2 Pr: 0.25
Kick Right (0.5)	Val: 1 Pr: 0.25	Val: 0 Pr: 0.25

Now for the Kicker, we have \mathbb{E}[K] = 0.25(0) + 0.25(2) + 0.25(1) + 0.25(0) = 0.75.

Since it’s zero-sum, we have \mathbb{E}[G] = -0.75 for the Goalie.

Probabilistic interpretations

Note that, for example, the Kicker playing 50% Left and 50% Right could be interpreted as a single player having these probabilities or a field of players averaging to these probabilities. So out of 100 players, this could mean:

100 players playing 50% Left and 50% Right
50 players playing 100% Left and 50 players playing 100% Right
50 players playing 75% Left/25% Right and 50 players playing 25% Left/75% right

Finding equilibrium

When the Goalie plays left with probability p and right with probability 1-p, we can find the expected value of the Kicker actions.

Kicker/Goalie	Lean Left (p)	Lean Right (1-p)
Kick Left	(0, 0)	(2, -2)
Kick Right	(1, -1)	(0, 0)

\mathbb{E}[K(L)] = p(0) + (1-p)(2) = 2 - 2p

\mathbb{E}[K(R)] = p(1) + (1-p)(0) = p

The Kicker is going to play the best response to the Goalie’s strategy. The Goalie wants to make the Kicker indifferent to Kick Left and Kick Right because if the Kicker was not going to be indifferent, then he would prefer one of the actions, meaning that action would be superior to the other. When you make your opponent indifferent, then you don’t give them any best play.

Therefore the Kicker will play a mixed strategy in response that will result in a Nash equilibrium where neither player benefits from unilaterally changing strategies. (Note that indifferent does not mean 50% each, but means the expected value is the same for each.)

By setting the values equal, we get 2 - 2p = p \Rightarrow p = \frac{2}{3} as shown in the plot.

This means that 1-p = 1 - \frac{2}{3} = \frac{1}{3}.

Therefore the Goalie should play Lean Left (p) \frac{2}{3} and Lean Right (1-p) \frac{1}{3}.

The value for the Kicker is \frac{2}{3}, or (0.67), for both actions, regardless of the Kicker’s mixing strategy.

Note that the Kicker is worse off now (0.67 now compared to 0.75) than when both players played 50% each action. Why?

If the Kicker plays Kick Left with probability q and Kick Right with probability 1-q, then the Goalie’s values are:

\mathbb{E}[G(L)] = 0(q) - 1(1-q) = -1 + q

\mathbb{E}[G(R)] = -2q + 0(1-q) = -2q

Setting equal,

\begin{equation} \begin{split} -1 + q &= -2q \\ -1 &= -3q \\ \frac{1}{3} &= q \end{split} \end{equation}

Therefore the Kicker should play Kick Left (q) \frac{1}{3} and Kick Right (1-q) \frac{2}{3}, giving a value of -\frac{2}{3} to the Goalie.

We can see this from the game table:

Kicker/Goalie	Lean Left (\frac{2}{3})	Lean Right (\frac{1}{3})
Kick Left (\frac{1}{3})	Val: (0, 0) Pr: \frac{2}{9}	Val: (2, -2) Pr: \frac{1}{9}
Kick Right (\frac{2}{3})	Val: (1, -1) Pr: \frac{4}{9}	Val: (0, 0) Pr: \frac{2}{9}

Therefore the expected payoffs in this game are:

\mathbb{E}[K] = \frac{2}{9}(0) + \frac{1}{9}(2) + \frac{4}{9}(1) + \frac{2}{9}(0) = \frac{6}{9} = 0.67 for the Kicker and -0.67 for the Goalie.

Switching out of equilibrium

In equilibrium, both players are playing a best response against each other, giving no incentive to deviate.

Therefore no player can unilaterally improve by changing their strategy. What if the Kicker switches to always Kick Left?

Kicker/Goalie	Lean Left (\frac{2}{3})	Lean Right (\frac{1}{3})
Kick Left (1)	Val: (0, 0) Pr: \frac{2}{3}	Val: (2, -2) Pr: \frac{1}{3}
Kick Right (0)	Val: (1, -1) Pr: 0	Val: (0, 0) Pr: 0

Now the Kicker’s payoff is still \frac{2}{3}(0) + \frac{1}{3}(2) = 0.67.

When a player makes their opponent indifferent, this means that any action the opponent takes (within the set of equilibrium actions) will result in the same payoff! But now they are no longer in equilibrium.

So if you know your opponent is playing the equilibrium strategy, then you can actually do whatever you want with no penalty with the mixing actions. Sort of.

The risk is that the opponent can now deviate from equilibrium and take advantage of your new strategy. For example, if the Goalie caught on and moved to always Lean Left, then the value is reduced to 0 for both players. Though if the Kicker then caught on and moved to always Kick Right, then the value goes up to 1 for the Kicker and -1 for the Goalie. Perhaps after a few cycles like this, the players would go back to equilibrium.

To summarize, you can only be penalized for not playing the equilibrium mixing strategy if your opponent plays a non-equilibrium strategy that exploits your strategy. You can only profit more than the equilibrium strategy if your opponent either plays off equilibrium and you do as well or if you play equilibrium and they play some action that is not part of the equilibrium strategy (also known as the support, or the set of pure strategies that are played with non-zero probability under the mixed strategy, which is not possible in this game since both possible actions are played at equilibrium).

Clairvoyance Poker

Back to poker. We previously built Face-up 3-card Poker where each player was dealt a card from the deck of {0, 1, 2} and the high card won.

Now we’re going to make a small change to the game, while keeping the same deck.

Player 1 will always be dealt card 1.
Player 2 will be dealt randomly from {0, 2}.
P1 knows P2 has either card 0 or 2, but not which. P2 knows that P1 has card 1.
This is representative of a real poker scenario where P1 has a mid-strength hand that can beat bluffs, but loses to value bets by strong hands. P2 has either a bluff (weak hand) or a strong hand.

Let’s modify the code to update the deal:

Update deal_cards function

    def deal_cards(self):
        self.cards[0] = 1
        remaining_cards = [card for card in self.deck if card != self.cards[0]]
        self.cards[1] = random.choice(remaining_cards)

Equity and Variance

Equity is defined as the percentage of the pot expected to win given the cards, assuming no additional betting. We see that in the original case where players were randomly dealt cards and in this case, both players have 50% equity before the cards are dealt. After the cards are dealt, the higher card has 100% equity.

In real poker games like Texas Hold’em, equity is more interesting because there are additional community cards that mean that even if one hand starts off as stronger, it won’t necessarily win. Below we have a special poker hand where the equities are shown as 53% and 47% and the hand currently losing actually has higher equity!

Variance measures the spread of the score differences. Higher variance means more volatility in the outcomes.

We can run a sample of games and get an empirical sample for:

Equity:
Variance
Winrate:

Standard Error: This measures the precision of our sample mean as an estimate of the population mean. Confidence Interval: This gives us a range where we expect the true population mean to lie with 95% confidence.

Betting Rules and Sequences

Let’s make this a real strategic poker game with betting. This game, Clairvoyance Poker, was first shown in The Mathematics of Poker (Bill Chen, Jared Ankenman) and then also in the excellent Play Optimal Poker (Andrew Brokos).

Both players ante 1 chip, so the starting pot is 2
Each player has 1 remaining chip to use for betting
Player 1 acts first and can either Bet 1 or Check (Pass)

Play continues until one of these betting sequences happens, which ends the hand:

Check Check: Hands go to showdown and player with highest card wins 1 (the ante)
Bet Call: Hands go to showdown and player with highest card wins 2 (the ante + bet)
Bet Fold: Player who bets wins 1 (the ante)

Here is a list of all possible betting sequences:

P1	P2	P1	Winner
Bet	Fold		P1 (+1)
Bet	Call		Higher Card (+2)
Check	Check		Higher Card (+1)
Check	Bet	Fold	P2 (+1)
Check	Bet	Call	Higher Card (+2)

Starting Strategies

There are six possible states of the game, two for P1 and four for P2. Let’s hard-code some initial strategies for both players:

P1[1, Starting]: Randomly Bet/Check
P2[0, After Bet]: Fold
P2[2, After Bet]: Call
P2[0, After Check]: Randomly Bet/Check
P2[2, After Check]: Bet
P1[1, After Check and Bet]: Randomly Bet/Check

This strategy randomizes uncertain decisions and takes clearly smart decisions when possible. What does that mean?

P2[0, After Bet]: Fold
- This is the worst card and can’t win
P2[2, After Bet]: Call
- This is the best hand and guarantees a win
P2[2, After Check]: Bet
- This is the best card and betting can only possibly win additional chips

Let’s put the strategies into our code and see how each strategy is doing. There are 9 total state-action pairs bolded below:

P1[1, Starting]: Randomly Bet/Check
- P1[1, Starting]: Bet
- P1[1, Starting]: Check
P2[0, After Bet]: Fold
P2[2, After Bet]: Call
P2[0, After Check]: Randomly Bet/Check
- P2[0, After Check]: Bet
- P2[0, After Check]: Check
P2[2, After Check]: Bet
P1[1, After Check and Bet]: Randomly Call/Fold
- P1[1, After Check and Bet]: Call
- P1[1, After Check and Bet]: Fold

For now, we’ll do this in a simple, but not-so-elegant way.

—> Logic for taking these actions

—> Logic for keeping track of frequency of each situation/action and results for each situation/action (and still output stuff from before)

Dominated Strategy

Note that P1 betting is strictly worse and why –> show math

–> Update logic to always check

P1[1, Starting]: Check
P2[0, After Bet]: Fold
P2[2, After Bet]: Call
P2[0, After Check]: Randomly Bet/Check
- P2[0, After Check]: Bet
- P2[0, After Check]: Check
P2[2, After Check]: Bet
P1[1, After Check and Bet]: Randomly Call/Fold
- P1[1, After Check and Bet]: Call
- P1[1, After Check and Bet]: Fold

Your Strategies

—> Try strat for P1 then for P2 See results

P1[1, Starting]: Check
P2[0, After Bet]: Fold
P2[2, After Bet]: Call
P2[0, After Check]: ??
P2[2, After Check]: Bet
P1[1, After Check and Bet]: ??

Indifference in Poker

There are two strategies for which we can find an equilibrium:

P2[0, After Check]: ??
P1[1, After Check and Bet]: ??

Let’s start by looking at the EVs for P2’s action with card 0 facing a Check.

\mathbb{E}[P2](0,Check) = 0 \mathbb{E}[P2](0,Bet) = 2(\Pr(\text{P1 Fold})) + -1(\Pr(\text{P1 Call}))

These should be equal in order for P2 to be indifferent. Let \Pr(\text{P1 Call}) = p and \Pr(\text{P1 Fold}) = 1-p.

Note that we set the Check action to have an EV of 0 because we are counting any chips already in the pot as already used.

We solve for the P1 strategy that makes P2 indifferent.

\begin{equation} \begin{split} 0 &= 2(1-p) - p \\ 0 &= 2 - 2p - p \\ 3p &= 2 \\ p &= 2/3 \end{split} \end{equation}

Therefore P1 should play $p =

Now let’s look at the EVs for P1’s action with card 1 facing a Bet after Checking.

\mathbb{E}[P1](1,Fold) = 0 \mathbb{E}[P2](1,Call) = 3(\Pr(\text{P2 Card 0 and Bets})) + -1(\Pr(\text{P2 Card 2 and Bets}))

We know that \Pr\text{P2 Card 2 and Bets} = \frac{1}{2}(1) since P2 has the card half the time and bets always with the card.

Let b represent how often P2 bluffs with card 0.

\Pr(\text{P2 Card 0 and Bets}) = (1/2)(b)

Now to equate the two equations above, we have:

0 = 3(1/2)(b) - 1/2 b = 1/3

Therefore P2 will bet 1/3 of the time with card 0 facing a Check, i.e. P2 will bluff 1/3 of the time with the Q.

Note that the indifference equation also accounts for how often P2 is betting with card 2 and we are really thinking about an overall strategy for playing the range of cards {0, 2}.

One additional step is that we can compute the overall ratio of bluffs vs. value bets that P1 is facing.

P2 has card 0 and 2 each \frac{1}{2} of the time.

P2 is betting card 0 \frac{1}{3} of the time (bluffing).

P2 is betting card 2 always (value betting).

Therefore for every 3 times with card 0 you will bet once and for every 3 times with card 2 you will bet 3 times. Out of the 4 bets, 1 of them is a bluff.

\Pr(\text{P2 Bluff after P1 Check}) = \frac{1}{4}

Overall we have this strategy for P2:

2/6: Give up (check) with card 0
1/6: Bluff with card 0
3/6: Value bet with card 2

If P2 bet \frac{2}{3} instead of \frac{1}{3} with card 0 after P1 checks and P1 is playing an equilibrium strategy, how would P2’s expected value change?

Solution

It wouldn’t! As long as P1 doesn’t modify their equilibrium strategy, then P2 can mix his strategy however he wants and have the same EV.

Equity vs. Expected Value

can tell opponent strategy to bet 1/3 0s but can’t stop call more catches more bluffs, but pays off more value bets

Sometimes have to bluff with bad cards because EV higher than if you always folded with bad cards.

Add uncertainty the opponent has to think about in counter-strategy If always folded bad cards, opponent knows you keep playing with good #

exploits

bet all good, check all bad opponent can always fold Then switch to always betting Then switch to awlasy calling Then move to betting strong, checking weak

bluffing combined with value betting is the profitable thing. players who fold too often lose to bluffs, call too often lose value to value bets. betting to bluff and call ratios adepend on pot size

Bonus: Bet Sizing

pot sizing affects strategy pot large etc. –> Add pot size stuff pot odds

1 - bet/(bet+pot) larger bet means call less, let them get away with bluffs the bigger they bet

bluff to value ratio bet/(bet+pot) bluffing frequency goes up as bet size gets larger relative to pot