C2C: About

About Us

We’re excited about learning through games. We developed Cards to Code as a foundational course on building a functional poker bot after running an AI Poker Camp course in San Francisco in summer 2024.

Check out Overbet.ai (under development) and Poker Camp.

We’ll also have more advanced courses available soon.

Why poker?

Let’s split this into two questions.

Why play poker?

Games of incomplete information like poker are most similar to real-life settings. Incomplete information means that there is some hidden information – in poker this is the private opponent cards and, in games like Texas Hold’em, the yet to be revealed board cards.

This means we need to infer our opponents’ hands and understand the probabilities of the cards that we can’t see.

Poker is interesting from a math perspective – probability, expected value, risk, and bankroll management are all important skills.

Also from a psychological perspective – emotional control, reading opponents, and adapting to other players are all valuable.

We learn to make decisions and focus on the quality of the decision, rather than the results.

Why build a poker bot?

Poker has clear rules and structure, so we can apply reinforcement learning where the rewards are the profits in the game. Reinforcement learning, game theory, and Monte Carlo methods are applicable across many domains.

Here we mainly focus on a simplified poker game that can be solved in under a minute, but the same core principles apply for scaled up versions that are much larger and more complex, though they will also involve approximating states of the game.

The richness of the game is an ideal testbed for decision making under uncertainty, probabilistic reasoning, and taking actions that can have both immediate and longer horizon consequences.

We can use the bot solutions to gain insights from the underlying game itself. For example, you’ll see that bots automatically learn to bluff, showing that this is a theoretically correct play and not a “loss leader” just to get more action later.

Beyond building a basic bot, there is a broad area for further research, including agent evaluation, opponent modeling, building an LLM model on top of the traditional poker agent, and intepretability of the agent.

Modern AI agents

LLM-based AI agents are very popular in 2024. The poker bot we’re building here is different and uses the classic AI method of reinforcement learning, where the goal is to maximize long-term reward given feedback from the environment (i.e. maximizing winnings in a poker game).

The poker bot learns how to make optimal decisions given a state of the game by repeatedly playing the game against itself and optimizing the strategy over time.

LLM agents generally use large amounts of text data to understand and generate human-like language based on patterns in the data, which could include strategy in a game like poker.

So is RL still relevant? Yes! OpenAI recently released their o1 chain of thought model:

Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process.

In chain of thought, each step in the reasoning process can be seen as a state with possible actions. RLHF (Reinforcement Learning from Human Feedback) is also a popular technique for LLM agents where a human acts as the reward signal to shape a model’s outputs.

Hybrid approaches seem promising and may be a future direction for poker agents as well.

Inspiration

Some inspiration for this course: