Why did AlphaGo lose its Go game?

We know what Lee's strategy was during the game, and it seems like the sort of thing that should work. Here's an article explaining it. Short version: yes, we know what went wrong, but probably not how to fix it yet.

Basically, AlphaGo is good at making lots of small decisions well, and managing risk and uncertainty better than humans can. One of the things that's surprising about it relative to previous bots that play Go is how good it was at tactical fights; in previous games, Lee had built a position that AlphaGo needed to attack, and then AlphaGo successfully attacked it.

So in this game, Lee played the reverse strategy. Instead of trying to win many different influence battles, where AlphaGo had already shown it was stronger than him, he would set up one critical battle (incurring minor losses along the way), and then defeat it there, with ripple events that would settle the match in his favor.

So what's the weakness of AlphaGo that allowed that to work? As I understand it, this is a fundamental limitation of Monte Carlo Tree Search (MCTS). MCTS works by randomly sampling game trees and averaging them; if 70% of games from a particular position go well and 30% of games from another position go well, then you should probably play the first move instead of the second move.

But when there's a specific sequence of plays that go well--if, say, W has a path that requires them playing exactly the right stone each time, but B has no possible response to this path--then MCTS breaks down, because you can only find that narrow path through minimax reasoning, and moving from the slower minimax reasoning to the faster MCTS is one of the big reasons why bots are better now than they were in the past.

It's unclear how to get around this. There may be a way to notice this sort of threat, and then temporarily switch from MCTS reasoning to minimax reasoning, or to keep around particular trajectories in memory for consideration in future plays.

Why did AlphaGo lose its Go game?

1 Answer

Write your answer here