Monday, October 31, 2016

Why AlghaGo?

why alpha go is really such a big deal

Knight=bishop= 3 pawns, rook-5 pawns, queen= 9 pawns, king =$\infty$ pawns
The notionof value is crucial in computer chess. The goal is for the program to find a sequence of moves that maximizes the final vlaue of the program’s board position, no what what the opponent do.
Ideas like this(a pawn blocking the rook devalue the rook) depend on detailed knowledge of chess and were curcial to deep blue’s success.
What happends if you apply this strategy to Go? .. Top Go players use a lot of intuition in juding how good a particualr board position is. And it’s not immediately clear how to express this intuition in simple, well-defined systems like the valuation of chess pieces. In 2006, Monte Carlo tree search algorithms was introduced, based on a clever wayof randomly simulating games.But it still fell far short of human player.
The mechanics behind AlphaGo is published in Nature in Jan. 2016.
AlphaGo learned in 2 stages:
  1. AlphaGo was trained by 150 k games played by good human players(6~9 dan), and used an artificial neural network to find patterns in those games. It learned to predict with high probability what move a human player would take in any given position.
  2. Improve the neural network by repeatedly playing it against earlier version of itself, adjusting the network so it gradually improved its chance of winning.
The neural network is a very complicated mathematical model, with millions of parameters to tune. When the network learned, it kept making tiny adjustments to the parameters in the model, trying to find a way to make corresponding tiny improvements in its play. This sounds like a crazy strategy—repeatedly tiny tweaks to enormously complicated function. But if you do this for long enough, with enough computing power, the network gets pretty good. And here’s the strange thing: it gets good for reasons no one really understands, since the improvements are a consequence of billions of tiny adjustments made automatically.
However, the core idea is how to get a valuation of the position. While the valuation system of Deep Blue based on lots of detailed knowledge, Alphago did it by analyzing thousands of prior games and engaing in a lot of self-play. Alphago created a policy network through billions of tiny adjustments, and build a valuation system similar to a good player’s intuition abuout the value of different board positions.
However, neural network have drawbacks. It can be fooled. It needs too many training data than human players.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.