Monday, July 10, 2017

Artificial intelligence ND A2, isolation

15 Lessons, including 4 Projects:
  1. sudoku game
  2. isolation game: due in 2017.5.15
  3. planning: due in 2017.6.12
  4. hidden Markov models: due in 2017.7.10
This project 2 is much more difficult than I originally expected. But once I accomplish it, I feel so great. I really enjoy the beauty of the minimax algorithm and alpha-beta pruning!

5 Intro to AI

Examples of AI application:
  • Navigation
  • tic tac toe game
  • Monty Hall problem: prior vs posterior probability
We often tend to use the word intelligence to describe things that we don’t know how to explain. Once we can explain them, we just say that they’re algorithms or formulas. They’re only intelligent if we can’t explain them.
We consider ourselves intelligent because right now we don’t fully understand how our brain work.
Equating intelligence with human behavior will pretty much rule out all artificial systems. Besides, even humans are not great at all kinds of tasks.
Intelligence should be defined within the context of a task.
  • agent (only the software part)
  • environment (including the physical body of the agent: sensors, motors)
cognition is perception and actions.
Environment states:
  • fully observable vs partially observable
  • deterministic vs stochastic
  • discrete vs continuous
  • benign vs adversarial
Bounded Optimality. An intelligent agent is one that takes actions to maximize its expected utility given the desired goal. Given these constraints, we can not expect an agent to always behave optimally. But we can come up with a level of performance or bound that we desire the agent to meet.

6 Intro to Game playing

Course teachers:
  • Peter Norvig: search and logical planning
  • Sebastian: probability and Baynes nets
  • Thad Starner: the rest
Main topics:
  • Adversarial search
  • minimax algorithm
  • alpha-beta pruning
  • evaluation function
  • isolation game
  • multi-player, probabilistic

7 isolation game

depth limit search.
  • limit time to 2 seconds per move. If the computer speed is 2GHz, and the average branching factor is 8, then we have a depth limit of 9.
I always like to test my assumptions because they are often wrong.
quiescent search
24-27: search depth=3. The first moves on the edge have the best score.
28-29: search depth=2. The first moves on the center have the best score.
A state of quiescent: the recommended branches are not changing much.
iterative deepening means computer play always has an answer ready in case it runs out of time and it can search as far as possible within its time constraints.
It doesn’t waste that much time. Because of the exponential nature of the problem, the amount of time is dominated by the last level search.
  • If we limit the total time a player can take for the entire game, a player will want to search deeper in some parts of the game and shallower in other parts of the game.
  • A strategy for how deep we want to search: standard initial moves for the beginning of the game, then search deeper in the middle, use less time toward to the end.
  • 6.Horizon Effect: the computer may not see deep enough, goes to the local optimum.
change evaluation function: Number of my move - 2*Number of an opponent move
Shelly: the only way to really know is to try lots of variants of evaluation functions and see which ones are the best.
The strategy for an adversarial game is min-max algorithm, which has a run time of branch^{depth}
Alpha-Beta Pruning:
  • our goal is to play the game, not keep a list of all the equally good moves on a given level. It means we don’t need to look at most of the nodes.
  • run time is branch^{depth/2} if the nodes are ordered optimally with the best moves being first, or branch^{0.75depth} for random ordering.
Many problems in artificial intelligence are exponential in time or space. In fact, NP hard problems are so common in AI that researchers joke that AI is the study of finding clever hacks to exponential problems. Once the solution is found or computer is fast enough to address that particular problem, the world no longer thinks the problem belongs to artificial intelligence.
AI consists of all the NP hard problems that have not yet been solved.
What is exciting about AI is you get a chance to work on everyday problems that can improve people’s lives.
searching complex games: reading AIMA 5.3-5.4
tricks for 5x5 isolation:
  • 2nd player can always win
  • 1st play’s best strategy is to place the center and then play reflection
  • there are 8 positions you can’t play reflection.


starting code breakdown: 1 class Board, 18 class methods.
Note the board index is numbered vertically.
class Board(Player p1,Player p2, int width=7, int height=7)
    vector<mix> _board_state(width*height+3,0); // reserve 3 for player state
bool is_winnner(Player p)
    return (p==inactive_player && active player no legal move);
bool is_loser(Player p)
    return (p==active player && has no legal move);
int get_legal_moves(Player p)
    return __get_moves(get_player_location(p));
array<int,2> get_player_location(Player p)
  int index = ...; //identify player and the move
  return (index%height, index/height);
vector<array<int>> __get_moves(array<int,2> loc)
  // propose 8 positions for L-shaped motion, like a knight
  // only return the legal one
bool move_is_legal(array<int,2> move)
  int idx = move[0] + move[1] * height;
   return (0 <= move[0] < height && 0 <= move[1] < width && _board_state[idx] == None);
void apply_move(array<int,2>)
  int idx = move[0] + move[1] * height;
 // add board state, who's last player, last player's move
// swap active_player and inactive player
tuple<Player,vector<array<int,2>>,string> play(time_limit=150)
    curr_move=_active_player.get_move(game, time_left);
    move_end = time_left();
    if move_end < 0:
      return _inactive_player, move_history, "timeout";
    if curr_move not in legal_player_moves:
      if len(legal_player_moves) > 0:
          return _inactive_player, move_history, "forfeit";
        return _inactive_player, move_history, "illegal move"; 4 methods, 3 class: RandomPlayer, GreedyPlayer, HumanPlayer, 1 main method
float null_score(Board game, Player p)
    if (p win) return float("inf");
    else if (lose) return float("-inf"); 
    else return 0.;
float open_move_score(Board game, Player p)
  return game.get_legal_moves(p).size();
float improved_score(Board game, Player p)
  int own_moves = game.get_legal_moves(p).size();
  if (p == _active_player)    int opp_moves = game.get_legal_moves(_inactive_player).size();
    else int opp_moves = game.get_legal_moves(_active_player).size();      
    return float(own_moves - opp_moves);
float center_score(Board game, Player p)   
      int w= game.width / 2., h =game.height / 2.;
    int y, x = game.get_player_location(player);
    return float((h - y)**2 + (w - x)**2);
class RandomPlayer()
  array<int,2> get_move(Board game, Player p)
    auto legal_moves = game.get_legal_moves();
    if (legal_moves.size()==0) return make_tuple(-1,-1);
    return legal_moves[randint(0, len(legal_moves) - 1)];
class GreedyPlayer()
  array<int,2> get_move(Board game, Player p)
       legal_moves = game.get_legal_moves()
        _, move = max([(self.score(game.forecast_move(m), self), m) for m in legal_moves]) // ? 
        return move;
class HumanPlayer()
  array<int,2> get_move(Board game, Player p)
    legal_moves = game.get_legal_moves()
    // print board state and possible move, manually choose
int main()
    //RandomPlayer and GreedyPlayer play against each other 3 methods, 1 main method
Agent = namedtuple("Agent", ["player", "name"])
int main()
  array<Agent,4> test_agents;
  array<Agent,7> cpu_agents;
    return 0;
array<int,2> play_round(cpu_agent, test_agents, map<Agent,int> wins, int num_matches)
     // each round is num_matches*2*len(test_agents) games  
     return {timeout_count, forfeit_count};                  
int update(map total_wins, map new_wins)
    // absorb new wins into total wins
      return total_wins; // no need to return if use &
void play_matches(cpu_agents, test_agents, num_matches) 3 custom_score method, 3 class: IsolationPlayer, MinimaxPlayer, AlphaBetaPlayer.
  • dirty jobs are done by get_move of individual class.
  • RandomPlayer and GreedyPlayer from only deal with the first level search, so there is no timeout issue.
  • For depth-limited search, we have to check the time before going to the next level. TIME_LIMIT_MILLIS is set to 150 in time_left function is lambda defined in Board.playmethod and passed into Agent.get_move. In parent class IsolationPlayer, TIMER_THRESHOLD is set to 10, so an answer should be returned if time_left is smaller than this value.
function imbeded relation
play_matches(cpu_agents, test_agents, NUM_MATCHES):
    play_round(cpu_agent, test_agents, win, NUM_MATCHES):
                player.minimax(game, depth):