Showing posts with label udactiy. Show all posts
Showing posts with label udactiy. Show all posts

Saturday, August 26, 2017

Artificial Intelligence A3, planning

Note: Due to time constraint, I have to pause AIND at the planning section. If I have time in the future, I may restore my study in the following:
  1. pac-man lab
  2. simulated annealing lab
  3. airport transportation planning project.

Peter Norvig: AI is all about figuring out what to do when you don’t know what to do. Regular programming is writing instructions to make the computer do what you want when you do know what to do. In AI, it’s for when you don’t know. search is one of key areas where we can figure out what to do by planning a sequence of steps, even when we have no idea what the first step should be until we solve the search problem.
  • breadth first search (shortest)
  • uniform cost search (cheapest)
  • depth first search (least storage space, fastest computing time)
Also called best estimated total path cost first
only expanse the minimum value. f = g+h
g(path) = path cost so far
h(path) = h(s) = estimated distance to goal = straigh line distance
g is to keep the path short, h is to keep us focus on the goal.
A search finds the lowest cost path if *h(s) < true cost. h is optimistic.
A problem solver that uses a heuristic function given to it really is not intelligent at all.
crossing out parts of the rules is called generating a relaxed problem.
searching is one of the most fundamental methods of solving problems in AI.
Peter: To me, AI is programming a computer to do the right thing when you don’t know for sure the right thing is.
The book was successful because of a fortuitous accident of timing. we happened to come along just as AI was making this transition from Boolean logic to probability and from handcarfted knowledge to machine learning.

lab: pac man

simulated annealing

do the stupid thing first, and only add intelligence when necessary.
global optimum: an element of randomness is what you need
  • random restart
  • genetic algorithms
  • simulated annealing
taboo search: it is taboo if somewhere you have already been.
stage algorithm: keep a list of all local maximum and try to get the general shape of the problem,try to predict where a new maximum might be, given the areas I haven’t explored yet.
local beam search/ stochastic beam search: keep track of k particles. At each time frame, we randomly generate neighbors of these particles and keep the best ones for the next iteration. There are messages sharing between these particles.
genetic algorithm is a fancy version of beam search that happens to make a nice analogy to biology. some people call it the 2nd best solution to any problem.
We’ll see these ideas repeatedly in aI, in techniques like particle filters, pattern recognition with hidden Markov models, and Monte Carlo Markov chains. Sometimes let them go and embrace randomness is the key to the solution.

lab

Follow the instructions to implement Simulated Annealing from AIMA

constraint satisfaction

techniques:
  • backtracking
  • forward checking
  • minimum remaining values
  • least constraining value
problems with preference constraints are called constraint optimization problems and are with solving linear programming.
Examples:
  • car assembly
  • job scheduling in factories
  • class scheduling
  • spreadsheets
  • transportation scheduling
  • floor planning
Constraint Satisfaction Problems by solving the N-queens problem using symbolic constraints and backtracking search: https://github.com/udacity/AIND-Constraint_Satisfaction
complete the constraints for the N-Queens CSP and implement Backtracking-Search from AIMA (chapter 6)

logic

AI started in the 50s~70s with the idea that what we needed was better language and systems for writing down logical statements. Once we have done enough of them, we should be able to do anything of them. Some progress but 2 major issues:
  1. the world is uncertain. We didn’t get good algorithm to deal with probability until mid 1980s
  2. too laborious to try to write down everything by hand.
Today, logic is still useful in many domains where we can eliminate much of the uncertainty of real world and solve significant problems. Like FedEx planning all the deliveries of its packages.
Expert system.
future of planning algorithms:
  • Learning from examples would be an important area.
  • transfer learning across domain.
  • interactive planning
  • explainable, adaptable.
value iteration is a key concept for Markov decision processing.
  • first order logic: relation , object,function
  • propositional logic: facts
  • probability theory

planning

Blindfold Hiker Test: We can’t just plan ahead and come up with a whole plan. We need feedback from environment. We have to interweave planning and execution:
  • environment is stochastic.
  • multiagent
  • partial observability
  • unkown/imcomplete knowledge
  • hierarchical
When we start, we only have the high level parts of the plan, then during execution that we schedule the rest of low level parts of the plan.
sensorless planning in a deterministic world.
observation updates our believe state. Observation alone can’t create new states, but can partition them into pieces.

project

implement my_air_cargo_problems.py and test it by python run_search.py -p 1 -s 5 in which 5 is uniform cost search/ best_first_graph_search.
my_planning_graph.py
hierachy structure of code:
void main (vector<int> p, vector<int> s){
  void run_search(Problem p, Search_func(),para){
    void show_solution()
  }
}
FluentState(vector<Expr>,vector<Expr>)

string encode_state(FluentState fs,vector<Expr> fluent_map){}  // state 

AirCargoProblem(vector<string> cargos, vector<string> planes, vector<string> airports, FluentState init, vector<Expr> goal){
  vector<Action> get_action()
}

Action(Expr action,array<vector<Expr>,2> precond, array<vector<Expr>,2> effect)
-> represent return type, : represent input type.
expr('At(C1, SFO)') is an Expr object.
precondition is only very small part of the whole initial state.
what next?:
  • try single Action and Expr, see how it works
  • try understand various planning algorithms
submit requirement when udacity submit
  • my_air_cargo_problems.py
  • my_planning_graph.py
  • heuristic_analysis.pdf (if present)
  • research_review.pdf (one-page report on three of these developments, highlighting the relationships between the developments and their impact on the field of AI as a whole. Appropriate sources (such as books or magazine or journal articles) should be cited. There’s a lot of resources at the end of AIMA chapter 10 )

AIMA

codes: aima.cs.berkeley.edu

AIMA chapter 2, Intelligent Agents

In a sense, all areas of engineering can be seen as designing artifacts that interact with the world; AI operates at the most interesting end of the spectrum, where the artifacts have signficant computational resources and the task environment requires nontrivial decision making.
As a general rule, it is better to design performance measures according to what one actually wants in the environment, rather than according to how one thinks the agent should behave.
which is better— a reckless life of highs and lows, or a safe but humdrum existence?
rational agent: given percept sequence and built-in knowledge it has, select an action that is expected to maximize its performance measure.
Rationality maximizes expected performance, while perfection maximizes actual performance. The latter is impossible unless you have a crystall ball or time machine.
autonomous: an agent not only relies on its prior knowledeg of its designer; it can use its own percepts to learn and compensate for partial or incorrect prior knowledge.
Task environment: performance, environment, actuatoors, sensors.
“percept,action” table-driven approach is doomed to failture due to super large number of entries, say, 10^{250B} much larger than 10^{80} (number of atoms in the observable universe).
4 kinds of agent programs:
  1. Simple reflex agents: condition-action rule
  2. Model-based. handle partial observability: maintain internal state. The model of “how the world works”: how the world evolves independently of the agent, how the agent’s own actions affect the world. Uncertainty about the current state is unavoidable, but the agent still has to make a decision—best guess.
  3. Goal-based: external performance measure, crude binary distinction between happy and unhappy states.
  4. Utility-based: use an internal utility function to measure goals in greater details. If there are conflicting goals (e.g. speed and safety), the utility function specifies the appropriate tradeoff. when uncertainty, provides a way in which the likelihood of success can be weighted against the importance of the goals.
Any rational agent must behave as if it possesses a utility function whose expected value it tries to maximize.
learning agent:
  1. performance element: responsible for selecting external actions.
  2. learning element: responsible for making improvements
  3. critic: a fixed performance standard. It can be thought as outside the agent because the agent must not modify it to fit its own standard.
  4. probelm generator: suggest new exploratory actions to discover much better actions for the long run.
state representations:
  • atomic: inside is an identical black box. Algorithms: search, game-playing, hidden markov models, markov decision process
  • factored: splits up each state into a fixed set of key-value pairs. Algorithms: constraint satisfaction, propositional logic, planning, Bayesian networks, machine learning.
  • structured: having thins in it that are related to each other, like a class object. Algorithms: relational databases, first-order logic, first-order probablity models, knowledge-based learning, natural language processing.
More expressive representation is much more concise.

AIMA chapter 10, classical planning

fully observable, deterministic, static environments with single agents
factored representation.
a fluent is a condition that can change over time. In logical approaches to reasoning about actions, fluents can be represented in first-order logic by predicates having an argument that depends on time.
To describe the planning problem concisely, we use PDDL (Planning Domain Definition Language):
  • initial state: a conjunction of fluents that are ground, functionless atoms.
  • actions: a set of action schemas that consists of action name, a list of all the variables, a precondition and an effect.
  • result
  • goal test

Friday, August 25, 2017

Artificial Intelligence A4, Hidden Markov Model

Probabilities is the cornerstone of AI. Express uncertainty and the management of uncertainty is the key to many, many things in AI.
naive Bayes: incidences are independent to each other.
Need 3 parameters to determine joint probability: P(A), P(B/A), P(B/~A)
The caveat is that B is dependent on the not observable A.
reading: AIMA: Chapter 13.
1st order Markove models only depend on the state immediately preceding them and not a history of states. We don’t necessarily know which state matches which physical event. Instead, each state can yield one or more outputs. we observe the output over time and determine a sequence of states, based on how likely they were to produce the output.
Because the base frequency may change, we use delta frequency and euclidean distance to compare the 2 signals.
transition probability x output probability.
AIMA: Chapter 15.1-15.3
  • Please read Chapter 1 The Fundamentals of HTK (pages 3-13) in The HTK Book (version 3.4) [PDF | HTML].
AIMA: Chapter 15.4-15.6 (provides another viewpoint on HMMs with a natural extension to Kalman filters, particle filtering, and Dynamic Bayes Nets), Chapter 20.3 (hidden variables, EM algorithm)
Huang, Ariki, and Jack’s book Hidden Markov Models for Speech Recognition.
Yechiam Yemini’s slides on HMMs used in genetics (gene sequencing, decoding).
Sebastian Thrun and Peter Norvig’s AI course:
Resources for Segmentally Boosted HMMs
HMMs for Speech Synthesis
DeepMind’s WaveNet

project: build a sign language recognizer

In this project, you will build a system that can recognize words communicated using the American Sign Language (ASL). You will be provided a preprocessed dataset of tracked hand and nose positions extracted from video. Your goal would be to train a set of Hidden Markov Models (HMMs) using part of this dataset to try and identify individual words from test sequences.

learning notes

It is best to dig into the original asl_data.py to see how to data is wrangled.
asl.df is a dataframe with 15.746 k entries and 7 columns. The rows are from (98,0) to (125,56) . 6 columns are from hands_condensed.csv, 1 column is from speaker.csv
asl.build_training() gives a ssl_data.WordsData object. Each word in the training set has multiple examples from various videos.
feature selection: feature_ground, features_norm, features_polar, features_delta.
The base model is GaussianHMM: https://hmmlearn.readthedocs.io/
Each time the model is trained on a single word. We are mostly interested in the number of hidden states.
training = asl.build_training(features)  # worksdata object
X, lengths = training.get_word_Xlengths(word)
model = GaussianHMM(n_components=num_hidden_states, n_iter=1000).fit(X, lengths)
logL = model.score(X, lengths)
Note that the training set and test set are the same in the above example.

Submission

Once you have completed the project and met all the requirements set in the rubric (see below), please save the notebook as an HTML file. You can do this by going to the File menu in the notebook and choosing “Download as” > HTML. Submit the following files (and only these files) in a .zip archive:
  • asl_recognizer.ipynb
  • asl_recognizer.html
  • my_model_selectors.py
  • my_recognizer.py
the goal is to get 40% correct or 72 of 178.
features selector correct/178 time
ground SelectorConstant 59 24
ground SelectorCV 59 82
ground SelectorBIC 67 63
ground SelectorDIC 71 174
polar BIC 69 64
polar DIC 75 187
delta DIC 63 176
norm DIC 68 200
norm BIC 67 66
norm CV 67 81
norm-delta BIC 78 71
norm-delta DIC 78 201

number of free params

In this project, however, we are using the “diag” in the hmmlearn model and we are not specifying starting probabilities. Therefore, if we say that m = num_components and f = num_features…
The free parameters are a sum of:
the free transition probability parameters, which is the size of the transmat matrix less one row because they add up to 1 and therefore the final row is deterministic, so m*(m-1)
PLUS
the free starting probabilities, which is the size of startprob minus 1 because it adds to 1.0 and last one can be calculated so m-1
PLUS
number of means, which is m*f
number of covariances which is the size of the covars matrix, which for “diag” is m*f
WHICH EQUALS :
m^2 + 2*m*f - 1

Monday, July 10, 2017

Artificial intelligence ND A2, isolation



15 Lessons, including 4 Projects:
  1. sudoku game
  2. isolation game: due in 2017.5.15
  3. planning: due in 2017.6.12
  4. hidden Markov models: due in 2017.7.10
This project 2 is much more difficult than I originally expected. But once I accomplish it, I feel so great. I really enjoy the beauty of the minimax algorithm and alpha-beta pruning!

5 Intro to AI

Examples of AI application:
  • Navigation
  • tic tac toe game
  • Monty Hall problem: prior vs posterior probability
We often tend to use the word intelligence to describe things that we don’t know how to explain. Once we can explain them, we just say that they’re algorithms or formulas. They’re only intelligent if we can’t explain them.
We consider ourselves intelligent because right now we don’t fully understand how our brain work.
Equating intelligence with human behavior will pretty much rule out all artificial systems. Besides, even humans are not great at all kinds of tasks.
Intelligence should be defined within the context of a task.
  • agent (only the software part)
  • environment (including the physical body of the agent: sensors, motors)
cognition is perception and actions.
Environment states:
  • fully observable vs partially observable
  • deterministic vs stochastic
  • discrete vs continuous
  • benign vs adversarial
Bounded Optimality. An intelligent agent is one that takes actions to maximize its expected utility given the desired goal. Given these constraints, we can not expect an agent to always behave optimally. But we can come up with a level of performance or bound that we desire the agent to meet.

6 Intro to Game playing

Course teachers:
  • Peter Norvig: search and logical planning
  • Sebastian: probability and Baynes nets
  • Thad Starner: the rest
Main topics:
  • Adversarial search
  • minimax algorithm
  • alpha-beta pruning
  • evaluation function
  • isolation game
  • multi-player, probabilistic

7 isolation game

depth limit search.
  • limit time to 2 seconds per move. If the computer speed is 2GHz, and the average branching factor is 8, then we have a depth limit of 9.
Prof.
I always like to test my assumptions because they are often wrong.
quiescent search
24-27: search depth=3. The first moves on the edge have the best score.
28-29: search depth=2. The first moves on the center have the best score.
A state of quiescent: the recommended branches are not changing much.
iterative deepening means computer play always has an answer ready in case it runs out of time and it can search as far as possible within its time constraints.
It doesn’t waste that much time. Because of the exponential nature of the problem, the amount of time is dominated by the last level search.
  • If we limit the total time a player can take for the entire game, a player will want to search deeper in some parts of the game and shallower in other parts of the game.
  • A strategy for how deep we want to search: standard initial moves for the beginning of the game, then search deeper in the middle, use less time toward to the end.
  • 6.Horizon Effect: the computer may not see deep enough, goes to the local optimum.
change evaluation function: Number of my move - 2*Number of an opponent move
Shelly: the only way to really know is to try lots of variants of evaluation functions and see which ones are the best.
The strategy for an adversarial game is min-max algorithm, which has a run time of branch^{depth}
Alpha-Beta Pruning:
  • our goal is to play the game, not keep a list of all the equally good moves on a given level. It means we don’t need to look at most of the nodes.
  • run time is branch^{depth/2} if the nodes are ordered optimally with the best moves being first, or branch^{0.75depth} for random ordering.
Many problems in artificial intelligence are exponential in time or space. In fact, NP hard problems are so common in AI that researchers joke that AI is the study of finding clever hacks to exponential problems. Once the solution is found or computer is fast enough to address that particular problem, the world no longer thinks the problem belongs to artificial intelligence.
AI consists of all the NP hard problems that have not yet been solved.
What is exciting about AI is you get a chance to work on everyday problems that can improve people’s lives.
searching complex games: reading AIMA 5.3-5.4
tricks for 5x5 isolation:
  • 2nd player can always win
  • 1st play’s best strategy is to place the center and then play reflection
  • there are 8 positions you can’t play reflection.

project

starting code breakdown:
isolation.py: 1 class Board, 18 class methods.
Note the board index is numbered vertically.
class Board(Player p1,Player p2, int width=7, int height=7)
    vector<mix> _board_state(width*height+3,0); // reserve 3 for player state
bool is_winnner(Player p)
    return (p==inactive_player && active player no legal move);
bool is_loser(Player p)
    return (p==active player && has no legal move);
int get_legal_moves(Player p)
    return __get_moves(get_player_location(p));
array<int,2> get_player_location(Player p)
  int index = ...; //identify player and the move
  return (index%height, index/height);
vector<array<int>> __get_moves(array<int,2> loc)
  // propose 8 positions for L-shaped motion, like a knight
  // only return the legal one
bool move_is_legal(array<int,2> move)
  int idx = move[0] + move[1] * height;
   return (0 <= move[0] < height && 0 <= move[1] < width && _board_state[idx] == None);
void apply_move(array<int,2>)
  int idx = move[0] + move[1] * height;
 // add board state, who's last player, last player's move
// swap active_player and inactive player
tuple<Player,vector<array<int,2>>,string> play(time_limit=150)
    curr_move=_active_player.get_move(game, time_left);
    move_end = time_left();
    if move_end < 0:
      return _inactive_player, move_history, "timeout";
    if curr_move not in legal_player_moves:
      if len(legal_player_moves) > 0:
          return _inactive_player, move_history, "forfeit";
        return _inactive_player, move_history, "illegal move";
sample_player.py: 4 methods, 3 class: RandomPlayer, GreedyPlayer, HumanPlayer, 1 main method
float null_score(Board game, Player p)
    if (p win) return float("inf");
    else if (lose) return float("-inf"); 
    else return 0.;
float open_move_score(Board game, Player p)
  return game.get_legal_moves(p).size();
float improved_score(Board game, Player p)
  int own_moves = game.get_legal_moves(p).size();
  if (p == _active_player)    int opp_moves = game.get_legal_moves(_inactive_player).size();
    else int opp_moves = game.get_legal_moves(_active_player).size();      
    return float(own_moves - opp_moves);
float center_score(Board game, Player p)   
      int w= game.width / 2., h =game.height / 2.;
    int y, x = game.get_player_location(player);
    return float((h - y)**2 + (w - x)**2);
class RandomPlayer()
  array<int,2> get_move(Board game, Player p)
    auto legal_moves = game.get_legal_moves();
    if (legal_moves.size()==0) return make_tuple(-1,-1);
    return legal_moves[randint(0, len(legal_moves) - 1)];
class GreedyPlayer()
  array<int,2> get_move(Board game, Player p)
       legal_moves = game.get_legal_moves()
        _, move = max([(self.score(game.forecast_move(m), self), m) for m in legal_moves]) // ? 
        return move;
class HumanPlayer()
  array<int,2> get_move(Board game, Player p)
    legal_moves = game.get_legal_moves()
    // print board state and possible move, manually choose
int main()
    //RandomPlayer and GreedyPlayer play against each other
tournament.py: 3 methods, 1 main method
Agent = namedtuple("Agent", ["player", "name"])
int main()
  array<Agent,4> test_agents;
  array<Agent,7> cpu_agents;
  play_matches(cpu_agents,test_agents,5);
    return 0;
array<int,2> play_round(cpu_agent, test_agents, map<Agent,int> wins, int num_matches)
     // each round is num_matches*2*len(test_agents) games  
     return {timeout_count, forfeit_count};                  
int update(map total_wins, map new_wins)
    // absorb new wins into total wins
      return total_wins; // no need to return if use &
void play_matches(cpu_agents, test_agents, num_matches)
game_agent.py: 3 custom_score method, 3 class: IsolationPlayer, MinimaxPlayer, AlphaBetaPlayer.
Notes:
  • dirty jobs are done by get_move of individual class.
  • RandomPlayer and GreedyPlayer from sample_player.py only deal with the first level search, so there is no timeout issue.
  • For depth-limited search, we have to check the time before going to the next level. TIME_LIMIT_MILLIS is set to 150 in isolation.py. time_left function is lambda defined in Board.playmethod and passed into Agent.get_move. In parent class IsolationPlayer, TIMER_THRESHOLD is set to 10, so an answer should be returned if time_left is smaller than this value.
function imbeded relation
play_matches(cpu_agents, test_agents, NUM_MATCHES):
    play_round(cpu_agent, test_agents, win, NUM_MATCHES):
        game.play(time_limit=TIME_LIMIT):
            game._active_player.get_move():
                player.minimax(game, depth):
                    game.get_legal_moves():
                        game.score(game.forecast_move(m))