Friday, August 25, 2017

Artificial Intelligence A4, Hidden Markov Model

Probabilities is the cornerstone of AI. Express uncertainty and the management of uncertainty is the key to many, many things in AI.
naive Bayes: incidences are independent to each other.
Need 3 parameters to determine joint probability: P(A), P(B/A), P(B/~A)
The caveat is that B is dependent on the not observable A.
1st order Markove models only depend on the state immediately preceding them and not a history of states. We don’t necessarily know which state matches which physical event. Instead, each state can yield one or more outputs. we observe the output over time and determine a sequence of states, based on how likely they were to produce the output.
Because the base frequency may change, we use delta frequency and euclidean distance to compare the 2 signals.
transition probability x output probability.
AIMA: Chapter 15.1-15.3
• Please read Chapter 1 The Fundamentals of HTK (pages 3-13) in The HTK Book (version 3.4) [PDF | HTML].
AIMA: Chapter 15.4-15.6 (provides another viewpoint on HMMs with a natural extension to Kalman filters, particle filtering, and Dynamic Bayes Nets), Chapter 20.3 (hidden variables, EM algorithm)
Huang, Ariki, and Jack’s book Hidden Markov Models for Speech Recognition.
Yechiam Yemini’s slides on HMMs used in genetics (gene sequencing, decoding).
Sebastian Thrun and Peter Norvig’s AI course:
Resources for Segmentally Boosted HMMs
HMMs for Speech Synthesis
DeepMind’s WaveNet

project: build a sign language recognizer

In this project, you will build a system that can recognize words communicated using the American Sign Language (ASL). You will be provided a preprocessed dataset of tracked hand and nose positions extracted from video. Your goal would be to train a set of Hidden Markov Models (HMMs) using part of this dataset to try and identify individual words from test sequences.

learning notes

It is best to dig into the original `asl_data.py` to see how to data is wrangled.
`asl.df` is a dataframe with 15.746 k entries and 7 columns. The rows are from (98,0) to (125,56) . 6 columns are from `hands_condensed.csv`, 1 column is from `speaker.csv`
`asl.build_training()` gives a ssl_data.WordsData object. Each word in the training set has multiple examples from various videos.
feature selection: feature_ground, features_norm, features_polar, features_delta.
The base model is GaussianHMM: https://hmmlearn.readthedocs.io/
Each time the model is trained on a single word. We are mostly interested in the number of hidden states.
``````training = asl.build_training(features)  # worksdata object
X, lengths = training.get_word_Xlengths(word)
model = GaussianHMM(n_components=num_hidden_states, n_iter=1000).fit(X, lengths)
logL = model.score(X, lengths)
``````
Note that the training set and test set are the same in the above example.

Submission

Once you have completed the project and met all the requirements set in the rubric (see below), please save the notebook as an HTML file. You can do this by going to the File menu in the notebook and choosing “Download as” > HTML. Submit the following files (and only these files) in a .zip archive:
• `asl_recognizer.ipynb`
• `asl_recognizer.html`
• `my_model_selectors.py`
• `my_recognizer.py`
the goal is to get 40% correct or 72 of 178.
features selector correct/178 time
ground SelectorConstant 59 24
ground SelectorCV 59 82
ground SelectorBIC 67 63
ground SelectorDIC 71 174
polar BIC 69 64
polar DIC 75 187
delta DIC 63 176
norm DIC 68 200
norm BIC 67 66
norm CV 67 81
norm-delta BIC 78 71
norm-delta DIC 78 201

number of free params

In this project, however, we are using the “diag” in the hmmlearn model and we are not specifying starting probabilities. Therefore, if we say that m = num_components and f = num_features…
The free parameters are a sum of:
the free transition probability parameters, which is the size of the transmat matrix less one row because they add up to 1 and therefore the final row is deterministic, so `m*(m-1)`
PLUS
the free starting probabilities, which is the size of startprob minus 1 because it adds to 1.0 and last one can be calculated so `m-1`
PLUS
number of means, which is `m*f`
number of covariances which is the size of the covars matrix, which for “diag” is `m*f`
WHICH EQUALS :
`m^2 + 2*m*f - 1`