Probabilities is the cornerstone of AI. Express uncertainty and the management of uncertainty is the key to many, many things in AI.
naive Bayes: incidences are independent to each other.
Need 3 parameters to determine joint probability: P(A), P(B/A), P(B/~A)
The caveat is that B is dependent on the not observable A.
reading: AIMA: Chapter 13.
1st order Markove models only depend on the state immediately preceding them and not a history of states. We don’t necessarily know which state matches which physical event. Instead, each state can yield one or more outputs. we observe the output over time and determine a sequence of states, based on how likely they were to produce the output.
Because the base frequency may change, we use delta frequency and euclidean distance to compare the 2 signals.
transition probability x output probability.
AIMA: Chapter 15.1-15.3
Rabiner’s famous Tutorial on hidden Markov models and selected applications in speech recognition [errata]
Thad Starner’s MS thesis: Visual Recognition of American Sign Language Using Hidden Markov Models [PDF]
- Please read Chapter 1 The Fundamentals of HTK (pages 3-13) in The HTK Book (version 3.4) [PDF | HTML].
AIMA: Chapter 15.4-15.6 (provides another viewpoint on HMMs with a natural extension to Kalman filters, particle filtering, and Dynamic Bayes Nets), Chapter 20.3 (hidden variables, EM algorithm)
Huang, Ariki, and Jack’s book Hidden Markov Models for Speech Recognition.
Yechiam Yemini’s slides on HMMs used in genetics (gene sequencing, decoding).
Sebastian Thrun and Peter Norvig’s AI course:
Resources for Segmentally Boosted HMMs
- SBHMM project at Georgia Tech
- HMM Tool Kit (HTK)
- Gesture and Activity Recognition Toolkit (GART; formerly Georgia Tech Gesture Toolkit)
Pei Yin’s dissertation: Segmental discriminative analysis for American Sign Language recognition and verification
HMMs for Speech Synthesis
Junichi Yamagishi’s An Introduction to HMM-Based Speech Synthesis
Heiga Zen’s Deep Learning in Speech Synthesis
DeepMind’s WaveNet
project: build a sign language recognizer
In this project, you will build a system that can recognize words communicated using the American Sign Language (ASL). You will be provided a preprocessed dataset of tracked hand and nose positions extracted from video. Your goal would be to train a set of Hidden Markov Models (HMMs) using part of this dataset to try and identify individual words from test sequences.
learning notes
It is best to dig into the original
asl_data.py
to see how to data is wrangled.asl.df
is a dataframe with 15.746 k entries and 7 columns. The rows are from (98,0) to (125,56) . 6 columns are from hands_condensed.csv
, 1 column is from speaker.csv
asl.build_training()
gives a ssl_data.WordsData object. Each word in the training set has multiple examples from various videos.
feature selection: feature_ground, features_norm, features_polar, features_delta.
The base model is GaussianHMM: https://hmmlearn.readthedocs.io/
Each time the model is trained on a single word. We are mostly interested in the number of hidden states.
training = asl.build_training(features) # worksdata object
X, lengths = training.get_word_Xlengths(word)
model = GaussianHMM(n_components=num_hidden_states, n_iter=1000).fit(X, lengths)
logL = model.score(X, lengths)
Note that the training set and test set are the same in the above example.
Submission
Once you have completed the project and met all the requirements set in the rubric (see below), please save the notebook as an HTML file. You can do this by going to the File menu in the notebook and choosing “Download as” > HTML. Submit the following files (and only these files) in a .zip archive:
asl_recognizer.ipynb
asl_recognizer.html
my_model_selectors.py
my_recognizer.py
the goal is to get 40% correct or 72 of 178.
features | selector | correct/178 | time |
---|---|---|---|
ground | SelectorConstant | 59 | 24 |
ground | SelectorCV | 59 | 82 |
ground | SelectorBIC | 67 | 63 |
ground | SelectorDIC | 71 | 174 |
polar | BIC | 69 | 64 |
polar | DIC | 75 | 187 |
delta | DIC | 63 | 176 |
norm | DIC | 68 | 200 |
norm | BIC | 67 | 66 |
norm | CV | 67 | 81 |
norm-delta | BIC | 78 | 71 |
norm-delta | DIC | 78 | 201 |
number of free params
In this project, however, we are using the “diag” in the hmmlearn model and we are not specifying starting probabilities. Therefore, if we say that m = num_components and f = num_features…
The free parameters are a sum of:
the free transition probability parameters, which is the size of the transmat matrix less one row because they add up to 1 and therefore the final row is deterministic, so
PLUS
the free starting probabilities, which is the size of startprob minus 1 because it adds to 1.0 and last one can be calculated so
PLUS
number of means, which is
number of covariances which is the size of the covars matrix, which for “diag” is
WHICH EQUALS :
The free parameters are a sum of:
the free transition probability parameters, which is the size of the transmat matrix less one row because they add up to 1 and therefore the final row is deterministic, so
m*(m-1)
PLUS
the free starting probabilities, which is the size of startprob minus 1 because it adds to 1.0 and last one can be calculated so
m-1
PLUS
number of means, which is
m*f
number of covariances which is the size of the covars matrix, which for “diag” is
m*f
WHICH EQUALS :
m^2 + 2*m*f - 1