Friday, February 10, 2017

Machine Learning ND 2, unsupervised learning

I am not satisfied with the unsupervised learning courses at Udacity. It is just not well organized! Seems like a random collection of “Intro” course and “Gatech” course. I lose my focus several times during the study.
To me, unsupervised learning is actually more important than supervised learning. Because all human knowledge begins with unlabelled data. After human discover the natural patterns behind the phenomenon, they begin to label various things to accumulate knowledge and gain further insights. Unsupervised learning is difficult to teach because, in the first place, you don’t even know whether there is a pattern to look at, let alone what’s the important features.

Unsupervised algorithms

  1. K-means clusters. cons: bad starting points may lead to the bad local minimum.
  2. Single Linkage clustering. consider each object a cluster, merge the closest together.
  3. Expectation Maximization. soft clustering.

Feature selection

from m features, select n features is an NP-hard problem, has a complexity of n^m
speed main characteristics implement
filtering fast ignore the learner and no feedback Information gain
wrapping slow takes into account model bias and learning forward (adding) backward (subtract)
  • Relevance: information, Bayes optimal classifier (no bias)
  • usefulness: reduce error, bias help break the tie.


  • a systematic way to transform input features into principal components.
  • use PCs as new features
  • Maximum variance as the principal component, so as to minimize the information loss.
  • PCs are independent features.
when to use:
  • latent features driving the patterns
  • dimensionality reduction (human can only draw 2D scatterplot!).
  • It is a data preprocessing. So it can be used in both supervised or unsupervised learning to reduce noise and reduce overfitting.

facial recognition

How many PCs to use? (measured by f1 score due to multi-class labels)
No of PC F1 score
15 0.65
25 0.74
50 0.81
100 0.85
250 0.82

Feature transformation

have overlap with feature selection.
independent component analysis

Customer segments

content has been merged into p3_Customer Segments