Monday, February 6, 2017

The ever-changing landscape

Update on 2017.2.16: TensorFlow 1.0
I started to learn machine learning in September 2016, and recently found that some python functions from the machine learning library was already deprived. I realize now machine learning is growing so fast that the API is being updated non-stop to better suit the real-world requirements.
Here are some notes to track the grammar changes that affect my projects.

Tensorflow

tf.initialize_all_variables() # 0.11
tf.global_variables_initializer() # 0.12

scikit-learn

from sklearn.model_selection import validation_curve, train_test_split, GridSearchCV, KFold, cross_val_score  # 0.18, 
from sklearn.cross_validation import train_test_split # 0.17

Note:

model_selection is a new module, which groups several functionalities together:
  • cross_val_score(svc, X, y, cv=KFold(N_splits=3, n_jobs=-1) is very convenient. You get 3 sets of data, fit, prediction and score in one line of code. So you can easily see the variation caused by data fluctuation.
  • A more fancy way is to use validation_curve, in which you get both training score and test score for a set of hyperparameters. e.g. train_scores, test_scores = validation_curve(SVC(), X, y, param_name="gamma", param_range=np.logspace(-6,-1,5), cv=10, scoring="accuracy", n_jobs=1)
  • An even more fancy way is to use learning_curve, in which you see the score change with data size. e.g. train_sizes, train_scores, valid_scores = learning_curve (SVC(kernel='linear'), X, y, train_sizes=[50, 80, 110], cv=5)
  • Don’t forget the previous GridSearchCV is still powerful.