Monday, March 27, 2017

Deep Learning ND 2, sentiment analysis, image classification

Course schedule: week 3-6, lesson 10-23, project 2
Recap:

section 2

2 Sentiment Analysis with Andrew Trask

Andrew Trask is a PhD student at university of Oxford. He is currently writing a book: Grokking Deep Learning (40% Off: traskud17). It is an in-progress book and you prepay to read each chapter as he finishes.
course material is a few notebooks: Sentiment Network
Project end goal: analyze IMDB comments to infer “positive” or “negative”. The basical flow is:
  1. you have 25 k reviews with binary target features. The reviews can be decomposed to a vocabulory of 74 k words.
  2. write a home-made class called SentimentNetwork that preprocess data, construct a 10-node hiddenlayer network with sigmoid output and back propagation. The input layer has a size of the vocabulary—74k.
  3. last 1 k review is used for testing.
Dataset documentation: here.
miniproject 1
  1. use Counter() to build 3 vocabulary dictionaries to count positive, negative and total reviews
  2. Because the most common words are connecting/preposition words and appear in both positive and negative reviews, we use another counter to store the ratios of positive count to negative count. And use np.log to scale the very large ratio and very small ratio.
miniproject 2
  1. useset(total_counts.keys()) to build a vocabulary, i.e., a list of words.
  2. use a word2index dictionary to give index to each word
  3. vectorize each review based on this vocabulary.
miniproject 3
  1. construct a class named SentimentNetwork, initialize with a 10-node hidden layer.
  2. use 24 k instances of review for the training set, 1 k instances for testing set. Get 60% accuracy
miniproject 4
By setting self.layer_0[0][self.word2index[word]] = 1, the most common words such as space and preposition is restricted to value 1. The neural network is more effectively trained. A testing accuracy of 85% is obtained.
miniproject 5
Taking advantage of the sparsity of layer_0, only a few nodes that have value is used to calculate the weighted sum. This increases the training speed by 10 times.
miniproject 6
  1. use bokeh module to plot D3 style histogram.
  2. use min_count=10, polarity_cutoff = 0.1 to add the informative words to vocabulary. This further increases training speed by 4 times, although the accuracy is slightly reduced to 82%

Analysis

use the weights to see the similarity under the positive/negative context
def get_most_similar_words(focus = "horrible"):
    most_similar = Counter()
    for word in mlp_full.word2index.keys():
        weights_a = mlp_full.weights_0_1[mlp_full.word2index[word]]
        weights_b = mlp_full.weights_0_1[mlp_full.word2index[focus]]
        most_similar[word] = np.dot(weights_a,weights_b)
    return most_similar.most_common()
use sklearn.manifold.TSNE to cluster the words and visualize the results.

3 Intro to TFLearn

This lesson begins with a comparison for different activation functions:
  • sigmoid has a maximum value of dy/dx (0.25 per layer), it is difficult to train deep layers.
  • ReLu is better, but should be fine tune the learning rate to avoid local minimum at 0.
  • softmax is good for multi-class learning. Consequently, cost function is changed from sum of squared errors to cross entropy.
TFLearn does a lot of things for you such as initializing weights, running the forward pass, and performing backpropagation to update the weights. You end up just defining the architecture of the network (number and type of layers, number of units, etc.) and how it is trained.
import pandas as pd
import numpy as np
import tensorflow as tf
import tflearn
from tflearn.data_utils import to_categorical
reviews = pd.read_csv('reviews.txt', header=None) 
labels = pd.read_csv('labels.txt', header=None) # 25 k
from collections import Counter
total_counts = Counter()
for _, row in reviews.iterrows():
    total_counts.update(row[0].split(' ')) #have 74 k keys
vocab = sorted(total_counts, key=total_counts.get, reverse=True)[:10000] # key the 10 k most common words
word2idx = {word: i for i,word in enumerate(vocab)} # used to vectorize the word
def text_to_vector(text):
    word_vector = np.zeros(len(vocab), dtype=np.int)
    for word in text.split(' '):
        idx = word2idx.get(word,None) # get index or None
        if idx is None:    
            continue
        else:    
            word_vector[idx] += 1
    return np.array(word_vector)
word_vectors = np.zeros((len(reviews), len(vocab)), dtype=np.int_)
for i, (_, text) in enumerate(reviews.iterrows()):
    word_vectors[i] = text_to_vector(text[0]) # vectorize all reviews
Y = (labels=='positive').astype(np.int_)
records = len(labels)
y = to_categorical(Y,2)  # change 1 label to 2 labels
from sklearn.model_selection import train_test_split
X_train, X_test,y_train,y_test = train_test_split(word_vectors,y,test_size = 0.1)
Build and train model
def build_model():
    tf.reset_default_graph()
    net = tflearn.input_data([None,10000]) # unknown instances, 10000 nodes
    net = tflearn.fully_connected(net,200, activation = "ReLU")
    net = tflearn.fully_connected(net,25 , activation = "ReLU")
    net = tflearn.fully_connected(net, 2, activation = "softmax")
    net = tflearn.regression(net, optimizer= 'sgd', learning_rate = 0.1, loss= "categorical_crossentropy")
    model = tflearn.DNN(net)
    return model
model = build_model()
model.fit(X_train, y_train, validation_set=0.1, show_metric=True, batch_size=128, n_epoch=50)
predictions = (np.array(model.predict(testX))[:,0] >= 0.5).astype(np.int_)
test_accuracy = np.mean(predictions == testY[:,0], axis=0)
print("Test accuracy: ", test_accuracy)
The tricky thing here is TFLearn does not fully support TensorFlow.

Resources

  • Christopher Olah’s blog post on RNNs and LSTMs.This is the shortest and most accessible read.
  • Deep Learning Book chapter on RNNs.This will be a very technical read and is recommended for students very comfortable with advanced mathematical notation and scientific papers.
  • Andrej Karpathy’s lecture on Recurrent Neural Networks.This is a fairly long lecture (around an hour) but covers the content quite well as always with Karpathy.

7 MiniFlow

This miniflow aims to get you practice the architecture before everything is encapsulated in Tensorflow. My implementation is in this gist. The dataset used in the quiz is sklearn.datasets.load_boston.

9,11,12 TensorFlow

These 3 lessons repackaged Vincent’s previous deep learning course by adding more illustrative animations and more quizzes. Although I watched Vincent’s previous course several times, I didn’t fully understand what he means until this time. I realize why a picture worth a thousand words.

Keras

Previous course seems to be removed to somewhere)

Project 2: classify image from CIFAR10

cifar dataset is originally hosted at http://www.cs.toronto.edu/~kriz/cifar.html.
  • 163 MB
  • 60 k instances (50 k training +10 test), each 10 k instances is pickled into a batch
  • input featues are 32*32, target feature is 10 classes, corresponding to ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
  • use tensorflow to build a neural net including 1 cnn (32,5x5)+ maxpool + flatten + fully connected layer(1024-node) + dropout(10-node) + softmax_cross_entropy_with_logits