Predicting the long-term future is very difficult. Nobody can really do it. The greedy algorithm takes whatever’s working best now and assume the future’s going to be like that forever.
TensorFlow is an open source API for deep learning, developed by Google brain and released on 2015-11-9.
TensorFlow computation is done in a structure called graphs. The inner core of TensorFlow is actually written in C++ to speed up the calculation.
Major improvements in machine learning research result from advances in learning algorithms, computer hardware and high-quality training datasets. MNIST(Mixed National Institute of Standards and Technology) is a large database of handwritten digits for training image processing systems. It contains 60 k training images and 10 k testing images. Currently, the lowest error rate is 0.23% by a hierarchical system of convolutional neural networks.
Here is an interesting animation to illustrate the data flow in neural network: http://playground.tensorflow.org/
TensorFlow is a programming system in which you represent computations as graphs. Nodes in the graph are called ops (short for operations). An op takes
Tensorsto performs some computation, where a
Tensoris a typed multi-dimensional array.
A TensorFlow graph is a description of computations. To compute anything, a graph must be launched in a
Sessionplaces the graph ops onto
Devices, such as CPUs or GPUs, and provides methods to execute them. These methods return tensors produced by ops as numpy
ndarrayobjects in Python.
TensorFlow programs are usually structured into 2 phases:
- construction phase, that assembles a graph,
- execution phase, that uses a session to execute ops in the graph.e.g.
Note: due to imcompatibility, turn on gpu will lead to painful error, so don’t do this:
Usually, we use
Session.run()method to execute operations.
But in python, we can use
InteractiveSessionclass, and the
Operation.run()methods. This avoids having to keep a variable holding the session.
fetch and feed
usually, we use fetch, where the value is preload. Otherwise, it is feed. we use
placeholder()for creating ops, then use
feed_dictto specific the values when we actually run.
- an example of how tensorflow solve a linear regression problem.
- The trick is to initialize the weight and bias in a reasonable value. The network doesn’t have the magic bullet to get arbitrary things right.
- the construction phase is actually writing symbolic math equation. Once I understand, it’s super easy to use.
# define weight,bias W = tf.Variable(tf.random_uniform(, -1.0, 1.0)) b = tf.Variable(tf.zeros()) y_pred = W * X_train + b # define loss, optimizer loss = tf.reduce_mean(tf.square(y_pred - y_train)) optimizer = tf.train.GradientDescentOptimizer(0.1) train_step = optimizer.minimize(loss) # initialize the variables, build Session init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) for step in range(101): sess.run(train_step) print(W.eval(),b.eval())
- train and init are Operations, they can only be
- W and b are Variable, they can be
Data extraction and preprocessing are done in
input_data.py. But that file is just a transfer center. Real work is done
from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets. Just in case you are interested the data source.
This is a 2-layer(784,10) neural network. Note that the loss function is
tf.nn.softmax_cross_entropy_with_logits(logits=y,labels=y_).WARNING: This op expects unscaled logits, since it performs a
logitsinternally for efficiency. Do not call this op with the output of
softmax, as it will produce incorrect results.
batch size =100, gradient descent, learning rate = 0.5, accuracy is 92%.
Actually, I recommend beginner to learn tensorboard at the first place. see here
|layer name||size||neuron type|
loss still use
It’s 4 layer network (784,128,32,10), the 2 hidden layers being ReLU. It seems complicated than the ‘experts’ version just because it tries to formulate every step, especially the
interference()that define the graph structure.
The precision is only about 90% in 2000 steps 100 batch size, worse than the beginner version of 92% in 1000 steps 100 batch size. Why it behavior so poorly? It’s partially because the default learning rate is 0.01. I get 96% once I increase the learning rate to 0.1, and 97% when 0.2.
An useful trick learned is to print the loss, which is more directly than the precision. Although such loss is only for a small batch of data, yet this is what exactly happenning during the training process.