[TOC]

# Trailer

**Predicting the long-term future is very difficult**. Nobody can really do it.

*The greedy algorithm*takes whatever’s working best now and assume the future’s going to be like that forever.

TensorFlow is an open source API for deep learning, developed by Google brain and released on 2015-11-9.

TensorFlow computation is done in a structure called

**graphs**. The inner core of TensorFlow is actually written in C++ to speed up the calculation.
Major improvements in machine learning research result from advances in

**learning algorithms, computer hardware and high-quality training datasets**. MNIST(Mixed National Institute of Standards and Technology) is a large database of handwritten digits for training image processing systems. It contains 60 k training images and 10 k testing images. Currently, the lowest error rate is 0.23% by a hierarchical system of convolutional neural networks.
Here is an interesting animation to illustrate the data flow in neural network: http://playground.tensorflow.org/

# Basic Usage

TensorFlow is a programming system in which you represent computations as graphs.

**Nodes in the graph are called**. An op takes*ops*(short for operations)`Tensors`

to performs some computation, where a `Tensor`

is a typed multi-dimensional array.**A TensorFlow graph is a**To compute anything, a graph must be launched in a

*description*of computations.`Session`

. A `Session`

places the graph ops onto `Devices`

, such as CPUs or GPUs, and provides methods to execute them. These methods return tensors produced by ops as numpy `ndarray`

objects in Python.
TensorFlow programs are usually structured into 2 phases:

**construction phase**, that assembles a graph,**execution phase**, that uses a session to execute ops in the graph.e.g.`tf.Session().run()`

Note: due to imcompatibility, turn on gpu will lead to painful error, so don’t do this:

`with tf.device("/cpu:0")`

### interactive session

Usually, we use

`Session.run()`

method to execute operations.
But in python, we can use

`InteractiveSession`

class, and the `Tensor.eval()`

and `Operation.run()`

methods. This avoids having to keep a variable holding the session.### fetch and feed

usually, we use fetch, where the value is preload. Otherwise, it is feed. we use

`placeholder()`

for creating ops, then use `feed_dict`

to specific the values when we actually run.# Introduction

- an example of how tensorflow solve a linear regression problem.
**The trick is to initialize the weight and bias in a reasonable value**. The network doesn’t have the magic bullet to get arbitrary things right.- the construction phase is actually writing symbolic math equation. Once I understand, it’s super easy to use.

```
# define weight,bias
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y_pred = W * X_train + b
# define loss, optimizer
loss = tf.reduce_mean(tf.square(y_pred - y_train))
optimizer = tf.train.GradientDescentOptimizer(0.1)
train_step = optimizer.minimize(loss)
# initialize the variables, build Session
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for step in range(101):
sess.run(train_step)
print(W.eval(),b.eval())
```

note:

- train and init are Operations, they can only be
`run()`

- W and b are Variable, they can be
`run()`

or`eval()`

# Beginners

Data extraction and preprocessing are done in

`input_data.py`

. But that file is just a transfer center. Real work is done `from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets`

. Just in case you are interested the data source.**This is a 2-layer(784,10) neural network**. Note that the loss function is

`tf.nn.softmax_cross_entropy_with_logits(logits=y,labels=y_)`

.**WARNING:**This op expects unscaled logits, since it performs a

`softmax`

on `logits`

internally for efficiency. Do not call this op with the output of `softmax`

, as it will produce incorrect results.
batch size =100, gradient descent, learning rate = 0.5, accuracy is 92%.

Actually, I recommend beginner to learn tensorboard at the first place. see here

# Experts

layer name | size | neuron type |
---|---|---|

input | 784 | |

h_pool1 | 32 | conv+relu+max_pool |

h_pool2 | 64 | conv+relu+max_pool |

h_fc1_drop | 1024 | relu+dropout |

y_conv | 10 | logits |

loss still use

`tf.nn.softmax_cross_entropy_with_logits`

optimization use

`AdamOptimizer(1e-4)`

# Mechanics 101

The main file is

`fully_connected_feed.py`

,in which the graph is built by `mnist.py`

.**It’s 4 layer network (784,128,32,10), the 2 hidden layers being ReLU**. It seems complicated than the ‘experts’ version just because it tries to formulate every step, especially the

`interference()`

that define the graph structure.
The precision is only about 90% in 2000 steps

*100 batch size, worse than the beginner version of 92% in 1000 steps*100 batch size. Why it behavior so poorly? It’s partially because the**default learning rate is 0.01**. I get 96% once I increase the learning rate to 0.1, and 97% when 0.2.
An useful trick learned is to print the

**loss**, which is more directly than the precision. Although such loss is only for a small batch of data, yet this is what exactly happenning during the training process.