Thursday, January 26, 2017

AMPL

AMPL, short for “A Mathematical Programming Language”, is an algebraic modeling language and the most powerful tool for linear programming problem and operation research. One advantage of AMPL is that its syntax is similar to the mathematical notation of optimization problems. It is the most popular input is NEOS:
market share
AMPL itself is free, but it doesn’t solve problems directly. Instead, different algorithms have been developed to solve application-oriented problems. These algorithms are packaged as so-called solvers. Hence, solvers are equivalent to toolbox of Matlab, library of R, or API of many general-purpose programming languages. Among AMPL solvers, CPLEX by IBM Corporation is most widely used. Solver performance is highly application-dependent.
To get started, AMPL IDE can be downloaded from official website http://ampl.com/. Basically, there are 4 ways to get the free lunch:
  1. demo version, problem-size limited (under 300 variables/constraints, etc)
  2. 30-day trial, full feature, requested individually
  3. courses version, requested by teacher
  4. cloud service by NEOS Server
For purchase, you will see solvers are sold annually, because solvers are the core part. Anyway, demo version is enough for personal use.

AMPL grammer

  • comment by #
  • variables are declared by var
  • parameters are declared by param
  • each line of code ends with a semi-colon; otherwise, you get ==ampl?== in console window because the computer thinks you have not finished.
  • objective format: maximize or minimize, a name, and a colon, then statement
  • constraint format: subject to, a name, and a colon, then statement.
  • keywords are in lowercase.
  • \sum_{i=1}^n is written as sum{i in 1..n}
  • output variables in the console by display. It’s a sharp contrast with Matlab, in which you can directly output anything without keyword without the semicolon. In AMPL, you must display 1+1,sqrt(2),2^3;
A typical input in console is:
reset;
model example.mod;
data exmple.dat;
solve;
display x;
The separation of model and data is the key to describing complex problems.

key concepts

  • decision variables: whose values are to maximize profits or reduce loss
  • feasible solutions: those satisfy all constraints.

the hardest tasks

  1. formulating a correct model
  2. providing accurate data

examples

A simple 2-variable example

# prod0.mod
var XB;
var XC;
maximize Profit: 25 * XB + 30 * XC;
subject to Time: (1/200) * XB + (1/140) * XC <= 40;
subject to B_limit: 0 <= XB <= 6000;
subject to C_limit: 0 <= XC <= 4000;

solve;
display XB,XC,Profit
A shortcut to execute is to save first, then right click in the script window-> AMPL command -> model. As we see in the console window, computer finds an optimal solver for you: MINOS 5.51.

multi-parameter example

# prod.mod
reset;
set P;
param a {j in P};  # tons per hour of product j
param b;           # hours available
param c {j in P};  # profit per ton of product j
param u {j in P};  # max tons of product j
var X {j in P};    # tons of product j
maximize Total_Profit: sum {j in P} c[j] * X[j];
subject to Time: sum {j in P} (1/a[j]) * X[j] <= b;
subject to Limit {j in P}: 0 <= X[j] <= u[j];

data /Users/yuchaojiang/Downloads/amplide.macosx64/models/prod.dat;
solve;
display X, Total_Profit;
# prod.dat
data;
set P := bands coils;
param:     a     c     u  :=
  bands   200   25   6000
  coils   140   30   4000 ;
param b := 40;
Actually, you can also combine the two files into one. This save the calling by data prod.dat and you have all the data in one page.

some tricks

  • prefer longer, more meaningful names
  • use cd; or display _cd; and cd path; to show directory and change directory

Wednesday, January 25, 2017

Text Analytics

Text mining, or text analytics, is to derive high-quality information from text.
  • devising patterns and trends
  • 80% of enterprise information originates and is locked in the unstructured form, rather than numerical data.
  • derive information from unstructured sources.
typical examples:
  • detect terrorist
  • find a protein in biomedical literature that may lead to a cancer.
The high quality is the first thing of text data:
  • choose the right source
  • That a source is available doesn’t mean it’s right for the job
  • source selection criteria include topicality, focus (high signal to noise ratio, currency, authority, your processing capabilities and analytics needs.
Three types of approaches:
  • co-currency based
  • rule-based
  • machine learning based

Natural Language ToolKit (NLTK)

The toolkit is powerful. However, after a few hours, I think this is a wrong direction to look at. Because disassemble the texts into words will lose the context information. Without context, the understanding is quite superficial. Machine is only good at “literal meaning”. Statistical learning is meant for numerical data, not text data.
To me, the nice thing about nltk is it has some great datasets that come from the master piece of English literature.
conda install nltk
python -m nltk.downloader all
import nltk
import IPython
from nltk.corpus import treebank
t = treebank.parsed_sents('wsj_0001.mrg')[0]
IPython.core.display.display(t)
from nltk.book import *
text1.concordance("monstrous")
text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])
nltk.corpus.gutenberg.fileids()
from nltk.corpus import brown
brown.categories()
Statistical results may be interesting, but not as much as reading a real book from cover to cover.

Monday, January 16, 2017

TensorFlow, learning guide

[TOC]

Trailer

Predicting the long-term future is very difficult. Nobody can really do it. The greedy algorithm takes whatever’s working best now and assume the future’s going to be like that forever.
TensorFlow is an open source API for deep learning, developed by Google brain and released on 2015-11-9.
TensorFlow computation is done in a structure called graphs. The inner core of TensorFlow is actually written in C++ to speed up the calculation.
Major improvements in machine learning research result from advances in learning algorithms, computer hardware and high-quality training datasets. MNIST(Mixed National Institute of Standards and Technology) is a large database of handwritten digits for training image processing systems. It contains 60 k training images and 10 k testing images. Currently, the lowest error rate is 0.23% by a hierarchical system of convolutional neural networks.
Here is an interesting animation to illustrate the data flow in neural network: http://playground.tensorflow.org/

Basic Usage

TensorFlow is a programming system in which you represent computations as graphs. Nodes in the graph are called ops (short for operations). An op takes Tensors to performs some computation, where a Tensor is a typed multi-dimensional array.
A TensorFlow graph is a description of computations. To compute anything, a graph must be launched in a Session. A Session places the graph ops onto Devices, such as CPUs or GPUs, and provides methods to execute them. These methods return tensors produced by ops as numpy ndarray objects in Python.
TensorFlow programs are usually structured into 2 phases:
  1. construction phase, that assembles a graph,
  2. execution phase, that uses a session to execute ops in the graph.e.g. tf.Session().run()
Note: due to imcompatibility, turn on gpu will lead to painful error, so don’t do this: with tf.device("/cpu:0")

interactive session

Usually, we use Session.run() method to execute operations.
But in python, we can use InteractiveSession class, and the Tensor.eval() and Operation.run() methods. This avoids having to keep a variable holding the session.

fetch and feed

usually, we use fetch, where the value is preload. Otherwise, it is feed. we use placeholder() for creating ops, then use feed_dict to specific the values when we actually run.

Introduction

  • an example of how tensorflow solve a linear regression problem.
  • The trick is to initialize the weight and bias in a reasonable value. The network doesn’t have the magic bullet to get arbitrary things right.
  • the construction phase is actually writing symbolic math equation. Once I understand, it’s super easy to use.
# define weight,bias
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0)) 
b = tf.Variable(tf.zeros([1]))                     
y_pred = W * X_train + b
# define loss, optimizer
loss = tf.reduce_mean(tf.square(y_pred - y_train))
optimizer = tf.train.GradientDescentOptimizer(0.1)
train_step = optimizer.minimize(loss)
# initialize the variables, build Session
init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for step in range(101):
        sess.run(train_step)
    print(W.eval(),b.eval())
note:
  • train and init are Operations, they can only be run()
  • W and b are Variable, they can be run() or eval()

Beginners

Data extraction and preprocessing are done in input_data.py. But that file is just a transfer center. Real work is done from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets. Just in case you are interested the data source.
This is a 2-layer(784,10) neural network. Note that the loss function is tf.nn.softmax_cross_entropy_with_logits(logits=y,labels=y_).WARNING: This op expects unscaled logits, since it performs a softmax on logitsinternally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
batch size =100, gradient descent, learning rate = 0.5, accuracy is 92%.
Actually, I recommend beginner to learn tensorboard at the first place. see here
Smiley face

Experts

layer name size neuron type
input 784
h_pool1 32 conv+relu+max_pool
h_pool2 64 conv+relu+max_pool
h_fc1_drop 1024 relu+dropout
y_conv 10 logits
loss still use tf.nn.softmax_cross_entropy_with_logits
optimization useAdamOptimizer(1e-4)
Smiley face

Mechanics 101

The main file is fully_connected_feed.py,in which the graph is built by mnist.py.
It’s 4 layer network (784,128,32,10), the 2 hidden layers being ReLU. It seems complicated than the ‘experts’ version just because it tries to formulate every step, especially the interference() that define the graph structure.
The precision is only about 90% in 2000 steps 100 batch size, worse than the beginner version of 92% in 1000 steps 100 batch size. Why it behavior so poorly? It’s partially because the default learning rate is 0.01. I get 96% once I increase the learning rate to 0.1, and 97% when 0.2.
An useful trick learned is to print the loss, which is more directly than the precision. Although such loss is only for a small batch of data, yet this is what exactly happenning during the training process.

Sunday, January 15, 2017

Elon Musk's Quote

  • The problem is that at a lot of big companies, process becomes a substitute for thinking. You’re encouraged to behave like a little gear in a complex machine. Frankly, it allows you to keep people who aren’t that smart, who aren’t that creative.
  • I think that’s the single best piece of advice: constantly think about how you could be doing things better and questioning yourself.
  • I think it’s very important to have a feedback loop, where you’re constantly thinking about what you’ve done and how you could be doing it better.
  • When something is important enough, you do it even if the odds are not in your favor.
  • If you go back back a few hundred years, what we take for granted today would seem like magic - being able to talk to people over long distances, to transmit images, flying, accessing vast amounts of data like an oracle. These are all things that would have been considered magic a few hundred years ago.
  • I’ve actually made a prediction that within 30 years a majority of new cars made in the United States will be electric. And I don’t mean hybrid, I mean fully electric.
  • When I was in college, I wanted to be involved in things that would change the world.
  • There have only been about a half dozen genuinely important events in the four-billion-year saga of life on Earth: single-celled life, multicelled life, differentiation into plants and animals, movement of animals from water to land, and the advent of mammals and consciousness.
  • The reality is gas prices should be much more expensive then they are because we’re not incorporating the true damage to the environment and the hidden costs of mining oil and transporting it to the U.S. Whenever you have an unpriced externality, you have a bit of a market failure, to the degree that externality remains unpriced.
  • I don’t spend my time pontificating about high-concept things; I spend my time solving engineering and manufacturing problems.
  • Patience is a virtue, and I’m learning patience. It’s a tough lesson.
  • Life is too short for long-term grudges.
  • I’ve actually not read any books on time management.
  • I do think there is a lot of potential if you have a compelling product and people are willing to pay a premium for that. I think that is what Apple has shown. You can buy a much cheaper cell phone or laptop, but Apple’s product is so much better than the alternative, and people are willing to pay that premium.
  • There are some important differences between me and Tony Stark, like I have five kids, so I spend more time going to Disneyland than parties.
  • I tend to approach things from a physics framework. And physics teaches you to reason from first principles rather than by analogy.
  • If humanity doesn’t land on Mars in my lifetime, I would be very disappointed.
  • I would like to fly in space. Absolutely. That would be cool. I used to just do personally risky things, but now I’ve got kids and responsibilities, so I can’t be my own test pilot. That wouldn’t be a good idea. But I definitely want to fly as soon as it’s a sensible thing to do.
  • People work better when they know what the goal is and why. It is important that people look forward to coming to work in the morning and enjoy working.
  • I wouldn’t say I have a lack of fear. In fact, I’d like my fear emotion to be less because it’s very distracting and fries my nervous system.
  • I always invest my own money in the companies that I create. I don’t believe in the whole thing of just using other people’s money. I don’t think that’s right. I’m not going to ask other people to invest in something if I’m not prepared to do so myself.
  • The United States is definitely ahead in culture of innovation. If someone wants to accomplish great things, there is no better place than the U.S.
  • Physics is really figuring out how to discover new things that are counterintuitive, like quantum mechanics. It’s really counterintuitive.

Saturday, January 14, 2017

Theano with GPU, setup and trouble shooting

Environment setup is always a headache for me. The only way to fight back is documenting every step I did before as detail as possible.
I will find time to finish this udacity course on Linux Comand line
I am now using MacBook Pro (2013) with OS X El Capitan, 10.11.6.
cpu: 2.4 GHz i7;
memory: 8G
graphics : NVIDIA GeForce GT 650M 1024 MB.
[TOC]

Background knowledge refresh

To have effect after making some changes, you need to restart terminal.Alternatively. you could execute source ~/.bash_profile to reload your settings.

Bash file

There may or may be not a default file in home directory named .bash_profile.
I configure the git workspace by this udacity video about tab completion, prompt color and git editor. Because I ran git config --global core.editor "atom —wait", so each time if I git commit and open a new file to documment the changes, I have to close the file before I continue.

Set path

to check your working directory: pwd
to check path: echo $PATH
The paths shown are actually written in two files:
/etc/paths
~/.bash_profile
in the first file, you can direct write the path; in the second file, you write your path in the format
export PATH="xxx:$PATH"
xxx represent your actual path.
The nice thing about bash_profile is you can use alias mlnd=" cd Desktop/Udacity/MLND/course_material/projects/" to save a long path for your frequent use.

Theano

Theano is a machine learning library, http://deeplearning.net/software/theano/
  • easy to implement backpropagation for convolutional neural networks
  • can run code on either a CPU or a GPU.
Basically, there are 3 steps to use Theano with GPU enabled:
  1. sudo pip install Theano
  2. download and install CUDA toolkit,which includes a compiler for NVIDIA GPUs. https://developer.nvidia.com/cuda-toolkit.
  3. configure Theano and CUDA

enable GPU

after nosetests theano, I found something is missing, then pip install nose_parameterized
write these into .bash_profile
# Theano and CUDA
export PATH="/Developer/NVIDIA/CUDA-8.0/bin/:$PATH"
export LD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-8.0/lib/
export CUDA_ROOT=/Developer/NVIDIA/CUDA-8.0/
export THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32'
test GPU or cpu
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')
used the cpu took 2 .06 seconds; used the gpu took 1.22 seconds

Another guide

http://daoyuan.li/installing-theano-and-cuda-on-mac-os-x/ provides detailed guidance and 2 comparison examples of GPU/CPU
write a python file, check.py
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')
in command line:
THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 time python check.py 
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 time python check.py
The results is 8.44 vs 6.3
But another test file, lr.py shows the opposite.
I also try example codes of chapter 6 from “Neural networks and deep learning” by Michael Nielsen. I get 60% reduction of running time. However, my excitement didn’t last long. I soon get “Initialisation of device gpu failed! Reason=CNMEM_STATUS_OUT_OF_MEMORY”. I tried to set cnmem= 1.0 or 0.9, but no help.
At last, I resort to https://github.com/phvu/cuda-smi to check my gpu. Still no idea.
I had to close everything and restart. still failed! Then I change cnmem =0.1 and succeed. What a joke~ I realize my poor GPU is not meant for such deep learning computing. The memory management is far from optimization. Remember: theano is at 0.8 version, CUDA is also at 0.8 version. They are still at very early stage. Note that nvidia just built first AI supercomputer, which cost $129 k!

life is better with cloud service

http://www.pyimagesearch.com/2014/10/06/experience-cudamat-deep-belief-networks-python/ didn’t get an improvement after using GPU, and suggested using cloud service: