Saturday, January 14, 2017

Theano with GPU, setup and trouble shooting

Environment setup is always a headache for me. The only way to fight back is documenting every step I did before as detail as possible.
I will find time to finish this udacity course on Linux Comand line
I am now using MacBook Pro (2013) with OS X El Capitan, 10.11.6.
cpu: 2.4 GHz i7;
memory: 8G
graphics : NVIDIA GeForce GT 650M 1024 MB.
[TOC]

Background knowledge refresh

To have effect after making some changes, you need to restart terminal.Alternatively. you could execute source ~/.bash_profile to reload your settings.

Bash file

There may or may be not a default file in home directory named .bash_profile.
I configure the git workspace by this udacity video about tab completion, prompt color and git editor. Because I ran git config --global core.editor "atom —wait", so each time if I git commit and open a new file to documment the changes, I have to close the file before I continue.

Set path

to check your working directory: pwd
to check path: echo $PATH
The paths shown are actually written in two files:
/etc/paths
~/.bash_profile
in the first file, you can direct write the path; in the second file, you write your path in the format
export PATH="xxx:$PATH"
xxx represent your actual path.
The nice thing about bash_profile is you can use alias mlnd=" cd Desktop/Udacity/MLND/course_material/projects/" to save a long path for your frequent use.

Theano

Theano is a machine learning library, http://deeplearning.net/software/theano/
  • easy to implement backpropagation for convolutional neural networks
  • can run code on either a CPU or a GPU.
Basically, there are 3 steps to use Theano with GPU enabled:
  1. sudo pip install Theano
  2. download and install CUDA toolkit,which includes a compiler for NVIDIA GPUs. https://developer.nvidia.com/cuda-toolkit.
  3. configure Theano and CUDA

enable GPU

after nosetests theano, I found something is missing, then pip install nose_parameterized
write these into .bash_profile
# Theano and CUDA
export PATH="/Developer/NVIDIA/CUDA-8.0/bin/:$PATH"
export LD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-8.0/lib/
export CUDA_ROOT=/Developer/NVIDIA/CUDA-8.0/
export THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32'
test GPU or cpu
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')
used the cpu took 2 .06 seconds; used the gpu took 1.22 seconds

Another guide

http://daoyuan.li/installing-theano-and-cuda-on-mac-os-x/ provides detailed guidance and 2 comparison examples of GPU/CPU
write a python file, check.py
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')
in command line:
THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 time python check.py 
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 time python check.py
The results is 8.44 vs 6.3
But another test file, lr.py shows the opposite.
I also try example codes of chapter 6 from “Neural networks and deep learning” by Michael Nielsen. I get 60% reduction of running time. However, my excitement didn’t last long. I soon get “Initialisation of device gpu failed! Reason=CNMEM_STATUS_OUT_OF_MEMORY”. I tried to set cnmem= 1.0 or 0.9, but no help.
At last, I resort to https://github.com/phvu/cuda-smi to check my gpu. Still no idea.
I had to close everything and restart. still failed! Then I change cnmem =0.1 and succeed. What a joke~ I realize my poor GPU is not meant for such deep learning computing. The memory management is far from optimization. Remember: theano is at 0.8 version, CUDA is also at 0.8 version. They are still at very early stage. Note that nvidia just built first AI supercomputer, which cost $129 k!

life is better with cloud service

http://www.pyimagesearch.com/2014/10/06/experience-cudamat-deep-belief-networks-python/ didn’t get an improvement after using GPU, and suggested using cloud service: