Monday, May 1, 2017

Tournament project revisited, SQLite, vagrant

format SQL online:
format SQL in python:
Recap1 is when I started to learn SQL 10 months ago by doing the Tournament project for the Intro to Programming Nanodegree.
However, I have to admit I almost forget everything, such as why I use vagrant and virtual box, why the reviewer insists not letting me using the view. Now I am using SQL again for Business Analyst Nanodegree, and it is a good time to look back and bridge my knowledge gap.
Recap 2 is in Data Analyst Nanodegree, where compared different SQL systems. Basically, SQL Server is sold by Microsoft and MySQL by Oracle, while the free meals are SQLite by D. Richard Hipp and PostgreSQL by UC Berkeley.
In my opinion, the starting point to learn should be SQLite. You just type sqlite3 in the terminal to have fun. But Tournament project from IPND does it the hard way by using PostgreSQL. This requires you to build a virtual machine to run the server, which is through Vagrant and Virtual Box.


The difference between Virtual machine and Docker is here.
Vagrant is an open-source software released in 2010. It is a configuration tool to help you set up the virtual environment. It was originally tied to VirtualBox and now has added support for VMware, KVM, Amazon EC2 or even Docker. The configuration information is written in “Vagrantfile” by Ruby.
You only need to remember 3 simple commands:
vagrant up # execute Vagrantfile to run virtual machine
vagrant ssh # enter the virtual environment
logout  # exit
Follow project description and get the newest Vagrant file. With my Virtual box installed, vagrant up will automatically check the environment and download 1GB .vmdk file to my Virtual box folder.
After a few minutes, the virtual machine is installed and vagrant ssh brings me to an exciting world. The welcome message shows a Ubuntu 16 and shared directory is at /vagrant.
Play around. I find python 2.7.12 and psql 9.5.6. Sadly no jupyter and it refuse to install that. No ipython but it suggest sudo apt install ipython But still impossible to use my favorite notebook even if I get ipython-notebook and firefox installed.
Get back to work by psql. Once inside, you will need:
\c tournament  # connect to xxx database
\i tournament.sql  # import sql commands in .sql file
\d matches   # describe xxx table 
\dt      # describe all the tables 
\q   # quit and have fun
Note this psql environment is a perfect place for experimenting sql queries where you can get immediate feedback. tournament.sql is used for setting up schema and database. While is where you define python function by SQL queries, and is testing these function and outputting results.
Because python needs to interact with the database, so both psql and python must in the same Ubuntu shared folders. In reality, you use 2 terminal windows to do the job.
Once you quit psql and logout , check vagrant status. You will see the virtual machine is still running and your python SQL query still works! If you open Virtual Box, you will see it runs happily.
To end these, type vagrant suspend. You will see Virtual Box simultaneously close this virtual machine and the python side also quit automatically.

sqlite implementation of tournament project

To refresh my memory and get it straight, I reimplement the project by SQLite in jupyter notebook. This step-by-step learning process will help clarify the concepts and expose the difficulties.
Codes are in my github.
Basically, there are 2 points worth mentioning in the refactoring:
  1. SQLite doesn’t have serial data type, but you can use integer autoincrement
  2. the placeholder in the insert command in SQLite is also slightly different
  3. the first difficulty is in the playStandings(). There are actually two aggregations: one count for wins, another count for games. You either use a view/subquery or use case when xx then x else x end grammar to do the job
  4. the last difficulty is in the swissPairings. First you need to create a view to get the ranking for these players, and then based the ranking on pairing them. The tricky thing is to use the hidden field rowid provided by SQLite, while in psql it is Row_number something.
Other interesring tricks:
import sqlite3
conn = sqlite3.connect(":memory:")
c = conn.cursor()
# sqlite_master to get table and schema
c.execute("select name from sqlite_master where type = 'table';")
print c.fetchall()
c.execute("select sql from sqlite_master where type = 'table' and name = 'matches';")
print c.fetchall()
# description to get table column names
c.execute("select * from players;")
print [i[0] for i in c.description]
# use sqlparse to get beautiful format

import sqlparse
query = "create view rank as select id, name, count(winner) as wins from players left join matches on id=winner group by id order by wins DESC;"
print sqlparse.format(query,reindent=True)
Last note is the fetchall() returns a list of tuples and fetchone() return one tuple.