practice SQL at http://sqlzoo.net/wiki/SELECT_basics
format SQL online: https://sqlformat.org/
format SQL in python: https://sqlparse.readthedocs.io/
SQLite tutorial: https://www.tutorialspoint.com/sqlite/index.htm
Recap1 is when I started to learn SQL 10 months ago by doing the Tournament project for the Intro to Programming Nanodegree.
However, I have to admit I almost forget everything, such as why I use vagrant and virtual box, why the reviewer insists not letting me using the view. Now I am using SQL again for Business Analyst Nanodegree, and it is a good time to look back and bridge my knowledge gap.
Recap 2 is in Data Analyst Nanodegree, where compared different SQL systems. Basically, SQL Server is sold by Microsoft and MySQL by Oracle, while the free meals are SQLite by D. Richard Hipp and PostgreSQL by UC Berkeley.
In my opinion, the starting point to learn should be SQLite. You just type
sqlite3
in the terminal to have fun. But Tournament project from IPND does it the hard way by using PostgreSQL. This requires you to build a virtual machine to run the server, which is through Vagrant and Virtual Box.Vagrant
The difference between Virtual machine and Docker is here.
Vagrant is an open-source software released in 2010. It is a configuration tool to help you set up the virtual environment. It was originally tied to VirtualBox and now has added support for VMware, KVM, Amazon EC2 or even Docker. The configuration information is written in “Vagrantfile” by Ruby.
You only need to remember 3 simple commands:
vagrant up # execute Vagrantfile to run virtual machine
vagrant ssh # enter the virtual environment
logout # exit
Follow project description and get the newest Vagrant file. With my Virtual box installed,
vagrant up
will automatically check the environment and download 1GB .vmdk file to my Virtual box folder.
After a few minutes, the virtual machine is installed and
vagrant ssh
brings me to an exciting world. The welcome message shows a Ubuntu 16 and shared directory is at /vagrant
.
Play around. I find python 2.7.12 and psql 9.5.6. Sadly no jupyter and it refuse to install that. No ipython but it suggest
sudo apt install ipython
But still impossible to use my favorite notebook even if I get ipython-notebook and firefox installed.
Get back to work by
psql
. Once inside, you will need:\c tournament # connect to xxx database
\i tournament.sql # import sql commands in .sql file
\d matches # describe xxx table
\dt # describe all the tables
\q # quit and have fun
Note this psql environment is a perfect place for experimenting sql queries where you can get immediate feedback.
tournament.sql
is used for setting up schema and database. While tournament.py
is where you define python function by SQL queries, and tournament_test.py
is testing these function and outputting results.
Because python needs to interact with the database, so both psql and python must in the same Ubuntu shared folders. In reality, you use 2 terminal windows to do the job.
Once you quit psql and
logout
, check vagrant status
. You will see the virtual machine is still running and your python SQL query still works! If you open Virtual Box, you will see it runs happily.
To end these, type
vagrant suspend
. You will see Virtual Box simultaneously close this virtual machine and the python side also quit automatically.sqlite implementation of tournament project
To refresh my memory and get it straight, I reimplement the project by SQLite in jupyter notebook. This step-by-step learning process will help clarify the concepts and expose the difficulties.
Codes are in my github.
Basically, there are 2 points worth mentioning in the refactoring:
- SQLite doesn’t have serial data type, but you can use
integer autoincrement
- the placeholder in the insert command in SQLite is also slightly different
- the first difficulty is in the
playStandings()
. There are actually two aggregations: one count for wins, another count for games. You either use a view/subquery or usecase when xx then x else x end
grammar to do the job - the last difficulty is in the
swissPairings
. First you need to create a view to get the ranking for these players, and then based the ranking on pairing them. The tricky thing is to use the hidden fieldrowid
provided by SQLite, while in psql it is Row_number something.
Other interesring tricks:
import sqlite3
conn = sqlite3.connect(":memory:")
c = conn.cursor()
# sqlite_master to get table and schema
c.execute("select name from sqlite_master where type = 'table';")
print c.fetchall()
c.execute("select sql from sqlite_master where type = 'table' and name = 'matches';")
print c.fetchall()
# description to get table column names
c.execute("select * from players;")
print [i[0] for i in c.description]
c.fetchall()
# use sqlparse to get beautiful format
import sqlparse
query = "create view rank as select id, name, count(winner) as wins from players left join matches on id=winner group by id order by wins DESC;"
print sqlparse.format(query,reindent=True)
Last note is the
fetchall()
returns a list of tuples and fetchone()
return one tuple.