Tuesday, March 14, 2017

Plotly is eating the data science market

Plotly is my new love after dimple failed to draw a simple pie chart. It turns out that, Plotly is doing the best job than any others that make every data analytics tool (e.g., python, R, Matlab) immediately ready for web-based data visualization. If you have complicated data and want to publish your finding online, look no further, python+ plotly is your best choice.
The founder of Plotly, Alex Johnson, has an interesting career path. He got his Harvard PhD in Physics in 2005, research on “Charge Sensing and Spin Dynamics in GaAs Quantum Dots” with a National Science Foundation graduate research fellowship. After one year of postdoc, he went to Harvard Environment Center for 3 years. He was trying to develop novel thin-film solid oxide fuel by applying semiconductor techniques such as microfabrication. He then spent 1 year at C12 Energy design and built a database and web interface for screening and forecasting enhanced oil recovery projects.
In 2012, he founded Plotly, which is a JavaScript graphing library:
  • comparable in scope and features to MATLAB or Python’s matplotlib.
  • It has D3 and WebGL for backend. no need for jQuery.
  • use JSON schema,It focuses on the chart’s physical attributes and attempts to leave the chart data separate.
  • In contrast, The vega and vega-lite schemas are more opinionated in prescribing how the chart data is grouped, sliced, or statistically processed before graphical display. This allows for complicated chart display with a concise JSON description, but leaves less control to the user. Neither approach is more “correct”—they’re just different.
Since 2015. 11, Plotly was open-source at https://github.com/plotly/plotly.js The business model for plotly is by charging the API of python. matlab, R, similar to the charge of Google map API. The community version is free for upto 50 API calls per day. More advanced plotting and more API calls are charged. Personal plan is $33 per month and student plan is $5 per month.
Due to time constraint, I only have a quick practice with the “getting started” for each language. Much more APIs are found here: https://plot.ly/api/


<script src=”https://cdn.plot.ly/plotly-latest.min.js”>
Basic Box plot
var y0,y1;
for (var i = 0; i < 50; i ++) {
    y0[i] = Math.random();
    y1[i] = Math.random() + 1;
var trace1 = {
  y: y0,
  type: 'box'
var trace2 = {
  y: y1,
  type: 'box'
Plotly.newPlot('myDiv', [trace1, trace2]);

python API

install the python API: pip install plotly
import plotly
plotly.tools.set_credentials_file(username='jychstar', api_key='1GPp9Dwmnsf897Z3kX8Q')  # now I use a free api key, info stored at .plotly/.credentials file in  home directory
                             sharing='private') # public, private, or secret
import plotly.plotly as py
import plotly.graph_objs as go
trace0 = go.Scatter(
    x=[1, 2, 3, 4],
    y=[10, 15, 13, 17]
trace1 = go.Scatter(
    x=[1, 2, 3, 4],
    y=[16, 5, 11, 9]
figure = go.Figure(
    data = [trace0, trace1],
    layout = go.Layout (title = "hello world" )

py.plot(figure, filename = 'basic-line')  # plot online
py.iplot(figure, filename = 'basic-line') # plot inline
plotly.offline.plot(figure) # plot offline in browser
plotly.offline.iplot(figure) # plot offline in notebook
If you are interested what methods can be called by go, using dir(plotly.graph_objs)will give 58 items such as Area, Bar, Box, Candlestic,, HeatMmap, Histogram, Line, Pie, Scatter, Trace.
colors = ['#FAEE1C', '#F3558E', '#9C1DE7', '#581B98']
trace = go.Pie(labels=labels,values=values,marker={'colors': colors})
url_2 = py.plot([trace], filename='pie-for-dashboard', auto_open=True)
py.iplot([trace], filename='pie-for-dashboard')


use a fresh-bake model cufflinks , pandas can directly access plotly method.
import cufflinks as cf
import pandas as pd
df.iplot(kind='box', filename='box-plots')
df.iplot(kind='histogram', barmode='stack', bins=100, histnorm='probability', filename='histogram-binning')

matplotlib, seaborn

use existing plotting libraries, pass the handler to plotly.
import matplotlib.pyplot as plt
import numpy as np
import plotly.plotly as py
py.sign_in('jychstar', '1GPp9Dwmnsf897Z3kX8Q')
x= np.array(range(1,10))
y = np.sin(x)
fig, ax = plt.subplots()
plot_url = py.plot_mpl(fig)  # publish fig online
## try seaborn, found the filling area is missing
import seaborn as sns
tips = sns.load_dataset("tips")
fig, ax = plt.subplots()
ax = sns.boxplot(x="day", y="total_bill", data=tips)
plot_url = py.plot_mpl(fig)

MatLab API

Download and uncompress the Plotly MATLAB library. 210K.
cd ~/Downloads/matlab_api/
plotlysetup('jychstar', '1GPp9Dwmnsf897Z3kX8Q')
[X,Y,Z] = peaks;
fig2plotly()  % push fig online
getplotlyoffline('https://cdn.plot.ly/plotly-latest.min.js') % download the offline plotly bundle
fig2plotly(gcf, 'offline', true) % generate html in current working directory
% try some simple plot
x= linspace(0, 5, 101);
plot(x, sin(x.^2), '-pb', 'linewidth',2)


download latest R from CRAN
I happened to have a R version 3.3.1(2016.6.21). Because R 3.3.2 was release on 2016.10.31, I guess R API was released after that.
R.version  # check, plotly require R version 3.3.2
install.packages("ggplot2")  # plotly require version >2.1.0
fig = plot_ly(midwest, x = ~percollege, color = ~state, type = "box")
plotly_POST(fig, filename = "midwest-boxplots")
# try to use other plotting library other than plotly
t = seq(0,10,0.1)
fig = qplot(t,sin(t),geom="path", xlab="time", ylab="Sine wave")
plotly_POST(fig, filename = "sine_wave")

Plotly vs Xap

Plotly Xap
web-based visualization yes yes
hosting final graph data pipeline
web presentation layout one full-size graph customized
plotting tool plotly.js + naive plotting library Component + JS library
generate take-away html, png files yes no
Python, R, Matlab works in naive environment yes no
open source yes ?
business model community/advanced consulting