Dashboard for Iris dataset clustering
This tutorial will guide you through using Stipple's low-code API to develop a web app for the classification of flower samples belonging to several species. We’ll be using the Iris Dataset, which is commonly used to showcase machine learning applications. In this dataset, we have tabular data describing 150 flower samples in terms of their petal length, petal width, sepal length, sepal width and species.
Our objective is to, based only on the petal and sepal dimensions, group together those samples belonging to the same species. This is an unsupervised learning problem that we’ll solve through clustering with the k-means algorithm. Further, we will deliver the solution as an interactive web app built with Genie Framework.
A Genie app is typically developed in three parts:
- Data processing logic: the code a data scientist would write to extract insights from data.
- Dashboard logic: the layer that connects data structures with the user interface.
- User interface: the interactive user-facing part of the app.
The tutorial will go into each of these steps and, by the end, we’ll have a beautiful dashboard like the one in the figure below. The full code can be found here.
Setup
Regarding the file structure of the project, it is very simple: inside
the project folder IrisClustering
we’ll have the main executable file
dashboard.jl
containing the data and dashboard logic, and ui.jl
containing the UI.
Project file structure
IrisClustering
├── dashboard.jl
└── ui.jl
Besides GenieFramework
, we require the following packages:
RDatasets
, DataFrames
and Clustering
. To add them to the Julia
install, enter package mode pressing “]” and input
add RDatasets DataFrames Clustering GenieFramework
Once the install is finished, we’re ready to start adding code.
Data processing logic
This first part implements the data gathering and analysis in our app.
The Iris dataset is readily available in the RDatasets.jl
package, so
let’s take a quick look at its contents with the code below.
#| echo: true
using DataFrames
using RDatasets: dataset
irisdata = dataset("datasets", "iris")
# show 5 random rows
idx = rand(1:nrow(irisdata), 5)
irisdata[idx,:]
Notice that the first four columns in the table have numerical values, which constitute the features in the dataset. The last column has categorical values, corresponding to the data labels. These take one of three values: virginica, setosa or versicolor.
We’ll use the k-means algorithm to classify the flower samples. This
algorithm takes as input the features of each sample, that is, the first
four columns in the data matrix, and groups together those flower
samples with similar features to obtain k=3
separate clusters. We hope
that k-means will give samples from the same species the same cluster
number, hence resulting in species classification.
For a ready-made implementation using the Clustering.jl
package,
create the script dashboard.jl
with the following code:
using RDatasets: dataset
using DataFrames, Clustering
const data = DataFrames.insertcols!(dataset("datasets", "iris"), :Cluster => zeros(Int, 150))
const features = [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]
function cluster(no_of_clusters = 3, no_of_iterations = 10)
# Put all feature vectors in a matrix, call kmeans to obtain cluster
# assignments and store assignments in data matrix
feats = Matrix(data[:, [c for c in features]])' |> collect
result = kmeans(feats, no_of_clusters; maxiter = no_of_iterations)
data[!, :Cluster] = assignments(result)
end
Note that we have added a column to the data matrix to store the result
of k-means. If we execute the cluster
function, we obtain
cluster()
data[idx,:]
The column data
now contains the cluster assignments made by k-means.
It seems that, for the shown samples, each species has a different
cluster assignment; this indicates a successful classification.
To explore the algorithm and the accuracy of its results, we will build a dashboard with the following: visualization of the clusters, and controls to rerun the clustering with different parameters.
Dashboard logic
The dashboard logic controls which variables are exposed to the user interface, and what code is executed on user interaction. The UI elements in our app, and the variables or arguments they depend on are:
- Plot showing the real species of each sample:
data
,features
. - Plot showing the assigned cluster number for each sample:
data
,features
. - Table showing the contents of the dataset:
data
. - Two selectors to choose the x and y axes of the plots:
features
. - Sliders to change the number of clusters and number of iterations of
the algorithm:
no_of_clusters
,no_of_iterations
.
Therefore, we’ll need to expose the listed variables to the UI. There are two macros to set the access properties of a variable:
@in
: the variable takes its value from an element in the UI.@out
: the value of the variable is shown in a UI element.
Furthermore, we have two other macros to control interaction:
@handlers
: sets up the server-side code that is run when values change.@onchangeany
: detects changes on a variable tagged with@in
and executes some code.
Putting all of this together, we implement the dashboard logic as
using GenieFramework
# set up automatic reload and other dev tools
@genietools
# make feature names readable from the UI
@out const features = [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]
# set up reactive code
@handlers begin
# variables taking their value from sliders and selectors
@in no_of_clusters = 3
@in no_of_iterations = 10
@in xfeature = :SepalLength
@in yfeature = :SepalWidth
# variables that hold plot and table data
@out datatable = DataTable()
@out datatablepagination = DataTablePagination(rows_per_page=50)
@out irisplot = PlotData[]
@out clusterplot = PlotData[]
@out title = "My Iris Dashboard"
# when any of the variables tagged with @in change, re-execute
@onchangeany isready, xfeature, yfeature, no_of_clusters, no_of_iterations begin
cluster(no_of_clusters, no_of_iterations)
datatable = DataTable(data)
irisplot = plotdata(data, xfeature, yfeature; groupfeature = :Species)
clusterplot = plotdata(data, xfeature, yfeature; groupfeature = :Cluster)
end
end
Besides re-calculating the cluster assignments when the parameters
change, the code in the @onchangeany
block updates the plots. Each
call to plotdata
returns a struct of type PlotData
, which is the
type accepted by the plotting functions in the UI code we’ll write
later.
This tutorial uses PlotData
objects for plotting. The new recommended way is to directly use PlotlyJS traces as explained here. You can find the updated code for this tutorial here
Now that the dashboard logic is implemented, we just need to add two more instructions to render the UI, and run the server:
@page("/", "ui.jl")
Server.isrunning() || Server.up()
The @page
macro takes as input the path to the web app, and the file
containing the UI definition. The final code in dashboard.jl
combining
the data and dashboard logic can be viewed in the code fold below.
dashboard.jl
+Â ÂFinally, we proceed to define the UI.
User interface
To implement the UI we don't need to write a single line of HTML or Javascript, it can be entirely done in Julia. The static UI elements are created by functions that generate the corresponding HTML code, as in
using GenieFramework
row(h5("Hello"))
"<div class=\"row\"><h5>Hello</h5></div>"
Furthermore, we have functions that generate a combination of HTML and Javascript for interactive UI elements. As a first example, let us look at the code for one of the plots:
cell(class="st-module", [
h5("k-means clusters")
plot(:clusterplot)
])
The code first defines a cell divider with CSS class st-module
; the
CSS code is automatically generated. The cell
generator function takes
as an argument the list of elements that it will hold; in this case a
header and the plot. The plot function’s only argument is the symbol
name of the variable of type PlotData
to be plotted.
As a second example, for one of the sliders we have:
cell(class="st-module", [
h6("Number of clusters")
slider( 1:1:20,
:no_of_clusters;
label=true)
])
The slider
function takes as arguments the number range, the symbol
name of variable to be changed, and the popup shown on user click.
The full UI is built by stacking the UI elements into an array and
converting the result into a string, as in the full code of ui.jl
shown below.
ui.jl
[
# insert contents of string var. using double mustache {{}} syntax
heading("{{title}}")
row([
# sliders
cell(class="st-module", [
h6("Number of clusters")
slider( 1:1:20,
:no_of_clusters;
label=true)
])
cell(class="st-module", [
h6("Number of iterations")
slider( 10:10:200,
:no_of_iterations;
label=true)
])
# selectors
cell(class="st-module", [
h6("X feature")
Stipple.select(:xfeature; options=:features)
])
cell(class="st-module", [
h6("Y feature")
Stipple.select(:yfeature; options=:features)
])
])
# plots
row([
cell(class="st-module", [
h5("Species clusters")
plot(:irisplot)
])
cell(class="st-module", [
h5("k-means clusters")
plot(:clusterplot)
])
])
# table
row([
cell(class="st-module", [
h5("Iris data")
table(:datatable; dense=true, flat=true, style="height: 350px;", pagination=:datatablepagination)
])
])
] |> string
Running the app
Now that we have finished coding the app, all is left to do is run it. Open the Julia REPL in the app folder and type in the following:
include("dashboard.jl")
Genie will render and serve the app, which can be accessed via http://localhost:8000.