Dashboard for Iris dataset clustering

This tutorial will guide you through using the low-code API to develop a web app for the classification of flower samples belonging to several species. We’ll be using the Iris Dataset, which is commonly used to showcase machine learning applications. In this dataset, we have tabular data describing 150 flower samples in terms of their petal length, petal width, sepal length, sepal width and species.

Our objective is to, based only on the petal and sepal dimensions, group together those samples belonging to the same species. This is an unsupervised learning problem that we’ll solve through clustering with the k-means algorithm. Further, we will deliver the solution as an interactive web app built with Genie Framework.

A Genie app is typically developed in three parts:

  1. Data processing logic: the code a data scientist would write to extract insights from data.
  2. Dashboard logic: the layer that connects data structures with the user interface.
  3. User interface: the interactive user-facing part of the app.

The tutorial will go into each of these steps and, by the end, we’ll have a beautiful dashboard like the one in the figure below. The full code can be found here.

Setup

Regarding the file structure of the project, it is very simple: inside the project folder IrisClustering we’ll have the main executable file dashboard.jl containing the data and dashboard logic, and ui.jl containing the UI.

Project file structure

IrisClustering
├── dashboard.jl
└── ui.jl

Besides GenieFramework, we require the following packages: RDatasets, DataFrames and Clustering. To add them to the Julia install, enter package mode pressing “]” and input

add RDatasets DataFrames Clustering GenieFramework

Once the install is finished, we’re ready to start adding code.

Data processing logic

This first part implements the data gathering and analysis in our app. The Iris dataset is readily available in the RDatasets.jl package, so let’s take a quick look at its contents with the code below.

#| echo: true
using DataFrames
using RDatasets: dataset

irisdata = dataset("datasets", "iris")
# show 5 random rows
idx = rand(1:nrow(irisdata), 5)
irisdata[idx,:]

Notice that the first four columns in the table have numerical values, which constitute the features in the dataset. The last column has categorical values, corresponding to the data labels. These take one of three values: virginica, setosa or versicolor.

We’ll use the k-means algorithm to classify the flower samples. This algorithm takes as input the features of each sample, that is, the first four columns in the data matrix, and groups together those flower samples with similar features to obtain k=3 separate clusters. We hope that k-means will give samples from the same species the same cluster number, hence resulting in species classification.

For a ready-made implementation using the Clustering.jl package, create the script dashboard.jl with the following code:

using RDatasets: dataset
using DataFrames, Clustering
const data = DataFrames.insertcols!(dataset("datasets", "iris"), :Cluster => zeros(Int, 150))
const features = [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]

function cluster(no_of_clusters = 3, no_of_iterations = 10)
  # Put all feature vectors in a matrix, call kmeans to obtain cluster 
  # assignments and store assignments in data matrix
  feats = Matrix(data[:, [c for c in features]])' |> collect
  result = kmeans(feats, no_of_clusters; maxiter = no_of_iterations)
  data[!, :Cluster] = assignments(result)
end

Note that we have added a column to the data matrix to store the result of k-means. If we execute the cluster function, we obtain

cluster()
data[idx,:]

The column data now contains the cluster assignments made by k-means. It seems that, for the shown samples, each species has a different cluster assignment; this indicates a successful classification.

To explore the algorithm and the accuracy of its results, we will build a dashboard with the following: visualization of the clusters, and controls to rerun the clustering with different parameters.

Dashboard logic

The dashboard logic controls which variables are exposed to the user interface, and what code is executed on user interaction. The UI elements in our app, and the variables or arguments they depend on are:

  • Plot showing the real species of each sample: data, features.
  • Plot showing the assigned cluster number for each sample: data, features.
  • Table showing the contents of the dataset: data.
  • Two selectors to choose the x and y axes of the plots: features.
  • Sliders to change the number of clusters and number of iterations of the algorithm: no_of_clusters, no_of_iterations.

Therefore, we’ll need to expose the listed variables to the UI. There are two macros to set the access properties of a variable:

  • @in: the variable takes its value from an element in the UI.
  • @out: the value of the variable is shown in a UI element.

Furthermore, we have two other macros to control interaction:

  • @handlers: sets up the server-side code that is run when values change.
  • @onchangeany: detects changes on a variable tagged with @in and executes some code.

Putting all of this together, we implement the dashboard logic as


using GenieFramework
# set up automatic reload and other dev tools
@genietools

# make feature names readable from the UI
@out const features = [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]

# set up reactive code
@handlers begin
  # variables taking their value from sliders and selectors
  @in no_of_clusters = 3
  @in no_of_iterations = 10
  @in xfeature = :SepalLength
  @in yfeature = :SepalWidth
  # variables that hold plot and table data
  @out datatable = DataTable()
  @out datatablepagination = DataTablePagination(rows_per_page=50)
  @out irisplot = PlotData[]
  @out clusterplot = PlotData[]
  @out title = "My Iris Dashboard"

  # when any of the variables tagged with @in change, re-execute 
  @onchangeany isready, xfeature, yfeature, no_of_clusters, no_of_iterations begin
    cluster(no_of_clusters, no_of_iterations)
    datatable = DataTable(data)
    irisplot = plotdata(data, xfeature, yfeature; groupfeature = :Species)
    clusterplot = plotdata(data, xfeature, yfeature; groupfeature = :Cluster)
  end
end

Besides re-calculating the cluster assignments when the parameters change, the code in the @onchangeany block updates the plots. Each call to plotdata returns a struct of type PlotData, which is the type accepted by the plotting functions in the UI code we’ll write later.

This tutorial uses PlotData objects for plotting. The new recommended way is to directly use PlotlyJS traces as explained here. You can find the updated code for this tutorial here

Now that the dashboard logic is implemented, we just need to add two more instructions to render the UI, and run the server:

@page("/", "ui.jl")
Server.isrunning() || Server.up()

The @page macro takes as input the path to the web app, and the file containing the UI definition. The final code in dashboard.jl combining the data and dashboard logic can be viewed in the code fold below.

dashboard.jl

+  

Finally, we proceed to define the UI.

User interface

To implement the UI we don't need to write a single line of HTML or Javascript, it can be entirely done in Julia. The static UI elements are created by functions that generate the corresponding HTML code, as in

using GenieFramework
row(h5("Hello"))
output
"<div class=\"row\"><h5>Hello</h5></div>"

Furthermore, we have functions that generate a combination of HTML and Javascript for interactive UI elements. As a first example, let us look at the code for one of the plots:

cell(class="st-module", [
  h5("k-means clusters")
  plot(:clusterplot)
])

The code first defines a cell divider with CSS class st-module; the CSS code is automatically generated. The cell generator function takes as an argument the list of elements that it will hold; in this case a header and the plot. The plot function’s only argument is the symbol name of the variable of type PlotData to be plotted.

As a second example, for one of the sliders we have:

cell(class="st-module", [
  h6("Number of clusters")
  slider( 1:1:20,
          :no_of_clusters;
          label=true)
])

The slider function takes as arguments the number range, the symbol name of variable to be changed, and the popup shown on user click.

The full UI is built by stacking the UI elements into an array and converting the result into a string, as in the full code of ui.jl shown below.

ui.jl

[
  # insert contents of string var. using double mustache {{}} syntax
  heading("{{title}}")

  row([
    # sliders
    cell(class="st-module", [
      h6("Number of clusters")
      slider( 1:1:20,
              :no_of_clusters;
              label=true)
    ])
    cell(class="st-module", [
      h6("Number of iterations")
      slider( 10:10:200,
              :no_of_iterations;
              label=true)
    ])
    # selectors
    cell(class="st-module", [
      h6("X feature")
      Stipple.select(:xfeature; options=:features)
    ])

    cell(class="st-module", [
      h6("Y feature")
      Stipple.select(:yfeature; options=:features)
    ])
  ])
  # plots
  row([
    cell(class="st-module", [
      h5("Species clusters")
      plot(:irisplot)
    ])

    cell(class="st-module", [
      h5("k-means clusters")
      plot(:clusterplot)
    ])
  ])
  # table
  row([
    cell(class="st-module", [
      h5("Iris data")
      table(:datatable; dense=true, flat=true, style="height: 350px;", pagination=:datatablepagination)
    ])
  ])
] |> string

Running the app

Now that we have finished coding the app, all is left to do is run it. Open the Julia REPL in the app folder and type in the following:

include("dashboard.jl")

Genie will render and serve the app, which can be accessed via http://localhost:8000.