CSV uploader app

In this tutorial, we'll learn to build a simple app for exploratory data analysis. The app will let us upload CSV files and visualize the data distribution in a histogram, and we'll interact through selectors to choose the CSV file and column to be plotted. Take a look at the final app:

The full code and some sample CSV files can be downloaded from this GitHub repository.

Setup and app structure

We'll use GenieFramework.jl, DataFrames.jl and CSV.jl to build the app. To install them, enter Pkg mode with ] and type in the following command:

pkg> add GenieFramework DataFrames CSV PlotlyBase

The app's code will have a main file app.jl for the logic, ui.jl for the UI, and a public/uploads folder to store the datasets:

CSVAnalysis
├── app.jl
├── ui.jl
└── public/uploads/

Implementation

Let's start by opening the app.jl file and adding the following skeleton code:

using GenieFramework, DataFrames, CSV, PlotlyBase
@genietools

@app
# Logic goes here
end

@page("/", "ui.jl")

The @genietools macro sets up the Genie environment and the @page macro renders the app's UI, whereas the logic will be implemented in between. For the ui.jl file this will be the skeleton code:

 heading("CSV Analysis")
 row([ ])
 row([ ])
 row([ ])

The app has four components: an upload box, two selectors, and a histogram plot. To implement each component, we'll need to:

  • Define component's UI in ui.jl.
  • Define the necessary variables and expose them to the UI.
  • Implement the component's behavior.

Upload box

The upload box lets us select a .csv file and store it on the server side. Add one to the page by editing ui.jl as follows:

 heading("CSV Analysis")
 row([
      cell(class="col-md-12", 
          [
          uploader( multiple = true,
              accept = ".csv",
              maxfilesize = 1024*1024*1, # bytes
              maxfiles = 3,
              autoupload = true,
              hideuploadbtn = true,
              label = "Upload datasets",
              nothumbnails = true,
              style="max-width: 95%; width: 95%; margin: 0 auto;",

              @on("rejected", :rejected),
              @on("uploaded", :uploaded)
              )

          ])
 ])
. . .

The uploader component takes many parameters to control its behavior, see its docstring for more information. It also emits several events when actions are performed which can be intercepted in the backend; we'll use the uploaded and rejected events to notify the user when the upload is finished or if the file is invalid.

When a file is uploaded, it is stored in a temporary folder in the server. The path to this file, and other information, is stored in the reactive variable fileuploads. This variable is automatically created when the user loads the page. We'll attach a handler to fileuploads in order to manage the files. Add the following to the @app block in app.jl:


const FILE_PATH = joinpath("public", "uploads")
mkpath(FILE_PATH)
@app begin
    @out upfiles = readdir(FILE_PATH)
    @onchange fileuploads begin
        if ! isempty(fileuploads)
            @info "File was uploaded: " fileuploads
            filename = fileuploads["name"]

            try
                isdir(FILE_PATH) || mkpath(FILE_PATH)
                mv(fileuploads["path"], joinpath(FILE_PATH, filename), force=true)
            catch e
                @error "Error processing file: $e"
                notify(__model__,"Error processing file: $(fileuploads["name"])")
            end

            fileuploads = Dict{AbstractString,AbstractString}()
        end
        upfiles = readdir(FILE_PATH)
    end
    @event uploaded begin
        @info "uploaded"
        notify(__model__, "File was uploaded")
    end
    @event rejected begin
        @info "rejected"
        notify(__model__, "Please upload a valid file")
    end
end

When activated, the handler will move the uploaded file from its temporary location to FILE_PATH and notify the user. The __model__ passed to notify is the reactive model of the app, which maintains a representation of the reactive elements in the app. We've also added the upfiles variable to store the list of uploaded files, which will then be used to populate the file selector. Finally, the @event macro defines handlers for the received events.

With this, you can already run your app to test it as

using GenieFramework; Genie.loadapp(); up()

This will start a server and you can access the app at localhost:8000.

File and column selectors

Once we have uploaded a .csv file , we want to access its contents and choose one column for plotting. We'll do the choosing through two selectors, which you can add to the second row in ui.jl as

row([
     cell(
          class="st-module",
          [
           h6("File")
           Stipple.select(:selected_file; options=:upfiles)
          ]
         )
     cell(
          class="st-module",
          [
           h6("Column")
           Stipple.select(:selected_column; options=:columns)
          ]
         )
    ])

The selectors are generated by Stipple.select, which takes as first argument the symbol name of the variable where the selected value will be stored.

Assuming the app comes with the Iris dataset preloaded, define now the reactive variables for each selector with default values as:

@in selected_file = "iris.csv"
@in selected_column = "petal.length"
@out upfiles = readdir(FILE_PATH)
@out columns = ["petal.length", "petal.width", "sepal.length", "sepal.width", "variety"]

Note that we tag the selection variables with @in as they'll be storing values sent from the UI.

When a new file is selected, the dataset must is loaded into memory and the list of columns that can be plotted updated. First, add a new private variable to store the dataset:

    @private data = DataFrame()

This variable is tagged with @private since each user needs to have its own copy, but the variale doesn't need to be sent to the browser.

Now, add a handler to load the data and column list when the file selector changes:

    @onchange isready,selected_file begin
        data = CSV.read(joinpath(FILE_PATH, selected_file), DataFrame)
        columns = names(data)
    end

We've also attached the handler to the isready variable, which becomes true when the page is finished loading, so that we won't start with an empty page with no selection.

Histogram

After selecting a dataset, we want to plot a histogram of one of its plots every time we make a new selection with the column selector. Genie uses Plotly for plots, which requires a trace to store the data on the plot, and a layout to define the plot's aesthetics. See the plotting reference for more info.

Add the trace and layout via reactive variables as:

    @out trace = [histogram()]
    @out layout = PlotlyBase.Layout(yaxis_title_text="Count",xaxis_title_text="Value")

The PlotlyBase prefix is necessary to avoid clashes with GenieFramework.Layout.

Then, add a handler to update the trace when the column selector changes:

    @onchange isready, selected_column begin
        trace = [histogram(x=data[!, selected_column])]
    end

Finally, add the plot to the last row in ui.jl as:

row([
     cell(
          class="st-module",
          [
           h5("Histogram")
           plot(:trace, layout=:layout)
          ]
         )
    ])

That's it, we're done! Now you can run the full app and add some more functionality if you'd like. Here are some ideas:

  • Add a scatter plot.
  • Add other interactive elements such as sliders to change the parameters of the plot.
  • Add the clustering algorithm from the Iris tutorial and try it on different datasets.