--- title: "Tour in Loon" author: "Wayne Oldford and Zehao Xu" date: "`r Sys.Date()`" bibliography: references.bib fontsize: 12pt link-citations: yes linkcolor: blue output: rmarkdown::html_vignette: toc: true geometry: margin=.75in urlcolor: blue graphics: yes vignette: > %\VignetteIndexEntry{Tour in Loon} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} header-includes: - \usepackage{graphicx} - \usepackage{epic} - \usepackage{color} - \usepackage{hyperref} - \usepackage{multimedia} - \PassOptionsToPackage{pdfmark}{hyperref}\RequirePackage{hyperref} - \newcommand{\code}[1]{\texttt{#1}} - \newcommand{\ve}[1]{\mathbf{#1}} - \newcommand{\pop}[1]{\mathcal{#1}} - \newcommand{\samp}[1]{\mathcal{#1}} - \newcommand{\subspace}[1]{\mathcal{#1}} - \newcommand{\sv}[1]{\boldsymbol{#1}} - \newcommand{\sm}[1]{\boldsymbol{#1}} - \newcommand{\tr}[1]{{#1}^{\mkern-1.5mu\mathsf{T}}} - \newcommand{\abs}[1]{\left\lvert ~{#1} ~\right\rvert} - \newcommand{\size}[1]{\left\lvert {#1} \right\rvert} - \newcommand{\norm}[1]{\left|\left|{#1}\right|\right|} - \newcommand{\field}[1]{\mathbb{#1}} - \newcommand{\Reals}{\field{R}} - \newcommand{\Integers}{\field{Z}} - \newcommand{\Naturals}{\field{N}} - \newcommand{\Complex}{\field{C}} - \newcommand{\Rationals}{\field{Q}} - \newcommand{\widebar}[1]{\overline{#1}} - \newcommand{\wig}[1]{\tilde{#1}} - \newcommand{\bigwig}[1]{\widetilde{#1}} - \newcommand{\leftgiven}{~\left\lvert~} - \newcommand{\given}{~\vert~} - \newcommand{\indep}{\bot\hspace{-.6em}\bot} - \newcommand{\notindep}{\bot\hspace{-.6em}\bot\hspace{-0.75em}/\hspace{.4em}} - \newcommand{\depend}{\Join} - \newcommand{\notdepend}{\Join\hspace{-0.9 em}/\hspace{.4em}} - \newcommand{\imply}{\Longrightarrow} - \newcommand{\notimply}{\Longrightarrow \hspace{-1.5em}/ \hspace{0.8em}} - \newcommand*{\intersect}{\cap} - \newcommand*{\union}{\cup} - \DeclareMathOperator*{\argmin}{arg\,min} - \DeclareMathOperator*{\argmax}{arg\,max} - \DeclareMathOperator*{\Ave}{Ave\,} - \newcommand{\permpause}{\pause} - \newcommand{\suchthat}{~:~} - \newcommand{\st}{~:~} --- ```{r setup, include=FALSE, warning=FALSE} knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, fig.align = "center", fig.width = 6, fig.height = 5, out.width = "80%", collapse = TRUE, comment = "#>", tidy.opts = list(width.cutoff = 65), tidy = FALSE) library(knitr) library(magrittr) library(loon.tourr, quietly = TRUE) library(tidyverse, quietly = TRUE) library(MASS) set.seed(12314159) imageDirectory <- "./images/tours" dataDirectory <- "./data/tours" path_concat <- function(path1, ..., sep="/") { # The "/" is standard unix directory separator and so will # work on Macs and Linux. # In windows the separator might have to be sep = "\" or # even sep = "\\" or possibly something else. paste(path1, ..., sep = sep) } ``` ```{r library loon.tourr, warning=FALSE, message=FALSE, error=FALSE} library(loon.tourr) ``` ## Introduction A tour is a motion graphic designed to study the joint distribution of multivariate data [@asimov1985grand][@buja1986grand]. A sequence of low-dimensional projections is created by a high dimensional data set. Tours are thus used to find interesting projections. Between each plane, interpolation along a geodesic path is provided so that the points in a sub space (e.g. 2D) can be rotated smoothly. In mathematics, $\sm{X}_{n \times p}$ represents the original data set; $\sm{P}_{p \times d}$ is the matrix of projection vectors where $d < p$. \[\sm{Y} = \sm{X} \sm{P}\] where $\sm{Y}$ is the lower dimensional sub-space. The tour was **first** implemented in the software `Dataviewer` [@buja1986elements][@hurley1987data][@buja1987data] in Symbolics Lisp machine. A smoothly moving scatterplot could be produced to visualize the tour paths. Then, @swayne1991xgobi implemented software `XGobi` in the **X** Window System, providing portability across **a wide variety of workstations** i.e. X terminals, personal computers, even across a network. Software `GGobi` [@swayne2001ggobi][@cook2007interactive], **redesigned and extended** its ancestor `XGobi`. It can be embedded in other software, like environment `R`. Package `rggobi` [@rggobi] is an `R` interface of `GGobi`, however, it has been removed from the CRAN and can only be accessed from the [archive](https://cran.r-project.org/src/contrib/Archive/rggobi/). Package `tourr` [@tourr] implements geodesic interpolation and variety tour generation functions (i.e. grand tour, guided tour, etc) in `R` language. The function `animate` provides tour animation as a kinematic sequence of static displays -- plots are generated and then displayed quickly in order. In an RStudio display, for example, the user can cycle back and forth through the sequence ... but that is all. Unlike earlier tour implementations (`rggobi`) no interactive manipulation of the plot elements is possible. The loon package [@loon] is a toolkit that enables highly interactive data visualization. The package **loon.tourr** adds the full functionality of loon's interactive graphics to `tourr`. For example, this allows interactive selection, colouring, and deactivating of points in the tour display and linking that display with any other loon plot. Interesting projections discovered during the tour can be accessed at any point in the tour. In addition, random tours displaying more than 2 dimensions are also provided using parallel or radial coordinates and scatterplot matrices. ## Basics The `crabs` [@campbell1974multivariate] data frame (stored in package `MASS`) contains 200 observations and 8 features, describing 5 morphological measurements on 50 crabs each of two colour forms and both sexes. GIF 1 shows the tours and color represents the species, "B" (blue) or "O" (orange). ```{r data, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center"} library(MASS, quietly = TRUE) kable(head(crabs, 6)) ``` This plot is interactive. Users can pan, zoom or select on this plot. The default number of random bases is 30 and the steps between two serial projections are 40. So, we have 1200 + 1 (start position) projections in total. To navigate the tour, scroll the rightmost bar down, the projection is transformed from one to another. If, unfortunately, none of the projections is interesting, click the "refresh" button at the left-bottom corner. New random tours are created. Immediately below the plot, behind the "refresh" button, there are several radio buttons, "data", "variable", "observation" and "sphere" that represent the scaling methods $f$ of $\sm{X}$. * The first of which is "data", where $f(\sm{X}) = \sm{X}$; * If `scaling = "variable"`, $f(\ve{x}_j) = \frac{1}{a - b}(\ve{x}_j - b\ve{1})$ where $\sm{X} = [\ve{x}_{1}, ..., \ve{x}_{p}]$, $\ve{x}_{j}$ is a $n \times 1$ vector, $a = \max{(\ve{x}_j)}$ and $b = \min{(\ve{x}_j)}$; * The third is "observation", where $f(\ve{x}_i) = \frac{1}{c - d}(\ve{x}_i - d\ve{1})$, $\sm{X} = \tr{[\tr{\ve{x}_{1}}, ..., \tr{\ve{x}_{n}}]}$, $\ve{x}_{i}$ is a $p \times 1$ vector, $c = \max{(\ve{x}_i)}$ and $d = \min{(\ve{x}_i)}$; * If the scaling method is "sphere", where $f(\sm{X}) = \sm{X}^\star \sm{V}$ where $\sm{X}^\star = (\sm{I} - \frac{1}{n}\ve{1}\tr{\ve{1}})\sm{X} = \sm{U} \sm{D} \sm{V}$. Notice that the tour is based on the transformed data set \[\sm{Y} = f(\sm{X}) \sm{P}\] ```{r crabs 2D, eval = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center"} color <- rep("skyblue", nrow(crabs)) color[crabs$sp == "O"] <- "orange" cr <- crabs[, c("FL", "RW", "CL", "CW", "BD")] p0 <- l_tour(cr, color = color) ``` ```{r crabs 2D gif, echo = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center", fig.cap = "GIF 1: 2D grand tour"} include_graphics(path_concat(imageDirectory, "crab2D.gif")) ``` The projection dimension (the dimension of the $\sm{P}$) is controlled by `tour_path = grand_tour(d)` where $d$ represents the dimensions and the default is $2$. Unlike the 2 dimensional Cartesian coordinate, higher dimensional space ($>2$) will be embedded in the parallel coordinate or radial coordinate. In GIF 2, the $d$ is set as 4. ```{r crabs 4D, eval = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center"} p1 <- l_tour(cr, tour_path = grand_tour(4L), color = color) ``` ```{r crabs 4D gif, echo = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center", fig.cap = "GIF 2: 4D grand tour"} include_graphics(path_concat(imageDirectory, "crab4D.gif")) ``` An `l_tour` object is returned by `p1` (or `p0`). ```{r `l_tour` class, eval = FALSE} class(p1) # class(p0) # > [1] "l_tour" "loon" ``` The loon serialaxes plot for `p1` (or scatterplot for `p0`) can be accessed by calling `l_getPlots` ```{r `l_tour` getLoon, eval = FALSE} w <- l_getPlots(p1) class(w) # > [1] "l_serialaxes" "loon" ``` The matrix of projection vectors $\sm{P}_{4 \times 4}$ can be returned by function `l_cget` or a simple `[` ```{r query matrix of projection vectors, eval = FALSE} round(p1["projection"], 2) # > # [,1] [,2] [,3] [,4] # [1,] -0.12 -0.93 -0.01 -0.35 # [2,] -0.57 0.21 0.63 -0.36 # [3,] -0.06 0.14 0.14 -0.38 # [4,] -0.37 0.23 -0.76 -0.47 # [5,] -0.72 -0.14 -0.12 0.62 ``` The radial coordinate can be converted to a parallel coordinate by calling `[<-` in the console or trigger the "parallel" radio button of "axes layout" on `p1`'s inspector. ```{r parallel, eval = FALSE} p1["axesLayout"] <- "parallel" ``` Additionally, Andrews plot can be shown by runing the following code in console ```{r andrews, eval = FALSE} p1["andrews"] <- TRUE ``` The interactive graphics can be turned to static either by ```{r static grid, eval = FALSE} plot(p1) ``` or ```{r static ggplot, eval = FALSE} loon.ggplot::loon.ggplot(p1) ``` ```{r crabs andrews, echo = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center", fig.cap = "Figure 1: Andrews curve"} include_graphics(path_concat(imageDirectory, "andrews.png")) ``` To investigate more about `loon`, please visit [great-northern-diver-loon](https://great-northern-diver.github.io/loon/index.html). ## Tour techniques @cook2007interactive introduced several methods for choosing projections. In `loon.tourr`, by modifying the argument `tour_path`, all can be realized. * Grand tour: a sequence of projections is chosen randomly ```{r gt, eval = FALSE} # Default, 2D grand tour pg <- l_tour(cr, tour_path = grand_tour(2L)) ``` * Projection pursuit guided tour: a sequence of projections is guided by an algorithm in search of "interesting" projections by optimizing a criterion function \[\argmax g(\sm{Y}), \forall \sm{P}\] where $\sm{Y} = \tr{[\tr{\ve{y}}_1, ..., \tr{\ve{y}}_n]} = \sm{X} \sm{P}$, $\ve{y}_j$ is a $p \times 1$ vector. + Holes: \[g(\sm{Y}) = \frac{1 - \frac{1}{n}\sum_{i=1}^n \exp(-\frac{1}{2}\tr{\ve{y}}_i\ve{y}_i)}{1 - \exp{(-\frac{p}{2})}}\] ```{r holes, eval = FALSE} # 2D holes projection pursuit indexes pp_holes <- l_tour(cr, tour_path = guided_tour(holes(), 2L)) ``` + Central Mass: \[g(\sm{Y}) = \frac{\frac{1}{n}\sum_{i=1}^n \exp(-\frac{1}{2}\tr{\ve{y}}_i \ve{y}_i) - \exp{(-\frac{p}{2})}}{1 - \exp{(-\frac{p}{2})}}\] ```{r cmass, eval = FALSE} # 2D CM projection pursuit indexes pp_CM <- l_tour(cr, tour_path = guided_tour(cmass(), 2L)) ``` + LDA \[g(\sm{Y}) = 1 - \frac{|\tr{\sm{P}}\sm{W}\sm{P}|}{|\tr{\sm{P}}(\sm{W} + \sm{B})\sm{P}|}\] where $\sm{B} = \sum_{i=1}^k n_i (\bar{y}_{i.} - \bar{y}_{..}) \tr{(\bar{y}_{i.} - \bar{y}_{..})}$, $\sm{W} = \sum_{i=1}^k \sum_{j=1}^{n_i} n_i (\bar{y}_{ij} - \bar{y}_{i.}) \tr{(\bar{y}_{ij} - \bar{y}_{i.})}$ and $k$ is the number of groups. ```{r PCA, eval = FALSE} # 2D LDA projection pursuit indexes pp_LDA <- l_tour(cr, color = crabs$sex, tour_path = guided_tour(lda_pp(crabs$sex), 2L)) ``` * Expect these, `tourr` also provides some other tour methods, for example: + `frozen_tour`, one variable is designated as the "manipulation variable", and the projection coefficient for this variable is controlled; + `little_tour`, a planned tour that travels between all axis parallel projections; + `local_tour`, alternates between the starting position and a nearby random projection and etc. ## Compound plot ### Facets GIF 3 shows the tour plot separated by variable sex into two panels. ```{r crabs facets, eval = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center"} pf <- l_tour(cr, by = crabs$sex, color = color) ``` ```{r crabs facets gif, echo = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center", fig.cap = "GIF 3: facets"} include_graphics(path_concat(imageDirectory, "facets.gif")) ``` ### Pairs plot GIF 4 shows the tour pairs plot (the upper triangle). If we set `showSerialAxes = TRUE`, a serial axes plot (parallel or radial axes) is displayed at the lower triangle. ```{r crabs pairs, eval = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center"} pp <- l_tour_pairs(cr, tour_path = grand_tour(4L), color = color, showSerialAxes = TRUE) ``` ```{r crabs pairs gif, echo = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center", fig.cap = "GIF 4: pairs plot"} include_graphics(path_concat(imageDirectory, "pairs.gif")) ``` Tour compound plot is a `l_tour_compound` object ```{r `l_tour_compound` class, eval = FALSE} class(pp) # or class(pf) # > [1] "l_tour_compound" "loon" ``` The loon pairs plot can be accessed by calling `l_getPlots`. It will return a `l_compound` widget that a list contains seven `loon` widgets. ```{r pairs objects, eval = FALSE} wp <- l_getPlots(pp) wp # > # $x2y1 # [1] ".l0.pairs.plot" # attr(,"class") # [1] "l_plot" "loon" # # $x3y1 # [1] ".l0.pairs.plot1" # attr(,"class") # [1] "l_plot" "loon" # # $x4y1 # [1] ".l0.pairs.plot2" # attr(,"class") # [1] "l_plot" "loon" # # $x3y2 # [1] ".l0.pairs.plot3" # attr(,"class") # [1] "l_plot" "loon" # # $x4y2 # [1] ".l0.pairs.plot4" # attr(,"class") # [1] "l_plot" "loon" # # $x4y3 # [1] ".l0.pairs.plot5" # attr(,"class") # [1] "l_plot" "loon" # # $serialAxes # [1] ".l0.pairs.serialaxes" # attr(,"class") # [1] "l_serialaxes" "loon" # # attr(,"class") # [1] "l_pairs" "l_compound" "loon" ``` Note that all the plots in a single compound widget share the same projection (e.g `pp['projection']`). ## Layers Sometimes, layering visuals in a tour is helpful to find an interesting "pattern". ### `l_layer_hull` A convex hull is the smallest convex set that contains it. We can add such layer by `l_layer_hull` ```{r crabs hull, eval = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center"} # pack layer on top of `p0` l0 <- l_layer_hull(p0, group = crabs$sp) ``` The argument `group` is used to clarify the group of the set in each hull. Suppose we want to find a projection that splits the crab species well, layering the hull could be extremely useful. If two hulls have no intersections, this projection could be the one we are looking for. ```{r hull gif, echo = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center", fig.cap = "GIF 5: layer hull"} include_graphics(path_concat(imageDirectory, "hull.gif")) ``` ### `l_layer_density2d` Two dimensional kernel density estimation can tell the dense of a distribution. ```{r crabs density2D, eval = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center"} # hide the hull l_layer_hide(l0) # density2D l1 <- l_layer_density2d(p0) ``` ```{r density 2D gif, echo = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center", fig.cap = "GIF 5: layer density"} include_graphics(path_concat(imageDirectory, "density2D.gif")) ``` ### `l_layer_trails` Display trails in tours ```{r crabs trails, eval = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center"} # hide the density2D l_layer_hide(l1) # density2D l2 <- l_layer_trails(p0) ``` ```{r crabs trails gif, echo = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center", fig.cap = "GIF 5: layer density"} include_graphics(path_concat(imageDirectory, "trails.gif")) ``` The trails can show the direction of the projection. Meanwhile, the lengths of tails represent the steps of each transformation. ### Build your own layers The interactive layers are realized by modifying function `l_layer_callback`. **It is a generic function and the class is determined by `label`**. For example, we want to create a static point layer and make it as a background of the object `pf`. ```{r non-interactive layer, eval = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center"} allx <- unlist(pf['x']) ally <- unlist(pf['y']) layers <- lapply(l_getPlots(pf), function(p) { l <- loon::l_layer_points(p, x = allx, y = ally, color = "grey80", label = "background") # set the layer as the background loon::l_layer_lower(p, l) }) ``` ```{r non-interactive layer gif, echo = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center", fig.cap = "Figure 2: non-interactive layer"} include_graphics(path_concat(imageDirectory, "tour_layer_non_interactive.PNG")) ``` When the tour changes, this static layer does not change correspondingly (in GIF 6). To make it interactive, we have to create the following `S3` function, by setting its class as **background**. The parameters of this function are * `target`: a **`l_tour`** object. * `layer`: a layer to be turned into interactive. * `...` includes that: + start: the start matrix of projection vectors $\sm{P}_s$ + initialTour: the activated panel's initial position $\sm{Y}_s$ where $\sm{Y}_s = \sm{X} \sm{P}_s$. + projections: A list of the activated panel's projection matrices. + tours: A list of the activated panel's tour path ($\sm{Y}$). + group: a factor defines the grouping of the activated panel. + var: scroll bar current variable. + varOld: scroll bar previous variable. + color: the activated panel's objects' (points' or lines') color. + axes: only for scatterplot, the guided axes. + labels: only for scatterplot, the guided labels. + axesLength: only for scatterplot, the guided axes length. + *If the layer is added for a `l_tour_compound` object* - l_compound: a `l_compound` widget - allTours: returns the tour path for all panels. - allInitialTour: returns the initial position for all panels. - allProjections: returns the projection matrices for all panels. - allColor: returns the color for all panels Notice that, there is no need to use all these to build an interactive statistical layer. For example, in function `l_layer_callback.background`, we only extract two parameters `allTours` (the tours for all panels) and `var` (the current scroll bar variable). ```{r interactive layer, eval = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center"} l_layer_callback.background <- function(target, layer, ...) { widget <- l_getPlots(target) layer <- loon::l_create_handle(c(widget, layer)) args <- list(...) # the overall tour paths allTours <- args$allTours # the scale bar variable var <- args$var # the current projection (bind both facets) proj <- do.call(rbind, allTours[[var]]) loon::l_configure(layer, x = proj[, 1], y = proj[, 2]) } ``` ```{r interactive layer gif, echo = FALSE, warning=FALSE, message=FALSE, error=FALSE, fig.width=4, fig.height=3, fig.align="center", fig.cap = "GIF 7: interactive layer"} include_graphics(path_concat(imageDirectory, "tour_layer_interactive.gif")) ``` After function `l_layer_callback.background` is created and executed, the layer is interactive. ## Reference