%matplotlib inline
from data_vis import load, run

load()

Automatic Parallel Coordinates Axes Sorting for Binary Classification¶

by Luke Plewa - June 11th, 2015¶

Introduction

The community is torn between love and hate towards parallel coordinates. How can we demonstrate an effective use for parallel coordinates, supported by a metric?

In binary classification, we're interested in using a set of features to divide each sample in a dataset into one of two classes.

Parallel coordinates can help visualize this problem. The goal here is to identify multiple features which we can clump together to create a particularly accurate division. In this way, we can remove unhelpful features and only include helpful features.

This project uses a dataset of heart rates. The normal sinus rhythm (healthy) heart rates are shown in blue. The onset of sudden cardiac arrest (unhealthy) heart rates are shown in red. The dataset features almost 300 samples and has 10 derived features stored in Python pickle files. All the features are normalized beforehand.

Binary Classification

Machine learning problem
Two sets of classes (healthy and unhealthy heart rates)
Dataset = set of heart rate (HRV) derived features and class labels
Divide each sample into one of two classes

Parallel Coordinates

Previous work in clustering to derive these classes, but no work where the labels are already given
Visualize binary classification problem
Each axis represents a heart rate derived feature
Cluster the two classes together and as far away apart as possible
Visualize helpful and unhelpful features

Examples

Red = onset of sudden cardiac arrest (unhealthy) heart rates
Blue = normal sinus rhythm (healthy) heart rates
300 samples
10 heart rate variability features
sample features stored in python pickle files for fast loading
features are normalized beforehand

Contributions

The main complaint with parallel coordinates is the big effect the sorting of the axes has on the graph. Using this complaint, this project provides the following contributions:

An scoring metric for scoring parallel coordinate plots made for binary classification.
Automatic sorting methods which are proven to be better than random sorting by this metric.
The use of cubic splines to show the average plot for each class.

The Scoring Algorithm

Background

Take the average of each feature for both classes
Each subplot (the area between two axes) is represented by a trapezoid between these average features
Scoring algorithm is modeled after the area of a trapezoid, because we want to maximize the area between these two classes
Area of a trapezoid = (height_a + height_b) / 2.0 * base

Implementation

For every subplot:

Let height_a be the difference of the average features between both classes on the left feature.
Let height_b be the difference of the average features between both classes on the right feature.
score += abs(height_a * height_b)

For any given graph, we want the highest score possible.

Results

The average score for random sorting of axes (based on 10 iterations) is 0.1550.
3 out of 4 of the sorting algorithms scored higher than this.
SCD increasing order = 0.2225
Absolute difference between averages increasing order = 0.1824
Normal sinus rhythm increasing order = 0.1750
Non-absolute differences between averages increasing order = 0.1046

# sorts the parallel coordinates in increasing order based on the average SCD class normalized feature values
run("scd")

['std_dev', 'min', 'mean', 'sdsd', 'median', 'rmsdd', 'max', 'outlier', 'nn_50', 'sdhr']
score: 0.22249022561954082

# Absolute sorts the deltas by absolute value,
# giving us the features in order of highest to lowest difference by average

run("absolute")

['sdhr', 'nn_50', 'outlier', 'max', 'min', 'median', 'sdsd', 'mean', 'rmsdd', 'std_dev']
score: 0.18245909806126992

# sorts the parallel coordinates in increasing order based on the average normal sinus rhythm class normalized feature values
run("norm")

['std_dev', 'min', 'median', 'sdsd', 'mean', 'rmsdd', 'max', 'outlier', 'nn_50', 'sdhr']
score: 0.17508011502406648

# normal gives us the sorting by delta averages,
# starting with red on bottom and ending up with blue on top
run("normal")

['sdhr', 'std_dev', 'rmsdd', 'sdsd', 'mean', 'median', 'min', 'max', 'outlier', 'nn_50']
score: 0.1046859827814546

# Randomly sort the axes
run(spline=True)
run()
run()
run()

['nn_50', 'max', 'sdhr', 'sdsd', 'min', 'outlier', 'median', 'rmsdd', 'std_dev', 'mean']
error: 0.1757012441236637

['std_dev', 'sdsd', 'median', 'nn_50', 'mean', 'rmsdd', 'max', 'sdhr', 'min', 'outlier']
error: 0.20959513422663908

['median', 'mean', 'outlier', 'sdhr', 'rmsdd', 'nn_50', 'std_dev', 'max', 'min', 'sdsd']
error: 0.17808626566417704

['sdhr', 'sdsd', 'rmsdd', 'nn_50', 'max', 'outlier', 'mean', 'min', 'median', 'std_dev']
error: 0.2199287998665183

run()
run()
run()
run()

['max', 'median', 'sdhr', 'mean', 'nn_50', 'std_dev', 'sdsd', 'min', 'outlier', 'rmsdd']
error: 0.15537826113197323

['outlier', 'std_dev', 'sdsd', 'nn_50', 'max', 'mean', 'rmsdd', 'min', 'sdhr', 'median']
error: 0.05961839227164985

['sdsd', 'outlier', 'sdhr', 'median', 'rmsdd', 'nn_50', 'std_dev', 'min', 'max', 'mean']
error: 0.05510635864324331

['rmsdd', 'std_dev', 'nn_50', 'median', 'min', 'sdhr', 'max', 'mean', 'outlier', 'sdsd']
error: 0.1864923946557482

Works Cited

Heinrich and Weiskopf. “State of the Art of Parallel Coordinates.” 2013. University of Stuttgart. https://classes.soe.ucsc.edu/cmps261/Fall13/papers/hcmarsh/StateXofXtheXArtXofXParallelXCoordinates.pdf

Liang Fu Lu ; "A New Axes Re-ordering Method in Parallel Coordinates Visualization." Sch. of Software, Univ. of Technol., Sydney, NSW, Australia ; Mao Lin Huang ; Tze-Haw Huang http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6406759&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6406759

Makwana, Tanwani, and Jain. "Axes Re-ordering in Parallel Coordinate for Pattern Optimization." Institute of Engineering & Technology Devi Ahilya University Indore, India. 2012. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.259.2315&rep=rep1&type=pdf

Hong Zhou, Xiaoru Yuan, Huamin Qu, Weiwei Cui, Baoquan Chen. "Visual Clustering in Parallel Coordinates." Computer Science & Engineering Department, The Hong Kong University of Science and Technology, Hong Kong. 2008. http://www.cse.ust.hk/~huamin/eurovis08_zhou.pdf

Ev-br. Parallel Coordinates Implementation. StackOverflow. 2011. http://stackoverflow.com/questions/8230638/parallel-coordinates-plot-in-matplotlib

Implementation

GitHub implementation available at:

https://github.com/luke-plewa/zagreus

https://syntagmatic.github.io/parallel-coordinates/
https://eagereyes.org/techniques/parallel-coordinates
http://bl.ocks.org/jasondavies/1341281

Automatic Parallel Coordinates Axes Sorting for Binary Classification¶

by Luke Plewa - June 11th, 2015¶

Introduction

Binary Classification

Parallel Coordinates

Examples

Contributions

The Scoring Algorithm

Background

Implementation

Results

Works Cited

Implementation

Related Works

¶