{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**CSC 466: Knowledge Discovery in Data **\n", "** Individual Test**\n", "\n", "**Task 1 **" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Your Name **: \n", "\n", "**Cal Poly Email**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The program below performs hierarchical clustering using Complete Link distance.\n", "\n", "**Your Task**: Perform the following:\n", "\n", " 1. Add to the notebook the computation of the Centroid Distance method for computing the distance between two data points.\n", " The function getCentroid() is defined and stubbed below for your convenience\n", " \n", " 2. Change the code of the hierarchical clustering function to use the Centroid distance. \n", " \n", " 3. Display the vector of cluster assignments constructed by function flatten()\n", " \n", " 4. Plot the dataset colored by cluster (marking outliers with a separate color)\n", " \n", " 5. Write a function computeClusterRadius(), which, given a cluster returns its radius.\n", " \n", " 6. Report the radii for each cluster" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Notes:**\n", "\n", "Please read carefully the comments to the existing code - they may explain things you need to know in order to complete some of the tasks.\n", "\n", "I have created placeholders for the main functions you need to create and Jupyter cells at the bottom of the notebook for you to produce the desired output. You may need/want/find it convenient to define additional functions, or to change the parameters in the function definitions provided to you. Feel free to do both/either as you see fit." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Imports\n", "\n", "import numpy as np\n", "from matplotlib import pyplot as plt\n", "import seaborn\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Distance Metrics**\n", "\n", "We use Eucledian distance between two points for this assignment. For simplicity, we declare our own function." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def eucledian(x,y):\n", " return np.sqrt(np.sum((x - y)*(x-y)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Complete Link Distance Computation **\n", "\n", "The code below is simply to make the original Hierarchical code run." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Parameters\n", "## m: matrix of cluster distances\n", "## i, j: clusters that are being merged (new cluster is {i,j})\n", "## k: cluster to which the new distance needs to be computed\n", "\n", "## complete link is the largest distance between two points belonging to two different clusters\n", "## if a cluster i = {x1,...,xn} and cluster j = {y1,...,ym} are merged, then \n", "## the complete link distance from the new cluster ij = {x1,..,xn,y1,..,ym} to a cluster \n", "## k = {z1,...,zp} is the larger of the complete link distances between clusters k and i and clusters k and j\n", "\n", "def completeLink(m, i,j,k):\n", " # assume i maxHeight:\n", " maxHeight = clusters[i][0]\n", " idx = i\n", " s = clusters[idx]\n", " left = s[1]\n", " right = s[2]\n", " l = []\n", " if len(left)==3: ## check if leaf\n", " l.append(left)\n", " if len(right)==3:\n", " l.append(right)\n", " del clusters[idx]\n", " clusters.extend(l)\n", " \n", " return clusters\n", " \n", " \n", "def getClusters(clusters, k):\n", "\n", " s = clusters\n", " while len(clusters) < k:\n", " s = split(s)\n", " \n", " return s\n", " \n", "def flatten(cluster, assignment, label):\n", " left = cluster[1]\n", " right = cluster[2]\n", " if len(left)== 3:\n", " flatten(left, assignment,label)\n", " else:\n", " assignment[left[0]] = label\n", " if len(right) == 3:\n", " flatten(right,assignment, label)\n", " else:\n", " assignment[right[0]]= label\n", " \n", " return\n", " \n", " \n", "\n", "## takes the output of getClusters() function and assigns a cluster label to each data label\n", "## n: number of data points in the dataset\n", "def flattenClusters(clusters,n): \n", "\n", " k = len(clusters) ## number of clusters.\n", " \n", " assignment = np.full((n), -1)\n", " \n", " for i in range(k):\n", " flatten(clusters[i], assignment,i)\n", " \n", " return assignment\n", " \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Reading in the Dataset **\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "filename = \"data1.csv\"\n", "\n", "rawData = np.loadtxt(filename, delimiter = \",\")\n", "\n", "## let's keep only the two columns with the data attributes\n", "\n", "data = rawData[:,0:2]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Let's visualize the dataset**" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWkAAAD4CAYAAAAuNhccAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3X9wG+d5J/AvCBAAIYIiSELRD0qWLUqvE8tSZOlsKYojW2bOcRpddCfHahw353Hi+Dq1L527m7vexI3j1mmnzeTaTnsz52rsepIb99SxW9e55s6uLNtRreicyLZkJe0ribLqWJZDiIRI0PhFgrw/KJAAubtYALvYd3e/nxnPWCAEvHqxfPDu8z7v+wZmZmZARERqanO6AUREpI9BmohIYQzSREQKY5AmIlIYgzQRkcJCVr9gKpWxpFwkkYghnc5a8VKux76oxv6Yx76o5ub+SCbjAa3HlR1Jh0JBp5ugDPZFNfbHPPZFNS/2h7JBmoiIGKSJiJTGIE1EpDAGaSIihTFIExEpjEGaiJRQmCxhOJ1FYbLkdFOUYnmdNBFRPUrT0zh4+CzePJ3C6HgBPV0RbNmQxP7dAwi2cRzJIE1Ejjp4+CwO/fS9uT+PjBfm/nz34AanmqUMfk0RkWMKkyW8eTql+bM3T19i6gMM0kTkoLGJAkbHC5o/S2fyGJvQ/pmfMEgTkWOWdkbQ0xXR/FkiHsXSTu2f+QmDNBGZZnUFRqQ9iC0bkpo/27KhD5F27+3FUS9OHBJRTXZWYOzfPQBgNgedzuSRiEexZUPf3ON+xyBNRDXZWYERbGvD3YMbsG/XOoxNFLC0M6I5gi5Mlgx/7lU1g7QQ4isAfq3ioW1Syk77mkREKqlVgbFv1zpLgmakPYhlidiix/1eR10zSEspnwDwBAAIIXYBuMvuRhGROsxUYGgFV6v4vY663q+hbwL4XTsaQkRqcrICg3XUdeSkhRD/AsAvpJQfGD0vkYhZdjpCMhm35HW8oFZf5ItTSI8XkOiKIBr2/lQDr415reiLnZtX4fkj5zQeX4n+ld22ve/FSx9iNKM/ig+G25HsW1L1uNeujXp+m78K4KlaT7LqfLFkMo5UKmPJa7mdUV/4MV/Ha2Neq/piz441yOaKiyow9uxYY+v7lyZL6IlHMKKRbknEoygVJ6ve383Xht6XSz1B+hYAD1nRGLKO3/N11BpmKzCsVq6jrrzGy/xSR21qqCWEWAlgQkpZtLk9VAfm66jVyhUYrQyO+3cPYHBbP3q7omgLAL1dUQxu6/dNHbXZkfQKAMN2NoTq5/SsO1ErODWKV4WpIC2lPA7gDpvbQnUqz7rr5eu47wF5iV4dtdd5c2bJJ7jvAZH3eb9Wy+Oc3PfAr8t0SW3NXpeqXdcM0i7nRL7Oj2V/pL7S9DQOPPc2XjtxoaHrUtXrmkHaI1qZr2PZH6mo2etS1euawx6qC8v+SEXNXpcqX9cM0gTA/GbuPO6IVNTsdanydc10h8/Vm4dj2R+pqNnrUuXrmiNpnyvn4UbGC5jBfB7u4OGzms9n2R+pqNnrUuXrmiNpH2t0M3ced0Qq2r97ALGO8Gx1R6aAnvj8XaHZvw+od10zSPtYo8vK/b5Ml9SUyU5iOJ3D9PQ0ZmaAmZmZuv6+qtc1g7SPNZuH8+syXVJLcWoK3/7eG/jF8ETV46OZYkMldKpd18xJ+5jKeTgis7QCdCWzJXhmqpucwJG0z6mahyMyI5Mt4kJKP0ADxqk7veqmvTdfjYnspBIpDwZpn1M1D0dkxnvDE5iukXpemLqr3Jvj2VeHNFcZ/sPJiygUS0osDWeQJgDq5eGIzOhf1om2AAwDdTl1pzVq/jA/qfl38sXZtIcKS8OZk7aQynktIi+Kx8JYlezU/FmwDVUnuGitCcgXp029j5NLwzmStoCqu2cR+cE3vnwDvv29N3AhNZv6aAsAH0nE8Fu/dgPiHWEAxmsCzHDypCMGaQuounsWkR+EQyE8et+NyGSLyBSnEQ+3IR4LVz3HaE2AGU4uDTd7EO2XhBAnhBDHhRCftbtRbqLy7llEfhKPhSGuSmBsooD3UhNVv3vlNQFaouEgeuLGAdjJktSaI2khRC+ARwBsBdAJ4FEAP7S5Xa7Bw2CJnFeansZfvnQGPz71AXKF2eAcDQex8/rl+NXb1s+tCai84y375KYV2POJtXjkyddxeaK46OfRcBB7b77G9n+DHjPpjkEAh6SUGQAZAF+zt0nuovLuWUR+cfDwWRw+fqHqsXyxhJeOX0AgEMDdgxsM1wSMjOUxphGgAaA4WcJEtohYxJnssJl3XQsgIIQ4CGAlgG9JKV/Se3IiEUMoZM1tQTIZt+R17LZz8yo8f+ScxuMr0b+y25L3cEtftAr7Y57f+yJfnMKJs5d0f35yaAT37gkjm5/CA/s2AwDS4wUkuiKIhmdDYHdiCZKJDgync4v+fl93B9at7Z17bquZedcAgH4A/xrAVQBeFkJcJaXUrExMp7OWNCyZjCOVyljyWnbbs2MNsrniom/oPTvWWPJvcFNftAL7Y14jfaHaQavNGk5nkbqcN/h5Dg9952VcnqiuvMqMTaOy5zat69VMh2xa14vMWA5GvWxFn+p92ZoJ0r8EcFRKOQVgSAiRAZAEMNxQSzyIq/bIDbxaKrq0M4KeeBijGe10BQCkr5ysYlR51cgWCa3oUzNB+kUATwkh/gBAD2YnD/XvLXyMq/ZIZV4tFY20B3GDWKY5CtajtV96I4OtVvRpzVAvpbwA4BkAhzFb1fGQlNLcMh0iUoKbS0XNrOTdv3sAu7euQkdkPqiG2/XDm9G5heXBVq0A3ao+NZUJl1I+DuBxS96RiFrOjaWi9aQSgm1tuOfTAr9+58fxj2eGgUAAS5eE8TtP/cS2yqtW9al7E1FEZJrRYg5VS0XrPX8TAKLhEPqXxdGf7EQ8Fsbm9X2az9u8vrfpeaNW9SmDNJEPuO2AB6tSCYE6H69Hq/qUQdqDuBsfadm/ewCD2/rR2xVFWwDo7YpW7RKnEjOphFoKkyW8dUa7xuGtMyOW/H60ok+5wZKHaOXwNq3rxeC21ejpiio3WqLWclOpqPFK3oipVEIrcsat6FOOpD1EK4f38pvv4xsH/h8ePnAMTx86jdI0C3P8zmz1gpOMUgkf5ifx7KtDNa/lVubh7exTBmmPqLVfrplJFyKVlFMJ0XB14MsXp01dy27Lw+thkPaI0fG85q3hQqrXxBKVBdvasG/XOsQi2sHUzLXspjy8HuakPeLQT39h6nmq1sQSaRmbKCCts9zbzLXspjy8Ho6kPaAwWcLJoRFTz1W1JpZIi1V5ZTfk4fUwSHtAPUcDbVrX48oLlfzJK3nlZjBIe4DRaKNctN925X9ODo2wyoNcxQt55WYwJ+0BRkcDrUouwXupDzF9Zfdvr+x8Rv7hVF5ZlX23GaQ9Qmsv3E3renRz1VpbNRKpzIqtgM0EXtX23WaQ9git0cbYRAGvvPm+5vNZ5UF+Uk/gVW3fbeakPaZyFtuNO58R2cHsjnoq7rvNIO2QVmyCxJlxovoCrxUbO1mN6Y4Wa3W+q5Fz24i8pJ6Nlow3dnLm7pNBusVane/ywoorombUE3iNKqWcuvusGaSFEFsB/C2AcvLmbSnlQ7a2yqNq3XbZWW3BQ3LJr+oNvKrdfZoZSXcCeEZK+Zt2N8br3HjOHJEX1BN4Vbv7NBOk47a3widUzHcR+YFqgbceZkfSnxRC/B8ASwA8IqV82d5meZOK+S4iPzGT9lNtMUtgZmbG8AlCiI8CWC+lfF4IsQHAIQADUkrN/QOnpkozoRCDjZ5SaRpP/uBnOHbqIi5dzqGvuwPbN67AfXuuQzDIikgipx147m08f+Tcosf/1c3X4P6919v51prn49YM0gsJIV4HsF9K+Y7Wz1OpTH0vqCOZjCOVyljxUkqqZ18Ar/dFvdgf89gX1Zrtj8JkCQ8fOKaZkuztiuKx+2+y7Y43mYxrBumaQzchxH1CiH9/5f+XA/gIgAvWNs9/3Ly/LZFXqbiYxcz99d8A+IwQ4keYLcX7db1UBxGRm6m4lULNiUMpZRrAZ1vQFiIiR6k4uc8Vh0REFdy4mIWIyDdUq6lmkCYi0qDKVgoszCUiUhiDNBGRwhikiYgUxiBNRKQwBmkiIoUxSBMRKYxBmohIYQzSHteKU8mJyD5czOJRqm1cTkSNYZD2qFafSk5E9uCQyoNqnUrO1AeRezBIe5CKG5cTUWMYpD1IxY3LiagxDNIeVN64XAtPJfeWfHGK1Tsex4lDj1Jt43KyVrl65+TQCFLpHKt3PIxB2qNU27icrMXqHf8w9ZUrhOgQQpwTQtxrc3vIYjyV3HtYveMvZu+LHgYwYmdDiMgcVu/4S80gLYS4FsDHAPyd/c0holpYveMvZnLS3wXwIIB/a+YFE4kYQiFrbq2Tybglr+MF7Itqfu+PnZtX4fkj5zQeX4n+ld0OtKhavjiF9HgBia4IouHWTn157dow7D0hxJcB/FhK+Y4QwtQLptNZK9qFZDKOVCpjyWu5HfuiGvsD2LNjDbK5Ik4OjeDS5dxc9c6eHWsc7Run94xx87Wh9+VS6yvuVwBcI4T4HIB+AAUhxHtSykMWt4+I6lCu3nlgXweGzo8oU73DqhPrGQZpKeX+8v8LIb4F4DwDNJE6ouEQliViTjcDQO2qk3271inxReI2rHoncqnCZAkXL32oTMkdq07sYTqjL6X8lo3tICKTqvK+mQJ64mqsNixXnYxoBGpWnTSOI2kilynnfUfGC5iZmc/7Hjx81tF2cc8YezBIE7mI6qsN9+8ewOC2fvR2RdEWAHq7ohjc1s89Y5rAvTuIXMRM3tfJiUTuGWM9jqSJXMQtqw25Z4x1GKTJEjyVvDXM5n35eXgH0x3UFKdXmPmR0V7h/Dy8h0GamsIVZq1XmfcNhttRKk7OjaCfPnSan4fH8KuVGqZ6pUGZV2/9I+1BrOhbUpXisOPz0Os/r/arajiSpoa1utKgMFnC2EQB8aUdmo8vrCTw262/1Z+HXv/decs1eOaVc77pV6cxSFPDWrXCbGGwSCY6sGldb81g0UgqRi/gu4HVn4de/8l3L+MXwxOLHgfMp1S0+tnNfW8nBmlqWLnSoPIXuczKFWYLg8VwOlczWOzbta6uzX7MjrpVDiRWfh5GqZMLqQnNx81soqTVz5vX9yEA4K0zlzgy18AgTU2x+1TyRoPFpzavrOvWv9ao2y2pE6s+D6PUyfSM9t8xk1LR6ufDxy9UPYeTndUYpKkpja4wMzsibTRYYGbG9K2/mS02n311yBVVE1at+DNKnbQFtPu+VkrFqJ+1cHvTWeoMAcjVzK4wK01P4+lDp/HwgWP4r48fw8MHjuHpQ6dRmp7WfL7RCruAznssXRIGAgFsGujT/PnCW/9aE26pyzlXVLFUanbFn9GimVXJTs3Ha6VUjPpZS+X2pn6uJOFImmxXOWqud0RqlGdd0hHCRG5q0ePj2Uk88sTrSMTDWL2sE9n8JNKZgu6tf60JN8zMKL1fhln15tP1UifzE7bmUirl9+2IhJCIhzGaKZpqb3dnBLliCd9/UeLkWf/mqxmkyTYL87iJeBjZgvZIyOjWdmGw6OvuwHVrEzg5NKIZpEtX7sVHM0WMZoq4dctK3H7jGt3gVGvCLZmIuXqf5Ebz6UapEzMplcr3HRkvINwewNSUTo5KQ7YwhUf/4idVj6maZrITgzTZZuEkkdEIymhEujBYrFvbi6HzI3jlzfdNtePk0Cju2r3ecPRoNOEWbGtrSRWLXZ4+dAYvvzE/OVdvoCunTsw+DsyOnr//gsTRUx/MPVacXBygo+EgPnH98ivVHSNIZ/IItweRL5aQL+qnNvyUr2aQJlvUO0mUiEdqjkjLQSEaDhmmKBYyk5KoNeFmdxWLHUrT03j670/j1be0v8ysCnSVaZRQMFA1eq5lSTSEL9wygEh7EHfeUkIqncWfPHPSMEAD7kozNatmkBZCxAA8BeAjAKIAfldK+b9tbhe5XL2TRB/mJ/Hsq0Omc41GKYqF6klJ6I0O3bhP8tN/fxovG9xtNBvotNIosWh7Ve16LelMYa4NkfYgwu1BU9eNG9JMVjGTed8D4KdSyl0A7gLw3+xtEnlBRySEbp1fomg4iEi4+tLLF6frPgJq783XYOfG5ejtiqAtMPu6WqxMSbhhn+TS9DS+/6LUHUGXNRvoqo7xwmwapZ4ArdUGo2qeStev61H6M7BSzZG0lPJgxR9XA6g9dCHfqhxdpXVOh77xo0mcOjeKQnFxjtrUqrXSbBlf5YTk9uuWY/9tA/jBa+ddlZKww8HDZ6ty0Ho2XQl0jayirDedZdSGyvc2e4dUqJEO8RLTOWkhxFEA/QA+Z/S8RCKGUMiab7hkMm7J63iBW/riwHNv6/6CtbUB09PAqXfSupOI6UwewXA7kn1LTL/HaKaIo6c+QF8ihq9/cSvyxSmkxwtIdEUQDXt/2qXy2sgXp3ByaMTU37vztg147rXzOHbqIlKXc0h2d2D7xhW4b891CAaNb7IvXvoQoxnz6ayFouEgVvYtwc/Op/HKW+9XvfeDd21BOBzC/z12Hjrl8zh7YQzxpR2an69bflfMMn0FSyk/IYT4OID/KYTYLKXUrKVJp7OWNCyZjCOVyljyWm7nlr4oTJbw2gn9EVz5F84o55iIR1EqTur+ewuTJRw7dVHzZ6+deB933LgakfYgQgAyYzmo32vNWXhtDKezSKVzNf9eb1cUz7xUnbMeTufw/JFzyOaKtcvrJkvoiZubuK3U0xXBR9ckEGpvw6s673334Abc+alrMHo5h2M//6Xm64yM5TF0fmRRPt0tvyta9L5cauakhRBbhRCrAUBK+RZmA7v2UiTytXonC7WYWbWWuqwdhCpXqPmV2ZzupoFe3RH3P5y8WHNFqNGKRD0BAL955ybcc7vAKZ33fvN0Cu8NZ1CYLOGe2wWiYe0QxYnDap8C8B8BQAjxEQCdAC7Z2ShyJ7MBolKic3bSr7crisFt/TXzx0s7I0h2d2j+zE+/uHpqBc9EZwS33rAKg1v7db9Q88VS1WSg3oTu3puvxg3rtZfea+npiiKZiBl+mY+MF/DNJ3+Chw8cw3NHzuET16/QfJ4b6tOtYibd8T8APCGEOAKgA8BvSCl1MkXkZ/WUxQGzgfmb925DrjBletIq0h7E9o0r8PyRc4t+5qdfXCOLa7oj6IiEkMkWkZ4o4OTZ2TFWPUu0Kyd0F64k1NtwaaHy52Omxr385XDb1lUY3Nbv68lgM9UdOQB3t6At5AFaiz5i0ZBmadaWDX2Ix8KIx8J1vcd9e65DNlf05C+uFftVL6zpfuH1d6tyzyPjBbz8xgWsXtZpOkhX1lQvXEmqF6Cj4SCKk6VFn089X+ZvnRnBY/ff5Kr6dKt5f+qbWirY1oZ9u9bhU5tXAjMzSCZiFavQrAmqwWDthSWVwQ6A8r/gduxXXR616uWes/lJ3LplJU4OjSKdyaO7M4JsYUpztV85lWRUetcWAGZmMNf2vTdfjYnspGa/V36Zj2bymDGxR7UfVhdqYZAmyxgFGjtW62mtDlx4Kz478RRAoVhSegc1u05dN96GtYDbb1yDu3av192lsKycqhhOZ2vu7z1zJeJG2oOIJdo1nzv3Zb5pBSanpvHf/+ZtzVE95xkYpMlCtQKN0YY8drUhX5yfPlF1BzUzhw40+qVm5tzDys+l1h4lZvLJo5miYT9rfZkv6dDOj3OegZv+k0VqBRorNmsvb/yeLy7enrRWG+xoj1XMnPLdKKNqD60AWM5nP3b/Tfi9r23HY/ffhLsHN8zdedRTeqfXz3rLyVcv60RvV7Suah8/4EiaLGEm0Fi1kU/5tPCFaQuzddqq7aBm96nrjezgZ3TXU/57b8iU4arD0fHF/Wz0RZq6nMPvfW07ipMlpecPWo1BmixhZ6DROy0cqL6dNrt9qWp5TrtPXbd6B7/y631q80o88sTr0Ku+W9oZXtTPRl+k+WIJz74yhK987mMNt82LmO4gS4SCAcSi2pNEzQSaetIoZm/FNw30KjdK2797AIPb+m293bd6B79kd4fh4qUt6xd/7ks7I0jE9Usu/+ndtFKpKBVwJE2WOHj4rGYt9OplnU0FmnrTKPt3DyCXn8JrFSeCLDS4tb/h9tjFLftVL6zj/vj6Prx0fPF+Lf3LluDuT2ufWXntVT1VJ7ZUqtxfmmYxSNvEikUJbmE02s3mpzBVmkGNTdV01ZtGCba14Z7bBf7xn0c1qwV6u6Lo6Yo21pgWaEUFTCP0yitLOgXOG1Z365Y53v3p9XjjdMqwHpvmMUhbzI5FCaqzc9KwkXxtpD2IG8Qy155JqCK98kq9DZBOnBnBF24pafZ1LNKOT25awc/HJF8GaTtHuXYtSlBZq6sT+rrnqzvM/h0vLR1vNaM7pco69Eq1vpz5+ZjnqyBt9yjXzkUJKmt1dcK6tb3IjBnvmeyWHK8bNLIFba0vZ34+5vkqSNs9yrXztl91rRgZVZ4WbnZbd1VzvG5idKcUDQc1c8ubBnpNBV9+PrX5Jki3YpRr922/yjgy8i6jO6Wd1y9HIBCo2hY1Fm3HiTMpvPLGBV/MydjNN0G6FaNcu2/73YAjI28yulMqb5akty2q1+dk7OabIN2qUS4nRMiLat0p1doW1ctzMnbzTZBu1SiXt/32yxenMJzOsm8dYHSn5Oc5GTv5JkgDrR3l8rbfeuXqnJNDI0ilc8x3KqYzFkYk3KZZluf1ORk7mQrSQog/BHDzlef/vpTyr21tlU04ynU3P9agu8lzR87p1k37ZU7GDjWHH0KIWwFslFLuAPAZAH9se6tsZvVGM2SN8n7RWhvstGK/amqc0ecTDQex9+arLXkPvevDy8yMpH8E4PUr/58GsEQIEZRS+qunyDZmFhkx36k2o8+nOFnCRHYSsYj2Lom1+HGrhUpmTgsvAfjwyh+/CuCHDNBkJTNpDD/XoLtBK/cT91uay/TEoRDi8wC+AuBfGj0vkYghFLJov9pk3JLXcYt8cQrp8QISXRFEw9UfjVf7Il+c0i3bOjk0ggf2dcz1xc7Nq/D8kXOLnrdz80r0r+w2fI8PRrIAZrC8d8mivnU7Va6NRj8fI/VcH2Wq9IdVzE4c3g7gGwA+I6UcM3puOp21ol1IJuNIpcwu/nW3WrdzXu6L4XQWqbT2PhyXLucwdH5kLo2xZ8caZHNFnBwawaXLubnqnD071mj2T2l6Gn/50hkcffvi3IRWNBzEzuuX41dvW++JW+Vmrw0rNxsrfz4Lq6f0Ph8z6rk+AHfHDb0vl5pBWgixFMB3AAxKKUctbhfB37dz9dwml6tzHtjXgaHzIzUDy8HDZ3F4wYb0+WIJLx2/gEAg4Pm+NWJHnteO6immucwdn7UfQB+AvxJCvHLlvzU2t8s3/F61UO9p1gAQDYdqVucUJkt4Qw7r/vzN0ynP960RrRO7D/30PRw8fLbp17ayeqqR68NrzEwc/jmAP29BW3zJTNWCeoc9WcuORUZjEwXNk1nKRn18TJPbttT1+1YL3ppBcSHeztl4mxwP6wbqnnjEF32rxW3ljH5fhOb+mROX4+3cPKtvk28Qy3R/vmVD0ld9W6k8MNBiZmDg1KISvy5C40haAX6/nbPL/t0DmJ6ZwdG3P5jbmL5c3eHnvm10szG/LypxSmBG57TfRqVSGUte0M2lNI3SK4fyY18Yqbc/CpMlpNJZIBBAsrvDUyOxRq+N+YCrvT+0lqcPndYM7IPb+pWplHHz70oyGQ9oPc6RtEWsqDflznn2iLQH0b/MWwscmlVvntdtk41ewiDdJN4CkpuZHRi4bbLRSxhF6qA1YWJnvSmRKpqdbKTGcSRtgt5oee/NV/MWkHyB53c6h0HaBL1l29n8FG8ByTdYheQMBukajCZM/umf075fiEL+4fdFJU5hTroGowmTyxMFXLsmofmzTQO9GJso+Hp/CPImvy4qcYovRtLNlMfVWrb9xU9vQEc0VHELGEEs2o4TZ1J45Y0LrPYgoqa4Ikg3GmStKI+rNWESi4SqbgFfeP1dvPzm+3PP8dO2o0RkPaWDdLNB1qp9ms1MmETag1jaGdE9RYLVHkTUCKWDdDNB1soVUmYnTOop+LfyRAwi8i5lg3S+ONVUkLVjhVSt1Vlmth3lCkUiqoeyUSE9XjvIGnFihZSZbUe5QpGI6qFskE50NRdkndqnef/uAQxu60dvVxRtAaC3K4rBbf3Yv3vA90dlEVH9lE13RMOhppehOrFCyih/PTKW5QpFIqqLqSAthNgI4G8B/JGU8s/sbdK8ZoOskyuktPLXzRyVxYlGIn+qGaSFEEsA/CmAl+xvTjWrgqwq+zQ3sklNaXoaB557G6+duMCJRiIfMjOSLgD4LID/YnNbdDkRZO0audZ7d2BVrTcRuVPNIC2lnAIwJYQw9YKJRAyhkDVBLZls/WkapdI0nvzBz3Ds1EWkLueQ7O7A9o0rcN+e6xAMWjNy/foXtyJfnEJ6vIBEVwTRsPbHkC9O6S6OOTk0ggf2dej+Xa9z4tpQFfuimtf6w/Lf8HQ6a8nrtPKssspR87OvDlWNXIfTOTx/5ByyuaLlI9cQgMxYDnr/yuF0Fql0TvNnly7nMHR+RIk0Tqu5+Rw7q7Evqrm5P/S+XPw5DLti4cKSRDyMbEG7DM6JZd3NTDQSkTf4euZp4cKS0UwR+aJ2kDazgMZqTtV6E5E6zFR3bAXwXQBrAUwKIe4E8G+klKM2t81WRgtLtITbg+iMtdvYIm37dw8g1hHGayfe52kYRD5kZuLwOIBb7G9Kaxnt7aElXyzhuSPvtLyiItjWhvv3Xo87blzNOmkiH/JduqN84ndHJKS77FyPk0u3eRoGkT/5ZuJQa/e5WLRdc1JOjypLt7n6kMg/fBOktRaFjIwXsHpZJ7L5qbl876aBXpw4k8JoprjoNZyuqOA2p0T+44sgbTRJmM1P4Zv3bkOuMDU3Mg22BZra2MkuXH1I5D++GH7VOgAgV5iqyvcabTfqFG5zSuRPvhhJ17soxMnd8/TYcdIMEalPmZF0uerCjhFho4tCVKqocOKkGSIpjwIXAAAFfElEQVRynuMjab3JsAfv2mLp+zhxAICVGtnmlIjcz/EgrTcZFusIY+/OtZa9j4opjHq5/YuGiOrnaJA2mgw7duoi7rhxdVOBVKueWJUDABrhhS8aIqqPo0HaaDLs0uVcw5NhKtcTW7EQxc1fNERUH0eDtFHVRV93R8OTYSrWE6v8xUFE6nI0OhhVXWzfuML0SLOyMkTVeuKF26KWvzgOHj7rSHuIyB0cnzjUmwy7b891GB390PDvao1O1/cvVa6euNYXR6sPEyAi93A8SC+cDOuIhJArTGGyNF3z72rux/HzYd3nl+uJW71BEReiEFGjHA/SZaFgAIeOvzc3Kk4mOrBpXa9uzrbeTfsB4OPre/Hsq0MtzwvzGCwiapQyM1YLc7bD6ZxhzraeTft7uyIY3NaPGcCRvDCPwSKiRikRpBuZ7DNaJl0pAODrd27Cvl3rcOLMpbrew0oqbtpEROozle4QQvwRgO0AZgB8XUr5Eysb0UjO1miZdKWeriiSiZjjeWEuRCGiRtQcSQshdgFYL6XcAeCrAP7M6kY0unlQ5ehUTzmdoMoGRSpt2kRE6jOT7rgNwHMAIKX8OYCEEKLLykY0mrMtj04fu/8mfPv+m3DrDat00wnMCxORG5lJdywHcLziz7+88ti41pMTiRhCofoD3oN3bUGsI4xjpy7i0uUc+ro7sH3jCty35zoEg7W/S/pXApuuXY58cQrp8QISXRFEw9X/vGbfw0nJZNzpJiiF/TGPfVHNa/1hJkgHNP48o/fkdDrbcGP27lyLO25cjbGJAtat7UVmLFdzQYuWEIDMWA6ZGu9Rzgs38h6tlEzGkUpp/Wv8if0xj31Rzc39offlYiZIX8DsyLlsJYAPLGiTpnLONhoOaQZZK9+DiEh1Zu7xXwRwJwAIIbYAeF9K6c6vKiIil6kZpKWURwEcF0IcBfCnAH7D9lYREREAk3XSUsrfsrshRES0mNolDUREPscgTUSksMDMjG41HREROYwjaSIihTFIExEpjEGaiEhhDNJERApjkCYiUhiDNBGRwhikiYgUpsxp4WV2H9XlNkKIPwRwM2Y/q9+XUv61w01ylBCiA8DPAPyOlPIph5vjKCHElwD8ZwBTAH5bSvlDh5vkGCFEJ4DvAegBEAbwqJTyBWdbZQ2lRtKtOKrLTYQQtwLYeKU/PgPgjx1ukgoeBjDidCOcJoToBfAIgE8C+ByAvc62yHH3ApBSylswu2vnnzjaGgspFaTRgqO6XOZHAL5w5f/TAJYIIXx7zpcQ4loAHwPwd063RQGDAA5JKTNSyotSyq853SCHXQLQe+X/E1f+7AmqBenlAFIVfy4f1eVLUsqSlLJ8bMxXAfxQSllysk0O+y6A/+B0IxSxFkBACHFQCHFECHGb0w1ykpTyfwFYI4Q4i9nBzX9yuEmWUS1I13VUl18IIT4P4CsAHnS6LU4RQnwZwI+llO843RZFBAD0A/gSZm/1/0IIsfD3xzeEEPcAeFdKOQBgN2b3vvcE1YJ0S4/qcgMhxO0AvgHgDinlmNPtcdCvAPi8EOIYZu8qflsIMehwm5z0SwBHpZRTUsohABkASYfb5KSdAF4AACnlCQCrhBDKFUY0QrV/xIsAHgXwOI/qAoQQSwF8B8CglHLU6fY4SUq5v/z/QohvATgvpTzkXIsc9yKAp4QQf4DZioZOeCgP24CzAG4C8KwQ4ioAE1LKKYfbZAmlgrSU8qgQonxU1zR4VNd+AH0A/koIUX7sy1LKd51rEqlASnlBCPEMgMMAYgAeklJOO9wsJz0O4EkhxKuYjWv/zuH2WIb7SRMRKUy1nDQREVVgkCYiUhiDNBGRwhikiYgUxiBNRKQwBmkiIoUxSBMRKez/A43bgPkJOrfpAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.scatter(data[:,0],data[:,1])" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## Perform Clustering\n", "\n", "distanceMatrix = computeDistanceMatrix(data, eucledian)\n", "clusters = hierarchical(data,distanceMatrix)\n", "\n", "# Get five clusters \n", "five = getClusters(clusters, 5)\n", "\n", "# Flatten those clusters into usable values\n", "flattened = flattenClusters(five, len(data))\n", "\n", "# Create a dictionary of all the individual clusters\n", "clusters = {}\n", "for i, point in enumerate(flattened):\n", " if point not in clusters:\n", " clusters[point] = [data[i]]\n", " else:\n", " clusters[point].append(data[i])\n", "\n", "# This is just some light data formatting\n", "for c in clusters:\n", " clusters[c] = np.stack(clusters[c], axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Your Task**: output the cluster assignment. \n", "\n", "We are looking for **five (5)** clusters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Your Task**: visualize the scatterplot of your cluster assignments" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cluster 1 with radius 1.5050503818586876: \n", "[[ 1.13914451 1.42127638]\n", " [ 2.05490501 0.94506381]\n", " [ 1.68718074 1.1498139 ]\n", " [ 1.84522469 1.83389662]\n", " [ 0.34916704 0.20757109]\n", " [ 0.71312273 2.29592454]\n", " [ 1.45120283 1.30024144]\n", " [ 1.34923173 1.50134826]\n", " [ 0.68673661 0.62518478]\n", " [ 1.00956961 0.73643864]\n", " [ 0.81778376 0.80011884]\n", " [ 0.59629252 1.25168355]\n", " [ 0.45972192 0.15943312]\n", " [ 0.70812176 0.12705347]\n", " [ 0.9877163 0.88087237]\n", " [ 2.13006473 0.58668393]\n", " [ 1.63906411 0.27282683]\n", " [-0.07439339 0.39514351]\n", " [ 0.53158587 0.30863246]\n", " [ 0.1284272 1.21343064]\n", " [-0.01671219 0.10049069]\n", " [ 1.44887643 2.36205833]]\n", "Cluster 4 with radius 1.0605860542418464: \n", "[[ 3.35485443 2.93165325]\n", " [ 4.14035686 3.33299415]\n", " [ 4.09297082 2.91941295]\n", " [ 4.0255524 2.0603712 ]\n", " [ 4.34792055 2.52103225]\n", " [ 3.97263885 3.35778194]\n", " [ 3.4400606 3.10399365]\n", " [ 4.39475248 1.84578503]\n", " [ 3.30182989 2.46335266]\n", " [ 3.24289466 3.4846477 ]\n", " [ 4.94876467 3.25362572]\n", " [ 4.28558271 2.65092888]\n", " [ 3.73850811 1.93814949]\n", " [ 3.96621709 2.54236809]\n", " [ 3.4944046 2.88808942]\n", " [ 3.65815333 3.01243996]\n", " [ 4.38450682 2.64807034]\n", " [ 4.16127113 3.18042862]\n", " [ 3.57659967 3.1872377 ]\n", " [ 4.42818966 1.85642644]\n", " [ 4.27975805 3.15859511]\n", " [ 4.72411876 3.31727824]\n", " [ 4.39687162 3.38838496]\n", " [ 3.32977397 2.65770565]]\n", "Cluster 5 with radius 1.5105212286442755: \n", "[[ 7.74178165 2.98820786]\n", " [ 7.23877304 3.11233171]\n", " [ 7.13997585 2.19156537]\n", " [ 7.0069634 3.02368756]\n", " [ 7.05139949 2.63671635]\n", " [ 7.41742855 2.61724531]\n", " [ 6.48432308 1.81779005]\n", " [ 6.86508344 1.6115423 ]\n", " [ 6.2032856 3.50225275]\n", " [ 6.94687453 2.76920293]\n", " [ 7.04917701 2.89365498]\n", " [ 6.02489295 4.07673433]\n", " [ 7.44888086 2.53461879]\n", " [ 5.65779909 2.69738454]\n", " [ 6.28670454 3.03973959]\n", " [ 6.33782706 3.11854403]\n", " [ 6.93046716 3.36804613]\n", " [ 6.53349031 2.91376934]\n", " [ 7.10903004 2.07571773]\n", " [ 6.33939021 2.74087906]\n", " [ 7.03391515 2.41297226]\n", " [ 7.36649548 3.31924822]\n", " [ 6.30653573 3.01781923]\n", " [ 7.719528 3.05130484]\n", " [ 6.76350196 3.36047975]\n", " [ 7.15592462 3.05347983]\n", " [ 6.66273713 2.84304108]\n", " [ 7.33364818 3.12884973]\n", " [ 6.21583128 2.96251341]\n", " [ 7.5728735 2.78429215]\n", " [ 5.99485326 3.69010212]\n", " [ 7.42802257 2.88418269]\n", " [ 7.13471635 3.84256047]\n", " [ 7.53605744 3.32103687]\n", " [ 7.1819572 2.95003399]\n", " [ 6.86121822 2.37042466]\n", " [ 7.63642691 3.10655575]\n", " [ 7.2747744 3.09505402]\n", " [ 7.02453215 2.60155743]\n", " [ 7.76509682 1.94725476]\n", " [ 7.74914167 3.17197454]]\n", "Cluster 2 with radius 1.430299037770285: \n", "[[ 2.45473115 4.30576896]\n", " [ 1.36845274 6.10263427]\n", " [ 2.35347325 5.44650425]\n", " [ 2.35967528 5.57684932]\n", " [ 2.4758371 5.52198907]\n", " [ 1.68981927 5.08565087]\n", " [ 2.51901262 5.3235015 ]\n", " [ 3.13287714 6.61450036]\n", " [ 2.75038807 6.46789207]\n", " [ 2.29412747 6.86917004]]\n", "Cluster 3 with radius 1.6847625897254654: \n", "[[ 8.54991328 6.58307743]\n", " [ 7.78045274 5.54953381]\n", " [ 8.52654346 6.05561829]\n", " [ 8.43657199 6.21754073]\n", " [ 7.52139659 7.30566686]\n", " [ 8.92301108 5.84377781]\n", " [ 8.60768773 5.2297361 ]\n", " [ 8.00818768 6.31721167]\n", " [ 8.57580943 6.21368595]\n", " [ 8.25585671 5.20895508]\n", " [ 8.02082345 5.95926999]\n", " [ 7.71764847 5.72116412]\n", " [ 8.05906408 5.97456223]\n", " [ 8.21513793 5.87175774]\n", " [ 7.76486087 7.01476363]\n", " [ 8.29286791 6.14634376]\n", " [ 8.59056135 4.89706758]\n", " [ 8.06248476 6.34177293]\n", " [ 6.84885249 7.05820496]\n", " [ 6.39545529 5.82832459]\n", " [ 8.19700245 5.666758 ]\n", " [ 8.05345784 6.58039393]\n", " [ 7.99653757 6.32082644]]\n" ] } ], "source": [ "def computeClusterRadius(cluster):\n", " centroid = sum(cluster) / len(cluster)\n", " \n", " radius = 0\n", " for point in cluster:\n", " dist = eucledian(centroid, point)\n", " if dist > radius:\n", " radius = dist\n", " return radius\n", "\n", "# This outputs all the data points and which clusters they belong to\n", "for c in clusters:\n", " if c == -1:\n", " continue\n", " else:\n", " radius = computeClusterRadius(clusters[c])\n", " print(\"Cluster {} with radius {}: \\n{}\".format(c+1, radius, clusters[c]))\n", "if -1 in clusters:\n", " print(\"Outliers:\\n{}\".format(clusters[-1])) " ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## Use the array of colors below. Use black for outliers, other colors for the five clusters you constructed.\n", "\n", "colors=['black', 'blue', 'green', 'red', 'purple', 'brown']\n", "\n", "# Print out all the clusters, normal colors for good values,\n", "# black for outliers\n", "color_index = 1\n", "for c in clusters:\n", " if c == -1:\n", " plt.scatter(clusters[c][:, 0], clusters[c][:, 1], color='black')\n", " else:\n", " plt.scatter(clusters[c][:, 0], clusters[c][:, 1], color=colors[color_index])\n", " color_index += 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**When you are done**: download the notebook, submit using the following command:\n", "\n", " handin dekhtyar 466-test " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }