{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**CSC 466: Knowledge Discovery in Data **\n", "** Individual Test**\n", "\n", "**Task 1 **" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Your Name **:\n", "\n", "**Cal Poly Email**:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The program below performs hierarchical clustering using Complete Link distance.\n", "\n", "**Your Task**: Perform the following:\n", "\n", " 1. Add to the notebook the computation of the Centroid Distance method for computing the distance between two data points.\n", " The function getCentroid() is defined and stubbed below for your convenience\n", " \n", " 2. Change the code of the hierarchical clustering function to use the Centroid distance. \n", " \n", " 3. Display the vector of cluster assignments constructed by function flatten()\n", " \n", " 4. Plot the dataset colored by cluster (marking outliers with a separate color)\n", " \n", " 5. Write a function computeClusterRadius(), which, given a cluster returns its radius.\n", " \n", " 6. Report the radii for each cluster" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Notes:**\n", "\n", "Please read carefully the comments to the existing code - they may explain things you need to know in order to complete some of the tasks.\n", "\n", "I have created placeholders for the main functions you need to create and Jupyter cells at the bottom of the notebook for you to produce the desired output. You may need/want/find it convenient to define additional functions, or to change the parameters in the function definitions provided to you. Feel free to do both/either as you see fit." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "## Imports\n", "\n", "import numpy as np\n", "from matplotlib import pyplot as plt\n", "import seaborn\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Distance Metrics**\n", "\n", "We use Eucledian distance between two points for this assignment. For simplicity, we declare our own function." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "def eucledian(x,y):\n", " return np.sqrt(np.sum((x - y)*(x-y)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Complete Link Distance Computation **\n", "\n", "The code below is simply to make the original Hierarchical code run." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "## Parameters\n", "## m: matrix of cluster distances\n", "## i, j: clusters that are being merged (new cluster is {i,j})\n", "## k: cluster to which the new distance needs to be computed\n", "\n", "## complete link is the largest distance between two points belonging to two different clusters\n", "## if a cluster i = {x1,...,xn} and cluster j = {y1,...,ym} are merged, then \n", "## the complete link distance from the new cluster ij = {x1,..,xn,y1,..,ym} to a cluster \n", "## k = {z1,...,zp} is the larger of the complete link distances between clusters k and i and clusters k and j\n", "\n", "def completeLink(m, i,j,k):\n", " # assume i<j\n", " sim = min(m[i,k], m[j,k])\n", " \n", " return sim" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Your Assignment**: develop the centroidDistance() function in the cell below." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "## Parameters\n", "## data: the dataset (expect a numPy array or matrix)\n", "## cluster1, cluster2 - dendrograms representing the two clusters \n", "## \n", "## Output: Centroid distance between the two clusters represented by the dendrograms.\n", "## Note: leaf nodes in the dendrograms are indexes of points in the data array/matrix\n", "\n", "def centroidDistance(cluster1, cluster2, data):\n", " \n", " return 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Compute Distance Matrix **\n", "\n", "The computeDistanceMatrix() function below simply computes the pairwise distances between all data points.\n", "To enable proper work of hierarchical, the distance between all pairs [i,i] is set to 'NaN'\n", "\n", "This function is parameterized by the distance computation method. We are using Eucledian distance." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "## Parameters\n", "## data: numPy array of data points\n", "## distanceMetric: name of the function for computing the distance between two data points\n", "\n", "def computeDistanceMatrix(data, distanceMetric):\n", " ## size of distance matrix\n", " n = data.shape[0]\n", " \n", " distances = np.matrix([distanceMetric(x,y) for x in data for y in data]).reshape((n,n))\n", " \n", " for i in range(n):\n", " distances[i,i] = 'Nan' # make sure that diagonal distances to not interfere with computation\n", " \n", " return distances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Hierarchical Clustering**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The hierarchical clustering function hierarchical() below performs the hierarchical clustering using the hard-coded Complete Link distance between the clusters.\n", "\n", "It creates a dendrogram of the clusters by starting with a list [[1],[2],[3]....[n]], where each element of the list represents a singleton cluster, and then, on every step merging the two clusters deemed closest to each other and labeling the merged cluster with the intercluster distance. \n", "\n", "For example, if clusters [1] and [2] in the list above have a distence 0.2, and this is the shortest intercluster distance, on the next step the list of clusters will look as follows:\n", "\n", " [[0.2, [1],[2]],[3],...,[n]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Your task:**\n", " Change the function hierarchical() to work with the centroid distance function you developed.A" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "## canonize() ensures that the values in the s= (i,j) pair satisfy i<j\n", "def canonize(s):\n", " if s[0] < s[1]:\n", " return s\n", " else:\n", " return (s[1],s[0])\n", "\n", "## Paremeters\n", "## distances: matrix of distances\n", "## data: the raw dataset. Passed in case needed for distance computations\n", "\n", "def hierarchical(data, distances):\n", " \n", " size = distances.shape[0]\n", " m = np.matrix(np.copy(distances)) ## make a copy of the original distance matrix\n", " ## some distance computations may require it\n", "\n", " ## clusters is the data structure storing the dendrogram \n", " clusters = [[i] for i in range(size)] ## leaf level of the dendrogram\n", " \n", " for iteration in range(size-1): ## we merge two clusters size-1 times\n", " \n", " s = np.unravel_index(np.nanargmin(m),m.shape)\n", " s = canonize(s)\n", " \n", " height = m[s[0],s[1]] ## smallest distance between two clusters.\n", " \n", " clusters[s[0]] = [height, clusters[s[0]],clusters[s[1]]] ## create a merged node in the\n", " ## dendrogram in-place \n", " \n", " for i in range(m.shape[0]):\n", " if i != s[0] and i != s[1]:\n", " newSim = completeLink(m, s[0],s[1],i)\n", " m[s[0],i] = newSim\n", " m[i,s[0]] = newSim\n", " \n", " ## modify data structures: delete row and column from the distance matrix\n", " ## delete merged entry from the dendrogram\n", " m= np.delete(m,s[1],0)\n", " m= np.delete(m,s[1],1)\n", " del clusters[s[1]]\n", " return clusters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Cluster Output **\n", "\n", "The functions below are used to process the dendrogram created by hierarchical() and to retrieve from this dendrogram the assignment of points to clusters.\n", "\n", "getClusters() takes as input a dendrogram and a number of clusters, and produces as a result the list of dendrograms representing the requested number of clusters. \n", "\n", "**Note:** this method performs outlier removal - any singleton \"clusters\" detected during the process are removed\n", "\n", "split() is the helper function for getClusters(), it finds, in a forest of dendrograms, the dendrogram with the highest height, and splits that dendrogram into two parts, performing outlier removal as needed.\n", "\n", "flattenClusters() and its helper function flatten() work with the output produced by getClusters(). \n", "\n", "flattenClusters() returns an array of cluster assignments - the value in position j represents the cluster label for the jth point in the dataset. If this value is -1, the point is an outlier. Cluster labels for k clusters are 0,1,.., k-1." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "## Parameters\n", "## clusters: the dendrogram from Hierarchical clustering method\n", "## k: number of clusters to extract\n", "\n", "def split(clusters):\n", " \n", " size = len(clusters)\n", " \n", " maxHeight = clusters[0][0]\n", " idx = 0\n", " for i in range(size):\n", " if clusters[i][0] > maxHeight:\n", " maxHeight = clusters[i][0]\n", " idx = i\n", " s = clusters[idx]\n", " left = s[1]\n", " right = s[2]\n", " l = []\n", " if len(left)==3: ## check if leaf\n", " l.append(left)\n", " if len(right)==3:\n", " l.append(right)\n", " del clusters[idx]\n", " clusters.extend(l)\n", " \n", " return clusters\n", " \n", " \n", "def getClusters(clusters, k):\n", "\n", " s = clusters\n", " while len(clusters) < k:\n", " s = split(s)\n", " \n", " return s\n", " \n", "def flatten(cluster, assignment, label):\n", " left = cluster[1]\n", " right = cluster[2]\n", " if len(left)== 3:\n", " flatten(left, assignment,label)\n", " else:\n", " assignment[left[0]] = label\n", " if len(right) == 3:\n", " flatten(right,assignment, label)\n", " else:\n", " assignment[right[0]]= label\n", " \n", " return\n", " \n", " \n", "\n", "## takes the output of getClusters() function and assigns a cluster label to each data label\n", "## n: number of data points in the dataset\n", "def flattenClusters(clusters,n): \n", "\n", " k = len(clusters) ## number of clusters.\n", " \n", " assignment = np.full((n), -1)\n", " \n", " for i in range(k):\n", " flatten(clusters[i], assignment,i)\n", " \n", " return assignment\n", " \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Reading in the Dataset **\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "filename = \"/share/466-test/data1.csv\"\n", "\n", "rawData = np.loadtxt(filename, delimiter = \",\")\n", "\n", "## let's keep only the two columns with the data attributes\n", "\n", "data = rawData[:,0:2]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Let's visualize the dataset**" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "<matplotlib.collections.PathCollection at 0x7f88a2b265c0>" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAD4CAYAAADFAawfAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X+QHOV5J/DvzszOzI52VjtajYyklSyQ0GuDkBDoQLIi\nMGJ9BAdxSsmHgLJTPnKEuos5cpWqlO344lzKcSV1qZSpulQqYBObC/iUApuSUz6TyEJYBgQYCQlh\neIVWxkJCeEer2V+aXzuze3+sZjU7293TPdM9/Xb391NFIe3M7rx6p/eZt5/3ed+3Y3p6GkREpK6Q\n2w0gIiJjDNRERIpjoCYiUhwDNRGR4hioiYgUF3Hih2Yy47aUkqRSCWSzOTt+lOexL+Zif1zGvpjL\nq/2RTic79B5TekQdiYTdboIy2BdzsT8uY1/M5cf+UDpQExERAzURkfIYqImIFMdATUSkOAZqIiLF\nMVATkTKKkxUMZXMoTlbcbopSHKmjJiKyojI1hT37T+LIiQwujBWxqCeGjWvT2L19DcIhjicZqInI\ndXv2n8S+X5yZ/fvwWHH27/cPrHWrWcrgRxURuao4WcGRExnNx46cOM80CBioichloxNFXBgraj6W\nHS9gdEL7sSBhoCYiVy3sjmFRT0zzsVQyjoXd2o8FCQM1EVlid2VGrDOMjWvTmo9tXLsYsU7/7d1h\nFScTicgUJyszdm9fA2AmJ50dLyCVjGPj2sWzXw86BmoiMsXJyoxwKIT7B9Zi162rMTpRxMLumOZI\nujhZMXzcrxoGaiGEALCn5ktXAfgzKeW3HGsVESmlUWXGrltX2xI4Y51hLEkl5n096HXWDQO1lFIC\nuB4AhBBhAGcB/NDhdhGRQsxUZmgFWLsEvc7a6kfR7QAGpZS/dqIxRKQmNyszWGdtPUd9L4DvN3pS\nKpWw7ZSFdDppy8/xg0Z9USiVkR0rItUTQzzq/+kHXhuXtaMvtm5Yjr0HT2l8fRn6l/U69rrnzl/E\nhXH90Xw42on04gVzvu63a8P0b7MQIgrgbgBfafRcu84rS6eTyGTGbflZXmfUF0HM3/HauKxdfbFj\ny0rk8qV5lRk7tqx09PUrkxUsSsYwrJF6SSXjqJQm57y+V68Now8XK8OuOwEcllL+puUWka2Cnr+j\n9jBbmWG3ap117TVeFZQ6ayvDrftgIu1B7cX8HbVbtTKjnQFy9/Y1GNjUj76eOEIdQF9PHAOb+gNT\nZ21qRC2ESAD4DICHnG0OWeX2bDxRO7g1mleFqUAtpcwB6HO4LdSE6my8Xv6O+ySQn+jVWfudP2ea\nAoT7JBD5n/9ruALArX0Sgrqcl9RWKJUxlM21dF2qdm0zUPtAu/N3QSwHJPVVr8tjg8PIZPNNXZeq\nXtsM1D7SrvwdywFJRXZcl6pe2xz+kCUsByQV2XFdqnxtM1ATAPObwfPYJFKRHdelytc2Ux8BZzUn\nx3JAUpEd16XK1zZH1AFXzckNjxUxjcs5uT37T2o+n+WApCI7rkuVr22OqAOs2c3geWwSqah6/R0b\nHEZmJI/eBTFcb/G6VPXaZqAOsGaXnwd9OS+pKRwK4bObP46x3CSKpTKyE0UcO3ke4VCH6fI6Va9t\nBuoAazUnF9TlvKSeUrmMv3zyMD4Ympjz9WbL61S7tpmjDjCVc3JEVmgF6VqNyuvMVj25hSPqgFM1\nJ0dk1niuhLMZ/SAN6KfyjKqeypVpZdIfDNQBp2pOjsisM0MTmJo2fk59Kq+6l8fzr53GC0c+nP16\nNVUiT48gV5hUZhk5AzUBUC8nR2RW/5JuhDpgGKyrqbz6EXRHh/bza9MoKiwjZ47aZqrnuoj8JpmI\nYnm6W/OxcAhzToKpXzfQaCRey81l5BxR20TVXbeIguBPf+8G/OWTh3E2M5MGCXUAH0sl8OUv3IBk\nVxSA8boBM9w8MYmB2iaq7rpFFATRSAT/84GbMJ4rYbw0hWQ0hGQiOuc5RusGzHBzGbmpoZ4QolcI\n8YwQ4l0hxDtCiC1ON8xLVN51iyhIojVzLfUpyOq6AS0hnVx1LTdLVs2OqB8F8BMp5eeEEFEAnHWq\nwQNmidxVTT0elkO4MF6anVzsq0lBVtcN1N75Vm3bsAxHT57HyERp3mOhDuDWjctdLVltGKiFED0A\nbgHwRQCQUpYAzP/XBJjKu24RBUF96rE6SVifgtRbN3DbxuX42Zsfzvu5ADA9Ddzx71Yof8LLVQAy\nAP5RCLEBwBsAHpFSXtT7hlQqgUjEnluEdDppy89x2tYNy7H34CmNry9D/7JeW17DK33RLuyPy4Lc\nF4VSGccGhw2fc2xwGF/cEUWuUMZDuzYAALJjRaR6YohHIyiUykinujCUzc/73nSqC6tX9SEedW9K\nz8wrRwDcAOBhKeWrQohHAXwZwP/Q+4ZsNmdL49LpJDKZcVt+ltN2bFmJXL4075N6x5aVtvwbvNQX\n7cD+uKyZvlDt8NZWDGVzyGgE2LnPyePh//UCRibmVmSNj06h2nPrV/dppkXWr+7D+GgejXq41T41\n+rA1E6jPADgjpXz10t+fwUygphpc4Ude4McyUqPUY63spRNa9Cqymt1OoR192jBQSyk/EkJ8IIQQ\nUkoJ4HYAv7Tl1X2IK/xIZX4sIzWaJDRSv+d6s4OtdvSp2XD/MICnhBDHAFwP4Ju2vDoRtY2Xy0gb\nrfjdvX0NBjb1Y1FyZuK+Wm7X2x3VfD6gfw5idbBlJki3q09NZcellG8C2GTLKxKRK7xYRmo2rVA7\nGg5HO5G/WEC+WEZXLIK/+O7rjlVktatPvZmUIiLLjBZ8qFpG2syZnksXL0AyEcWSVAKJeASJeKfm\nc+1YwNKuPmWgJgoIrx0UYUdaYc/+k5oHCqxY0m3LApZ29Sn3+vAhP5Vekb28dFBEq2kFo0CfK5RR\nrkwjbMNQtR19ykDtI3r5vJ3brsREbpKBmzxVRmq84jfWMK3QrvxxO/qUgdpH9MqEfn7sHIqlii9q\nZskeXigjNSq7u1iYxLMvDhpey+3e2sHJPuVvq08Y3eYVShVTEzFEqqmW3cWjc0eohdJUw2vZazl5\nIwzUPmFlr13Va2aJqsKhEHbduhqJmHZQbXQtVwN9X08coQ6gryc+58QXr2DqwycWdscQi4ZRKDUO\nwKrWzBJpGZ0oIjuuvWFno2vZSzl5IxxR+4q5A+BUrZkl0mJHrbKV1YYqYqD2idGJIgqlKVPPXb96\nkWcvWAoeP+Wam8VA7RMLu2Po0xl1VE8Zqu5/cGxwGE/vO4HKlLnATuQ2v+Sam8UctU8YlTItTy/A\nmcxF3VMviFTnVq5ZlcVjDNQ+orVCav3qRbqnX9Rv80ikulZrlc0GXtX27Wag9hGtUcfoRBEHjmif\nBcfqDwoKq4FXtX27maP2odoZbi/umEZkNyu78Km4bzcDtYsabYZuB86YU9BZDbxm9ghpN6Y+XNDu\n/JeXdkwjspvVzZnavUeIGQzULmh3/ssvq7OImmE18BpVULl1F2oqUAsh3gcwDqACoCyl5LFcTWp0\nG+ZkFYYXdkwjslszgVe1u1ArI+rbpJTnHWtJQHjx3Doir7MaeFW7C2Xqo81UzH8R+Z1qgdcqs4F6\nGsC/CiGmAfyDlPIxB9vkayrmv4iCwmz6T7UFLx3T0413XBNCLJNSfiiEWALg3wA8LKX8md7zy+XK\ndCTCgKOnUpnCEz96G4eOn8P5kTwW93Zh87qleGDHtQjbcYgbEbXk8efewt6Dp+Z9/e5tV+HBndc5\n9bIdug+YCdS1hBB/DmBCSvk3es/JZMat/VAd6XQSmcy4HT9KSVb2EfB7X1jF/riMfTFXq/1RnKzg\na48f0kxP9vXE8Y0Hb3bkzjedTuoG6obDNyHEAiFEsvpnAP8ewHH7mhdcXt8jl8iPvLrg5WMAfiiE\nqD7/aSnlTxxtFRGRS1Sc8G8YqKWUpwBsaENbiIhcp+KEP8vziIjqeHnBCxFRIKhWd81ATUSkQ5Vt\nF1i0S0SkOAZqIiLFMVATESmOgZqISHEM1EREimOgJiJSHAM1EZHiGKgDoB2nnRORc7jgxcdU2/yc\niJrDQO1j7T7tnIicwWGVTzU67ZxpECLvYKD2KRU3Pyei5jBQ+1R183MtPO2cyFsYqH2quvm5Fp52\n7i+FUplVPT7HyUQfU23zc7JXtarn2OAwMtk8q3p8jIHax1Tb/Jzsxaqe4DD9sSuECAshjggh/sXJ\nBpH9eNq5/7CqJ1is3B89AuAdpxpCROaxqidYTAVqIUQ/gN8B8G1nm0NEZrCqJ1jM5qi/BeBPACTN\nPDmVSiASsec2O5029ZKBwL6YK+j9sXXDcuw9eErj68vQv6zXhRZdViiVkR0rItUTQzza/qkwv10b\nDXtQCHEXgCEp5RtCiE+b+aHZbK7VdgGY6exMZtyWn+V17Iu52B/Aji0rkcuXcGxwGOdH8rNVPTu2\nrHStb1TYX8ar14bRh4uZj7qtAO4WQnwWQBxAjxDin6SUn7epfUTUhGpVz0O7ujD4/rASVT2sRHFG\nw484KeVXpJT9UspVAO4FsJ9Bmkgd8WhEiaoeVqI4h1XxRB5WnKzg3PmLSgRBVqI4x1KWX0p5AMAB\nR1pCRKbNyQWPF7Eo6f6qxGolyrBGsGYlSms4oibyoGoueHisiOnpy7ngPftPutYm7i/jHAZqIo9R\nORe8e/saDGzqR19PHKEOoK8njoFN/dxfpkXc64PIY8zkgpekEm1u1QzuL+MMjqiJPMYLqxK5v4y9\nGKjJNjztvD3M5oL5fvgHUx/UMhVWowWN0V7jfD/8h4GaWsbVaO1XmwsORztRKU3OjqSf3neC74fP\n8OOVWqJyBUItv6YBYp1hLF28YE66w4n3Q6///NqvquGImlrS7gqE4mQFoxNFJBd2zfuaVoVB0NIA\ndr8fev33uU9fhWcOnApMv7qNgZpa0q7VaPUBI53qwnVXLcI0gKPvndcNFs2kZYwCv+rsfj/0+k+e\nHsEHQxPzvg6YS6/o9bGX+95JDNTUkmoFQu0vc5Wdq9HqA8ZQNo+fvnF2znPqg0WjNMCuW1fPaZ/Z\n0bfKwcTO98Oo/85mJjS/rtWvtThCbw4DNbXM6dPOjQKGlmqwsJoGaDT69koaxa73w6j/pqa1v6dR\nesWpEbrfMVBTy5pZjWZlVGoUMLRUg4WVNICZ0fezLw56oprCrtWBRv0X6tAO1kbpFSdG6EGhzjCA\nPM/MarTK1BSe3ncCX3v8EL7yD4fwtccP4el9J1CZmtL9HqOVeFqinWFEO8MYnShi/ZrFms+pTwM0\nGn1nRvKeqG6p1erqQKOFNcvT3ZpfN0qvtDJCB4JdYcIRNbVFdQT9/Osf4IXDl3PLZkalRnlXLYVS\nBV997BCKpQpSyShWLOlGrjCJ7HhRNw3QaPSN6Wll99ewwmp+XS+NcjmnbC69UpysoDRZsTxC71kQ\nRbQzjKf3nVA+5eQkBmpyVG1ed3isiFCH9vMa3eLWB4zFvV24ZlUvXnn7NyiW5o/GC6WZUdeF8RIu\njJdw28ZluOOmlboBqtEkXDqV8PRey83m143SKGbSK2bf/+Xp7jk56qqRiRK++tgrKNS8x6qmnJzE\nQE2Oqp88anYSqj5grF7Vh8H3h/HikXOm2nFs8ALu2X614SjSaBIuHAq1pbrFKU/ve8/ynUytahrF\n7NeBmVH0/3le4uXjH81+Tev9X7GkG3/6ezfMjtCHxwpzHi9ofBADwcpfM1CTY6xUayxcEENXrPHl\nWA0M8WjEMF1Rz0x6otEknNPVLU6oTE3h6X87gQNHPtR83K5gV5tSiYQ75oyiG8kVypie7sD9A2ux\n41Or8OdPvI6siWO7vJRyalXD3wwhRBzAzwDELj3/GSnl151uGHmflWqN7EQRf/Hd1y3lHq3krq2k\nJ/RGiV7ba7k4WcH3nn8Xh47/Rvc5rQY7rZRKIt6pmcYw04Z8sYwRk2creiHlZBczI+oigO1Sygkh\nRCeAnwsh/p+U8pDDbSOP64pF0Nsd0xwdaU0eNZN73LntKuQLZbx7OovseBHRzvBsfrqWnekJo9t9\nFdTnhY20Guy06qLNjKL12mDlLum61YuU/qC0U8NALaWcBlD9eOy89J9OppFobqDQu4Xdsu5jePtX\nWYxMlOY9ZuZ2vFKZmlMJkEpGsfnaK7D79jX40Uvveyo9Ybf64Gmk+gHWzGpLqwuR9IiVvbN/tnKX\nVNT4QPYrUzlqIUQYwBsA1gD4Oynlq0bPT6USiETs+aRLp5O2/Bw/8EpfPP7cW7q/aKEQMDUFvPPr\nEc0gDczcCoejnUgvXmD6NS6Ml/Dy8Y+wOJXAI/fdiEKpjOxYEameGOJR/0/FVK+NQqmMY4PDpr7n\n1huW47/s2oDv/fgdHDp+DpmRPNK9Xdi8bike2HEtwmHj9NO58xdxYdza6LlWOAREOyN45e2PcPLs\n6OzrfumejUh0RXHo+DkMZfO633/y7CiSC7s031+v/K6YZeoKllJWAFwvhOgF8EMhxDop5XG952ez\nOVsal04nkcmM2/KzvM4rfVGcrOClo2d1H6+uazHKXaeScVRKk7r/3uJkBYeOa1d7vHT0Q9x50wrE\nOsOIABgfzUP9XmtN7bUxlM0hYxDcqmKdIdx72xr8/bNH5+2hsvfgKeTypcald5MVLEqaS1PUWtQT\nQywSxrkLOeSL5Xmve//AWuzcugp33rQCj+99G4ffO6/5c4ZHCxh8f3heGsorvyv1jD5cLFWLSylH\nABwA8NutNYn8yupyby2N8smjE0VkRrSDUe1KtiAyu4rzt9YvBQDd1MXPj51ruHrUaOWing4A//V3\n16FU1k5bHDlxHuO5EoYuDfYeuOsaxKPaYYqTiTWEEGkAk1LKESFEF4ABAH/teMvIk6xMBlWlumMY\nvai/alDrNdK9XZq3xUH65dXSKMe7qCeGT65M4XdvMd60qlCqzE7KGk3y7tx2JS6MFnRHvfNfP45o\nOKT7usNjBXz9idcwOlGaXZTzqeuWYv8b8+/SvFC/bhczqY+lAL53KU8dAvDPUsp/cbZZ5FVWl3v3\n9cTxZ1/chHyxbHoiK9YZxuZ1S7H34Kl5jwXpl1ePVr33dasXoViqQJ7O4uXjH+Hd01msX7MYqWQU\nF8a15wrq1U7yaq041FvMVKvRKk8As3MX1Q+I229cjoFN/YGeIDZT9XEMwMY2tIV8QitQJOIRzdra\njWsXI5mIIpmIWnqNB3Zci1y+5Mtf3lb3u9aq9372xUEcePvyopfhsSJeOHwWK5Z0mw7UtfXOZlec\nxqNhlCYrpld5annzvWF848GbPVO/7gT/T4dT24VDIey6dTVu2bAMmJ5GOpWoWa1mT2ANh40Xn9QH\nO5U3+6+ye7/rar23URldrjCJ2zYuw7HBC8iOF9DbHUOuWNasRa+mlYx+XqgDmJ7GbNt3brsSE7nJ\nhqs8Fy7QrrcH5n5AqFy/7iQGarKVUbBxYlVf/eKT+tdPJaNY0BVFrjCp/M5rTp3mbryFaxF33LQS\n92y/es7o22hPk6FsruF2pdPTM3+IdYaRSHXOe1511L/jU6twZmgCS1Jd+KunDnt20yunMVCTrRoF\nG6dX9dW/fnX3PL32qMLqsWFWmDlAofZ9abSniZkJ4wvjJcN+1lt6rvUzOe/AgwPIRo2CjR0bvlc3\njy+UypZe36n22MXMsWHNMiqj0wqC1dHuNx68Gd/8g834xoM34/6BtbN3IFbK8vT6ufqBOjxWxDRm\nPkA/GJrAiiXd6OuJI9QxM9E8sKnfF/MOreKImmxj9YxCK7ROIV+/um9OCsPSJlCK7bzm9Gnuzez8\nZ3T3U/2+wzJjuDrxwtj8fjb6QM2M5PHNP9iM0mRF6fmEdmOgJts4GWy0TiGvv7W2UsOtWt7T6dPc\n7d75r/rzbtmwDF//zmu6m/8s7I7O6+dG9dvPHhjE7991TdNt8yOmPsg2kXAHEvH5E0dAa8HGbErF\nyi25innP3dvXYGBTv6O3/q2eo1gv3dtluBJy49Xz+3lhdwyppH455runs0qlpVTAETXZZs/+k5q1\n0iuWdLcUbKykVHZvX4N8oYyXak4Vqbd13RVK5j29st91fanj9Vcvxk81Vg72L1mA+z8zfyIx1hnG\nJz6+aM7JL7Wy40Wl0lIqYKB2kBdqd+1iXKtbRrkyjQabsemyklIJh0L4/B0C7/z6guZCjkXJGD5/\nh1CuNK+Wqvtd65VeVqa1Ex9rV/Tq9vP9n7kah09kDOu16TIGagfYvXDBC5ycSLSav411hnGDWKL5\n/BtE2vcfmk7RK73U2zTp6HvD+I+frmj2dyLWid9av9SzZ1C2W2ADtZOjXacWLqis3VULi3svV32Y\neb6flpi7weiOSe/w2UYf0HyPzAtcoHZ6tOvkwgWVtbtqYfWqPoyP6u+77JV8r1c0s31tow9ovkfm\nBS5QOz3adTIFoLp2jJBqTyE3szW8qvlerzG6Y4pHtc+pXL+mz1QA5nvUWKACdTtGu06nAFTGEZJ/\nGd0xbb3uCnR0dNR8QM8sBz/6XgYHDp8NxByN0wIVqNsx2nU6BeAFHCH5k9EdU3XHxNGJIp5/7TRe\nODJ3S1W/z9E4LVCBul2jXU6SkB81umOKdYaxsDume7iun+donBaoQN2u0S5TAM4rlMoYyubYty4w\numMK8hyNkwIVqIH2jnaZArBftWrn2OAwMtk885+K6U5EEYuGNEv2/D5H4yQzh9uuAPAkgCsATAF4\nTEr5qNMNcwpHu94WxBp1L3nu4CnduuqgzNE4wcwQpAzgj6WUnwSwGcAfCiE8v7WV3ZvTkD2q+01r\nbcrTjv2uqXlG7088GsbObVfa8hp614efmTnc9hyAc5f+PC6EeAfAcgC/dLhtFCBmFiIx/6k2o/en\nNFnBRG4SiZj27oqNBHFbhlqW/oVCiFWYOZH8VUdaQ4GldeLHvl+cwZ79J2efU63a0cL8p/ucfH/M\nXB9+ZnoyUQjRDeBZAH8kpRwzem4qlUAkYtN+t+mkLT/HKwqlMrJjRaR6YohH5749fu2LQqmsW9J1\nbHAYD+3qmu2LrRuWY+/BU/Oet3XDMvQv6zV8jY+GcwCmcUXfgnl963WqXBvNvj9GrFwfVar0h11M\nXa1CiE7MBOmnpJQ/aPT8bDbXarsAzHR2JmNmobD3Nbq183NfDGVzyGS19+04P5LH4PvDsymNHVtW\nIpcv4djgMM6P5GerdnZsWanZP5WpKXz/p+/h5bfOzU5yxaNhbL3uCtx7+9W+uG1u5dqwe3Oy6vtT\nX1Wl9/6YYeX6ALwbN4w+XMxUfXQA+A6Ad6SUf2tju6hGkKsZrO43ff/AWjy0qwuD7w83DDB79p/E\n/rpN7QulCn76xll0dHT4vm/1OJXzdaKqKsjbMlSZeUe2AvgCgO1CiDcv/fdZh9sVKEGvZrB6SjYA\nxKORhlU7xckKDssh3cePnMj4vm/1OJ3ztbOqqpnrw2/MVH38HEBHG9oSWGaqGfrb3KZ2c2Ih0uhE\nUfOUl6oLAT3yyYtb8QZ9WwZ/zah4FG/tHLxlTkZ1g/WiZCwQfVvPi2WOQV+o5v2ZFB/grd1ldt8y\n3yCW6D6+cW0wj+VqtYzOzUUnQV2oxhG1IoJ+a+eU3dvXYGp6Gi+/9dHs5vbVqo+g9m2zm5MFfdGJ\nmzqmdU4QbkUmM27LD/VqmU0r9MqlgtgXRqz2R3Gygkw2B3R0IN3b5asRWTPXxuWgq723tJan953Q\nDO4Dm/qVqp7x6u9KOp3UnQvkiNpGdtSkcsc9Z8Q6w+hf4q9FEK2wmvP14gSknzBQ24C3hORVZgcG\nXpyA9BNGEYu0JlKCvg8B+R/3WXEXR9Qm6Y2ad267kreE5Hs8C9RdDNQm6S3xzhXKvCWkQGBlknsY\nqE0wmkh599fZwC9WoWAI+qITNzFHbYLRRMrIRBGfWJnSfGz9mj6MThQDu58E+VNQF524KTAj6lZK\n5xot8b7vM2vRFY/U3BLGkIh34uh7GRw4fJZVIETUEs8E6mYDrR2lc40mUhKxyJxbwudfO40Xjnw4\n+5wgbVlKRPZTPlC3Gmjt2ufZzERKrDOMhd0x3dMoWAVCRM1QPlC3EmjtXE1ldiLFysIAu0/XICJ/\nUjpQF0rllgKtE6upGq3kMrNlKVcyEpEVSkeF7FjjQGvEjdVUZrYs5UpGIrJC6UCd6mkt0Lq1z/Pu\n7WswsKkffT1xhDqAvp44Bjb1Y/f2NYE/douIrFM69RGPRlpeturGaiqjfPbwaI4rGYnIEjOnkD8B\n4C4AQ1LKdc43aa5WA62bq6m08tmtHLvFyUeiYDIzov4ugP8N4Elnm6LNrkCryj7PzWxuU5mawuPP\nvYWXjp7l5CNRAJk5hfxnQohVbWiLITcCrVMjWKt3CXbVghORNzmSo06lEohE7Als6XT7T+WoVKbw\nxI/exqHj55AZySPd24XN65bigR3XIhy2ZwT7yH03olAqIztWRKonhnhU+60olMq6C2iODQ7joV1d\nut/rd25cG6piX8zlt/5w5Dc8m83Z8nPadfZZ/ci5/my4oWweew+eQi5fsn0EGwEwPpqH3r9yKJtD\nJpvXfOz8SB6D7w8rkdJpN6+ei+cE9sVcXu0Pow+XYA7FLtFaeLJ+dZ9SS8BbmXwkIn8I9EyU1sKT\nF458qBkUAXOLbOzmVi04EamjYaAWQnwfwCszfxRnhBC/73yznGe08CSkc2h7tDOM7kSng63Stnv7\nGty97SrNBTRE5H9mqj7ua0dD2s1oH5Cpae3vKZQqeO7gr9peaREOhfDgzutw500rWEdNFECBTH0U\nJysoTVZ0l6cvSkYR69TuGjeXefNkDaJgCtRkYv3kYSyqHYw/+fFFePn4R5qPqbLMm6sUiYIjUIG6\nfuFIoTQFAIhHwyhNVmYXnuzcdhXePZ1VstKCW6QSBU9gArXR5GEiFsFXv3Aj0r1ds6PTVjeDcgpX\nKRIFT2CGYI1OEo9GQnMCsNFWpW7hFqlEwRSYEbXVhSNu7rqnx4kTa4hIfUqNqIuTFQxlc46MDJtd\nOKJSpYUbJ9YQkfuUGFHrTZB96Z6Ntr6OG4cI2KmZLVKJyPuUCNR6E2SJrih2bl1l2+uomM6wyusf\nNkRknevjukmhAAAFX0lEQVSB2miC7NDxc7jzphUtBVOtemNVDhFohh8+bIjIGtcDtdEE2fmRfNMT\nZCrXG9uxWMXLHzZEZI3rgdqoGmNxb1fTE2Qq1hur/OFBROpyPToYVWNsXrfU1IizvlpE1XpjrW1V\n9/3iDPbsP+lKe4jIG1wfUQP6E2QP7LgWFy5c1P0+vRHq1uuuUK7euNGHR7sPJCAi71AiUNdPkHXF\nIsgXy5isTBl+n1564+DRD6GzU+lsvXG7NzXiYhUiapYSgboqEu7AvjfOzI6Q06kurF/dp5nDNRqh\nFif1A/z1V/fh2RcH254n5pFaRNQs13PUtepzuEPZvG4O12iEqqWvJ4aBTf2YBlzJE/NILSJqljKB\n2uoEoNFy6nodAB753HrsunU1jr533vRr2E3FjZ6ISH2mUh9CiN8G8CiAMIBvSyn/yu6GWM3hGi2n\nrreoJ450KuF6npiLVYioGWYOtw0D+DsAdwK4BsB9Qohr7G5IMxsO1Y9Q41HtoFdNLaiyqZFKGz0R\nkfrMjKhvAnBSSnkKAIQQ/xfAfwDwSzsb0syGQ/Uj1O5EFM8dPKW7DwY3NSIiLzITqJcD+KDm72cA\n3Gz0DalUApGI9aD3pXs2ItEVxaHj53B+JI/FvV3YvG4pHthxLcJh48F//6X/P3LfjSiUysiOFZHq\niSEenftPbOU13JZOJ91uglLYH5exL+byW3+YCdQdGl/TK1MGAGSzueZaA2Dn1lW486YVGJ0oYvWq\nPoyP5g0XveiJABgfzWO8wWtU88TNvEY7pdNJZDJa/5pgYn9cxr6Yy6v9YfThYiZQnwGwoubv/QA+\nbLFNhqo53Hg0ohlo7XwNIiLVmQnUrwO4WghxJYCzAO4FcL+jrSIiolkNk7JSyjKALwF4HsA7AP5Z\nSvm20w0jIqIZpuqopZQ/BvBjh9tCREQa1C5zICIiBmoiItV1TE8bVtoREZHLOKImIlIcAzURkeIY\nqImIFMdATUSkOAZqIiLFMVATESmOgZqISHFKnUJe1Y6jv7xCCLECwJMArgAwBeAxKeWj7rbKXZdO\nHfoFgLNSyrvcbo+bhBC9AL4NYB1mth9+QEr5irutcocQ4r8D+M+Y6Ye3APwnKWXB3VbZQ7kRdbuO\n/vKQMoA/llJ+EsBmAH8Y8P4AgEcws0EYzQxofiKl/ASADQhovwghlgP4bwA2SSnXYWaQd6+7rbKP\ncoEaNUd/SSlLAKpHfwWSlPKclPLwpT+PY+YXcbm7rXKPEKIfwO9gZhQZaEKIHgC3APgOAEgpS1LK\nEXdb5aoIgC4hRARAAg7vm99OKgZqraO/AhuYagkhVgHYCOBVl5vipm8B+BPMpIGC7ioAGQD/KIQ4\nIoT4thBigduNcoOU8iyAvwFwGsA5AKNSyn91t1X2UTFQWz76KwiEEN0AngXwR1LKMbfb4wYhxF0A\nhqSUb7jdFkVEANwA4O+llBsBXATwZXeb5A4hRAozd95XAlgGYIEQ4vPutso+Kgbqth/9pTohRCdm\ngvRTUsofuN0eF20FcLcQ4n3MpMS2CyH+ydUWuesMgDNSyuod1jOYCdxBNADgV1LKjJRyEsAPAHzK\n5TbZRsVAPXv0lxAiipkJgb0ut8k1QogOzOQg35FS/q3b7XGTlPIrUsp+KeUqzFwX+6WUvhk1WSWl\n/AjAB0IIcelLtwP4pYtNctNpAJuFEIlLvzO3w0cTq8oFah79Nc9WAF/AzOjxzUv/fdbtRpEyHgbw\nlBDiGIDrAXzT5fa44tJdxTMADmOmNC8E4DFXG2Uj7kdNRKQ45UbUREQ0FwM1EZHiGKiJiBTHQE1E\npDgGaiIixTFQExEpjoGaiEhx/x819Pvbp+ZklwAAAABJRU5ErkJggg==\n", "text/plain": [ "<matplotlib.figure.Figure at 0x7f88a2738400>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.scatter(data[:,0],data[:,1])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "## Perform Clustering\n", "\n", "distanceMatrix = computeDistanceMatrix(data, eucledian)\n", "clusters = hierarchical(data,distanceMatrix)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Your Task**: output the cluster assignment. \n", "\n", "We are looking for **five (5)** clusters" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Your Task**: visualize the scatterplot of your cluster assignments" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "## Use the array of colors below. Use black for outliers, other colors for the five clusters you constructed.\n", "\n", "colors=['black', 'blue', 'green', 'red', 'purple', 'brown']\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**When you are done**: download the notebook, submit using the following command:\n", "\n", " handin dekhtyar 466-test <notebook>" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }