{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**CSC 466: Knowledge Discovery in Data **\n", "** Individual Test**\n", "\n", "**Task 1 **" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Your Name **:\n", "\n", "**Cal Poly Email**:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The program below performs hierarchical clustering using Complete Link distance.\n", "\n", "**Your Task**: Perform the following:\n", "\n", " 1. Add to the notebook the computation of the Centroid Distance method for computing the distance between two data points.\n", " The function getCentroid() is defined and stubbed below for your convenience\n", " \n", " 2. Change the code of the hierarchical clustering function to use the Centroid distance. \n", " \n", " 3. Display the vector of cluster assignments constructed by function flatten()\n", " \n", " 4. Plot the dataset colored by cluster (marking outliers with a separate color)\n", " \n", " 5. Write a function computeClusterRadius(), which, given a cluster returns its radius.\n", " \n", " 6. Report the radii for each cluster" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Notes:**\n", "\n", "Please read carefully the comments to the existing code - they may explain things you need to know in order to complete some of the tasks.\n", "\n", "I have created placeholders for the main functions you need to create and Jupyter cells at the bottom of the notebook for you to produce the desired output. You may need/want/find it convenient to define additional functions, or to change the parameters in the function definitions provided to you. Feel free to do both/either as you see fit." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Imports\n", "\n", "import numpy as np\n", "from matplotlib import pyplot as plt\n", "import seaborn\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Distance Metrics**\n", "\n", "We use Eucledian distance between two points for this assignment. For simplicity, we declare our own function." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def eucledian(x,y):\n", " return np.sqrt(np.sum((x - y)*(x-y)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Complete Link Distance Computation **\n", "\n", "The code below is simply to make the original Hierarchical code run." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Parameters\n", "## m: matrix of cluster distances\n", "## i, j: clusters that are being merged (new cluster is {i,j})\n", "## k: cluster to which the new distance needs to be computed\n", "\n", "## complete link is the largest distance between two points belonging to two different clusters\n", "## if a cluster i = {x1,...,xn} and cluster j = {y1,...,ym} are merged, then \n", "## the complete link distance from the new cluster ij = {x1,..,xn,y1,..,ym} to a cluster \n", "## k = {z1,...,zp} is the larger of the complete link distances between clusters k and i and clusters k and j\n", "\n", "def completeLink(m, i,j,k):\n", " # assume i maxHeight:\n", " maxHeight = clusters[i][0]\n", " idx = i\n", " s = clusters[idx]\n", " left = s[1]\n", " right = s[2]\n", " l = []\n", " if len(left)==3: ## check if leaf\n", " l.append(left)\n", " if len(right)==3:\n", " l.append(right)\n", " del clusters[idx]\n", " clusters.extend(l)\n", " \n", " return clusters\n", " \n", " \n", "def getClusters(clusters, k):\n", "\n", " s = clusters\n", " while len(clusters) < k:\n", " s = split(s)\n", " \n", " return s\n", " \n", "def flatten(cluster, assignment, label):\n", " left = cluster[1]\n", " right = cluster[2]\n", " if len(left)== 3:\n", " flatten(left, assignment,label)\n", " else:\n", " assignment[left[0]] = label\n", " if len(right) == 3:\n", " flatten(right,assignment, label)\n", " else:\n", " assignment[right[0]]= label\n", " \n", " return\n", " \n", " \n", "\n", "## takes the output of getClusters() function and assigns a cluster label to each data label\n", "## n: number of data points in the dataset\n", "def flattenClusters(clusters,n): \n", "\n", " k = len(clusters) ## number of clusters.\n", " \n", " assignment = np.full((n), -1)\n", " \n", " for i in range(k):\n", " flatten(clusters[i], assignment,i)\n", " \n", " return assignment\n", " \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Reading in the Dataset **\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "filename = \"/share/466-test/data1.csv\"\n", "\n", "rawData = np.loadtxt(filename, delimiter = \",\")\n", "\n", "## let's keep only the two columns with the data attributes\n", "\n", "data = rawData[:,0:2]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Let's visualize the dataset**" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAD4CAYAAADFAawfAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X+QHOV5J/DvzszOzI52VjtajYyklSyQ0GuDkBDoQLIi\nMGJ9BAdxSsmHgLJTPnKEuos5cpWqlO344lzKcSV1qZSpulQqYBObC/iUApuSUz6TyEJYBgQYCQlh\neIVWxkJCeEer2V+aXzuze3+sZjU7293TPdM9/Xb391NFIe3M7rx6p/eZt5/3ed+3Y3p6GkREpK6Q\n2w0gIiJjDNRERIpjoCYiUhwDNRGR4hioiYgUF3Hih2Yy47aUkqRSCWSzOTt+lOexL+Zif1zGvpjL\nq/2RTic79B5TekQdiYTdboIy2BdzsT8uY1/M5cf+UDpQExERAzURkfIYqImIFMdATUSkOAZqIiLF\nMVATkTKKkxUMZXMoTlbcbopSHKmjJiKyojI1hT37T+LIiQwujBWxqCeGjWvT2L19DcIhjicZqInI\ndXv2n8S+X5yZ/fvwWHH27/cPrHWrWcrgRxURuao4WcGRExnNx46cOM80CBioichloxNFXBgraj6W\nHS9gdEL7sSBhoCYiVy3sjmFRT0zzsVQyjoXd2o8FCQM1EVlid2VGrDOMjWvTmo9tXLsYsU7/7d1h\nFScTicgUJyszdm9fA2AmJ50dLyCVjGPj2sWzXw86BmoiMsXJyoxwKIT7B9Zi162rMTpRxMLumOZI\nujhZMXzcrxoGaiGEALCn5ktXAfgzKeW3HGsVESmlUWXGrltX2xI4Y51hLEkl5n096HXWDQO1lFIC\nuB4AhBBhAGcB/NDhdhGRQsxUZmgFWLsEvc7a6kfR7QAGpZS/dqIxRKQmNyszWGdtPUd9L4DvN3pS\nKpWw7ZSFdDppy8/xg0Z9USiVkR0rItUTQzzq/+kHXhuXtaMvtm5Yjr0HT2l8fRn6l/U69rrnzl/E\nhXH90Xw42on04gVzvu63a8P0b7MQIgrgbgBfafRcu84rS6eTyGTGbflZXmfUF0HM3/HauKxdfbFj\ny0rk8qV5lRk7tqx09PUrkxUsSsYwrJF6SSXjqJQm57y+V68Now8XK8OuOwEcllL+puUWka2Cnr+j\n9jBbmWG3ap117TVeFZQ6ayvDrftgIu1B7cX8HbVbtTKjnQFy9/Y1GNjUj76eOEIdQF9PHAOb+gNT\nZ21qRC2ESAD4DICHnG0OWeX2bDxRO7g1mleFqUAtpcwB6HO4LdSE6my8Xv6O+ySQn+jVWfudP2ea\nAoT7JBD5n/9ruALArX0Sgrqcl9RWKJUxlM21dF2qdm0zUPtAu/N3QSwHJPVVr8tjg8PIZPNNXZeq\nXtsM1D7SrvwdywFJRXZcl6pe2xz+kCUsByQV2XFdqnxtM1ATAPObwfPYJFKRHdelytc2Ux8BZzUn\nx3JAUpEd16XK1zZH1AFXzckNjxUxjcs5uT37T2o+n+WApCI7rkuVr22OqAOs2c3geWwSqah6/R0b\nHEZmJI/eBTFcb/G6VPXaZqAOsGaXnwd9OS+pKRwK4bObP46x3CSKpTKyE0UcO3ke4VCH6fI6Va9t\nBuoAazUnF9TlvKSeUrmMv3zyMD4Ympjz9WbL61S7tpmjDjCVc3JEVmgF6VqNyuvMVj25hSPqgFM1\nJ0dk1niuhLMZ/SAN6KfyjKqeypVpZdIfDNQBp2pOjsisM0MTmJo2fk59Kq+6l8fzr53GC0c+nP16\nNVUiT48gV5hUZhk5AzUBUC8nR2RW/5JuhDpgGKyrqbz6EXRHh/bza9MoKiwjZ47aZqrnuoj8JpmI\nYnm6W/OxcAhzToKpXzfQaCRey81l5BxR20TVXbeIguBPf+8G/OWTh3E2M5MGCXUAH0sl8OUv3IBk\nVxSA8boBM9w8MYmB2iaq7rpFFATRSAT/84GbMJ4rYbw0hWQ0hGQiOuc5RusGzHBzGbmpoZ4QolcI\n8YwQ4l0hxDtCiC1ON8xLVN51iyhIojVzLfUpyOq6AS0hnVx1LTdLVs2OqB8F8BMp5eeEEFEAnHWq\nwQNmidxVTT0elkO4MF6anVzsq0lBVtcN1N75Vm3bsAxHT57HyERp3mOhDuDWjctdLVltGKiFED0A\nbgHwRQCQUpYAzP/XBJjKu24RBUF96rE6SVifgtRbN3DbxuX42Zsfzvu5ADA9Ddzx71Yof8LLVQAy\nAP5RCLEBwBsAHpFSXtT7hlQqgUjEnluEdDppy89x2tYNy7H34CmNry9D/7JeW17DK33RLuyPy4Lc\nF4VSGccGhw2fc2xwGF/cEUWuUMZDuzYAALJjRaR6YohHIyiUykinujCUzc/73nSqC6tX9SEedW9K\nz8wrRwDcAOBhKeWrQohHAXwZwP/Q+4ZsNmdL49LpJDKZcVt+ltN2bFmJXL4075N6x5aVtvwbvNQX\n7cD+uKyZvlDt8NZWDGVzyGgE2LnPyePh//UCRibmVmSNj06h2nPrV/dppkXWr+7D+GgejXq41T41\n+rA1E6jPADgjpXz10t+fwUygphpc4Ude4McyUqPUY63spRNa9Cqymt1OoR192jBQSyk/EkJ8IIQQ\nUkoJ4HYAv7Tl1X2IK/xIZX4sIzWaJDRSv+d6s4OtdvSp2XD/MICnhBDHAFwP4Ju2vDoRtY2Xy0gb\nrfjdvX0NBjb1Y1FyZuK+Wm7X2x3VfD6gfw5idbBlJki3q09NZcellG8C2GTLKxKRK7xYRmo2rVA7\nGg5HO5G/WEC+WEZXLIK/+O7rjlVktatPvZmUIiLLjBZ8qFpG2syZnksXL0AyEcWSVAKJeASJeKfm\nc+1YwNKuPmWgJgoIrx0UYUdaYc/+k5oHCqxY0m3LApZ29Sn3+vAhP5Vekb28dFBEq2kFo0CfK5RR\nrkwjbMNQtR19ykDtI3r5vJ3brsREbpKBmzxVRmq84jfWMK3QrvxxO/qUgdpH9MqEfn7sHIqlii9q\nZskeXigjNSq7u1iYxLMvDhpey+3e2sHJPuVvq08Y3eYVShVTEzFEqqmW3cWjc0eohdJUw2vZazl5\nIwzUPmFlr13Va2aJqsKhEHbduhqJmHZQbXQtVwN9X08coQ6gryc+58QXr2DqwycWdscQi4ZRKDUO\nwKrWzBJpGZ0oIjuuvWFno2vZSzl5IxxR+4q5A+BUrZkl0mJHrbKV1YYqYqD2idGJIgqlKVPPXb96\nkWcvWAoeP+Wam8VA7RMLu2Po0xl1VE8Zqu5/cGxwGE/vO4HKlLnATuQ2v+Sam8UctU8YlTItTy/A\nmcxF3VMviFTnVq5ZlcVjDNQ+orVCav3qRbqnX9Rv80ikulZrlc0GXtX27Wag9hGtUcfoRBEHjmif\nBcfqDwoKq4FXtX27maP2odoZbi/umEZkNyu78Km4bzcDtYsabYZuB86YU9BZDbxm9ghpN6Y+XNDu\n/JeXdkwjspvVzZnavUeIGQzULmh3/ssvq7OImmE18BpVULl1F2oqUAsh3gcwDqACoCyl5LFcTWp0\nG+ZkFYYXdkwjslszgVe1u1ArI+rbpJTnHWtJQHjx3Doir7MaeFW7C2Xqo81UzH8R+Z1qgdcqs4F6\nGsC/CiGmAfyDlPIxB9vkayrmv4iCwmz6T7UFLx3T0413XBNCLJNSfiiEWALg3wA8LKX8md7zy+XK\ndCTCgKOnUpnCEz96G4eOn8P5kTwW93Zh87qleGDHtQjbcYgbEbXk8efewt6Dp+Z9/e5tV+HBndc5\n9bIdug+YCdS1hBB/DmBCSvk3es/JZMat/VAd6XQSmcy4HT9KSVb2EfB7X1jF/riMfTFXq/1RnKzg\na48f0kxP9vXE8Y0Hb3bkzjedTuoG6obDNyHEAiFEsvpnAP8ewHH7mhdcXt8jl8iPvLrg5WMAfiiE\nqD7/aSnlTxxtFRGRS1Sc8G8YqKWUpwBsaENbiIhcp+KEP8vziIjqeHnBCxFRIKhWd81ATUSkQ5Vt\nF1i0S0SkOAZqIiLFMVATESmOgZqISHEM1EREimOgJiJSHAM1EZHiGKgDoB2nnRORc7jgxcdU2/yc\niJrDQO1j7T7tnIicwWGVTzU67ZxpECLvYKD2KRU3Pyei5jBQ+1R183MtPO2cyFsYqH2quvm5Fp52\n7i+FUplVPT7HyUQfU23zc7JXtarn2OAwMtk8q3p8jIHax1Tb/Jzsxaqe4DD9sSuECAshjggh/sXJ\nBpH9eNq5/7CqJ1is3B89AuAdpxpCROaxqidYTAVqIUQ/gN8B8G1nm0NEZrCqJ1jM5qi/BeBPACTN\nPDmVSiASsec2O5029ZKBwL6YK+j9sXXDcuw9eErj68vQv6zXhRZdViiVkR0rItUTQzza/qkwv10b\nDXtQCHEXgCEp5RtCiE+b+aHZbK7VdgGY6exMZtyWn+V17Iu52B/Aji0rkcuXcGxwGOdH8rNVPTu2\nrHStb1TYX8ar14bRh4uZj7qtAO4WQnwWQBxAjxDin6SUn7epfUTUhGpVz0O7ujD4/rASVT2sRHFG\nw484KeVXpJT9UspVAO4FsJ9Bmkgd8WhEiaoeVqI4h1XxRB5WnKzg3PmLSgRBVqI4x1KWX0p5AMAB\nR1pCRKbNyQWPF7Eo6f6qxGolyrBGsGYlSms4oibyoGoueHisiOnpy7ngPftPutYm7i/jHAZqIo9R\nORe8e/saDGzqR19PHKEOoK8njoFN/dxfpkXc64PIY8zkgpekEm1u1QzuL+MMjqiJPMYLqxK5v4y9\nGKjJNjztvD3M5oL5fvgHUx/UMhVWowWN0V7jfD/8h4GaWsbVaO1XmwsORztRKU3OjqSf3neC74fP\n8OOVWqJyBUItv6YBYp1hLF28YE66w4n3Q6///NqvquGImlrS7gqE4mQFoxNFJBd2zfuaVoVB0NIA\ndr8fev33uU9fhWcOnApMv7qNgZpa0q7VaPUBI53qwnVXLcI0gKPvndcNFs2kZYwCv+rsfj/0+k+e\nHsEHQxPzvg6YS6/o9bGX+95JDNTUkmoFQu0vc5Wdq9HqA8ZQNo+fvnF2znPqg0WjNMCuW1fPaZ/Z\n0bfKwcTO98Oo/85mJjS/rtWvtThCbw4DNbXM6dPOjQKGlmqwsJoGaDT69koaxa73w6j/pqa1v6dR\nesWpEbrfMVBTy5pZjWZlVGoUMLRUg4WVNICZ0fezLw56oprCrtWBRv0X6tAO1kbpFSdG6EGhzjCA\nPM/MarTK1BSe3ncCX3v8EL7yD4fwtccP4el9J1CZmtL9HqOVeFqinWFEO8MYnShi/ZrFms+pTwM0\nGn1nRvKeqG6p1erqQKOFNcvT3ZpfN0qvtDJCB4JdYcIRNbVFdQT9/Osf4IXDl3PLZkalRnlXLYVS\nBV997BCKpQpSyShWLOlGrjCJ7HhRNw3QaPSN6Wll99ewwmp+XS+NcjmnbC69UpysoDRZsTxC71kQ\nRbQzjKf3nVA+5eQkBmpyVG1ed3isiFCH9vMa3eLWB4zFvV24ZlUvXnn7NyiW5o/GC6WZUdeF8RIu\njJdw28ZluOOmlboBqtEkXDqV8PRey83m143SKGbSK2bf/+Xp7jk56qqRiRK++tgrKNS8x6qmnJzE\nQE2Oqp88anYSqj5grF7Vh8H3h/HikXOm2nFs8ALu2X614SjSaBIuHAq1pbrFKU/ve8/ynUytahrF\n7NeBmVH0/3le4uXjH81+Tev9X7GkG3/6ezfMjtCHxwpzHi9ofBADwcpfM1CTY6xUayxcEENXrPHl\nWA0M8WjEMF1Rz0x6otEknNPVLU6oTE3h6X87gQNHPtR83K5gV5tSiYQ75oyiG8kVypie7sD9A2ux\n41Or8OdPvI6siWO7vJRyalXD3wwhRBzAzwDELj3/GSnl151uGHmflWqN7EQRf/Hd1y3lHq3krq2k\nJ/RGiV7ba7k4WcH3nn8Xh47/Rvc5rQY7rZRKIt6pmcYw04Z8sYwRk2creiHlZBczI+oigO1Sygkh\nRCeAnwsh/p+U8pDDbSOP64pF0Nsd0xwdaU0eNZN73LntKuQLZbx7OovseBHRzvBsfrqWnekJo9t9\nFdTnhY20Guy06qLNjKL12mDlLum61YuU/qC0U8NALaWcBlD9eOy89J9OppFobqDQu4Xdsu5jePtX\nWYxMlOY9ZuZ2vFKZmlMJkEpGsfnaK7D79jX40Uvveyo9Ybf64Gmk+gHWzGpLqwuR9IiVvbN/tnKX\nVNT4QPYrUzlqIUQYwBsA1gD4Oynlq0bPT6USiETs+aRLp5O2/Bw/8EpfPP7cW7q/aKEQMDUFvPPr\nEc0gDczcCoejnUgvXmD6NS6Ml/Dy8Y+wOJXAI/fdiEKpjOxYEameGOJR/0/FVK+NQqmMY4PDpr7n\n1huW47/s2oDv/fgdHDp+DpmRPNK9Xdi8bike2HEtwmHj9NO58xdxYdza6LlWOAREOyN45e2PcPLs\n6OzrfumejUh0RXHo+DkMZfO633/y7CiSC7s031+v/K6YZeoKllJWAFwvhOgF8EMhxDop5XG952ez\nOVsal04nkcmM2/KzvM4rfVGcrOClo2d1H6+uazHKXaeScVRKk7r/3uJkBYeOa1d7vHT0Q9x50wrE\nOsOIABgfzUP9XmtN7bUxlM0hYxDcqmKdIdx72xr8/bNH5+2hsvfgKeTypcald5MVLEqaS1PUWtQT\nQywSxrkLOeSL5Xmve//AWuzcugp33rQCj+99G4ffO6/5c4ZHCxh8f3heGsorvyv1jD5cLFWLSylH\nABwA8NutNYn8yupyby2N8smjE0VkRrSDUe1KtiAyu4rzt9YvBQDd1MXPj51ruHrUaOWing4A//V3\n16FU1k5bHDlxHuO5EoYuDfYeuOsaxKPaYYqTiTWEEGkAk1LKESFEF4ABAH/teMvIk6xMBlWlumMY\nvai/alDrNdK9XZq3xUH65dXSKMe7qCeGT65M4XdvMd60qlCqzE7KGk3y7tx2JS6MFnRHvfNfP45o\nOKT7usNjBXz9idcwOlGaXZTzqeuWYv8b8+/SvFC/bhczqY+lAL53KU8dAvDPUsp/cbZZ5FVWl3v3\n9cTxZ1/chHyxbHoiK9YZxuZ1S7H34Kl5jwXpl1ePVr33dasXoViqQJ7O4uXjH+Hd01msX7MYqWQU\nF8a15wrq1U7yaq041FvMVKvRKk8As3MX1Q+I229cjoFN/YGeIDZT9XEMwMY2tIV8QitQJOIRzdra\njWsXI5mIIpmIWnqNB3Zci1y+5Mtf3lb3u9aq9372xUEcePvyopfhsSJeOHwWK5Z0mw7UtfXOZlec\nxqNhlCYrpld5annzvWF848GbPVO/7gT/T4dT24VDIey6dTVu2bAMmJ5GOpWoWa1mT2ANh40Xn9QH\nO5U3+6+ye7/rar23URldrjCJ2zYuw7HBC8iOF9DbHUOuWNasRa+mlYx+XqgDmJ7GbNt3brsSE7nJ\nhqs8Fy7QrrcH5n5AqFy/7iQGarKVUbBxYlVf/eKT+tdPJaNY0BVFrjCp/M5rTp3mbryFaxF33LQS\n92y/es7o22hPk6FsruF2pdPTM3+IdYaRSHXOe1511L/jU6twZmgCS1Jd+KunDnt20yunMVCTrRoF\nG6dX9dW/fnX3PL32qMLqsWFWmDlAofZ9abSniZkJ4wvjJcN+1lt6rvUzOe/AgwPIRo2CjR0bvlc3\njy+UypZe36n22MXMsWHNMiqj0wqC1dHuNx68Gd/8g834xoM34/6BtbN3IFbK8vT6ufqBOjxWxDRm\nPkA/GJrAiiXd6OuJI9QxM9E8sKnfF/MOreKImmxj9YxCK7ROIV+/um9OCsPSJlCK7bzm9Gnuzez8\nZ3T3U/2+wzJjuDrxwtj8fjb6QM2M5PHNP9iM0mRF6fmEdmOgJts4GWy0TiGvv7W2UsOtWt7T6dPc\n7d75r/rzbtmwDF//zmu6m/8s7I7O6+dG9dvPHhjE7991TdNt8yOmPsg2kXAHEvH5E0dAa8HGbErF\nyi25innP3dvXYGBTv6O3/q2eo1gv3dtluBJy49Xz+3lhdwyppH455runs0qlpVTAETXZZs/+k5q1\n0iuWdLcUbKykVHZvX4N8oYyXak4Vqbd13RVK5j29st91fanj9Vcvxk81Vg72L1mA+z8zfyIx1hnG\nJz6+aM7JL7Wy40Wl0lIqYKB2kBdqd+1iXKtbRrkyjQabsemyklIJh0L4/B0C7/z6guZCjkXJGD5/\nh1CuNK+Wqvtd65VeVqa1Ex9rV/Tq9vP9n7kah09kDOu16TIGagfYvXDBC5ycSLSav411hnGDWKL5\n/BtE2vcfmk7RK73U2zTp6HvD+I+frmj2dyLWid9av9SzZ1C2W2ADtZOjXacWLqis3VULi3svV32Y\neb6flpi7weiOSe/w2UYf0HyPzAtcoHZ6tOvkwgWVtbtqYfWqPoyP6u+77JV8r1c0s31tow9ovkfm\nBS5QOz3adTIFoLp2jJBqTyE3szW8qvlerzG6Y4pHtc+pXL+mz1QA5nvUWKACdTtGu06nAFTGEZJ/\nGd0xbb3uCnR0dNR8QM8sBz/6XgYHDp8NxByN0wIVqNsx2nU6BeAFHCH5k9EdU3XHxNGJIp5/7TRe\nODJ3S1W/z9E4LVCBul2jXU6SkB81umOKdYaxsDume7iun+donBaoQN2u0S5TAM4rlMoYyubYty4w\numMK8hyNkwIVqIH2jnaZArBftWrn2OAwMtk885+K6U5EEYuGNEv2/D5H4yQzh9uuAPAkgCsATAF4\nTEr5qNMNcwpHu94WxBp1L3nu4CnduuqgzNE4wcwQpAzgj6WUnwSwGcAfCiE8v7WV3ZvTkD2q+01r\nbcrTjv2uqXlG7088GsbObVfa8hp614efmTnc9hyAc5f+PC6EeAfAcgC/dLhtFCBmFiIx/6k2o/en\nNFnBRG4SiZj27oqNBHFbhlqW/oVCiFWYOZH8VUdaQ4GldeLHvl+cwZ79J2efU63a0cL8p/ucfH/M\nXB9+ZnoyUQjRDeBZAH8kpRwzem4qlUAkYtN+t+mkLT/HKwqlMrJjRaR6YohH5749fu2LQqmsW9J1\nbHAYD+3qmu2LrRuWY+/BU/Oet3XDMvQv6zV8jY+GcwCmcUXfgnl963WqXBvNvj9GrFwfVar0h11M\nXa1CiE7MBOmnpJQ/aPT8bDbXarsAzHR2JmNmobD3Nbq183NfDGVzyGS19+04P5LH4PvDsymNHVtW\nIpcv4djgMM6P5GerdnZsWanZP5WpKXz/p+/h5bfOzU5yxaNhbL3uCtx7+9W+uG1u5dqwe3Oy6vtT\nX1Wl9/6YYeX6ALwbN4w+XMxUfXQA+A6Ad6SUf2tju6hGkKsZrO43ff/AWjy0qwuD7w83DDB79p/E\n/rpN7QulCn76xll0dHT4vm/1OJXzdaKqKsjbMlSZeUe2AvgCgO1CiDcv/fdZh9sVKEGvZrB6SjYA\nxKORhlU7xckKDssh3cePnMj4vm/1OJ3ztbOqqpnrw2/MVH38HEBHG9oSWGaqGfrb3KZ2c2Ih0uhE\nUfOUl6oLAT3yyYtb8QZ9WwZ/zah4FG/tHLxlTkZ1g/WiZCwQfVvPi2WOQV+o5v2ZFB/grd1ldt8y\n3yCW6D6+cW0wj+VqtYzOzUUnQV2oxhG1IoJ+a+eU3dvXYGp6Gi+/9dHs5vbVqo+g9m2zm5MFfdGJ\nmzqmdU4QbkUmM27LD/VqmU0r9MqlgtgXRqz2R3Gygkw2B3R0IN3b5asRWTPXxuWgq723tJan953Q\nDO4Dm/qVqp7x6u9KOp3UnQvkiNpGdtSkcsc9Z8Q6w+hf4q9FEK2wmvP14gSknzBQ24C3hORVZgcG\nXpyA9BNGEYu0JlKCvg8B+R/3WXEXR9Qm6Y2ad267kreE5Hs8C9RdDNQm6S3xzhXKvCWkQGBlknsY\nqE0wmkh599fZwC9WoWAI+qITNzFHbYLRRMrIRBGfWJnSfGz9mj6MThQDu58E+VNQF524KTAj6lZK\n5xot8b7vM2vRFY/U3BLGkIh34uh7GRw4fJZVIETUEs8E6mYDrR2lc40mUhKxyJxbwudfO40Xjnw4\n+5wgbVlKRPZTPlC3Gmjt2ufZzERKrDOMhd0x3dMoWAVCRM1QPlC3EmjtXE1ldiLFysIAu0/XICJ/\nUjpQF0rllgKtE6upGq3kMrNlKVcyEpEVSkeF7FjjQGvEjdVUZrYs5UpGIrJC6UCd6mkt0Lq1z/Pu\n7WswsKkffT1xhDqAvp44Bjb1Y/f2NYE/douIrFM69RGPRlpeturGaiqjfPbwaI4rGYnIEjOnkD8B\n4C4AQ1LKdc43aa5WA62bq6m08tmtHLvFyUeiYDIzov4ugP8N4Elnm6LNrkCryj7PzWxuU5mawuPP\nvYWXjp7l5CNRAJk5hfxnQohVbWiLITcCrVMjWKt3CXbVghORNzmSo06lEohE7Als6XT7T+WoVKbw\nxI/exqHj55AZySPd24XN65bigR3XIhy2ZwT7yH03olAqIztWRKonhnhU+60olMq6C2iODQ7joV1d\nut/rd25cG6piX8zlt/5w5Dc8m83Z8nPadfZZ/ci5/my4oWweew+eQi5fsn0EGwEwPpqH3r9yKJtD\nJpvXfOz8SB6D7w8rkdJpN6+ei+cE9sVcXu0Pow+XYA7FLtFaeLJ+dZ9SS8BbmXwkIn8I9EyU1sKT\nF458qBkUAXOLbOzmVi04EamjYaAWQnwfwCszfxRnhBC/73yznGe08CSkc2h7tDOM7kSng63Stnv7\nGty97SrNBTRE5H9mqj7ua0dD2s1oH5Cpae3vKZQqeO7gr9peaREOhfDgzutw500rWEdNFECBTH0U\nJysoTVZ0l6cvSkYR69TuGjeXefNkDaJgCtRkYv3kYSyqHYw/+fFFePn4R5qPqbLMm6sUiYIjUIG6\nfuFIoTQFAIhHwyhNVmYXnuzcdhXePZ1VstKCW6QSBU9gArXR5GEiFsFXv3Aj0r1ds6PTVjeDcgpX\nKRIFT2CGYI1OEo9GQnMCsNFWpW7hFqlEwRSYEbXVhSNu7rqnx4kTa4hIfUqNqIuTFQxlc46MDJtd\nOKJSpYUbJ9YQkfuUGFHrTZB96Z6Ntr6OG4cI2KmZLVKJyPuUCNR6E2SJrih2bl1l2+uomM6wyusf\nNkRknevjukmhAAAFX0lEQVSB2miC7NDxc7jzphUtBVOtemNVDhFohh8+bIjIGtcDtdEE2fmRfNMT\nZCrXG9uxWMXLHzZEZI3rgdqoGmNxb1fTE2Qq1hur/OFBROpyPToYVWNsXrfU1IizvlpE1XpjrW1V\n9/3iDPbsP+lKe4jIG1wfUQP6E2QP7LgWFy5c1P0+vRHq1uuuUK7euNGHR7sPJCAi71AiUNdPkHXF\nIsgXy5isTBl+n1564+DRD6GzU+lsvXG7NzXiYhUiapYSgboqEu7AvjfOzI6Q06kurF/dp5nDNRqh\nFif1A/z1V/fh2RcH254n5pFaRNQs13PUtepzuEPZvG4O12iEqqWvJ4aBTf2YBlzJE/NILSJqljKB\n2uoEoNFy6nodAB753HrsunU1jr533vRr2E3FjZ6ISH2mUh9CiN8G8CiAMIBvSyn/yu6GWM3hGi2n\nrreoJ450KuF6npiLVYioGWYOtw0D+DsAdwK4BsB9Qohr7G5IMxsO1Y9Q41HtoFdNLaiyqZFKGz0R\nkfrMjKhvAnBSSnkKAIQQ/xfAfwDwSzsb0syGQ/Uj1O5EFM8dPKW7DwY3NSIiLzITqJcD+KDm72cA\n3Gz0DalUApGI9aD3pXs2ItEVxaHj53B+JI/FvV3YvG4pHthxLcJh48F//6X/P3LfjSiUysiOFZHq\niSEenftPbOU13JZOJ91uglLYH5exL+byW3+YCdQdGl/TK1MGAGSzueZaA2Dn1lW486YVGJ0oYvWq\nPoyP5g0XveiJABgfzWO8wWtU88TNvEY7pdNJZDJa/5pgYn9cxr6Yy6v9YfThYiZQnwGwoubv/QA+\nbLFNhqo53Hg0ohlo7XwNIiLVmQnUrwO4WghxJYCzAO4FcL+jrSIiolkNk7JSyjKALwF4HsA7AP5Z\nSvm20w0jIqIZpuqopZQ/BvBjh9tCREQa1C5zICIiBmoiItV1TE8bVtoREZHLOKImIlIcAzURkeIY\nqImIFMdATUSkOAZqIiLFMVATESmOgZqISHFKnUJe1Y6jv7xCCLECwJMArgAwBeAxKeWj7rbKXZdO\nHfoFgLNSyrvcbo+bhBC9AL4NYB1mth9+QEr5irutcocQ4r8D+M+Y6Ye3APwnKWXB3VbZQ7kRdbuO\n/vKQMoA/llJ+EsBmAH8Y8P4AgEcws0EYzQxofiKl/ASADQhovwghlgP4bwA2SSnXYWaQd6+7rbKP\ncoEaNUd/SSlLAKpHfwWSlPKclPLwpT+PY+YXcbm7rXKPEKIfwO9gZhQZaEKIHgC3APgOAEgpS1LK\nEXdb5aoIgC4hRARAAg7vm99OKgZqraO/AhuYagkhVgHYCOBVl5vipm8B+BPMpIGC7ioAGQD/KIQ4\nIoT4thBigduNcoOU8iyAvwFwGsA5AKNSyn91t1X2UTFQWz76KwiEEN0AngXwR1LKMbfb4wYhxF0A\nhqSUb7jdFkVEANwA4O+llBsBXATwZXeb5A4hRAozd95XAlgGYIEQ4vPutso+Kgbqth/9pTohRCdm\ngvRTUsofuN0eF20FcLcQ4n3MpMS2CyH+ydUWuesMgDNSyuod1jOYCdxBNADgV1LKjJRyEsAPAHzK\n5TbZRsVAPXv0lxAiipkJgb0ut8k1QogOzOQg35FS/q3b7XGTlPIrUsp+KeUqzFwX+6WUvhk1WSWl\n/AjAB0IIcelLtwP4pYtNctNpAJuFEIlLvzO3w0cTq8oFah79Nc9WAF/AzOjxzUv/fdbtRpEyHgbw\nlBDiGIDrAXzT5fa44tJdxTMADmOmNC8E4DFXG2Uj7kdNRKQ45UbUREQ0FwM1EZHiGKiJiBTHQE1E\npDgGaiIixTFQExEpjoGaiEhx/x819Pvbp+ZklwAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.scatter(data[:,0],data[:,1])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## Perform Clustering\n", "\n", "distanceMatrix = computeDistanceMatrix(data, eucledian)\n", "clusters = hierarchical(data,distanceMatrix)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Your Task**: output the cluster assignment. \n", "\n", "We are looking for **five (5)** clusters" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Your Task**: visualize the scatterplot of your cluster assignments" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## Use the array of colors below. Use black for outliers, other colors for the five clusters you constructed.\n", "\n", "colors=['black', 'blue', 'green', 'red', 'purple', 'brown']\n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "\n", "**When you are done**: download the notebook, submit using the following command:\n", "\n", " handin dekhtyar 466-test " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }