{ "cells": [ { "cell_type": "markdown", "id": "ee8dc521-999d-4b18-a5b4-10593de22c59", "metadata": {}, "source": [ "**CSC 365 - Fall 2022 - Lab 1-2**" ] }, { "cell_type": "code", "execution_count": 3, "id": "5a5acb99-fb76-4a1a-a367-fabcffb77eeb", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "id": "f56826e7-6f8b-4a24-8c4f-7e5dfd0d998d", "metadata": {}, "source": [ "**Name:** ______________________________________\n", "\n", "**Cal Poly Email:**_____________________________" ] }, { "cell_type": "markdown", "id": "06dc0cb6-1c1b-441d-9444-6ed19d785c53", "metadata": {}, "source": [ "This Jupyter notebook loads a simple two CSV files. One (`world_population.csv`) contains information about the population estimates for each country in the world. The other one (`corruption_data.csv`) reports a metric called \"corruption index\" (a value in the 0 to 100 range, reflecting the perception of prevalence of corruption in the country - the higher the value, the **less** corrupt the country is) for several years for most (but not all) countries in the world.\n", "\n", "The notebook contains 10 questions of the data. Some questions may require looking at only one of the two data files, but there may be questions below that would have you combine the data from both files in order to answer them.\n", "\n", "As with Lab 1-1 notebook, each table is provided to you in three different formats (`pandas` data frame, `numPy` 2D array, Python nested list structure). In addition, the list of columns for each table is also available as a separate variable for you to use in your computations.\n", "\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "83df65ec-b9a2-4a89-819b-900d7a62b77f", "metadata": {}, "outputs": [], "source": [ "## loading the world population data\n", "\n", "filenameWorld = \"world_population.csv\"\n", "\n", "worldDf = pd.read_csv(filenameWorld) ## pandas data frame for world population data\n", "\n", "worldData = np.array(worldDf) ## numPy array with the world population data\n", "\n", "worldDl = [list(x) for x in worldData] ## List of lists with the world population data\n", "\n", "worldColumns = list(worldDf.columns) ## list of column names - in the order of columns in both the Pandas data frame and the NumPy array\n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "a505258c-db2d-4704-abe4-2dbda50c8f94", "metadata": {}, "outputs": [], "source": [ "## loading the corruption index data\n", "\n", "filenameCorruption = \"corruption_data.csv\"\n", "\n", "corrDf = pd.read_csv(filenameCorruption) ## pandas data frame for corruption index data\n", "\n", "corrDf['country'] = [x.strip() for x in corrDf[\"region_name\"]]\n", "\n", "\n", "corrData = np.array(corrDf) ## numPy array with the corruption index data\n", "\n", "corrDl = [list(x) for x in corrData] ## List of lists with the corruption index data\n", "\n", "corrColumns = list(corrDf.columns) ## list of column names - in the order of columns in both the Pandas data frame and the NumPy array\n", "\n" ] }, { "cell_type": "markdown", "id": "3b14cf40-c189-4255-a37f-67406d680763", "metadata": {}, "source": [ "The data is provided to you in three forms:\n", "\n", "1. `pandas` data frame. `pandas` is a Python package for dealing with tabular data that includes a lot of functionality that is similar to that of relational database which we will be studying in this course. A data frame is a Python data structure for storing two-dimentional tabular data. If you know how to work with `pandas` data frames - feel free to use `pandas` functionality.\n", "\n", "2. `numPy` array. `numPy` is a Python package primarily used for dealing with multi-dimensional arrays of data. A CSV file can be stored as a two-dimensional `numPy` array in a natural way, and a 2-dimensional `numPy` array is essentially a Python list or Python lists. `numPy` provides a lot of functionality for working with arrays.\n", "\n", "3. Python's list of lists. This is the most basic Python structure that does not rely on functionality from any specific Python package. Each row of the CSV file is one element of the list. The element itself is a list containing values of individual cells in the row.\n" ] }, { "cell_type": "markdown", "id": "5c4dcb18-3666-4d99-ab2a-3d5ca54472b7", "metadata": {}, "source": [ "Here is what the `pandas` data frames looks like:" ] }, { "cell_type": "code", "execution_count": 6, "id": "ce846aa2-334b-4d8f-8a2b-2e087ca80bb9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Rank</th>\n", " <th>CCA3</th>\n", " <th>Country</th>\n", " <th>Capital</th>\n", " <th>Continent</th>\n", " <th>2022 Population</th>\n", " <th>2020 Population</th>\n", " <th>2015 Population</th>\n", " <th>2010 Population</th>\n", " <th>2000 Population</th>\n", " <th>1990 Population</th>\n", " <th>1980 Population</th>\n", " <th>1970 Population</th>\n", " <th>Area (km²)</th>\n", " <th>Density (per km²)</th>\n", " <th>Growth Rate</th>\n", " <th>World Population Percentage</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>36</td>\n", " <td>AFG</td>\n", " <td>Afghanistan</td>\n", " <td>Kabul</td>\n", " <td>Asia</td>\n", " <td>41128771</td>\n", " <td>38972230</td>\n", " <td>33753499</td>\n", " <td>28189672</td>\n", " <td>19542982</td>\n", " <td>10694796</td>\n", " <td>12486631</td>\n", " <td>10752971</td>\n", " <td>652230</td>\n", " <td>63.0587</td>\n", " <td>1.0257</td>\n", " <td>0.52</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>138</td>\n", " <td>ALB</td>\n", " <td>Albania</td>\n", " <td>Tirana</td>\n", " <td>Europe</td>\n", " <td>2842321</td>\n", " <td>2866849</td>\n", " <td>2882481</td>\n", " <td>2913399</td>\n", " <td>3182021</td>\n", " <td>3295066</td>\n", " <td>2941651</td>\n", " <td>2324731</td>\n", " <td>28748</td>\n", " <td>98.8702</td>\n", " <td>0.9957</td>\n", " <td>0.04</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>34</td>\n", " <td>DZA</td>\n", " <td>Algeria</td>\n", " <td>Algiers</td>\n", " <td>Africa</td>\n", " <td>44903225</td>\n", " <td>43451666</td>\n", " <td>39543154</td>\n", " <td>35856344</td>\n", " <td>30774621</td>\n", " <td>25518074</td>\n", " <td>18739378</td>\n", " <td>13795915</td>\n", " <td>2381741</td>\n", " <td>18.8531</td>\n", " <td>1.0164</td>\n", " <td>0.56</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>213</td>\n", " <td>ASM</td>\n", " <td>American Samoa</td>\n", " <td>Pago Pago</td>\n", " <td>Oceania</td>\n", " <td>44273</td>\n", " <td>46189</td>\n", " <td>51368</td>\n", " <td>54849</td>\n", " <td>58230</td>\n", " <td>47818</td>\n", " <td>32886</td>\n", " <td>27075</td>\n", " <td>199</td>\n", " <td>222.4774</td>\n", " <td>0.9831</td>\n", " <td>0.00</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>203</td>\n", " <td>AND</td>\n", " <td>Andorra</td>\n", " <td>Andorra la Vella</td>\n", " <td>Europe</td>\n", " <td>79824</td>\n", " <td>77700</td>\n", " <td>71746</td>\n", " <td>71519</td>\n", " <td>66097</td>\n", " <td>53569</td>\n", " <td>35611</td>\n", " <td>19860</td>\n", " <td>468</td>\n", " <td>170.5641</td>\n", " <td>1.0100</td>\n", " <td>0.00</td>\n", " </tr>\n", " <tr>\n", " <th>...</th>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " </tr>\n", " <tr>\n", " <th>229</th>\n", " <td>226</td>\n", " <td>WLF</td>\n", " <td>Wallis and Futuna</td>\n", " <td>Mata-Utu</td>\n", " <td>Oceania</td>\n", " <td>11572</td>\n", " <td>11655</td>\n", " <td>12182</td>\n", " <td>13142</td>\n", " <td>14723</td>\n", " <td>13454</td>\n", " <td>11315</td>\n", " <td>9377</td>\n", " <td>142</td>\n", " <td>81.4930</td>\n", " <td>0.9953</td>\n", " <td>0.00</td>\n", " </tr>\n", " <tr>\n", " <th>230</th>\n", " <td>172</td>\n", " <td>ESH</td>\n", " <td>Western Sahara</td>\n", " <td>El Aaiún</td>\n", " <td>Africa</td>\n", " <td>575986</td>\n", " <td>556048</td>\n", " <td>491824</td>\n", " <td>413296</td>\n", " <td>270375</td>\n", " <td>178529</td>\n", " <td>116775</td>\n", " <td>76371</td>\n", " <td>266000</td>\n", " <td>2.1654</td>\n", " <td>1.0184</td>\n", " <td>0.01</td>\n", " </tr>\n", " <tr>\n", " <th>231</th>\n", " <td>46</td>\n", " <td>YEM</td>\n", " <td>Yemen</td>\n", " <td>Sanaa</td>\n", " <td>Asia</td>\n", " <td>33696614</td>\n", " <td>32284046</td>\n", " <td>28516545</td>\n", " <td>24743946</td>\n", " <td>18628700</td>\n", " <td>13375121</td>\n", " <td>9204938</td>\n", " <td>6843607</td>\n", " <td>527968</td>\n", " <td>63.8232</td>\n", " <td>1.0217</td>\n", " <td>0.42</td>\n", " </tr>\n", " <tr>\n", " <th>232</th>\n", " <td>63</td>\n", " <td>ZMB</td>\n", " <td>Zambia</td>\n", " <td>Lusaka</td>\n", " <td>Africa</td>\n", " <td>20017675</td>\n", " <td>18927715</td>\n", " <td>16248230</td>\n", " <td>13792086</td>\n", " <td>9891136</td>\n", " <td>7686401</td>\n", " <td>5720438</td>\n", " <td>4281671</td>\n", " <td>752612</td>\n", " <td>26.5976</td>\n", " <td>1.0280</td>\n", " <td>0.25</td>\n", " </tr>\n", " <tr>\n", " <th>233</th>\n", " <td>74</td>\n", " <td>ZWE</td>\n", " <td>Zimbabwe</td>\n", " <td>Harare</td>\n", " <td>Africa</td>\n", " <td>16320537</td>\n", " <td>15669666</td>\n", " <td>14154937</td>\n", " <td>12839771</td>\n", " <td>11834676</td>\n", " <td>10113893</td>\n", " <td>7049926</td>\n", " <td>5202918</td>\n", " <td>390757</td>\n", " <td>41.7665</td>\n", " <td>1.0204</td>\n", " <td>0.20</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>234 rows × 17 columns</p>\n", "</div>" ], "text/plain": [ " Rank CCA3 Country Capital Continent \\\n", "0 36 AFG Afghanistan Kabul Asia \n", "1 138 ALB Albania Tirana Europe \n", "2 34 DZA Algeria Algiers Africa \n", "3 213 ASM American Samoa Pago Pago Oceania \n", "4 203 AND Andorra Andorra la Vella Europe \n", ".. ... ... ... ... ... \n", "229 226 WLF Wallis and Futuna Mata-Utu Oceania \n", "230 172 ESH Western Sahara El Aaiún Africa \n", "231 46 YEM Yemen Sanaa Asia \n", "232 63 ZMB Zambia Lusaka Africa \n", "233 74 ZWE Zimbabwe Harare Africa \n", "\n", " 2022 Population 2020 Population 2015 Population 2010 Population \\\n", "0 41128771 38972230 33753499 28189672 \n", "1 2842321 2866849 2882481 2913399 \n", "2 44903225 43451666 39543154 35856344 \n", "3 44273 46189 51368 54849 \n", "4 79824 77700 71746 71519 \n", ".. ... ... ... ... \n", "229 11572 11655 12182 13142 \n", "230 575986 556048 491824 413296 \n", "231 33696614 32284046 28516545 24743946 \n", "232 20017675 18927715 16248230 13792086 \n", "233 16320537 15669666 14154937 12839771 \n", "\n", " 2000 Population 1990 Population 1980 Population 1970 Population \\\n", "0 19542982 10694796 12486631 10752971 \n", "1 3182021 3295066 2941651 2324731 \n", "2 30774621 25518074 18739378 13795915 \n", "3 58230 47818 32886 27075 \n", "4 66097 53569 35611 19860 \n", ".. ... ... ... ... \n", "229 14723 13454 11315 9377 \n", "230 270375 178529 116775 76371 \n", "231 18628700 13375121 9204938 6843607 \n", "232 9891136 7686401 5720438 4281671 \n", "233 11834676 10113893 7049926 5202918 \n", "\n", " Area (km²) Density (per km²) Growth Rate World Population Percentage \n", "0 652230 63.0587 1.0257 0.52 \n", "1 28748 98.8702 0.9957 0.04 \n", "2 2381741 18.8531 1.0164 0.56 \n", "3 199 222.4774 0.9831 0.00 \n", "4 468 170.5641 1.0100 0.00 \n", ".. ... ... ... ... \n", "229 142 81.4930 0.9953 0.00 \n", "230 266000 2.1654 1.0184 0.01 \n", "231 527968 63.8232 1.0217 0.42 \n", "232 752612 26.5976 1.0280 0.25 \n", "233 390757 41.7665 1.0204 0.20 \n", "\n", "[234 rows x 17 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "worldDf" ] }, { "cell_type": "code", "execution_count": 8, "id": "5b0f6a49-6472-4762-aa33-c949758ac496", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>region_name</th>\n", " <th>2021</th>\n", " <th>2020</th>\n", " <th>2019</th>\n", " <th>2018</th>\n", " <th>2017</th>\n", " <th>2016</th>\n", " <th>2015</th>\n", " <th>2014</th>\n", " <th>2013</th>\n", " <th>2012</th>\n", " <th>country</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Denmark</td>\n", " <td>88</td>\n", " <td>88</td>\n", " <td>87</td>\n", " <td>88</td>\n", " <td>88</td>\n", " <td>90</td>\n", " <td>91</td>\n", " <td>92</td>\n", " <td>91</td>\n", " <td>90</td>\n", " <td>Denmark</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>New Zealand</td>\n", " <td>88</td>\n", " <td>88</td>\n", " <td>87</td>\n", " <td>87</td>\n", " <td>89</td>\n", " <td>90</td>\n", " <td>91</td>\n", " <td>91</td>\n", " <td>91</td>\n", " <td>90</td>\n", " <td>New Zealand</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Finland</td>\n", " <td>88</td>\n", " <td>85</td>\n", " <td>86</td>\n", " <td>85</td>\n", " <td>85</td>\n", " <td>89</td>\n", " <td>90</td>\n", " <td>89</td>\n", " <td>89</td>\n", " <td>90</td>\n", " <td>Finland</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Singapore</td>\n", " <td>85</td>\n", " <td>85</td>\n", " <td>85</td>\n", " <td>85</td>\n", " <td>84</td>\n", " <td>84</td>\n", " <td>85</td>\n", " <td>84</td>\n", " <td>86</td>\n", " <td>87</td>\n", " <td>Singapore</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Sweden</td>\n", " <td>85</td>\n", " <td>85</td>\n", " <td>85</td>\n", " <td>85</td>\n", " <td>84</td>\n", " <td>88</td>\n", " <td>89</td>\n", " <td>87</td>\n", " <td>89</td>\n", " <td>88</td>\n", " <td>Sweden</td>\n", " </tr>\n", " <tr>\n", " <th>...</th>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " </tr>\n", " <tr>\n", " <th>175</th>\n", " <td>Yemen</td>\n", " <td>16</td>\n", " <td>15</td>\n", " <td>15</td>\n", " <td>14</td>\n", " <td>16</td>\n", " <td>14</td>\n", " <td>18</td>\n", " <td>19</td>\n", " <td>18</td>\n", " <td>23</td>\n", " <td>Yemen</td>\n", " </tr>\n", " <tr>\n", " <th>176</th>\n", " <td>Venezuela</td>\n", " <td>14</td>\n", " <td>15</td>\n", " <td>16</td>\n", " <td>18</td>\n", " <td>18</td>\n", " <td>17</td>\n", " <td>17</td>\n", " <td>19</td>\n", " <td>20</td>\n", " <td>19</td>\n", " <td>Venezuela</td>\n", " </tr>\n", " <tr>\n", " <th>177</th>\n", " <td>Somalia</td>\n", " <td>13</td>\n", " <td>12</td>\n", " <td>9</td>\n", " <td>10</td>\n", " <td>9</td>\n", " <td>10</td>\n", " <td>8</td>\n", " <td>8</td>\n", " <td>8</td>\n", " <td>8</td>\n", " <td>Somalia</td>\n", " </tr>\n", " <tr>\n", " <th>178</th>\n", " <td>Syria</td>\n", " <td>13</td>\n", " <td>14</td>\n", " <td>13</td>\n", " <td>13</td>\n", " <td>14</td>\n", " <td>13</td>\n", " <td>18</td>\n", " <td>20</td>\n", " <td>17</td>\n", " <td>26</td>\n", " <td>Syria</td>\n", " </tr>\n", " <tr>\n", " <th>179</th>\n", " <td>South Sudan</td>\n", " <td>11</td>\n", " <td>12</td>\n", " <td>12</td>\n", " <td>13</td>\n", " <td>12</td>\n", " <td>11</td>\n", " <td>15</td>\n", " <td>15</td>\n", " <td>14</td>\n", " <td>13</td>\n", " <td>South Sudan</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>180 rows × 12 columns</p>\n", "</div>" ], "text/plain": [ " region_name 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 \\\n", "0 Denmark 88 88 87 88 88 90 91 92 91 90 \n", "1 New Zealand 88 88 87 87 89 90 91 91 91 90 \n", "2 Finland 88 85 86 85 85 89 90 89 89 90 \n", "3 Singapore 85 85 85 85 84 84 85 84 86 87 \n", "4 Sweden 85 85 85 85 84 88 89 87 89 88 \n", ".. ... ... ... ... ... ... ... ... ... ... ... \n", "175 Yemen 16 15 15 14 16 14 18 19 18 23 \n", "176 Venezuela 14 15 16 18 18 17 17 19 20 19 \n", "177 Somalia 13 12 9 10 9 10 8 8 8 8 \n", "178 Syria 13 14 13 13 14 13 18 20 17 26 \n", "179 South Sudan 11 12 12 13 12 11 15 15 14 13 \n", "\n", " country \n", "0 Denmark \n", "1 New Zealand \n", "2 Finland \n", "3 Singapore \n", "4 Sweden \n", ".. ... \n", "175 Yemen \n", "176 Venezuela \n", "177 Somalia \n", "178 Syria \n", "179 South Sudan \n", "\n", "[180 rows x 12 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corrDf" ] }, { "cell_type": "markdown", "id": "0c9ae250-e948-42a4-aaa2-addecd92a92f", "metadata": {}, "source": [ "Here are the `numPy` arrays:" ] }, { "cell_type": "code", "execution_count": 14, "id": "895c1898-aa11-413d-8cb3-834da6f3a85d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[36, 'AFG', 'Afghanistan', ..., 63.0587, 1.0257, 0.52],\n", " [138, 'ALB', 'Albania', ..., 98.8702, 0.9957, 0.04],\n", " [34, 'DZA', 'Algeria', ..., 18.8531, 1.0164, 0.56],\n", " ...,\n", " [46, 'YEM', 'Yemen', ..., 63.8232, 1.0217, 0.42],\n", " [63, 'ZMB', 'Zambia', ..., 26.5976, 1.028, 0.25],\n", " [74, 'ZWE', 'Zimbabwe', ..., 41.7665, 1.0204, 0.2]], dtype=object)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "worldData" ] }, { "cell_type": "code", "execution_count": 15, "id": "6e0a71ef-c5dd-45d1-9171-efe610d1a91a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[' Denmark', 88, 88, ..., 92, 91, 90],\n", " [' New Zealand', 88, 88, ..., 91, 91, 90],\n", " [' Finland', 88, 85, ..., 89, 89, 90],\n", " ...,\n", " [' Somalia', 13, 12, ..., 8, 8, 8],\n", " [' Syria', 13, 14, ..., 20, 17, 26],\n", " [' South Sudan', 11, 12, ..., 15, 14, 13]], dtype=object)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corrData" ] }, { "cell_type": "markdown", "id": "66edc843-64bb-4241-b772-2da0ab064b3e", "metadata": {}, "source": [ "And here are the Python lists (we are displaying only the first four rows here):" ] }, { "cell_type": "code", "execution_count": 16, "id": "f91bc7db-519f-49b7-be1b-282630aed54a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[36,\n", " 'AFG',\n", " 'Afghanistan',\n", " 'Kabul',\n", " 'Asia',\n", " 41128771,\n", " 38972230,\n", " 33753499,\n", " 28189672,\n", " 19542982,\n", " 10694796,\n", " 12486631,\n", " 10752971,\n", " 652230,\n", " 63.0587,\n", " 1.0257,\n", " 0.52],\n", " [138,\n", " 'ALB',\n", " 'Albania',\n", " 'Tirana',\n", " 'Europe',\n", " 2842321,\n", " 2866849,\n", " 2882481,\n", " 2913399,\n", " 3182021,\n", " 3295066,\n", " 2941651,\n", " 2324731,\n", " 28748,\n", " 98.8702,\n", " 0.9957,\n", " 0.04],\n", " [34,\n", " 'DZA',\n", " 'Algeria',\n", " 'Algiers',\n", " 'Africa',\n", " 44903225,\n", " 43451666,\n", " 39543154,\n", " 35856344,\n", " 30774621,\n", " 25518074,\n", " 18739378,\n", " 13795915,\n", " 2381741,\n", " 18.8531,\n", " 1.0164,\n", " 0.56],\n", " [213,\n", " 'ASM',\n", " 'American Samoa',\n", " 'Pago Pago',\n", " 'Oceania',\n", " 44273,\n", " 46189,\n", " 51368,\n", " 54849,\n", " 58230,\n", " 47818,\n", " 32886,\n", " 27075,\n", " 199,\n", " 222.4774,\n", " 0.9831,\n", " 0.0]]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "worldDl[0:4]" ] }, { "cell_type": "code", "execution_count": 17, "id": "73609174-af56-4aae-a4ef-b1b2aabb5594", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[' Denmark', 88, 88, 87, 88, 88, 90, 91, 92, 91, 90],\n", " [' New Zealand', 88, 88, 87, 87, 89, 90, 91, 91, 91, 90],\n", " [' Finland', 88, 85, 86, 85, 85, 89, 90, 89, 89, 90],\n", " [' Singapore', 85, 85, 85, 85, 84, 84, 85, 84, 86, 87]]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corrDl[0:4]" ] }, { "cell_type": "markdown", "id": "b26267f4-8a45-45ea-87d3-b318e83289ff", "metadata": {}, "source": [ "Additionally, here are the column lists for the world population CSV file and the corruption index CSV file. Please note that the column names are pretty self-explanatory." ] }, { "cell_type": "code", "execution_count": 19, "id": "91a3b21a-c91d-4313-9dc4-ceb1054544e6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Rank',\n", " 'CCA3',\n", " 'Country',\n", " 'Capital',\n", " 'Continent',\n", " '2022 Population',\n", " '2020 Population',\n", " '2015 Population',\n", " '2010 Population',\n", " '2000 Population',\n", " '1990 Population',\n", " '1980 Population',\n", " '1970 Population',\n", " 'Area (km²)',\n", " 'Density (per km²)',\n", " 'Growth Rate',\n", " 'World Population Percentage']" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "worldColumns" ] }, { "cell_type": "code", "execution_count": 20, "id": "737e5680-bc57-439f-a4eb-00a99b4ed55d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['region_name',\n", " '2021',\n", " '2020',\n", " '2019',\n", " '2018',\n", " '2017',\n", " '2016',\n", " '2015',\n", " '2014',\n", " '2013',\n", " '2012']" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corrColumns" ] }, { "cell_type": "markdown", "id": "db7902db-907e-4f86-824d-77002675debd", "metadata": {}, "source": [ "The world population information is taken from the following Kaggle dataset: https://www.kaggle.com/datasets/iamsouravbanerjee/world-population-dataset\n", "\n", "The world corruption index information is taken from another Kaggle dataset: https://www.kaggle.com/datasets/tr1gg3rtrash/global-corruption-index" ] }, { "cell_type": "markdown", "id": "b393677b-241d-4f0c-8f03-74c72081335f", "metadata": {}, "source": [ "To answer the questions below you can use any of the data representations (data frame, array, list) provided to you. Your output should be presented in a readable form that makes it clear what you are reporting (if you are working with lists, please engage in come pretty printing. data frames can be presented as-is. for arrays - look at what the output looks like, and if it is messy, pretty print as well).\n", "\n", "**NOTE:** All questions MUST be answered **in isolation**. That is, you cannot reuse any variables computed when answering one question to answer another question. Instead - **you can reuse the code** (basically, I should be able to copy a full cell answering a question into an empty notebook and have it produce correct result)." ] }, { "cell_type": "markdown", "id": "4f2dfd14-2e91-4aea-9e5b-4e1b2068a869", "metadata": {}, "source": [ "**Question 1:** Find how much the population of a country whose capital is the city of _Lusaka_ changed between 1970 and 2000. Report the name of the country and the absolute value of the population change." ] }, { "cell_type": "code", "execution_count": 32, "id": "5a051c5b-5625-4595-a79c-ff02ed50c15e", "metadata": {}, "outputs": [], "source": [ "### Your code goes here\n", "\n" ] }, { "cell_type": "markdown", "id": "a95486d0-0973-401c-99cb-95bd3e67b5e0", "metadata": {}, "source": [ "**Question 2:** Report the change in the corruption index of Portugal between the years of 2012 and 2021." ] }, { "cell_type": "code", "execution_count": null, "id": "b491d8db-d938-42a2-bd01-d73a1659e036", "metadata": {}, "outputs": [], "source": [ "### Your code goes here" ] }, { "cell_type": "markdown", "id": "bc296192-ecff-4600-8e06-27553e13b522", "metadata": {}, "source": [ "**Question 3:** Find the average area for a country in each continent and output the list of contintents and the average area of a country in each continent sorted in descending order by the average area. Use `Area (km²)` column." ] }, { "cell_type": "code", "execution_count": 34, "id": "840ee9e8-0bbb-4d12-931e-557aba6532e6", "metadata": {}, "outputs": [], "source": [ "### Your code goes here" ] }, { "cell_type": "markdown", "id": "bbc11fd4-47c6-4d0f-a861-775df364e5e0", "metadata": {}, "source": [ "**Question 4** Let us break all countries into \"good\" (corruption index of 70 or higher), \"average\" (corruption index between 45 and 69), and \"bad\" (corruption index below 45). Report the number of countries with \"good\", \"average\" and \"bad\" corruption indexes in 2020. Your output shall consist of three rows - one per type of country (\"good\", \"average\", \"bad\")\n" ] }, { "cell_type": "code", "execution_count": 35, "id": "002b9de0-f84b-4b59-93ba-f7e6ac292080", "metadata": {}, "outputs": [], "source": [ "### Your code goes here" ] }, { "cell_type": "markdown", "id": "ab5ef168-b516-4d67-994e-4f6e09b6edb1", "metadata": {}, "source": [ "**Question 5:** Find the most corrupt country (or countries - there may be ties) of Africa in 2020." ] }, { "cell_type": "code", "execution_count": 36, "id": "36f21335-3563-40ab-b77a-ba3082177099", "metadata": {}, "outputs": [], "source": [ "### Your code goes here" ] }, { "cell_type": "markdown", "id": "1fd7d706-246f-49a9-be27-807807fcaff6", "metadata": {}, "source": [ "**Question 6:** Find the Asian country (or countries - there may be ties) that showed the highest absolute decline of its corruption index between 2012 and 2020 (if no such country exists, report a country that showed the slowest absolute progress). Report the names of the countries (in alphabetical order if more than one country matches this information request), and the 2012 and 2020 corruption indexes." ] }, { "cell_type": "code", "execution_count": 37, "id": "41fc66c6-7656-489b-a7c0-02929cd5813d", "metadata": {}, "outputs": [], "source": [ "### Your code goes here" ] }, { "cell_type": "markdown", "id": "ee25278f-aff6-4a8c-a483-bc0d6eabcf5b", "metadata": {}, "source": [ "**Question 7:** List all countries in Europe which had a population increase between 2015 and 2020 sorted in descending order by their 2020 corruption index. For each country report the absolute population increase between 2015 and 2020 and the 2020 corruption index. Report only countries for which corruption index exists." ] }, { "cell_type": "code", "execution_count": null, "id": "1cb1b092-4112-4e5f-a84f-601c646d79ce", "metadata": {}, "outputs": [], "source": [ "### Your code goes here" ] }, { "cell_type": "markdown", "id": "086c0254-288a-46cc-97f7-f7027aa5e5f5", "metadata": {}, "source": [ "**Question 8:** Find all countries NOT in Africa for which there is no corruption index data. Report their names in alphabetical order." ] }, { "cell_type": "code", "execution_count": 38, "id": "d1da8de3-1eb7-47c8-8052-0aeb4e70874b", "metadata": {}, "outputs": [], "source": [ "### Your code goes here" ] }, { "cell_type": "markdown", "id": "d7c3a1a6-8c16-4075-95da-291b0b092a03", "metadata": {}, "source": [ "**Question 9:** Report the 2020 population of the least corrupt country for that year on each continent (include all ties). Report the continent, the name of the country, the 2020 corruption index, and the 2020 population. Sort output in alphabetical order by continent, and within continent - in descending order by the country's 2020 population (if there are ties)." ] }, { "cell_type": "code", "execution_count": 39, "id": "4174b010-0553-4578-951e-d1ba7283d50c", "metadata": {}, "outputs": [], "source": [ "### Your code goes here" ] }, { "cell_type": "markdown", "id": "c74dfd5f-6e7b-4600-838f-cae908d7aa8a", "metadata": {}, "source": [ "**Question 10:** Find the largest (by 2020 population) country with a \"good\" (70 or above) corruption index **for each** of the years reported in the corruption index data file. Report in chronological order. Each row shall contian the year, the country, its 2020 (note!) population, and its corruption index for the given year." ] }, { "cell_type": "code", "execution_count": 40, "id": "ddb41700-1908-4919-b9d8-f509cbe2b591", "metadata": {}, "outputs": [], "source": [ "### Your code goes here" ] }, { "cell_type": "markdown", "id": "35ad9052-ef73-4064-878f-b7ce0aac0131", "metadata": {}, "source": [ "**Congratulations!**\n", "\n", "You are done!\n", "\n", "To submit your work, open the terminal window in your Jupyter Labs environment, navigate to the directory where this notebook resides in, and type the following command:\n", "\n", " $ handin dekhtyar 365-lab01-2 <filename>\n", " \n", "where < filename > is the name under which you stored this notebook." ] }, { "cell_type": "code", "execution_count": null, "id": "e61ff6af-4965-4ace-8222-7f8b5a5ddc22", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 5 }