{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ee8dc521-999d-4b18-a5b4-10593de22c59",
   "metadata": {},
   "source": [
    "**CSC 365 - Fall 2024- Lab 1-1**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a5acb99-fb76-4a1a-a367-fabcffb77eeb",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f56826e7-6f8b-4a24-8c4f-7e5dfd0d998d",
   "metadata": {},
   "source": [
    "**Name:** ______________________________________\n",
    "\n",
    "**Cal Poly Email:**_____________________________"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06dc0cb6-1c1b-441d-9444-6ed19d785c53",
   "metadata": {},
   "source": [
    "This Jupyter notebook loads a simple CSV file (performance of students from two schools in Portugal in a Mathematics course - see https://www.kaggle.com/datasets/whenamancodes/alcohol-effects-on-study?resource=download)  and contains 10 questions about the data in the file.   You are provided with one cell per question into which you can place any Python code you want in order to produce output that contains the answer to the question.  \n",
    "\n",
    "A valid answer to the question is formed out of the information available in the loaded CSV file. Sometimes this information will be contained in one or more rows of the CSV files directly. Sometimes you may need to perform certain computations with the data found in the CSV file in order to answer the question."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83df65ec-b9a2-4a89-819b-900d7a62b77f",
   "metadata": {},
   "outputs": [],
   "source": [
    "## loading the data\n",
    "\n",
    "filename = \"http://www.csc.calpoly.edu/~dekhtyar/365-Spring2024/labs/lab1/Maths.csv\"\n",
    "\n",
    "df = pd.read_csv(filename)   ## pandas data frame with the data\n",
    "\n",
    "df['Id'] = df.index\n",
    "\n",
    "data = np.array(df)          ## numPy array with the data\n",
    "\n",
    "dl = [list(x) for x in data] ## List of lists with the data\n",
    "\n",
    "columns = list(df.columns)   ## list of column names - in the order of columns in both the Pandas data frame and the NumPy array\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af9809c0-05e6-4a89-9be5-17866ae48665",
   "metadata": {},
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b14cf40-c189-4255-a37f-67406d680763",
   "metadata": {},
   "source": [
    "The data is provided to you in three forms:\n",
    "\n",
    "1. `pandas` data frame.  `pandas` is a Python package for dealing with tabular data that includes a lot of functionality that is similar to that of relational database which we will be studying in this course.  A data frame is a Python data structure for storing two-dimentional tabular data.  If you know how to work with `pandas` data frames - feel free to use `pandas` functionality.\n",
    "\n",
    "2. `numPy` array. `numPy` is a Python package primarily used for dealing with multi-dimensional arrays of data. A CSV file can be stored as a two-dimensional `numPy` array in a natural way, and a 2-dimensional `numPy` array is essentially a Python list or Python lists. `numPy` provides a lot of functionality for working with arrays.\n",
    "\n",
    "3. Python's list of lists. This is the most basic Python structure that does not rely on functionality from any specific Python package. Each row of the CSV file is one element of the list. The element itself is a list containing values of individual cells in the row.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c4dcb18-3666-4d99-ab2a-3d5ca54472b7",
   "metadata": {},
   "source": [
    "Here is what the `pandas` data frame looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ce846aa2-334b-4d8f-8a2b-2e087ca80bb9",
   "metadata": {},
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c9ae250-e948-42a4-aaa2-addecd92a92f",
   "metadata": {},
   "source": [
    "Here is the `numPy` array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "895c1898-aa11-413d-8cb3-834da6f3a85d",
   "metadata": {},
   "outputs": [],
   "source": [
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66edc843-64bb-4241-b772-2da0ab064b3e",
   "metadata": {},
   "source": [
    "And here is the Python list (we are displaying only the first four rows here):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e0a71ef-c5dd-45d1-9171-efe610d1a91a",
   "metadata": {},
   "outputs": [],
   "source": [
    "dl[0:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b26267f4-8a45-45ea-87d3-b318e83289ff",
   "metadata": {},
   "source": [
    "Additionally, the variable `columns` contains the list of column names for the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "91a3b21a-c91d-4313-9dc4-ceb1054544e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "columns"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db7902db-907e-4f86-824d-77002675debd",
   "metadata": {},
   "source": [
    "**Please read the  description of the columns before proceeding with the rest of the lab assignment**: https://www.kaggle.com/datasets/whenamancodes/alcohol-effects-on-study?resource=download\n",
    "\n",
    "To this list, we added one more column, `\"Id\"` to represent the unique identifier of each student. This is the last column in the data frame (and thus, also the last value in each row of the array and the list)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b393677b-241d-4f0c-8f03-74c72081335f",
   "metadata": {},
   "source": [
    "To answer the questions below you can use any of the data representations (data frame, array, list) provided to you.  Your output should be presented in a readable form that makes it clear what you are reporting (if you are working with lists, please engage in come pretty printing. data frames can be presented as-is. for arrays - look at what the output looks like, and if it is messy, pretty print as well).\n",
    "\n",
    "**NOTE:** All questions MUST be answered **in isolation**. That is, you cannot reuse any variables computed when answering one question to answer another question. Instead - **you can reuse the code** (basically, I should be able to copy a full cell answering a question into an empty notebook and have it produce correct result)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f2dfd14-2e91-4aea-9e5b-4e1b2068a869",
   "metadata": {},
   "source": [
    "**Question 1:**\n",
    "Report the current health status for all 20-year old students from Gabriel Pereira ('GP') high school who are in a romantic relationship (report the student Id and their health status)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a051c5b-5625-4595-a79c-ff02ed50c15e",
   "metadata": {},
   "outputs": [],
   "source": [
    "### Your code goes here\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a95486d0-0973-401c-99cb-95bd3e67b5e0",
   "metadata": {},
   "source": [
    "**Question 2:** Determine if there are more 18-year old male or female students who consume significant amounts of alcohol (4 or above) on the weekdays. Report both the number of male and female students."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b491d8db-d938-42a2-bd01-d73a1659e036",
   "metadata": {},
   "outputs": [],
   "source": [
    "### Your code goes here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc296192-ecff-4600-8e06-27553e13b522",
   "metadata": {},
   "source": [
    "**Question 3:** Find the highest total score a student from _Gabriel Pereira_ school received in their math class (add up _G1_, _G2_, and _G3_ scores)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "840ee9e8-0bbb-4d12-931e-557aba6532e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "### Your code goes here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bbc11fd4-47c6-4d0f-a861-775df364e5e0",
   "metadata": {},
   "source": [
    "**Question 4** Find all the students from  _Gabriel Pereira_ High School who received the highest score (sum of G1, G2 and G3) in their math class. Report their Ids, the G1, G2, and G3 scores, and their \n",
    "weekend and weekday alcohol consumption.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "002b9de0-f84b-4b59-93ba-f7e6ac292080",
   "metadata": {},
   "outputs": [],
   "source": [
    "### Your code goes here\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab5ef168-b516-4d67-994e-4f6e09b6edb1",
   "metadata": {},
   "source": [
    "**Question 5:** Compare the average performance of female students who are engaged in extra curricular activities at _Gabriel Pereira_ and _Mousinho da Silveira_ schools in the math class (the G3 score). Report the full name (_Gabriel Pereira_ or _Mousinho da Silveira_) of the school with the higher average performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36f21335-3563-40ab-b77a-ba3082177099",
   "metadata": {},
   "outputs": [],
   "source": [
    "### Your code goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fd7d706-246f-49a9-be27-807807fcaff6",
   "metadata": {},
   "source": [
    "**Question 6:** Break the students into two categories by whether or not they are in a romantic relationship. For each group report (a) the number of students in the group, (b) the average math score (G3). Report the groups in descending order by the average score. (your output should contain two rows, and three values in each row: \"yes\" or \"no\" for relationship status, the number of students, and the average score).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "41fc66c6-7656-489b-a7c0-02929cd5813d",
   "metadata": {},
   "outputs": [],
   "source": [
    "### Your code goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee25278f-aff6-4a8c-a483-bc0d6eabcf5b",
   "metadata": {},
   "source": [
    "**Question 7:** Find the percent of male and female students in each school that consume considerable quantities of alcohol (value of 3 or above) _both_ on weekends and weekdays.\n",
    "(your output shall consist of four rows of data)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1cb1b092-4112-4e5f-a84f-601c646d79ce",
   "metadata": {},
   "outputs": [],
   "source": [
    "### Your code goes here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "086c0254-288a-46cc-97f7-f7027aa5e5f5",
   "metadata": {},
   "source": [
    "**Question 8:**  Find who reports better health on average - students with high math scores (16 or above), or students with low math scores.  Report the two averages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d1da8de3-1eb7-47c8-8052-0aeb4e70874b",
   "metadata": {},
   "outputs": [],
   "source": [
    "### Your code goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7c3a1a6-8c16-4075-95da-291b0b092a03",
   "metadata": {},
   "source": [
    "**Question 9:** Find whether the **percentage** of students with high consumption of alcohol (>=4 either on weekend or weekday) is higher among the students whose parents live together or separately (the \"Pstatus\" column).  Report the status with the higher percentage and the percentage itself."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4174b010-0553-4578-951e-d1ba7283d50c",
   "metadata": {},
   "outputs": [],
   "source": [
    "### Your code goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c74dfd5f-6e7b-4600-838f-cae908d7aa8a",
   "metadata": {},
   "source": [
    "**Question 10:** List the average alcohol consumption level (average between their weekend and weekday consumption) for each student with the highest G3 score in the class. Your output shall contain two values in each row: the Id of the student and their average alcohol consumption level."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ddb41700-1908-4919-b9d8-f509cbe2b591",
   "metadata": {},
   "outputs": [],
   "source": [
    "### Your code goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35ad9052-ef73-4064-878f-b7ce0aac0131",
   "metadata": {},
   "source": [
    "**Congratulations!**\n",
    "\n",
    "You are done!\n",
    "\n",
    "To submit your work, open the terminal window in your Jupyter Labs environment, navigate to the directory where this notebook resides in, and type the following command:\n",
    "\n",
    "                   $ handin dekhtyar 365-lab01-1 <filename>\n",
    "                   \n",
    "where < filename > is the name under which you stored this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e61ff6af-4965-4ace-8222-7f8b5a5ddc22",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}