{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab 2: Feature Engineering" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read in the Autos Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "predictors = [\"symboling\", \"normalized-losses\", \"make\", \"fuel-type\",\n", " \"aspiration\", \"num-of-doors\", \"body-style\", \"drive-wheels\",\n", " \"engine-location\", \"wheel-base\", \"length\", \"width\",\n", " \"height\", \"curb-weight\", \"engine-type\", \"num-of-cylinders\",\n", " \"engine-size\", \"fuel-system\", \"bore\", \"stroke\",\n", " \"compression-ratio\", \"horsepower\", \"peak-rpm\", \"city-mpg\",\n", " \"highway-mpg\"]\n", "data = pd.read_csv(\"http://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data\",\n", " header=None,\n", " names=predictors + [\"price\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "for col in data.columns:\n", " if data[col].dtype == object:\n", " data = data[data[col] != \"?\"]\n", " try:\n", " data[col] = pd.to_numeric(data[col])\n", " except:\n", " pass" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Add cells as needed here. You may want to copy over functions that \n", "# you need from your class work. You can either use the `lm` \n", "# function you wrote or LinearRegression in scikit-learn." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predictor Variable 1\n", "\n", "Select a quantitative variable. Construct the scatter plot of the car price against it. Using the constructed scatter plot of the guide, create a non-linear fit model for that you consider to be “good”, and visualize it on a plot. You can use the polynomial expansion (add $x^2$, $x^3$, and so on to the model for the independent variable x), as well as any other functions of x." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Add cells as needed here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the Markdown cell below, explain why you selected this model specifically." ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predictor Variable 2\n", "\n", "Select a different quantitative variable. Construct the scatter plot of the car price against it. Using the constructed scatter plot of the guide, create a non-linear fit model for that you consider to be “good”, and visualize it on a plot. Please try to use a different model than you did for the first predictor. (For example, if you used polynomials for the first predictor, try to use other basis functions for this exercise.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Add cells as need here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the Markdown cell below, explain why you selected this model specifically." ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [conda root]", "language": "python", "name": "conda-root-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }