CPE/CSC 480 Lab Exercise 9: Decision Trees

In this lab exercise, you will use another tool in the Computational Intelligence Lab at UBC in Vancouver ( http://www.cs.ubc.ca/labs/lci/CIspace). As with the previous exercise, you can use the online version, or download the code, and run it locally on your machine as a Java application.

Decision Trees

The topic of the lab is learning with decision trees. The tool allows you to experiment with predefined examples or your own data sets.

Instructions

Invoke the Decision Tree (dTree) applet, and load the sample "CarExample.txt." By using the "Step" button, the program will create the decision tree for you; with "Auto-Solve" the full tree is created at once. You can select the information to be displayed when a node is clicked on by choosing one of the "View Node Info," "View Mapped Examples," and "View Histogram" buttons. The "Split Node" button allows you to determine which property to use as the decision criterion by clicking on a node that has not been expanded yet (shown in blue). After a tree has been created, the "Test" and "Test New Example" buttons can be used to see if the decision tree makes the right choice.

Tasks

Answer the following questions based on some experiments with the decision tree tool. You will have to compare different results against each other, so it will be helpful to print out or copy and paste them.

Auto-generated Tree

Reset the graph, and use the "Auto-Create" button to generate the tree automatically from the original data set.

Results:When you press the "Test" button, a panel comes up showing the results predicted correctly, incorrectly, and the ones with no prediction. Why does the tool generate a tree that gets a significant number of examples wrong, and can't handle others?

The numbers I got were as follows:

correct: 71%
no prediction: 3%
incorrect: 26%

The numbers may vary since the selection of the training data and the sequence in which they are processed seems to differ for different runs of the program.

Here are some problems:

Not enough samples.
Samples may not be representative of the domain. In this case, the distribution of test cases seems to be significantly different from that of the samples.

One problem is that there are

"Safety-First" Manually Generated Tree

Reset the graph, and use the "Split Node" button to manually control the generation of the tree. After activating that button, click on the blue rectangle. Then select the "safety" attribute, and generate the rest of the tree via the "Step" or "Auto-Solve" buttons. Make sure that you are using the original data set, in case you changed some values or added new examples.

Results:Press the "Test" button for this tree, and compare the results against the auto-generated one. Why do you think the automatically generated tree performs differently?

High Performance Tree

Reset the graph, and use the "Split Node" button to manually control the generation of the tree again. Try to find a criterion that results in a tree with the highest percentage of correctly predicted examples.

Results:What is the best first-choice criterion that you could find? Can you describe a strategy to generate such a high-performance tree?

Low Performance Tree

Reset the graph, and use the "Split Node" button to manually control the generation of the tree again. Try to find a criterion that results in a tree with a small percentage of correctly predicted examples.

Results:What is the least suitable first-choice criterion that you could find? Can you describe a strategy to generate such a low-performance tree?

FJK Home

CPE/CSC 480

Labs

Name and Section:
Status	Final
Points:	10
Deadline:	Tuesday, Nov. 23, end of lab