Lab Exercise 9: Decision Tree Learning
In this lab exercise, you will use a tool in the Computational Intelligence Lab at UBC in Vancouver ( http://www.cs.ubc.ca/labs/lci/CIspace). You can use the online version, or download the code, and run it locally on your machine as a Java application.
Based on some mixed feedback from students in previous courses about the particular example below (decision trees), you may replace this exercise with another one about learning; see more details below.
[Note: A new version has become available, and it differs somewhat from the one I used to generate the instructions below. Let me know if the discrepancies are significant.]

Tasks
The topic of the lab is learning with decision trees. The tool allows you to experiment with predefined examples or your own data sets. Answer the questions below based on some experiments with the decision tree tool. You can use the tables below to record your results, but you don’t need to submit them. You should submit the answers to the questions for the three tree types through a Google form (see the link under “Submission”).
You will have to compare different results against each other, so it will be helpful to print out or copy and paste them.
Instructions
Invoke the Decision Tree (dTree) applet, and load the sample "CarExample.txt." By using the "Step" button, the program will create the decision tree for you; with "Auto-Solve" the full tree is created at once. You can select the information to be displayed when a node is clicked on by choosing one of the "View Node Info," "View Mapped Examples," and "View Histogram" buttons. The "Split Node" button allows you to determine which property to use as the decision criterion by clicking on a node that has not been expanded yet (shown in blue). After a tree has been created, the "Test" and "Test New Example" buttons can be used to see if the decision tree makes the right choice.
Auto-generated Tree
Reset the graph, and use the "Auto-Create" button to generate the tree automatically from the original data set.
Results: When you press the "Test" button, a panel comes up showing the results predicted correctly, incorrectly, and the ones with no prediction. Enter your results in the table below. Use different first attributes in the rows of the table.
First Attribute
Correct
Undetermined
Incorrect
 
 
 
 
 
 
 
 
 
 
 
 

Why does the tool generate a tree that get a significant number of examples wrong, and can't handle others?



"Safety-First" Manually Generated Tree
Reset the graph, and use the "Split Node" button to manually control the generation of the tree. After activating that button, click on the blue rectangle. Then select the "safety" attribute, and generate the rest of the tree via the "Step" or "Auto-Solve" buttons. Reset the graph and run the program repeatedly. Make sure that you are using the original data set, in case you changed some values or added new examples.

Results: Press the "Test" button for this tree, and compare the results against the auto-generated one.
First Attribute
Correct
Undetermined
Incorrect
Safety
 
 
 
Safety
 
 
 
Safety
 
 
 

Why do you think the automatically generated trees perform differently from the manually generated ones?

High Performance Tree
Reset the graph, and use the "Split Node" button to manually control the generation of the tree again. Try to find a criterion that results in a tree with the highest percentage of correctly predicted examples.
Results: What is the best first-choice attribute that you could find? Enter three results from different runs in the table below.
First Attribute
Correct
Undetermined
Incorrect
 
 
 
 
 
 
 
 
 
 
 
 

Can you describe a strategy to generate such a high-performance tree?

Learning Exercise Alternatives
Students from past courses have indicated that the overhead between doing the exercise and the learning experience is not too favorable, and that it might be helpful to examine others.
One possible alternative is the Neural Networks tool from the
http://www.cs.ubc.ca/labs/lci/CIspace) Web site. I also have used this in the past, and the feedback was mixed as well, but for different reasons (difficult to understand, did not work reliably). In the meantime, it has been revised, and seems easier to use now.
If you happen to encounter other exercises, simulations, or demos relevant to the topic of learning, you can also evaluate them for use as a lab exercise. Here are some criteria that I usually apply:
  • easy to install, easy to use, good documentation
  • appropriate time required (1-2 hours typically)
  • related to and consistent with the material discussed in class
  • good learning experience

Administrative Aspects
Assignment Submission
This assignment must be submitted electronically via a Web form.
Collaboration
This exercise is an individual assignment.
Questions about the Assignment
If you have general questions or comments concerning exercise, post them on the Blackboard Discussion Forum for the assignment. The grader and I will check that forum on a regular basis, and try to answer your questions. If you know the answer to a support or clarification question posted by somebody else, feel free to answer it; this will count as extra participation credit.