Question A
- The tvshows data set from AI:FCA-1 (exercise 7.3) is shown below:
Assume that we are reporting errors based on the absolute error value (basically, each error counts as 1). Gove very brief answers to the following questions:
- What is the optimal tree with one node only? What is the associated error?
- What is the optimal tree with a depth of 2 (i.e. a root node with leaves as children)? What is the associated error? Which instances end up at each leaf?
- What is the smallest tree that classifies correctly all training instances? How will it classify a new instance described as (Comedy = true, Doctors = true, Lawyers = true, Guns = true) and another one as (Comedy = false, Doctors = false, Lawyers = true, Guns = true)? Which of the two test instances allow us to say that the tree is able to generalize?
- If you were building the tree using the information gain as a splitting criterion, what would be the root?