CSCI 5822

CSCI 5822 Spring 2018

Goal

The goal of this assignment is to give you a bit of practice manipulating data, using Bayes' rule, and constructing a naive Bayes classifier.  Naive Bayes is described in 10.1 of Barber and understanding examples 10.1 and 10.2 of the text should help you do this assignment.

Data Set

The titanic data set gives the values of four categorical attributes for each of the 2201 people on board the Titanic when it struck an iceberg and sank. The attributes are social class (first class, second class, third class, crew member), age (adult or child), gender, and whether or not the person survived. The titanic data set is available here.

Build a joint probability table, like the ones we discussed in class notes, that represents the joint distribution over all variables, i.e., Pr(Gender, Age, Class, Outcome).  This table should have 32 entries because Gender ∈ {male, female}, Age  ∈ {child, adult}, Class  ∈ {1st, 2nd, 3d, crew}, and Outcome ∈ {death, survival}. You will use the data in this table for the following tasks.  There is nothing to hand in for Task 0.

Build a probability table indicating Pr(death | Gender, Age, Class) for each combination of gender, age, and class. Display this table in the following way:

Male

Female

Child

Child

First

Second

Third

Crew

The rows of each table represent the different classes and the columns the different ages and genders. In each cell of the table, insert the conditional probability. Warning: Be alert to the possibility of a cell containing no data.

After you’ve built the probability table, come up with a rule that uses the probabilities to predict death or survival. Then make a second table, a classification table, which lists death or survival for each feature combination. Explain the rule you chose to classify.