In this assignment you will experiment
with discrete probability distributions, and use these distributions to
make predictions about the environment.
Part 1
When the Titanic struck and iceberg and sank, there were 2201 people on
board. Some survived, some died. How does survival relate
to other attributes of the individuals? We explore this question
with a probabilistic approach. We consider a sample space of
individuals who are characterized by four random variables:
- Class: What
status did the individual have on the ship? (1st, 2nd, or 3rd class passenger; crew member)
- Age: what
age was the individual? (child, adult)
- Gender:
what gender was the individual? (male, female)
- Survival:
did the individual survive the shipwreck? (yes, no)
For example, the sample point (Class=1st, Age=adult, Gender=male,
Survival=yes) characterizes a subset of individuals. The data set
of 2201 individuals is available from
http://www.cs.colorado.edu/~mozer/courses/3202/titanic.dat.
(a) Using the data set, compute the full joint distribution, i.e.,
P(Class, Age, Gender,
Survival). This distribution has 4x2x2x2 = 32
probabilities. Display the distribution as follows:
P(Class, Age, Gender,
Survival)
|
Survival=yes |
Survival=no |
Gender=male
|
Gender=female
|
Gender=male
|
Gender=female
|
Age=child
|
Age=adult
|
Age=child
|
Age=adult
|
Age=child
|
Age=adult
|
Age=child
|
Age=adult
|
Class=1st
|
|
|
|
|
|
|
|
|
Class=2nd
|
|
|
|
|
|
|
|
|
Class=3rd
|
|
|
|
|
|
|
|
|
Class=crew
|
|
|
|
|
|
|
|
|
(b) Using the data set and the joint distribution, compute
P(Survival=yes | Class, Age,
Gender). Warning: Be alert to the possibility of a cell whose
value is undefined. Display the distribution as follows:
P(Survival=yes | Class, Age, Gender)
|
Gender=male
|
Gender=female
|
Age=child
|
Age=adult
|
Age=child
|
Age=adult
|
Class=1st
|
|
|
|
|
Class=2nd
|
|
|
|
|
Class=3rd
|
|
|
|
|
Class=crew
|
|
|
|
|
(c) Construct the unconditional distribution
P(Survival).
(d) Construct the conditional distributions
P(Gender|Survival),
P(Adult|Survival), and
P(Class|Survival).
(e) Using the distributions you computed in parts (c) and (d), estimate
P(Survival=yes |
Class,Age,Gender) under the Naive Bayes
assumption. See the text and class notes for a description of
Naive Bayes. It boils down to this equation:
P(Survival
| Class,Age,Gender) = alpha P(Class|Survival)
P(Age|Survival) P(Gender|Survival) P(Survival)
(f) How well does the Naive Bayes assumption do in matching the
probabilities you obtained in (b)? Are there any advantages of
estimating the conditional probability using the Naive Bayes assumption?
Part 2
In this portion of the assignment, you are to do an analysis of pit
probabilities in the Wumpus World, analogous to the analysis that was
done in the text and in class. The particular situation you
should consider is as follows:
|
|
|
|
|
OK
breeze
|
|
OK
breeze
|
|
OK
|
|
OK
|
OK
|
OK
|
OK
|
OK
|
The rooms labeled "OK" have been visited and contain no wumpus or
pit. In the two rooms labeled "breeze", the agent sensed a
breeze. Estimate the probability of a pit in each of the
remaining rooms. Show the logic of your work.