This is one of several possible projects for CSCI 1300. The following link tells how the projects are used:
Please note that these projects indicate precisely what your program should accomplish, without a precise indication of how the program works. Part of your assignment is designing the techniques of how the program works.
Start by creating a working directory for this project and downloading this 11MB file to that directory:
www.cs.colorado.edu/~main/penn.txtThis file was created by a linguistics undergrad student from the Penn-Treebrook corpus of English sentences, taken from the Wall Street Journal. Each line of the file has the form:
xxx/YYYwhere the
xxx
is an English word or punctuation mark
and the YYY
is a syntactic category for the word. For example:
chairman/NNmeans that the word
chairman
appears in the corpus
and it has been "tagged" as being in the syntactic class NN
(which is the category for singular nouns).
Your job: Write a program that repeatedly asks the user to type an English word. The program reads the word and then looks through the penn.txt file to see which syntactic categories that word appears in. The output of the program should be a table with three columns:
NN
).
Bonus of 120 points if your program does not re-read the 11MB
penn.txt file for each word that the user inputs.