This is one of several possible projects for CSCI 1300. The following link tells how the projects are used:
Please note that these projects indicate precisely what your program should accomplish, without a precise indication of how the program works. Part of your assignment is designing the techniques of how the program works.
Start by creating a working directory for this project and downloading this 11MB file to that directory:
www.cs.colorado.edu/~main/penn.txtThis file was created by a linguistics undergrad student from the Penn-Treebrook corpus of English sentences, taken from the Wall Street Journal. Each line of the file has the form:
xxxis an English word or punctuation mark and the
YYYis a syntactic category for the word. For example:
chairman/NNmeans that the word
chairmanappears in the corpus and it has been "tagged" as being in the syntactic class
NN(which is the category for singular nouns).
Your job: Write a program that repeatedly asks the user to type an English word. The program reads the word and then looks through the penn.txt file to see which syntactic categories that word appears in. The output of the program should be a table with three columns:
Bonus of 120 points if your program does not re-read the 11MB penn.txt file for each word that the user inputs.