skip to main content
Department of Computer Science University of Colorado Boulder
cu: home | engineering | mycuinfo | about | cu a-z | search cu | contact cu cs: about | calendar | directory | catalog | schedules | mobile | contact cs
home · events · lecture series · 

Mervyn Young Lecture Series - Pereira


Learning to Analyze Sequences

Mervyn Young Memorial Lecture Series
on Computing Technology and Society

February 22, 2007
3:30pm - 4:30pm
ECCR 265

Fernando C. N. Pereira
Andrew and Debra Rachleff Professor and Chair
Computer and Information Science
University of Pennsylvania

sponsored by
Department of Computer Science

Sequential data -- speech, text, genomic sequences -- floods our storage servers. Much useful information in these data is carried by implicit structure: phonemes and prosody in speech, syntactic structure in text, genes and regulatory elements in genomic sequences. Over the last six years, several of us have been investigating structured linear models, a unified discriminative learning approach to sequence analysis problems. I will review the approach and illustrate it with applications to parsing, information extraction, and gene finding. I will conclude with a summary of other applications and current research questions.

Fernando Pereira photo

Fernando Pereira was born and raised around Lisbon, Portugal. He started college studying electrical engineering but majored in mathematics. While in college, he worked part-time for an architectural CAD project at LNEC, a government engineering laboratory. After graduating, Pereira stayed at LNEC for two years as a systems programmer and administrator, but also was involved in urban traffic modeling, artificial intelligence and logic programming. In 1977 he took a scholarship from the British Council to study artificial intelligence at the University of Edinburgh. There he worked on natural-language understanding and logic programming, and for a while again in architectural CAD. He was involved in creating the first Prolog compiler (for the PDP-10), and also wrote the first widely-used Prolog interpreter for 32-bit UNIX machines.

Pereira graduated in 1982 and joined the Artificial Intelligence Center of SRI International in Menlo Park, CA, where he worked on logic programming, natural-language understanding and later on speech-understanding systems. During 1987-88, he headed SRI's Cambridge, England, research center. Joining AT&T in the summer of 1989, he worked on speech recognition, speech retrieval, probabilistic language models, and several other topics. From 1994 to 2000, he headed the Machine Learning and Information Retrieval department of AT&T Labs -- Research. Pereira spent the 2000-2001 academic year as a research scientist at WhizBang! Labs, where he developed finite-state models and algorithms for information extraction from the Web. He has been at Penn since 2001.

This talk is based on joint work with Axel Bernal, Koby Crammer, John Lafferty, Andrew McCallum, Ryan McDonald and Fei Sha.

Learning to Analyze Sequences

  1. Learning to Analyze Sequences
  2. Sequences Everywhere
  3. Analyzing Sequences
  4. Analysis Challenges
  5. General Setting
  6. Previous Approaches
  7. Previous Approaches
  8. Structured Linear Models
  9. Learning
  10. Margin
  11. Losses
  12. Online Training
  13. Online maximum margin
  14. Analysis by Tagging
  15. Metrics
  16. Features
  17. Gene/protein results
  18. fable search: autism and genetics
  19. Gene Structure
  20. Gene Prediction
  21. Training Methods
  22. Possible Analyses
  23. Gene Features
  24. (table)
  25. Higher Accuracy Gene Prediction with CRAIG
  26. Learning to Parse
  27. Why Dependencies?
  28. Parse Scoring
  29. Scoring a Parse
  30. Finding the Best Parse
  31. Inference Algorithms
  32. Features
  33. Parsing Multiple Languages
  34. What's Next?

The Mervyn Young Memorial Lecture Series on Computing Technology and Society addresses the relationship between innovation in computing technology and changes in society. Established by a 1952 alumnus of the College's Engineering Physics program, the series is co-sponsored by the Department of Computer Science and the College of Engineering and Applied Science. The speakers are leaders from industry and distinguished academics.

See also:
Department of Computer Science
College of Engineering and Applied Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
Send email to

Engineering Center Office Tower
ECOT 717
FAX +1-303-492-2844
XHTML 1.0/CSS2 ©2012 Regents of the University of Colorado
Privacy · Legal · Trademarks
September 14, 2008