skip to main content
Department of Computer Science University of Colorado Boulder
cu: home | engineering | mycuinfo | about | cu a-z | search cu | contact cu cs: about | calendar | directory | catalog | schedules | mobile | contact cs
home · events · thesis defenses · 2011-2012 · 

Thesis Defense - Salvetti

CINC 102

Detecting Deception in Text: A Corpus-Driven Approach
Computer Science PhD Candidate

Deception is a pervasive psycholinguistic phenomenon -- from lies during legal trials to fabricated online reviews. Its identification has been studied for centuries -- from the ancient Chinese method of spitting dry rice to the modern polygraph. The recent proliferation of deceptive online reviews has increased the need for automatic deception filtering systems. Although human performance is in general at chance, previous research suggests that the linguistic signals resulting from conscious deception are sufficient for building automatic systems capable of distinguishing deceptive documents from truthful ones. Our interest is in identifying the invariant traits of deception in text, and we argue that these encouraging results in automatic deception detection are mainly due to the side effects of corpus-specific features. This poses no harm to practical applications, but it does not foster a deeper investigation of deception. To demonstrate this and to allow researchers and practitioners to share results, we have developed the largest publicly available shared multidimensional deception corpus for online reviews. In an attempt to overcome the inherent lack of ground truth, we have also developed a set of semi-automatic techniques to ensure corpus validity. This thesis shows that detecting deception using supervised machine learning methods is brittle. Experiments conducted using this corpus show that accuracy changes across different kinds of deception (e.g., lying vs. fabrication) and text content dimensions (e.g., sentiment), demonstrating the limitations of previous studies. Preliminary results confirm statistical separation, though smaller, between fabricated and truthful reviews, and they do not confirm the existence of separation between truths and lies.

Committee: James Martin, Professor (Chair)
Clayton Lewis, Professor
Wayne Ward, Research Professor
Daniel Jurafsky, Stanford University
Peter Norvig, Google

See also:
Department of Computer Science
College of Engineering and Applied Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
Send email to

Engineering Center Office Tower
ECOT 717
FAX +1-303-492-2844
XHTML 1.0/CSS2 ©2012 Regents of the University of Colorado
Privacy · Legal · Trademarks
May 5, 2012 (13:40)