Automatic Detection of Verbal Deception in Romanian with Artificial Intelligence Methods

M. Crudu

doi:10.24193/subbi.2024.1.05

M. Crudu Department of Computer Science, Babes-Bolyai University, 1, M. Kogalniceanu Street, 400084, Cluj-Napoca, Romania

DOI: https://doi.org/10.24193/subbi.2024.1.05

Abstract

Automatic deception detection is an important task with several applications in both direct physical human communication, as well as in computer-mediated one. The objective of this paper is to study the nature of deceptive language. The primary goal of this study is to investigate deception in Romanian written communication. We created a number of artificial intelligence models (based on Support Vector Machine, Random Forest, and Artificial Neural Network) to detect dishonesty in a topic-specific corpus. To assess the efficiency of the Linguistic Inquiry and Word Count (LIWC) categories in Romanian, we conducted a comparison between multiple text representations based on LIWC, TF-IDF, and LSA. The results show that in the case of datasets with a common subject such as the one we used regarding friendship, text categorization is more successful using general text representations such as TF-IDF or LSA. The proposed approach achieves an accuracy of the classification of 91.3%, outperforming the similar approaches presented in the literature. These findings have implications in fields like linguistics and opinion mining, where research on this subject in languages other than English is necessary.

References

[1] Angela Almela, Rafael Valencia-Garcıa, and Pascual Cantos. Seeing through deception: A computational approach to deceit detection in written communication. In Eileen Fitzpatrick, Joan Bachenko, and Tommaso Fornaciari, editors, Proceedings of the Workshop on Computational Approaches to Deception Detection, pages 15–22, Avignon, France, April 2012. Association for Computational Linguistics.
[2] Luigi Anolli, Michela Balconi, and Maria Ciceri. Linguistic styles in deceptive communication: Dubitative ambiguity and elliptic eluding in packaged lies. Social Behavior and Personality: an international journal, 31:687–710, 01 2003.
[3] Jeffrey S. Bedwell, Shaun Gallagher, Shannon N. Whitten, and Stephen M. Fiore. Linguistic correlates of self in deceptive oral autobiographical narratives. Consciousness and cognition, 20(3):547–555, 2011.
[4] Diana Paula Dud˘au and Florin Alin Sava. Performing multilingual analysis with linguistic inquiry and word count 2015 (liwc2015). an equivalence study of four languages. Frontiers in Psychology, 12:570568, 2021.
[5] David Freedman, Robert Pisani, and Roger Purves. Statistics (international student edition). Pisani, R. Purves, 4th edn. WW Norton & Company, New York, 2007.
[6] Jeffrey T. Hancock, Lauren E. Curry, Saurabh Goorha, and Michael T. Woodworth. Lies in conversation: An examination of deception using automated linguistic analysis. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 26, 2004.
[7] Saurabh Goorha Jeffrey T. Hancock, Lauren E. Curry and Michael Woodworth. On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Processes, 45(1):1–23, 2007.
[8] Rada Mihalcea and Carlo Strapparava. The lie detector: Explorations in the automatic recognition of deceptive language. In Keh-Yih Su, Jian Su, Janyce Wiebe, and Haizhou Li, editors, Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 309–312, Suntec, Singapore, August 2009. Association for Computational Linguistics.
[9] Matthew L. Newman, James W. Pennebaker, Diane S. Berry, and Jane M. Richards. Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin, 29(5):665–675, 2003. PMID: 15272998.
[10] Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T. Hancock. Finding deceptive opinion spam by any stretch of the imagination. In Dekang Lin, Yuji Matsumoto, and Rada Mihalcea, editors, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 309–319, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
[11] Katerina Papantoniou, Panagiotis Papadakos, Theodore Patkos, Giorgos Flouris, Ion Androutsopoulos, and Dimitris Plexousakis. Deception detection in text and its relation to the cultural dimension of individualism/collectivism. CoRR, abs/2105.12530, 2021.
[12] James W. Pennebaker and Martha E. Francis. Linguistic Inquiry and Word Count. Lawrence Erlbaum Associates, Incorporated, 1999.
[13] Veronica Perez-Rosas and Rada Mihalcea. Cross-cultural deception detection. In Kristina Toutanova and Hua Wu, editors, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 440–445, Baltimore, Maryland, June 2014. Association for Computational Linguistics.
[14] Veronica Perez-Rosas and Rada Mihalcea. Experiments in open domain deception detection. In Llu´ıs M`arquez, Chris Callison-Burch, and Jian Su, editors, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1120–1125, Lisbon, Portugal, September 2015. Association for Computational Linguistics.
[15] Marko Robnik-Sikonja and Igor Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53:23–69, 10 2003.
[16] Aldert Vrij, P¨ar Anders Granhag, and Stephen Porter. Pitfalls and opportunities in nonverbal and verbal lie detection. Psychological Science in the Public Interest, 11(3):89–121, 2010. PMID: 26168416.