Knight Foundation School of Computing and Information Sciences
Labiba Jahan is a fifth-year Ph.D. candidate at the Knight Foundation School of Computing and Information Sciences (KFSCIS), Florida International University (FIU) under the supervision of Professor Mark A. Finlayson. Her research interests are in Natural Language Processing, with a focus on Story Understanding. She holds a B.Sc degree in Computer Science and Engineering from the Shahjalal University of Science and Technology, Bangladesh in 2014. From 2014 to 2016, before joining FIU, she worked as a Lecturer at Metropolitan University, Sylhet, Bangladesh. In 2019 she served as an intern at the Product Simulation and Modelling Group at Siemens Corporate Technology in Princeton, NJ. She won 3rd place at 2019 Florida International University’s Graduate Student Appreciation Week. She has published several workshop and conference papers.
If we are to understand stories, we must understand characters: characters are central to every narrative and drive the action forward. Critically, many stories (especially cultural ones) employ stereotypical character roles in their stories for different purposes, including efficient communication among bundles of default characteristics and associations, ease understanding of those characters’ role in the overall narrative, and many more. These roles include ideas such as hero, villain, or victim, as well as culturally-specific roles such as, for example, the donor (in Russian tales) or the trickster (in Native American tales). My thesis aims to learn these roles automatically, inducing them from data using a novel co-clustering technique.
The first step of learning character roles, however, is to identify which coreference chains correspond to characters, which are defined by narratologists as animate entities that drive the plot forward. The first part of my work has focused on this character identification problem, specifically focusing on the problem of animacy detection. Prior work treated animacy as a word-level property, and researchers developed statistical models to classify words as either animate or inanimate. I claimed this approach to the problem is ill-posed and presented a new hybrid approach for classifying the animacy of coreference chains that achieved state-of-the-art performance.
The next step of my work is to develop approaches first to identify the characters and then a new unsupervised clustering approach to learn stereotypical roles. My character identification system consists of two stages: first, I detect animate chains from the coreference chains using my existing animacy detector; second, I apply a supervised machine learning model that identifies which of those chains qualify as characters. I proposed a narratologically grounded definition of character and built a straightforward supervised machine learning model with a small set of features that achieved state-of-the-art performance.
In the last step, I successfully implemented a clustering approach with plot and thematic information to cluster the archetypes. This work resulted in a completely new approach to understand the structure of stories, greatly advancing the state-of-the-art of story understanding.