Computer Science Department Public Lecture: epistemologies and their limits
Please join us for the following research presentation by a candidate for the faculty position in the Department of Computer Science.
Presenter: Dr. Farhan Samir
Research Talk: Haystack epistemologies and their limits
Abstract: Modern speech and text-based language models are trained on ever larger volumes of consolidated data: tens of millions of news articles, hundreds of thousands of hours of recorded speech, and millions of volunteer contributions to crowd-sourced encyclopedias. Critical archival scholars have argued that these datasets are typically amassed by prioritizing scale over careful appraisal of their contents, reflecting a haystack-like approach to data collection. As these data haystacks grow larger, their apparent comprehensiveness makes questions about what they might lack increasingly difficult to answer. The haystack approach tends to obscure its own circumstances of production, resulting in unexpected failures when AI models built atop these haystacks are applied outside of those circumstances. In this talk I present three empirical methods, applied to Wikipedia, news corpora, and major English speech repositories, that reveal how even massive data haystacks remain stubbornly contingent on geographic, relational, and linguistic circumstances, and that despite their size contain vast gaps in coverage beyond these contexts. Taken together, my work reveals that these gaps cannot be resolved by collecting yet larger haystacks. Instead, they demand targeted interventions in making our knowledge infrastructures more pluralistic. I will conclude with a discussion of some of these interventions, from developing new benchmarks to archives to novel interfaces for information navigation.
Bio: Farhan Samir is an NSERC Postdoctoral Fellow in the Computer Science Department at the University of Toronto, advised by Professor Syed Ishtiaque Ahmed. He completed his PhD at the University of British Columbia, where his research combined computational methods with questions about knowledge representation and information systems.
For the Zoom link, please contact grad.compsci@lakeheadu.ca
