MSc Thesis Defense - Computer Science: Protiva Arafin

Event Date: 
Thursday, April 30, 2026 - 11:00am to 12:30pm EDT
Event Location: 
Zoom

Please join the Department of Computer Science for the upcoming thesis defense:

Presenter: Protiva Arafin

Thesis title: Evidence-Grounded Clinical Pharmacogenomics Question Answering System Using Large Language Models and Hybrid Retrieval Augmentation

Abstract: Pharmacogenomics (PGx) is very important for personalized medicine since it helps doctors choose the right drugs and doses based on a person’s genetic makeup. But the growing amount and complexity of PGx data, as well as the requirement to understand clinical recommendations, make it harder to make good decisions. This study puts forward a data-driven clinical decision support framework that combines large language models (LLMs) with hybrid retrieval-augmented generation (RAG) to enhance the responding of pharmacogenomic questions.

The framework assesses two contemporary LLMs, Meta-LLaMA-3.1-8B-Instruct and Qwen3-8B, through various configurations, encompassing base models, Low-Rank Adaptation (LoRA) fine-tuning, and hybrid RAG-based methodologies. The structured pharmacogenomics data from CPIC and the clinical guideline information from ClinPGx are combined to make a huge dataset. To make it easier to find and use in models, the data goes through procedures including merging, cleaning, normalizing, and converting to JSONL format.A hybrid retrieval approach is aimed to enhance factual grounding by integrating lexical filtering with semantic similarity through sentence embeddings. This research use both automatic metrics and manual checks to rate the models on their correctness, relevance, completeness, and clarity. The results reveal that Qwen works well as a basic model, and that LLaMA gets much better when it is used with RAG and LoRA, giving answers that are more aware of the context and therapeutically useful. Fine-tuning alone doesn’t always work, which shows how limited it is to only use parametric data. The results show that accuracy in clinical settings needs to be backed up by consistency, relevance, and evidence.

This study demonstrates that employing retrieval methods alongside parameter-efficient fine-tuning enhances the reliability and utility of LLM-based systems in clinical environments. The proposed methodology establishes a scalable framework for the development of trustworthy AI-driven solutions in pharmacogenomics and healthcare decision support.


Committee Members:
Dr. Abedalrhman Alkhateeb (supervisor, committee chair), Dr.
Md Moniruzzaman (co-supervisor), Dr. Saad B. Ahmed, Dr. Malek Alsmadi (Electrical & Computer Engineering)


Please contact grad.compsci@lakeheadu.ca for the Zoom link. Everyone is welcome.