Thesis Defense - Computer Science: Arvind Chidambaram Boominathan
Please join the Computer Science Department for the upcoming thesis defense:
Presenter: Arvind Chidambaram Boominathan
Thesis title: Integrating Multi-omics Data via Latent Space Construction for Breast and Bladder Cancer Analysis
Abstract: Cancer remains one of the most complex and heterogeneous diseases, driven by intricate interactions across genetic, epigenetic, and transcriptional landscapes. Accurately understanding and predicting tumor characteristics, such as Tumor Mutational Burden (TMB), is critical for effective diagnosis, prognosis, and personalized treatment strategies. This research aims to address inherent challenges in integrating high-dimensional, heterogeneous multi-omics datasets—including DNA methylation, gene expression, and Copy Number Alteration (CNA)—specifically for bladder and breast cancer analysis, by building a shared latent space that captures and preserves meaningful cross-omics representations. Some of these challenges include data imbalance, dimensionality, modalityspecific noise, and complex non-linear biological interactions.
To overcome these obstacles, this thesis proposes constructing a shared latent space through advanced deep-learning approaches by utilizing Deep Multiset Canonical Correlation Analysis (DMCCA) and Graph Attention Networks (GATs). The shared latent space methodology provides a unified representation capturing crucial and intricate biological interactions across various omics modalities, as a result giving improved predictive accuracy for TMB classification. Attention mechanisms further refine this integration by dynamically focusing on the most relevant relational patterns within multiomics data, enhancing the model’s ability to capture biological interactions between genes, pathways, and patient profiles. In addition, this study utilizes oversampling techniques—mainly the Synthetic Minority Oversampling Technique (SMOTE)—to offset data imbalance among TMB classes and menopausal status groups. As compared to baseline supervised machine learning models such as Logistic Regression (LR), Artificial Neural Network (ANN), and Tabular Transformer, the new GAT model with shared latent space training performed better by achieving an AUC of 0.76 and accuracy of 76.1% for BRCA, whereas that of BLCA was 0.73 with an accuracy of 65.3%, thereby establishing the usefulness of multi-omics integration through shared latent space learning.
Committee Members:
Dr. Abedalrhman Alkhateeb (supervisor, committee chair), Dr. Saad Bin Ahmed, Dr. Abdulsalam Yassine (Software Engineering)
Please contact grad.compsci@lakeheadu.ca for the Zoom link. Everyone is welcome.