Thesis Defense: Exploiting semantic similarity models to automate transfer credit assessment in academic mobility

Event Date: 
Monday, April 19, 2021 - 9:00am to 10:30am EDT
Event Location: 
Online
Event Contact Name: 
Rachael Wang
Event Contact E-mail: 

Please join the Computer Science Department for the upcoming thesis defense:

Presenter: Dhivya Chandrasekaran

Thesis title: Exploiting semantic similarity models to automate transfer credit assessment in academic mobility

Abstract: Student mobility or academic mobility involves students moving between institutions during their post-secondary education, and one of the challenging tasks in this process is to assess the transfer credits to be offered to the incoming student. In general, this process involves domain experts comparing the learning outcomes (LOs) of the courses, and based on their similarity deciding on offering transfer credits to the incoming students. This manual implementation of the task is not only labor-intensive but also influenced by undue bias and administrative complexity. This research work focuses on identifying an algorithm that exploits the advancements in the field of Natural Language Processing (NLP) to effectively automate this process. A survey tracing the evolution of semantic similarity helps understand the various methods available to calculate the semantic similarity between text data. The basic units of comparison namely, learning outcomes are made up of two components namely the descriptor part which provides the contents covered, and the action word which provides the competency achieved. Bloom's taxonomy provides six different levels of competency to which the action words fall into. Given the unique structure, domain specificity, and complexity of learning outcomes, a need for designing a tailor-made algorithm arises. The proposed algorithm uses a clustering-inspired methodology based on knowledge-based semantic similarity measures to assess the taxonomic similarity of learning outcomes and a transformer-based semantic similarity model to assess the semantic similarity of the learning outcomes. The cumulative similarity between the learning outcomes is further aggregated 2 to form course to course similarity. Due to the lack of quality benchmark datasets, a new benchmark dataset is built by conducting a survey among domain experts with knowledge in both academia and computer science. The dataset contains 7 course-to-course similarity values annotated by 5 domain experts. Understanding the inherent need for flexibility in the decision-making process the aggregation part of the algorithm offers tunable parameters to accommodate different scenarios. Being one of the early research works in the field of automating articulation, this thesis establishes the imminent challenges that need to be addressed in the field namely, the significant decrease in performance by state-of-the-art semantic similarity models with an increase in complexity of sentences, lack of large datasets to train/fine-tune existing models, lack of quality in available learning outcomes, and reluctance to share learning outcomes publicly. While providing an efficient algorithm to assess the similarity between courses with existing resources, this research work steers future research attempts to apply NLP in the field of articulation in an ideal direction by highlighting the persisting research gaps.

Committee Members:
Dr. Vijay Mago (supervisor, committee chair), Dr. Quazi Rahman, Dr. Thangarajah Akilan (Software Engineering)

Please contact grad.compsci@lakeheadu.ca for the Zoom link.
Everyone is welcome.