About me

Hello👋 I'm a recent graduate from the University of Kelaniya, where I completed my undergraduate degree in Computer Science with a specialization in Artificial Intelligence.

I am currently working as a Teaching Assistant at the Department of Software Engineering within the Faculty of Computing and Technology at University of Kelaniya.

With a strong foundation in machine learning, statistical modeling, and MLOps, I approach AI development by leveraging a comprehensive understanding of the entire model lifecycle. My academic and professional experience has equipped me with robust software engineering best practices, enabling me to build scalable, maintainable AI systems.

I'm interested in various topics in NLP and AI, but I'm especially interested in how Large Language Models (LLMs) can be leveraged to build robust, interpretable, and context-aware AI agents capable of reasoning, learning, and interacting with complex environments. My research explores the intersection of Retrieval-Augmented Generation (RAG) systems and LLMs, aiming to address challenges such as hallucination mitigation, knowledge grounding, and dynamic adaptation to evolving information landscapes.

Update: I'm excited to share that my first research paper, "Subasa - Adapting Language Models for Low-Resourced Offensive Language Detection in Sinhala," based on my undergraduate thesis, has been published at NAACL 2025 SRW! (2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Student Research Workshop)

I'm always open to collaborative opportunities and discussions about innovative applications of AI and data science. Let's connect and explore how we can create impactful solutions together.

Projects

🚧 Under Maintenance!

Currently working on updating the project list with the latest work. Please check back soon!

Publications

  • Subasa - Adapting Language Models for Low-resourced Offensive Language Detection in Sinhala

    Shanilka Haturusinghe, Tharindu Cyril Weerasooriya, Christopher M Homan, Marcos Zampieri, and Sidath Ravindra Liyanage. 2025. Subasa - Adapting Language Models for Low-resourced Offensive Language Detection in Sinhala. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 260–270, Albuquerque, USA. Association for Computational Linguistics.
    Published In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop).
    Abstract: Accurate detection of offensive language is essential for a number of applications related to social media safety. There is a sharp contrast in performance in this task between low and high-resource languages. In this paper, we adapt fine-tuning strategies that have not been previously explored for Sinhala in the downstream task of offensive language detection. Using this approach, we introduce four models: "Subasa-XLM-R", which incorporates an intermediate Pre-Finetuning step using Masked Rationale Prediction. Two variants of "Subasa-Llama" and "Subasa-Mistral", are fine-tuned versions of Llama (3.2) and Mistral (v0.3), respectively, with a task-specific strategy. We evaluate our models on the SOLD benchmark dataset for Sinhala offensive language detection. All our models outperform existing baselines. Subasa-XLM-R achieves the highest Macro F1 score (0.84) surpassing state-of-the-art large language models like GPT-4o when evaluated on the same SOLD benchmark dataset under zero-shot settings. The models and code are publicly available.