Improving FAQ Retrieval for Academic Regulations Using Semantic Embeddings and LLM Question Augmentation

Fajri Profesio Putra; I Gusti Agung Putu Mahendra; Agus Tedyyana; Muhammad  Noor

doi:10.55583/jtisi.v4i1.2176

Authors

Fajri Profesio Putra Politeknik Negeri Bengkalis
I Gusti Agung Putu Mahendra Politeknik Negeri Bengkalis
Agus Tedyyana Politeknik Negeri Bengkalis
Muhammad Noor Universiti Utara Malaysia

DOI:

https://doi.org/10.55583/jtisi.v4i1.2176

Keywords:

semantic retrieval, FAQ retrieval, IndoSBERT, question augmentation, academic regulations

Abstract

Academic regulations in higher education are often documented in lengthy and formal handbooks, making it difficult for students to find relevant information using everyday language. This study developed a semantic FAQ retrieval system for academic regulations using IndoSBERT and question augmentation. The FAQ corpus was constructed from official academic and internship documents, resulting in 92 FAQ entries across 33 topical categories. Seed questions were generated from category–keyword pairs and expanded using simple rule-based augmentation and FLAN-T5-based paraphrasing. The dataset was evaluated using an 80:10:10 train–validation–test split. IndoSBERT was fine-tuned with Multiple Negatives Ranking Loss under three configurations: baseline, baseline with simple augmentation, and baseline with simple plus LLM-based augmentation. Retrieval performance was measured using Recall@1, Recall@3, Recall@5, and Mean Reciprocal Rank. The best result was achieved by the simple plus LLM augmentation configuration, with Recall@1 of 0.7848, Recall@5 of 0.8987, and MRR of 0.8396. These findings show that LLM-based question augmentation improves semantic retrieval robustness while keeping answers grounded in curated academic regulations.

Improving FAQ Retrieval for Academic Regulations Using Semantic Embeddings and LLM Question Augmentation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section