Improving the Mkulima Repository Content: Utilizing Theses, Dissertations, and LLMs for Agricultural Knowledge Dissemination in Kiswahili
Metadata
Improving the Mkulima Repository Content: Utilizing Theses, Dissertations, and LLMs for Agricultural Knowledge Dissemination in Kiswahili
The Sokoine National Agricultural Library (SNAL) at the Sokoine University of Agriculture (SUA) faces significant challenges in disseminating agricultural information to Swahili-speaking communities, as most research outputs are predominantly in English. This language barrier hinders the effective transmission of vital agricultural knowledge to key stakeholders in the agriculture-food value chain who use Kiswahili in their daily activities. To address this gap, SNAL established the Mkulima Collection and Repository, dedicated to collecting agricultural content in Kiswahili. Despite these efforts, the Swahili content in the repository remains limited. This study seeks to enhance the Mkulima Repository by translating abstracts from English-language theses and dissertations using MarianMT, a machine translation (MT) model based on large language models (LLMs). The selected abstracts underwent pre-processing, machine translation, and subsequent quality assessment by multilingual experts. Our findings reveal significant challenges in using LLMs like MarianMT for low-resource languages such as Kiswahili. While the MT system offers a rapid and scalable method for translating academic content, the accuracy and fluency of the translations were found to be suboptimal, as indicated by the evaluators. Common translation errors, particularly in agriculture-specific terminology and scientific names, highlight the limitations of current MT models in handling specialized agricultural content. These issues underscore the need for a more refined approach, including the development of a curated dataset of Swahili-English pairs that focus on agricultural jargon and the integration of a knowledge base to address the translation of scientific terms.
2024