Perhimpunan Mahasiswa SUTD Indonesia (PADI
Multinomial Naïve Bayesian Network for GitHub Topics
Pages
10
Time to read
41 mins
Publication
Language
English
Pages
10
Time to read
41 mins
Publication
Language
English
This paper is a research article that investigates the application of Multinomial Naïve Bayesian (MNB) networks to automatically classify GitHub repositories. The study addresses the challenge of assigning appropriate topics to repositories, which is crucial for enhancing their discoverability. The authors present a novel approach that utilizes a probabilistic model to recommend topics based on the content of README files and the associated source code. The methodology involves analyzing the most frequent terms within these documents using TF-IDF vectorization, which informs the topic recommendation process. The paper also discusses the validation of the proposed approach through various metrics, as there are no existing baselines for comparison. The authors highlight the importance of correctly tagging repositories to facilitate better access and contribution from developers. The research is part of the CROSSMINER Project, funded by the European Union’s Horizon 2020 Research and Innovation Programme, and aims to improve the automated classification of GitHub repositories.