Performance Evaluation of Various Machine Learning Algorithms for User Story Clustering

Bhawnesh Kumar; Umesh Kumar Tiwari; Dinesh C. Dobhal

doi:10.5815/ijmecs.2025.03.07

Scientific articles \ Prolegomena. Fundamentals of knowledge and culture. Propaedeutics \ Computer science and technology. Computing. Data processing \ Software

Performance Evaluation of Various Machine Learning Algorithms for User Story Clustering

Author: Bhawnesh Kumar, Umesh Kumar Tiwari, Dinesh C. Dobhal

Journal: International Journal of Modern Education and Computer Science @ijmecs

Article in issue: 3 vol.17, 2025.

Free access

In agile development, user stories are the primary method for defining requirements. These days, managing user stories effectively is difficult because software projects typically contain a large number of them. A project can involve a large amount of user stories, which should be clustered into different groups based on their functionality’s similarity for systematic requirements analysis, effective mapping to developed features, and efficient maintenance. Unfortunately, the majority of user story clustering methods now in use require a great deal of manual work, which is error-prone and time-consuming. In this research, we suggest an automated framework that uses a family of machine learning algorithms to classify user stories. First, preprocessing the data is done in order to examine user stories and extract keywords from them. After that, features are taken out, which allow user stories to be automatically grouped into distinct categories. We use four feature extraction algorithms and six clustering algorithms. According to our experimental results, K-means and BIRCH clustering outperform other clustering methods, whereas cosine similarity and distance are the best feature extraction for user stories categorization to form the more balanced clusters as they both have the standard deviation is 3.08. In case of user stories cohesion, the silhouette coefficient value is 0.225 for spectral with (cosine similarity and cosine distance feature extraction) is best outcome than other clustering algorithms. The usefulness and applicability of the suggested framework are demonstrated by this study. Additionally, it offers some useful recommendations for enhancing the effectiveness of user stories clustering, for example through parameter adjustments for enhanced feature extraction and clustering.

User Story, Agile Development, Clustering, Standard Deviation, Silhouette Coefficient

Short address: https://sciup.org/15019767

IDR: 15019767 | DOI: 10.5815/ijmecs.2025.03.07