Information Engineering for Fake Job Postings Classification in Electronic Business Based on Machine Learning Technology
Автор: Markiian-Mykhailo Paprotskyi, Victoria Vysotska, Lyubomyr Chyrun, Yuriy Ushenko, Zhengbing Hu, Dmytro Uhryn
Журнал: International Journal of Information Engineering and Electronic Business @ijieeb
Статья в выпуске: 5 vol.17, 2025 года.
Бесплатный доступ
This study investigates the application of machine learning methods for the classification of fraudulent job postings in e-business platforms. Using the publicly available fake_job_postings.csv dataset, textual and categorical features of vacancies were processed and vectorised through TF-IDF, HashingVectorizer, and optimised TF-IDF. Eight machine learning algorithms were compared, including Logistic Regression, Random Forest, Gradient Boosting, Decision Tree, Multinomial Naive Bayes, Linear SVC, K-Nearest Neighbours, and XGBoost. The experiments demonstrate that XGBoost achieved the best performance (Accuracy = 0.990, Precision = 0.982, Recall = 0.998, F1 = 0.990) across all feature representations. Its superior results can be attributed to the ability of boosted ensembles to capture complex non-linear relationships in high-dimensional feature spaces while maintaining robustness against noise and class imbalance. However, it should be noted that the evaluation was performed on a single static dataset. While the high recall shows the model’s ability to reliably detect fraudulent ads in this context, questions remain about its generalisability. Fraud tactics evolve rapidly, and new job scams may significantly differ from patterns in the training data. This creates a potential risk of overfitting to dataset-specific features, which limits direct transfer to real-world scenarios without continuous retraining and monitoring. The practical contribution of the study is a reproducible framework that integrates text and categorical processing, vectorisation, hyperparameter optimisation, and comparative model benchmarking. Such a framework could be embedded into online job platforms to support automated filtering of suspicious ads. Still, its deployment requires additional measures: periodic retraining with updated data, integration with platform APIs, and the inclusion of explainability modules to ensure transparency and user trust. Overall, the research demonstrates that ensemble-based models, particularly XGBoost, offer strong potential for fraud detection in the e-business labour market. At the same time, further work is necessary to validate model robustness on unseen and evolving fraudulent job posting strategies, ensuring scalability and reliability in production environments.
Machine Learning, Natural Language Processing, E-Business, Fake Jobs, Classification, Logistic Regression, Accuracy, Fullness, F1-Measure, Information Engineering
Короткий адрес: https://sciup.org/15019950
IDR: 15019950 | DOI: 10.5815/ijieeb.2025.05.08