A Novel Hybrid Machine Learning Model for Defect Prediction in Industrial Manufacturing Processes

Document Type : Original Article

Authors

1 Prof., Department of Industrial Management, Faculty of Management, University of Tehran, Tehran, Iran.

2 Ph.D. Candidate, Department of Industrial Management, Faculty of Management, University of Tehran, Tehran, Iran.

Abstract

The main contribution of the present study is to develop a novel hybrid machine learning model to enhance the defect prediction in industrial manufacturing processes. In this work, the model integrated four base models of XGBoost, LightGBM, CatBoost, and an artificial network, whose features are modeled with Random Forest (RF) as the metamodel using a stacking ensemble approach. For this study, the industrial data from Kaggle were used, and for their extensive and detailed hyperparameter optimization with Optuna, we greatly improved the prediction performance with the model. In the context of this study, key challenges like the data imbalance and the selection of the important features were solved using data balancing techniques like SMOTE and random forest-based analysis for selecting the most important input features. The hybrid model generated great results, which were quite better than the traditional single models, with an accuracy of 96.06% and precision, recall, and F1 scores of 95.10%, 97.32%, and 96.20%, respectively. The real-world applications of this model can be many by accurately and timely predicting defects in industrial environments. All results are reliable and interpretable due to the usage of robust data preprocessing methods, including feature standardization and correlation analysis. This study's results will have a significant impact on such tasks as defect management in manufacturing, as it provides a very scalable solution to enhance product quality, minimize operational cost, and improve process efficiency. This research illustrates the promise of hybrid machine learning methods in tooling manufacturing process optimization and the performance of industry.

Keywords


  1. Yang, J., Li, S., Wang, Z., Dong, H., Wang, J., & Tang, S. (2020). Using Deep Learning to Detect Defects in Manufacturing: A Comprehensive Survey and Current Challenges. Materials, 13(24), 5755. doi:10.3390/ma13245755.
  2. Antosz, K., KnapĨíková, L., & Husár, J. (2024). Evaluation and Application of Machine Learning Techniques for Quality Improvement in Metal Product Manufacturing. Applied Sciences (Switzerland), 14(22), 10450. doi:10.3390/app142210450.
  3. Bai, J., Wu, D., Shelley, T., Schubel, P., Twine, D., Russell, J., Zeng, X., & Zhang, J. (2025). A Comprehensive Survey on Machine Learning Driven Material Defect Detection. ACM Computing Surveys. 57(11), 1-36. doi:10.1145/3730576.
  4. Liu, J., Zhan, C., Wang, H., Zhang, X., Liang, X., Zheng, S., Meng, Z., & Zhou, G. (2023). Developing a Hybrid Algorithm Based on an Equilibrium Optimizer and an Improved Backpropagation Neural Network for Fault Warning. Processes, 11(6). doi:10.3390/pr11061813.
  5. Wang, Y., Yin, M., Wang, H., Ye, X., & Ma, X. (2024). Sample-Evaluation-Enhanced Machine Learning Approach for Fault Diagnosis of Hybrid Systems. IEEE Transactions on Instrumentation and Measurement, 73, 1–13. doi:10.1109/tim.2024.3442849.
  6. Lee, K. B., Cheon, S., & Kim, C. O. (2017). A convolutional neural network for fault classification and diagnosis in semiconductor manufacturing processes. IEEE Transactions on Semiconductor Manufacturing, 30(2), 135–142. doi:10.1109/TSM.2017.2676245.
  7. Aldoseri, A., Al-Khalifa, K. N., & Hamouda, A. M. (2023). Re-Thinking Data Strategy and Integration for Artificial Intelligence: Concepts, Opportunities, and Challenges. Applied Sciences (Switzerland), 13(12), 7082. doi:10.3390/app13127082.
  8. Howard, N., Chouikhi, N., Adeel, A., Dial, K., Howard, A., & Hussain, A. (2020). BrainOS: A Novel Artificial Brain-Alike Automatic Machine Learning Framework. Frontiers in Computational Neuroscience, 14. doi:10.3389/fncom.2020.00016.
  9. Ciaburro, G., & Iannace, G. (2021). Machine learning-based algorithms to knowledge extraction from time series data: A review. Data, 6(6), 55. doi:10.3390/data6060055.
  10. Taye, M. M. (2023). Understanding of Machine Learning with Deep Learning: Architectures, Workflow, Applications and Future Directions. Computers, 12(5), 91. doi:10.3390/computers12050091.
  11. Uddin, S., Khan, A., Hossain, M. E., & Moni, M. A. (2019). Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Informatics and Decision Making, 19(1). doi:10.1186/s12911-019-1004-8.
  12. Mehregan, M. R., & Khani, A. M. (2024). Improving organizational performance: the role of supply chain 4.0 and financing in reducing supply chain risk. Journal of International Business Administration, 7(3), 39-59. doi:10.22034/jiba.2024.60005.2164.
  13. Sarzaeim, P., Mahmoud, Q. H., Azim, A., Bauer, G., & Bowles, I. (2023). A Systematic Review of Using Machine Learning and Natural Language Processing in Smart Policing. Computers, 12(12), 255. doi:10.3390/computers12120255.
  14. Sharma, Y., Kaur, P., & Shingh, L. (2018). Theoretical perspectives on unsupervised learning: Clustering and dimensionality reduction techniques. International Journal of Applied Research, 4(7), 217–220. doi:10.22271/allresearch.2018.v4.i7c.11445.
  15. Ahmed, M., Seraj, R., & Islam, S. M. S. (2020). The k-means algorithm: A comprehensive survey and performance evaluation. Electronics (Switzerland), 9(8), 1–12. doi:10.3390/electronics9081295.
  16. Li, L., Liu, S., Peng, Y., & Sun, Z. (2016). Overview of principal component analysis algorithm. Optik, 127(9), 3935–3944. doi:10.1016/j.ijleo.2016.01.033.
  17. Alamiyan-Harandi, F., & Ramazi, P. (2024). Environmental-Impact-Based Multi-Agent Reinforcement Learning. Applied Sciences (Switzerland), 14(15), 6432. doi:10.3390/app14156432.
  18. Xiang, X., & Foo, S. (2021). Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1—Fundamentals and Applications in Games, Robotics and Natural Language Processing. Machine Learning and Knowledge Extraction, 3(3), 554–581. doi:10.3390/make3030029.
  19. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. doi:10.1038/nature14236.
  20. Baashar, Y., Alkawsi, G., Mustafa, A., Alkahtani, A. A., Alsariera, Y. A., Ali, A. Q., Hashim, W., & Tiong, S. K. (2022). Toward Predicting Student’s Academic Performance Using Artificial Neural Networks (ANNs). Applied Sciences (Switzerland), 12(3), 1289. doi:10.3390/app12031289.
  21. Liu, F., Zheng, H., Ma, S., Zhang, W., Liu, X., Chua, Y., Shi, L., & Zhao, R. (2024). Advancing brain-inspired computing with hybrid neural networks. National Science Review, 11(5). doi:10.1093/nsr/nwae066.
  22. Sadikin, M. A. (2023). Defect reduction in the manufacturing industry: Systematic literature review. International Journal of Industrial Engineering and Engineering Management, 5(2), 73–83. https://doi.org/10.24002/ijieem.v5i2.7495
  23. Brennan, M. C., Keist, J. S., & Palmer, T. A. (2021). Defects in Metal Additive Manufacturing Processes. Journal of Materials Engineering and Performance, 30(7), 4808–4818. doi:10.1007/s11665-021-05919-6.
  24. Pietsch, D., Matthes, M., Wieland, U., Ihlenfeldt, S., & Munkelt, T. (2024). Root Cause Analysis in Industrial Manufacturing: A Scoping Review of Current Research, Challenges and the Promises of AI-Driven Approaches. Journal of Manufacturing and Materials Processing, 8(6), 277. doi:10.3390/jmmp8060277.
  25. Montgomery, D. C. (2020). Introduction to statistical quality control. John wiley & sons. Hoboken, United States.
  26. Kitayama, S. (2022). Process parameters optimization in plastic injection molding using metamodel-based optimization: a comprehensive review. International Journal of Advanced Manufacturing Technology, 121(11–12), 7117–7145. doi:10.1007/s00170-022-09858-x.
  27. Xu, K., Li, Y., Liu, C., Liu, X., Hao, X., Gao, J., & Maropoulos, P. G. (2020). Advanced Data Collection and Analysis in Data-Driven Manufacturing Process. Chinese Journal of Mechanical Engineering, 33(1). doi:10.1186/s10033-020-00459-x.
  28. Blondheim, D. (2021). Improving Manufacturing Applications of Machine Learning by Understanding Defect Classification and the Critical Error Threshold. International Journal of Metalcasting, 16(2), 502–520. doi:10.1007/s40962-021-00637-0.
  29. Albattah, W., & Alzahrani, M. (2024). Software Defect Prediction Based on Machine Learning and Deep Learning Techniques: An Empirical Approach. AI (Switzerland), 5(4), 1743–1758. doi:10.3390/ai5040086.
  30. Al-Isawi, M.K., Abdulkader, H. (2024). Software Defects Detection in Explainable Machine Learning Approach. Emerging Trends and Applications in Artificial Intelligence. ICETAI 2023. Lecture Notes in Networks and Systems, vol 960. Springer, Cham, Switzerland. doi:10.1007/978-3-031-56728-5_42.
  31. Yan, W., Wang, J., Lu, S., Zhou, M., & Peng, X. (2023). A Review of Real-Time Fault Diagnosis Methods for Industrial Smart Manufacturing. Processes, 11(2), 369. doi:10.3390/pr11020369.
  32. Meddaoui, A., Hachmoud, A., & Hain, M. (2024). Advanced ML for predictive maintenance: a case study on remaining functional life prediction and reliability enhancement. International Journal of Advanced Manufacturing Technology, 132(1–2), 323–335. doi:10.1007/s00170-024-13351-y.
  33. Wang, Z., Wang, X., Liu, X., Zhang, J., Xu, J., & Ma, J. (2024). A Novel Stacked Generalization Ensemble-Based Hybrid SGM-BRR Model for ESG Score Prediction. Sustainability (Switzerland), 16(16), 6979. doi:10.3390/su16166979.
  34. Lu, L., Chen, J., Ulbricht, M., & Krstic, M. (2024). Machine Learning Methodologies to Predict the Results of Simulation-Based Fault Injection. IEEE Transactions on Circuits and Systems I: Regular Papers, 71(5), 1978–1991. doi:10.1109/TCSI.2024.3349928.
  35. Li, S., Jin, N., Dogani, A., Yang, Y., Zhang, M., & Gu, X. (2024). Enhancing LightGBM for Industrial Fault Warning: An Innovative Hybrid Algorithm. Processes, 12(1), 221. doi:10.3390/pr12010221.
  36. Tang, W., Wu, X., & Chen, J. (2023). Graph Neural Networks for Chemical Process Fault Diagnosis Based on Hybrid Variable Feature Learning. 2023 China Automation Congress (CAC), 4893–4898. doi:10.1109/cac59555.2023.10451911.
  37. Li, R., Wang, X., Wang, Z., Zhu, Z., & Liu, Z. (2023). Multistage Quality Prediction Using Neural Networks in Discrete Manufacturing Systems. Applied Sciences (Switzerland), 13(15), 8776. doi:10.3390/app13158776.
  38. Kosim, M., Wibowo, A., Setioputro, N. T., Kasda, & Susanto, D. (2023). Optimization of Prediction and Prevention of Defects on Metal Based on Ai Using Vgg16 Architecture. Journal of Mechanical and Manufacture, 3(1), 39–55. doi:10.31949/jmm.v3i1.6542.
  39. Ma, L., Zhao, L., & Wang, X. (2017). Prediction of thermal system parameters based on PSO-ELM hybrid algorithm. 2017 Chinese Automation Congress (CAC), 3136–3141. doi:10.1109/cac.2017.8243315.
  40. El Kharoua, R. (2024). Predicting manufacturing defects dataset [Data set]. Kaggle. doi:10.34740/KAGGLE/DSV/8715500
  41. Khani, A. M., Kazazi, A., & Taqhavi Fard, M. T. (2022). Evaluating the quality of services of the cultural and social deputy of Tehran municipality in the field of culture and art. Social Development & Welfare Planning, 13(50), 205-250. doi:22054/qjsd.2021.58035.2110
  42. Motiei, M., Beyrami, S., & Khani, A. M. (2022). The impact of applying knowledge, social capital and e-commerce activism on organizational agility in response to the corona crisis (Case study: Golestan export companies). Journal of International Business Administration, 5(2), 167–192. https://doi.org/10.22034/jiba.2022.48850.1797
  43. Mendoza, T., Lee, C. H., Huang, C. H., & Sun, T. L. (2021). Random forest for automatic feature importance estimation and selection for explainable postural stability of a multi-factor clinical test. Sensors, 21(17), 5930. doi:10.3390/s21175930.
  44. Yuan, X., Liu, S., Feng, W., & Dauphin, G. (2023). Feature Importance Ranking of Random Forest-Based End-to-End Learning Algorithm. Remote Sensing, 15(21), 5203. doi:10.3390/rs15215203.
  45. Hairani, H., Saputro, K. E., & Fadli, S. (2020). K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes. Jurnal Teknologi Dan Sistem Komputer, 8(2), 89–93. doi:10.14710/jtsiskom.8.2.2020.89-93.
  46. Chen, Y., Zou, J., Liu, L., & Hu, C. (2024). Improved Oversampling Algorithm for Imbalanced Data Based on K-Nearest Neighbor and Interpolation Process Optimization. Symmetry, 16(3), 273. doi:10.3390/sym16030273.
  47. Kumar, M., Singhal, S., Shekhar, S., Sharma, B., & Srivastava, G. (2022). Optimized Stacking Ensemble Learning Model for Breast Cancer Detection and Classification Using Machine Learning. Sustainability (Switzerland), 14(21), 13998. doi:10.3390/su142113998.
  48. Alserhani, F., & Aljared, A. (2023). Evaluating Ensemble Learning Mechanisms for Predicting Advanced Cyber Attacks. Applied Sciences (Switzerland), 13(24), 13310. doi:10.3390/app132413310.
  49. Mbazaia, O., Kamoun, K. (2019). CaRT: Framework for Semantic Query Correction and Relaxation. Digital Economy. Emerging Technologies and Business Innovation. ICDEc 2019. Lecture Notes in Business Information Processing, vol 358. Springer, Cham, Switzerland. doi:10.1007/978-3-030-30874-2_12.
  50. Hassanali, M., Soltanaghaei, M., Javdani Gandomani, T., & Zamani Boroujeni, F. (2024). Software development effort estimation using boosting algorithms and automatic tuning of hyperparameters with Optuna. Journal of Software: Evolution and Process, 36(9). doi:10.1002/smr.2665.
  51. Noorunnahar, M., Chowdhury, A. H., & Mila, F. A. (2023). A tree based eXtreme Gradient Boosting (XGBoost) machine learning model to forecast the annual rice production in Bangladesh. PLOS ONE, 18(3), e0283452. doi:10.1371/journal.pone.0283452.
  52. Machado, M. R., Karray, S., & de Sousa, I. T. (2019). LightGBM: an Effective Decision Tree Gradient Boosting Method to Predict Customer Loyalty in the Finance Industry. 2019 14th International Conference on Computer Science & Education (ICCSE), 1111–1116. doi:10.1109/iccse.2019.8845529.
  53. Rezasoltani, A., Jafarnejad, A., & Khani, A. M. (2025). A voting-based hybrid machine learning model for predicting backorders in the supply chain. Journal of Decisions and Operations Research, 10(1), 194–213. https://doi.org/10.22105/dmor.2025.511401.1924
  54. Yousefpour, H., & Ghasemi, J. (2024). Ensemble-Based Detection and Classification of Liver Diseases Caused by Hepatitis C. Contributions of Science and Technology for Engineering, 1(1), 32–42. doi:10.22080/cste.2024.5012.
  55. Waoo, A. A., & Soni, B. K. (2021). Performance Analysis of Sigmoid and Relu Activation Functions in Deep Neural Network. In Algorithms for intelligent systems (pp. 39–52). doi:10.1007/978-981-16-2248-9_5.
  56. Coppola, C., Papa, L., Boresta, M., Amerini, I., & Palagi, L. (2024). Tuning parameters of deep neural network training algorithms pays off: a computational study. TOP, 32(3), 579–620. doi:10.1007/s11750-024-00683-x.
  57. Wang, Z., Wang, X., Liu, X., Zhang, J., Xu, J., & Ma, J. (2024). A Novel Stacked Generalization Ensemble-Based Hybrid SGM-BRR Model for ESG Score Prediction. Sustainability (Switzerland), 16(16), 6979. doi:10.3390/su16166979.
  58. Jafarnejad Chaghoshi, A., Rezasoltani, A., & Khani, A. M. (2024). Unleashing the Power of Ensemble Learning: Predicting National Ranks in Iran’s University Entrance Examination. Industrial Management Journal, 16(3), 457–481. doi:10.22059/imj.2024.381521.1008178.
  59. Cao, Y., Zhao, P., Xu, B., & Liang, J. (2024). An Improved Random Forest Approach on GAN-Based Dataset Augmentation for Fog Observation. Applied Sciences (Switzerland), 14(21), 9657. doi:10.3390/app14219657.
  60. Mehregan, M. R., Taghavifard, M. T., Khani, A. M., Rezasoltani, A., & Nikkhah, M. A. (2025). A hybrid machine learning model based on deep learning for air quality prediction. Pollution, 11(4), 1199–1215. https://doi.org/10.22059/poll.2025.388743.2750Yuan, X., Liu, S., Feng, W., & Dauphin, G. (2023). Feature Importance Ranking of Random Forest-Based End-to-End Learning Algorithm. Remote Sensing, 15(21), 5203. doi:10.3390/rs15215203.
  61. Jafarnejad Chaghoshi, A., Khani, A. M., & Rezasoltani, A. (2024). Risk Modeling in Banking Services for the Blind Using Fuzzy FMEA and Graph Neural Network (GNN). Journal of Industrial Management Perspective, 14(4), 223-255. doi:10.48308/JIMP.14.4.223
  62. Kosari, A. (2025). Real-time network traffic anomaly detection using spiking neural networks (SNNs) with adaptive learning. Contributions of Science and Technology for Engineering, 2(2), 17–22. https://doi.org/10.22080/cste.2025.28763.1016
  63. Jafarnejad, A., Rezasoltani, A., & Khani, A. M. (2025). Predicting Heart Disease Using Automated Machine Learning Based on Genetic Algorithms. Journal of Information Technology Management, 17(2), 91–122. doi:10.22059/jitm.2024.382556.3829.
Volume 2, Issue 4
September 2025
Pages 43-58
  • Receive Date: 26 April 2025
  • Revise Date: 23 May 2025
  • Accept Date: 15 July 2025
  • First Publish Date: 15 July 2025
  • Publish Date: 01 September 2025