A Machine Learning-Based Production Defect Prediction and Process Optimization Framework using Random Forest

Author

B Lavany, Dr.S.Usharani

Keywords

Predictive Quality Management; Production Defect Prediction; Random Forest Classifier; Process Optimization; Industry 4.0; Machine Learning.

Abstract

In modern manufacturing systems, ensuring consistent product quality across high-speed production lines is a major challenge due to equipment wear, process fluctuations, operator variability, and batch inconsistencies. Traditional quality control methods rely on manual inspection and post-production sampling, making them reactive and inefficient since defects are detected only after production. Predictive QualityX is a machine learning-based defect prediction and process optimization platform designed to enable proactive quality management. The system uses a Random Forest classifier trained on 25,000 synthetic production records generated from realistic process parameters, including temperature, pressure, vibration, rotational speed, machine ID, and operator ID across multiple machines and operators. The platform implements an end-to-end data pipeline that includes synthetic data generation, model training, evaluation, database integration using MySQL, and an interactive analytics dashboard. The trained model identifies complex nonlinear relationships between process variables and defect occurrence, enabling accurate prediction of product quality outcomes. The system also processes both historical and future production data to provide retrospective analysis and forward-looking defect risk estimation. Results are stored in Excel and MySQL databases for reporting and enterprise integration. An interactive dashboard visualizes key insights such as machine-wise defect rates, parameter correlations, and temporal defect trends. This enables engineers and managers to identify high-risk conditions, optimize processes, and reduce production losses, making Predictive QualityX a comprehensive intelligent manufacturing decision-support system.

References

[1] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
[2] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[3] W. McKinney, “Data structures for statistical computing in Python,” in Proc. 9th Python in Science Conf., 2010, pp. 51–56.
[4] J. D. Hunter, “Matplotlib: A 2D graphics environment,” Computing in Science and Engineering, vol. 9, no. 3, pp. 90–95, 2007.
[5] M. L. Waskom, “Seaborn: Statistical data visualization,” Journal of Open Source Software, vol. 6, no. 60, p. 3021, 2021.
[6] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2016, pp. 785–794.
[7] N. V. Chawla et al., “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
[8] Aneeshkumar A.S., C JothiVenkateswaran, “Estimating the surveillance of liver disorder using classification algorithms”, International Journal of Computer Applications, vol. 57, issue 6, 2012, pp. 39-42.
[8] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 4765–4774.
[9] W. A. Shewhart, Economic Control of Quality of Manufactured Product. New York, NY, USA: D. Van Nostrand, 1931.
[10] D. C. Montgomery, Introduction to Statistical Quality Control, 8th ed. Hoboken, NJ, USA: Wiley, 2019.
[11] K. Schwab, The Fourth Industrial Revolution. Geneva, Switzerland: World Economic Forum, 2016.
[12] L. LeCam, “Maximum likelihood: An introduction,” International Statistical Review, vol. 58, no. 2, pp. 153–171, 1990.
[13] L. Rokach and O. Maimon, “Top-down induction of decision trees classifiers: A survey,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 35, no. 4, pp. 476–487, 2005.
[14] G. Lemaître, F. Nogueira, and C. K. Aridas, “Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets,” Journal of Machine Learning Research, vol. 18, no. 17, pp. 1–5, 2017.
[15] M. Abadi et al., “TensorFlow: A system for large-scale machine learning,” in Proc. 12th USENIX Symposium on Operating Systems Design and Implementation, 2016, pp. 265–283.
[16] M. Kleppmann, Designing Data-Intensive Applications. Sebastopol, CA, USA: O’Reilly Media, 2017.

Received: 20 March 2026
Accepted: 18 May 2026
Published: 26 May 2026
DOI: 10.30726/ijlca/v13.i2.2026.132015

24W51F0007-Predictive-QualityX.pdf

A Machine Learning-Based Production Defect Prediction and Process Optimization Framework using Random Forest

ESIJ

IJMRSS

IJLCA

Recent Posts

Tags