Abstract:
A data-driven approach for hydrogen content prediction and explainable analysis of high-strength steel was constructed. The prediction performance of four machine learning models, namely artificial neural network (ANN), support vector machine (SVM), random forest (RF), and extreme gradient boosting algorithm (XGBoost), in a multi-dimensional feature space was systematically evaluated. The explainable analysis was further carried out by using the SHAP (SHapley Additive exPlanations) method, and the contribution laws and their interactions of the contents of 12 elements (Fe, C, Mo, Mn, Ti, V, Si, Cr, Cu, Ni, Al, Nb) and the four test conditions of current density, temperature, hydrogen charging time, and strain rate to the prediction of hydrogen content were revealed. The results show that the root mean square errors and the prediction fluctuation range of the predicated hydrogen content by the ANN, RF and SVM models on the training set and the test set were relatively large, and some prediction results were outside the 2-fold error band. The XGBoost model had small root mean square error and prediction fluctuation range on both the training set and the test set, and the prediction results were evenly distributed in the 2-fold error band; the model had the best prediction performance for the hydrogen content of high-strength steel. The contributions of the six features (content of C, Mn, Si, Cu, strain rate and hydrogen charging time) to the prediction of hydrogen content of the ANN and SVM models were relatively balanced, indicating that the prediction relied on the combined effect of the majority of features. The contributions of 1 to 3 features (content of C, Mn or Si) of the RF and XGBoost models were significantly higher, while the remaining features only served as auxiliary information and had a relatively small impact on the prediction results. The mixed distribution of high and low feature values in positive and negative contributions indicated that there was a significant nonlinear response between hydrogen content and each feature, and the prediction was affected by the complex coupling effect among multiple features.