April 4, 2026

Home Inspection

Home Inspection, Primary Monitoring for Your Home

Support vector regression model for the prediction of buildings’ maximum seismic response based on real monitoring data

Support vector regression model for the prediction of buildings’ maximum seismic response based on real monitoring data

Training and test results of SVR-MDR model

In this section, earthquake response data of SRC buildings in NDE1.0 is used to train SVR-MDR model. The MDR is selected as target output of the model, and the 41 parameters including 6 structure information parameters, 9 spectrum information parameters, 16 duration information parameters, and 10 other earthquake information parameters are selected as input feature. The training and test datasets are split in a 9:1 ratio using the StandardScaler function of the scikit-learn, a free and open-source machine learning library for Python. The normalized probability distribution of MDRs in the training and test datasets is shown in Fig. 1. It demonstrates that the test dataset has almost the same distribution as the training dataset.

Fig. 1
figure 1

Normalized probability distribution of MDR in the training and test dataset.

Four criteria are introduced to evaluate the predictive performance of SVR model, including the standard deviation of the error (\(\:\sigma\:),\:\)the median absolute relative deviation (MARD) and determination coefficient (\(\:R^2\)) and the fraction of the data set whose relative difference is no more than 10% (\(\:D_10\%\)). The expressions for MARD, \(\:R^2\), \(\:D_10\%\) are given by

$$\:\beginarraycMARD=median\left(\left|\frac\widehaty_i-y_iy_i\right|\right)\endarray$$

(13)

where \(\:\widehaty_i\) is the predicted value, and \(\:y_i\) is corresponding true value;

$$\:\beginarraycR^2=1-\frac\sum\:_i=1^n\left(\widehaty_i-y_i\right)^2\sum\:_i=1^n\left(y_i-\stackrel-y\right)^2 \endarray$$

(14)

where \(\:\stackrel-y\) is the mean of \(\:y_i\). MARD indicates the central tendency of errors. It measures how well the predicted value fits the true value. The value of \(\:R^2\) is used to determine a model’s goodness of fit, and the best possible score of \(\:R^2\) is 1.0 and it can be negative when the model is arbitrarily worse. High value of \(\:R^2\) indicates good fitting effect of the model.

$$\:\beginarrayc\updelta\:_i=\fracy_i-\widehaty_iy_i\times\:100\% \endarray$$

(15)

The above expression of \(\:\delta\:_i\) represents the normalized error between the predicted value and the true value of a sample point in the test set. \(\:D_10\%\) is the percentage of sample points for which \(\:\delta\:_i\:\)are in the range of [− 10%, 10%]. To obtain the best SVR model, the hyper parameters (C, γ) for Gaussian kernel is determined using grid search and 10-fold cross validation. The n-fold cross validation divides the training dataset into n mutually exclusive and complementary subsets and each time uses n-1 subsets as training set and the remaining one subset as testing set, and the optimal hyper parameters are determined by selecting the model giving the minimal mean squared error (MSE) for the all n subsets.

The predictive performance for the training and test datasets are shown in Fig. 2a,b respectively. The solid line of 1:1 represents that the predicted value is equal to the true value, and the dashed red lines represent one standard deviation. The prediction model generally approaches a 1:1 linear proportional relationship. The standard deviation of errors for training and test dataset are 0.181 and 0.202 respectively. The calculated \(\:D_10\%\) for the train and test datasets are 97.1% and 91.2% respectively, indicating that the SVR-MDR model has high accuracy in predicting MDR.

Fig. 2
figure 2

Distribution of predicted and true MDR using SVR-MDR model with all 41 input features (a) Training dataset; (b) Test dataset.

The performance of the SVR-MDR model is compared with two other machine learning models: kernel ridge regression (KRR) and decision tree (DT). Radial basis function kernel is used in the KRR and the hyperparameters are chosen as α = 1, γ = 0.01, and the maximum depth of the decision tree is 10. The results for KRR model and DT model for the training and test sets are presented in Figs. 3 and 4 respectively. The predictive performance metrics of SVR-MDR, KRR and decision tree are shown in Table 4. The KRR model exhibits greater variability, particularly when predicting larger MDRs, indicating lower prediction accuracy than our SVR-MDR model. Though the DT model shows much smaller dispersion, the significant difference between the standard deviations of prediction error of the training and test sets suggests potential overfitting. In comparison, the SVR-MDR model not only demonstrates superior predictive accuracy but also maintains greater stability across both training and test sets, highlighting better performance in predicting MDR than KRR and DT models.

Fig. 3
figure 3

Distribution of predicted and true MDR using KRR model with all 41 input features (a) Training dataset; (b) Test dataset.

Fig. 4
figure 4

Distribution of predicted and true MDR using DT model with all 41 input features (a) Training dataset; (b) Test dataset.

Table 4 Predictive performance metrics of SVR-MDR, KRR and DT.

Training and test results of RSVR-MDR model

There are 41 input features in the initial SVR-MDR model, and each feature exerts different degrees of influence on the SVR-MDR performance. To quantify the importance of each feature, the SHAP method proposed by Lundberg and Lee is employed20. SHAP is an explainability tool for ML aiming to evaluate the impact of each feature. It employs Shapley value as the evaluation index, which reflects the average change in the prediction outcome when a specific feature is added to all possible combinations of other features20. Due to the relatively high computational cost of applying SHAP for SVR-MDR, 604 sets of data, about 10% of total SRC earthquake response data, are randomly selected. The absolute SHAP values of each feature on these 604 samples are ranked as shown in Fig. 5.

Fig. 5
figure 5

SHAP values of 41 input features: (a) structural information; (b) other earthquake information; (c) spectrum information; (d) duration information.

The fundamental frequency f1 and co-seismic minimum frequency f2 contribute most to the prediction in the 6 structural information parameters. Magnitude, with the highest SHAP value of 85.71, contribute most to the prediction, indicating magnitude’s dominant influence on MDR prediction. In the spectral information parameters, the average of pseudo-spectral acceleration (Avg_Sa) between the fundamental f1 and co-seismic minimum frequency f2 contributes most to prediction. DSa2, the duration corresponding to (5–95)% of total energy associated with ground motion acceleration, contributes most to prediction in the 16 duration information parameters. It is noted that the selection of input feature should not only be based on the SHAP value rankings, but also on the practical availability of input feature. For example, although the fundamental frequency f1 has lower SHAP value than the co-seismic minimum frequency f2, f1 is more accessible than f2. Since the former can be easily obtained by empirical formula or modal analysis; on the other hand, f2 necessitates building response records under specific ground motion21. Consequently, co-seismic minimum frequency and dependent parameters, such as SA2, SV2, SD2, Avg_Sa, Avg_Sv, Avg_Sd, are not easily obtained after earthquake and are not considered in the reduced SVR model. A total of 10 input features are finally selected based on their significance and ease of acquisition, including 3 building information parameters structure height H, number of stories N, and fundamental frequency f1, 4 other earthquake information parameters PGV, PGD, epicentral distance Dep and magnitude M, and 3 spectrum information parameters SA1, SV1, SD1. The RSVR-MDR model is then trained with these 10 input features. The predictive performance of the RSVR-MDR model is shown in Fig. 6. The standard deviation of errors for training and test dataset are 0.192 and 0.213 respectively, almost the same as those in the SVR-MDR model. The calculated \(\:D_10\%\) for the train and test datasets are 97.0% and 95.7% respectively, also almost the same as those in the SVR-MDR model. In this regards, the RSVR-MDR with 10 features (H, N, f1, PGV, PGD, Dep, M, SA1, SV1, SD1) has almost the same predictive performance as the SVR-MDR model with all the 41 features. It is noted that although duration information parameters DSa2, and DSv2 have higher SHAP value than building information parameters such as structural height H, number of stories N, the corresponding predictive performance with 10 features (DSa2, DSv2, f1, PGV, PGD, Dep, M, SA1, SV1, SD1) is worse than that using 10 features (H, N, f1, PGV, PGD, Dep, M, SA1, SV1, SD1), as shown in Fig. 7. In this regards, in the combination of multiple features, the building information parameters are more important than duration information parameters.

Fig. 6
figure 6

Distribution of predicted and true MDR using RSVR-MDR with 10 selected input features (H, N, f1, PGV, PGD, Dep, M, SA1, SV1, SD1) (a) Training dataset; (b) Test dataset.

Fig. 7
figure 7

Distribution of predicted and true MDR using RSVR-MDR with 10 selected input features (DSa2, DSv2, f1, PGV, PGD, Dep, M, SA1, SV1, SD1) (a) Training dataset; (b) Test dataset.

link

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © All rights reserved. | Newsphere by AF themes.