EXPLAINABLE ARTIFICIAL INTELLIGENCE MODEL FOR IDENTIFYING MARKET VALUE IN PROFESSIONAL SOCCER PLAYERS

Author(s): ZHANG, S., LI, M., Institution: TSINGHUA UNIVERSITY, Country: CHINA, Abstract-ID: 1217

INTRODUCTION:
The realm of association football transcends the boundaries of sport, evolving into a significant business sector with player transfers being pivotal economic events. Consequently, determining the market value of football players is a critical managerial function. This research introduces an advanced machine learning technique to predict soccer players market values, integrating ensemble models with Shapley Additive Explanations (SHAP) to enhance interpretability.
METHODS:
Leveraging data from Sofifa.com and transfermarkt.us recognized for its widespread use among FIFA football manager game fans, the study builds on prior research to perform a comprehensive analysis. The Boruta algorithm, known for its efficacy in feature selection and implemented via the BorutaShap Python package, was employed to refine the set of player characteristics used in model training. An array of machine learning algorithms was scrutinized to develop an optimal model for appraising player market values including Adaboost, LightGBM, Random Forest, Gradient Boosting Decision Tree (GBDT), CatBoost, and XGBoost.
RESULTS:
The initial dataset comprising 29 features was condensed to 22 salient features using the Boruta algorithm. In the quest to find the most accurate predictive model, the GBDT model exhibited exceptional performance. It achieved the highest R-Squared value of 0.889, indicating a strong correlation between the predicted values and actual market values. The subsequent models, CatBoost and LightGBM, displayed slightly lower but comparable R-Squared values. The GBDT models dominance extended to its predictive precision, reflected by the lowest Root Mean Squared Error (RMSE) across the algorithms, suggesting its forecasts were closest to the true player values. The robustness of the GBDT model was further confirmed on a separate test set, where it maintained the highest R-squared value of 0.901 and the lowest RMSE, underscoring its reliability in predicting player market values.
CONCLUSION:
The Gradient Boosting Decision Tree (GBDT) model stood out as a superior predictive tool for estimating soccer players market values. The integration of SHAP for model interpretation identified nine crucial player attributes such as short passing, finishing, interceptions, dribbling, standing tackle, sprint speed, acceleration, and reactions that significantly influence market value predictions. These findings are invaluable for football team managers and stakeholders, providing a data-driven basis for transfer negotiations and strategic planning. The application of such advanced predictive analytics can revolutionize the economic aspect of player transfers, offering a more objective and nuanced understanding of player worth in the competitive football market.