...

Scientific Programme

Applied Sports Sciences

CP-AP03 - Statistics and Analyses II - Football

Date: 03.07.2025, Time: 18:30 - 19:30, Session Room: Castello 2

Description

Chair TBA

Chair

TBA
TBA
TBA

ECSS Paris 2023: CP-AP03

Speaker A Tom Van Deuren

Speaker A

Tom Van Deuren
University of Antwerp, Department of Mathematics
Belgium
"Toward Precision Training via Prescriptive Modeling of the Ratings of Perceived Exertion in Professional Soccer Players: A Causal Machine Learning Approach"

INTRODUCTION: Managing training loads in professional soccer requires balancing performance enhancement and injury prevention. Traditionally, this has been approached through predictive models that estimate internal load responses (e.g., rating of perceived exertion, RPE) based on external load parameters. While these models provide valuable insights into player workload, they fall short in offering actionable, personalized training prescriptions [1]. This study proposes a causal machine learning framework to move beyond prediction toward prescriptive modeling, optimizing training load recommendations for each player based on observed data. METHODS: Fourteen male professional soccer players (age: 22.2 ± 4.5 years) from a Belgian Pro League club were monitored across a full competitive season, resulting in 974 recorded training and match observations. External load data — including total distance, high-speed running, accelerations, and metabolic power — were captured via 10 Hz GPS units (Catapult Optimeye S5). Internal load was assessed through sessional RPE, and additional pre-session wellness metrics (Hooper Index: sleep duration, muscle soreness, mood, fatigue, and stress) were collected daily. A total of 24 variables were analyzed. Predictive modeling was conducted using linear regression, regression tree, XGBoost, and random forest, with variable importance assessed via SHapley Additive exPlanations (SHAP) to enhance interpretability. Prescriptive modeling was performed using a counterfactual recurrent network (CRN) to estimate individualized causal effects of training load on future RPE, allowing for targeted load recommendations. RESULTS: XGBoost achieved the best predictive performance (RMSE: 1.262), with the total session distance emerging as the strongest predictor of RPE. While CRN had a slightly higher RMSE (1.379), it provided interpretable counterfactual insights, enabling training staff to simulate different load scenarios and anticipate individual player responses. CONCLUSION: Our findings highlight the advantages of integrating causal machine learning into training load management in professional soccer. While traditional predictive models offer valuable estimations of RPE, they fall short in guiding individualized training decisions. By leveraging prescriptive modeling, we move toward actionable recommendations that consider the unique physiological and psychological profiles of each player. These findings align with prior research in [2] on machine learning techniques for optimizing training design and evaluation and extend the work of [1] by demonstrating how prescriptive analytics can enhance decision-making in elite sports settings. Future research should expand sample sizes and explore broader applications across various (team) sports to further validate these approaches. REFERENCES: [1] Houtmeyers et al. 2021. https://doi.org/10.1123/ijspp.2020-0958 [2] Jaspers et al. 2018. https://doi.org/10.1123/ijspp.2017-0299

Read CV Tom Van Deuren

ECSS Paris 2023: CP-AP03

Speaker B Rui Zhou

Speaker B

Rui Zhou
cupes, Capital University of Physical Education And Sports
China
"Developing a Dynamic Time-Series Goal Prediction Model Based on Positional Data: A Case Study of the 2023 FIFA U-17 Men’s World Cup"

INTRODUCTION: Modern football goals determine outcomes, but their prediction is complex, influenced by strategies, player conditions, and in-game dynamics. Traditional models like xG rely on static data, missing tactical shifts and interactions. Advances in EPTS, OTS, and AI-driven analytics now provide high-resolution spatiotemporal data, enhancing accuracy. However, integrating dynamic and static features remains underexplored. This study develops a combined model to improve goal prediction methodologies. METHODS: This study analyzed data from 52 matches in the 2023 U17 Men’s World Cup, focusing on player positions, ball trajectories, and time-series events. Data was collected via an optical tracking system, with preprocessing and model development in MATLAB R2022b. Two feature types were extracted: dynamic features, capturing match evolution through x, y coordinates of 10 key events before a shot, and static features, including shot location, defensive pressure, and shooter posture for contextual insights. The target variable was goal occurrence (1 for goal, 0 for miss). Three deep learning models were implemented. The MLP model processed static features with two hidden layers (128 and 64 neurons, ReLU activation). The LSTM model analyzed sequential data with 30 hidden units and dropout to prevent overfitting. The LSTM+MLP hybrid model fused dynamic and static features at the mid-layer for improved prediction. All models were optimized using binary cross-entropy loss and the Adam optimizer. Data was split 80%-20% for training and testing, with oversampling to balance classes. RESULTS: The LSTM+MLP hybrid model achieved the best performance in goal prediction, with an accuracy of 93.5%, recall of 95.7%, and precision of 89.8% on the test set, making it well-suited for real-time tactical analysis. The LSTM model excelled in extracting temporal features, achieving 91.7% accuracy and 90.1% precision. However, due to the absence of static feature support, its recall was limited to 86.5%, making it more suitable for post-match tactical review. In contrast, the MLP model, relying solely on static features, performed the weakest, with 80.4% accuracy and 71.5% recall, failing to effectively predict goal events and being applicable only to static tactical analysis tasks. CONCLUSION: This study introduces a deep learning-based goal prediction model that leverages the strengths of LSTM and MLP to enhance goal-scoring event prediction, offering data-driven insights to support real-time tactical decision-making for coaches. The findings highlight that integrating time-series data with static tactical context significantly improves prediction accuracy. Moreover, the mid-layer feature fusion strategy employed in this study outperforms simple feature concatenation and late-stage fusion in effectively integrating dynamic and static features. Future research should focus on expanding the dataset to enhance model generalization and exploring model compression and lightweight design for real-time applications.

Read CV Rui Zhou

ECSS Paris 2023: CP-AP03

Speaker C Onur TÜTÜNCÜ

Speaker C

Onur TÜTÜNCÜ
Istanbul Gedik University, faculty of sport sciences
Turkey
"Will Positional Shifts Replace Formations? A Tactical Revolution in Football"

INTRODUCTION: The modern era of football has witnessed a significant increase in various aspects of the game, including the pace of play, player mobility, tactical mobilisation strategies, positional shifts, and the total distance covered at high intensity (1, 2, 3). Moreover, in the modern game, players positions and roles are more fluid and complex than ever before. This trend has significantly reduced adherence to rigid tactical formations (4-3-3-, 4-4-2, etc.). The main aim of this study is therefore to analyze the tactical formation used to position the teams in the process leading to the goal. METHODS: Three machine learning models have been developed to evaluate the discrimination of the formation distribution in the buildup to a goal (60second) from the general formation distribution of the match and other different formations (60minutes). A panel data analysis was conducted in order to examine how football formations transform over different time intervals (0, 1-15sec, 16-30sec, 31-45sec, 46-60sec and Mean-60min. ). The Ordinary Least Squares (OLS) regression method was employed in order to model the Euclidean distances of player positions across various time intervals and to assess the impact of each interval. The dataset consists of 969,408,000 refined positional data points for analysis, derived from 306 matches played during the 2016–2017 Bundesliga season. Exclusion criteria were applied to ensure the datasets purity and homogeneity. RESULTS: Model 1 (4-3-3) achieved 57% accuracy for 4-3-3 and 4-2-3-1 but only 29% for 4-4-2. Model 2 (4-2-3-1) demonstrated 43% accuracy across all formations. Model 3 (4-4-2) attained 29% accuracy for 4-4-2, 57% for 4-3-3, and 71% for 4-2-3-1. The 4-2-3-1 model (R²= 0.824, F(15,160)=66.72, p<.001) explained 82.4% of the variance in Euclidean distances, with significant differences at 46–60s (coef =1.5435, p<.001), 31–45s (coef =1.4499,p< .001), and 16–30s (coef=1.2211, p<.001), indicating tactical adjustments before a goal. The 4-4-2 model (R²=0.786, F(15,160)=45.45, p<.001) accounted for 78.6% of the variance, with significant changes at 46–60s (coef = 1.4720, p < .001), 31–45s (coef=1.1379, p <.001), and 16–30s (coef = 0.8432, p =.001), reflecting structural shifts as the goal moment nears. The 4-3-3 model (R²=0.735, F(15,160) = 28.16, p < .001) explained 73.5% of the variance, with significant differences at 16–30s (coef = 1.7846, p < .001) and 31–45s (coef=1.3852, p<.001). The 46–60s interval (coef=0.4642, p=.124) was not statistically significant. CONCLUSION: The main findings of this study are as follows: Firstly, teams are unable to maintain their initial formation patterns during the sequences leading to a goal. Secondly, even distinct tactical formations tend to converge into similar patterns at the moment of scoring. As a result, since different formations adopt similar structural patterns in goal-scoring processes, it cannot be concluded that a specific formation provides a positive or negative advantage in offensive play.

Read CV Onur TÜTÜNCÜ

ECSS Paris 2023: CP-AP03