In his landmark paper "Statistical Modeling: The Two Cultures" Leo Breiman [1], the father of tree-based machine learning (ML), argued that modeling should focus more on predictive algorithmic modeling rather than classical statistical modeling. Twenty-five years later, sport science is being introduced to new ML approaches for training scheduling, injury prevention, and talent identification. The pendulum seems to have swung in Breiman's favor, especially with explainable ML (XAI) options that combine ML's predictive power with statistical modeling insights. However, interpretation methods like feature importance analysis offer insights into what models learn only when variables are interpretable and uncorrelated. When features are correlated, these techniques become unreliable. This highlights a critical issue: although approaches may be labeled as explainable ML, explanations often don't make sense or lack sufficient detail to understand the black box's operations. This invited symposium aims to guide sports scientists and practitioners in adopting explainable ML for their research questions and determining when statistical modeling might be more appropriate. The first two talks will examine this topic from both a statistician's and machine learner's perspective, while the final talk will provide practical advice and examples of using both approaches effectively.
ECSS Lausanne 2026: IS-AP07 [41444]
AI or more precisely Machine Learning approaches have had great success in recent years and have started to affect society as a whole [1]. If ML approaches are used in the sports science literature, the reader might reasonably assume that the advantages of doing so outweigh the disadvantages compared to alternative statistical models. ML and statistical models, however, are not just two different approaches to the same problem, but may have inherently different goals and strengths by design. Whether in computer vision, recommender systems or chatbots, ML has demonstrated the biggest success in prediction tasks. Sports science requires more than prediction as we are interested in understanding processes and establish causality in the sense of causal inference [2], which most ML methods are not initially designed for. While explainable AI (XAI) holds the potential to improve our understanding of what is happening in the so-called and often criticised black boxes [3] of ML models, cautious and critical discussion on such approaches is necessary and more questions are open than answered. The maybe most relevant question is whether we need to use ML and then search for post hoc explanations by XAI or whether we can directly use well-explainable statistical models. If all we know and use is simple statistical approaches, the advocates of ML can legitimately argue that it offers more flexibility and higher degrees of freedom. Yet, if moving away from standard approaches, statistical modelling as well offers a high flexibility and the opportunity to create exactly the model that fits our purpose and that is explainable by nature. Doing so, in a first step, involves a critical a priori discussion on relevant concepts, sources of variation, possible confounding factors and expected causal structures. In addition, we must critically examine whether standard statistical models are sufficient or we need to look beyond classical techniques by considering e.g. Monte Carlo simulation, Markov models, bootstrapping, or statistical modelling (i.e. deliberate and customized mathematical representation of the relationships). The talk will illustrate these points using a model to analyse the influence of incomplete information on the accuracy of predictions for football matches. A unique advantage of the model is that, by means of statistical modeling, actual predictions can be compared to statistically optimal predictions. 1. European Union. Artificial Intelligence Act. 2024. Available from: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689. 2. Gelman A, Vehtari A. What are the Most Important Statistical Ideas of the Past 50 Years. Journal of the American Statistical Association. 2021; 116:2087–97. doi: 10.1080/01621459.2021.1938081. 3. Bullock GS, Hughes T, Arundale AH, Ward P, Collins GS, Kluzek S. Black Box Prediction Methods in Sports Medicine Deserve a Red Card for Reckless Practice: A Change of Tactics is Needed to Advance Athlete Care. Spo
ECSS Lausanne 2026: IS-AP07 [41445]
Machine learning approaches, particularly predictive modeling techniques, are increasingly being used within sports science. These models are well suited for data that include many variables, nonlinear relationships, and complex interactions that are difficult to specify manually. For example, models such as random forests or gradient boosted decision trees can automatically capture patterns that would require extensive human-guided feature engineering in classical regression models. Unlike traditional statistical approaches, machine learning methods do not assume linearity; instead, they learn relationships directly from the data and flexibly model interactions. This makes them especially powerful in situations where the underlying functional forms are unknown or hard to define in advance. At the same time, it is important to understand what these models are learning. They identify correlations in the data, such as “when feature X increases, the outcome Y tends to increase,” and use these patterns to make accurate predictions. However, they do not necessarily capture why X and Y are related; they simply recognize that the relationship is consistently present. This contrasts with methods that aim to uncover causal mechanisms. As a result, highly accurate machine learning models may be useful, but can be difficult to interpret, and the reasoning behind their predictions may not be transparent. Nevertheless, recent work has shown that predictive performance does not need to come at the expense of explainability. This talk will touch upon two ways of making machine learning models explainable. On the one hand, post-hoc explanation methods, such as SHAP and LIME, allow practitioners to analyze trained models and assess how different inputs influence their outputs. These techniques provide practical ways to explore the behavior of “black-box” models without modifying their internal structure. On the other hand, explainability can be incorporated directly into the modeling process. One approach is to select inherently interpretable models, such as decision trees or generalized additive models. Even when using more complex models, it is possible to distill them into simpler, more transparent surrogates. Another approach is to design features that are meaningful and understandable within the application domain. This requires careful engineering and extensive domain knowledge, but can greatly improve transparency. By combining these strategies, practitioners can develop machine learning models that are both powerful and interpretable, supporting more reliable and responsible use in a sports setting.
ECSS Lausanne 2026: IS-AP07 [33050]
A common misconception is that complex black-box models are necessary for optimal predictive performance. However, this is often untrue, particularly when data is structured with naturally meaningful features. The assumed trade-off between accuracy and interpretability has led many researchers to abandon attempts at producing interpretable models. This problem is compounded by current training paradigms: researchers learn deep learning techniques but rarely study interpretable machine learning approaches. Consequently, off-the-shelf explainable machine learning (XAI) implementations are frequently applied to validate findings from predictive models. However, methods like LIME create separate surrogate models to approximate black-box behavior rather than directly interpreting the original model. These post-hoc explanations are never entirely faithful to the original model's computations. Perfect explanations are impossible by definition—they would simply replicate the original model, which would then already be interpretable. Despite these limitations, interpretability remains critical in high-stakes decision-making contexts. In sports, this includes determining whether an athlete should sit out competition to prevent injury, or whether a player should be acquired based on predicted superiority in specific actions (e.g., 1 vs. 1 situations). The choice of modeling approach fundamentally influences which variables appear important for such decisions. Two case studies will illustrate this point. The first demonstrates how statistical approaches versus machine learning methods yield different conclusions about which characteristics predict success in soccer 1 vs. 1 actions [1]. The second compares a simplistic off-the-shelf XAI approach with a more sophisticated implementation using Explainable Boosting Machines for injury prediction [2]. To enable broader adoption of these approaches and ensure reproducibility, the sports science community must prioritize appropriate reporting alongside data and code sharing. Open science principles, as outlined by Bullock et al. [4], should become standard practice in these publications. This includes adhering to reporting guidelines like TRIPOD-AI and following FAIR data principles to make research truly transparent and reusable. 1. Oonk, G. A., Buurke, T. J. W., Lemmink, K. A. P. M., & Kempe, M. (2025). The interaction between attacker and environment predicts successfulness in one-on-one dribbles in male elite football. Journal of Sports Sciences, 1–13. https://doi.org/10.1080/02640414.2025.2555117 2. Hecksteden, A., Kempe, M., & Berger, J. (2025). Perspectives on data analytics for gaining a competitive advantage in football: harnessing data for decision support. Science and Medicine in Football, 1–9. https://doi.org/10.1080/24733938.2025.2517056 3. Bullock, G.S., Ward, P., Impellizzeri, F.M. et al. The Trade Secret Taboo: Open Science Methods are Required to Improve Prediction Models in Sports Medicine and