Sports analytics have revolutionized the way games are understood, evolving from simple box score statistics in baseball to detailed pass attempt charts in football. In continuous invasion sports like soccer and hockey, where play is fast and decisions are made in real time, much of the most valuable data remains limited to proprietary systems within professional organizations.
Researchers at the University of Waterloo have developed an AI-based method to make complex tracking data more accessible to scientists, teams, and fans.
Simulating the Game
The project was led by Dr. David Radke, a recent Waterloo PhD graduate and current senior research scientist with the NHL’s Chicago Blackhawks, and PhD student Kyle Tilbury. The team utilized Google Research Football, a platform that enables AI systems to play simulated soccer matches. With this tool, the team trained virtual players to move, pass, and adjust tactics across thousands of simulated matches, generating a dataset that reflects the flow of real games for analysis and strategy development.
To demonstrate this method, the researchers generated tracking data from 3,000 simulated games, recording information such as passes, goals, and player movements. In this context, tracking data refers to precise records of each player’s position and actions at every moment in the simulation. Although the AI players do not perform at the level of professional athletes, the datasets are detailed enough to support research in sports analytics.
“While researchers have access to a lot of data about episodic sports like baseball, continuous invasion-game sports like soccer and hockey are much more difficult to analyze,” Radke said. “While the AI-generated players might not exactly play like Lionel Messi, the simulated datasets they generate are still useful for developing sports analysis tools.”
Equal Opportunities
Professional teams heavily invest in advanced analytics systems to track every movement and decision made by players while on the field. Due to the complex analytical capabilities of these systems, the technology is often prohibitively expensive for smaller organizations, universities, or independent analysts.
By simulating matches with AI, Radke and Tilbury’s method produces open-access sports data, enabling broader participation in analytics research beyond professional organizations.
“Enabling researchers to have this data will open up all kinds of opportunities,” Tilbury said. “It’s a democratization of access to this kind of sports analytics data.”
Beyond the Box Score
Soccer, hockey, and other invasion sports are difficult to analyze because play is continuous and athletes are always in motion. Invasion sports are those where teams compete to gain possession and move into the opponent’s territory to score. Unlike baseball, where each pitch is a separate event, soccer involves a constant flow of decisions such as where to pass, when to shoot, and how to defend. Without advanced tracking systems, it has been hard to capture and study the complexity of these movements.
These datasets are important because they allow researchers to develop and test new tools for player evaluation, strategy modeling, and outcome prediction. By increasing the availability of this technology to universities and smaller organizations, the project may encourage broader innovation in the analytical study of invasion sports.
Implications for Future AI Research
Beyond sports, the work contributes to the development of AI itself. In Radke’s words:
“At its core, invasion-game sports analytics is about understanding complex multiagent systems. The better we are at modeling the complexity of human behaviour in a sporting situation, the more useful that is for AI research. In turn, more advanced multiagent systems will help us better understand invasion-game sports.”
By utilizing soccer matches as platforms to study cooperation, competition, and decision-making, this research enhances AI’s ability to model dynamic multiagent systems. Real-world application of data collected from tracking player movement can also inspire AI developments in areas such as self-driving cars, robotics, and collaborative problem-solving in complex environments.
Opening the Playbook
The research team emphasizes that widespread access to tracking data is essential for advancing the field of sports analytics. Their simulated datasets are designed to support further research and provide students and scientists with opportunities to develop models without relying on restricted data provided by professional leagues.
The study was presented at the 24th International Conference on Autonomous Agents and Multiagent Systems. For Radke and Tilbury, it’s not just about soccer, it’s about giving more people the tools to innovate.
Open access to sports data will not only accelerate advances in analytics and AI research but also empower a wider community to tackle complex challenges in modeling human behavior and decision-making, shaping the future of both sports and technology.
Austin Burgess is a writer and researcher with a background in sales, marketing, and data analytics. He holds a Master of Business Administration and a Bachelor of Science in Business Administration, along with a certification in Data Analytics. His work combines analytical training with a focus on emerging science, aerospace, and astronomical research.