Non Symmetric XGBoost – Tennis Match Predictions

51 views Asked by At

I'm working on developing an algorithm to predict the outcomes of tennis matches, focusing on player matchups. The core of my model is an XGBoost classifier. I've encountered a puzzling issue in the predictions and I'm hoping to gain some insights from this community.

The Problem: The algorithm is designed to predict the likelihood of a player (player1) defeating their opponent (player2). Our dataset originally labeled all winners as 'player1'. To balance the dataset and accommodate both perspectives (player1 as the winner and player2 as the winner), I've duplicated and reversed the matches in both the training and testing sets.

The expectation was that flipping the players should inversely flip the predictions (e.g., if player1 has a 70% chance to win, then player2 in the swapped row should also have 70% chance). However, the predictions are not aligning as expected.

Example Issue: Take, for example, the predictions involving "Lorenzo Giustino". In one scenario (row 100), the model predicts a 99% probability of Giustino winning. However, when the match is reversed (row 101), the probability changes to 46%, which is a significant and unexpected discrepancy.

enter image description here

I'm seeking guidance on:

Why is this discrepancy occurring, especially to such a significant degree? Are there any specific aspects of the XGBoost model or the data preparation process that I might be overlooking? Any insights or suggestions would be greatly appreciated. Thank you in advance for your help!

I tried to use Catboost, since I read that this algorithm works with a symmetric tree. I though that this could solve the problem, but the issue is still present.

0

There are 0 answers