Crow wrote: ↑Fri Jun 14, 2019 9:06 pm
I haven't tried to read or follow the links to method.
I'll just ask are the "chances to win" just averaged or do the weights vary by player? I'd think they'd have to be variably weighted.
So given that each player in the team has a different win chance (lets call this metric Wn, which can be calculated according to that article), let's look at an example.
Team1 vs Team2
What you can do is one of the following:
Either
A) Average all the players' Wn number for Team1 and Team2 and determine who is going to win based on whichever team has the higher averaged Wn. So Team1_Wn = sum(Wn)/N1 , Team2_Wn = sum(Wn)/N2 where N1, N2 the players in the two team rosters respectively. If Team1_Wn > Team2_Wn then Team1 has a better chance to win.
How much better chance? Then you will need something like logistic regression from past games data (or simply some sigmoid function) to map the difference into a probability.
You could as you say do a weighted averaging but figuring out the weights might be tricky. You could do something empirical like weighing by experience (games played) or expected minutes played etc etc.
or
B) even better than averaging player Wn's into a single team Wn number is to instead do regression on the full dimensionality. So in other words (and this will only work on teams with equal number of players so you need to make some choices here), take 10 players from Team1 and 10 players from Team2. Then do a regression (from past games data) from the 20 dimensional Wn space to the win/lose space. This should give you a probabilistic mapping that can predict win chance of teams (of equal number of players) without having to weigh+sum to single numbers.
Crow wrote: ↑Fri Jun 14, 2019 9:06 pm
"I also found that a combination of player-based and team-based features works the best" I'd agree to that. Though I wonder how do you get the team based for coming season when players and roles may have changed and coaching and system may have changed or tweaked?
Well that's the tricky part isnt it?
Generally player stats are not difficult to predict from season to season. However, team stats are. Due to composition changes, and all the reasons you mentioned above.
I do not have a solution to this. At least a good one.
What I am doing at the moment (until I figure out something better) is to do a linear combination of the previous season's team stats with the current season stats, in a moving window approach.
So if last year they played 82 games (Reg. Season) and this season they played 5 games then a Stat is going to be:
Stat_current = Stat_last * w1 + Stat_current*w2
where:
w1= max(0, 77/82) = 0.939
w2= min(1, 5/82) = 0.0609
Stat_current = Stat_last*0.939 + Stat_current*0.0609
And as the current season progresses the contribution from last season's stats goes down.
Crow wrote: ↑Fri Jun 14, 2019 9:06 pm
Do you just use last season data or do you try to project with age curves, team "optimization efficiency analysis", etc.?
No, not projecting anything at the moment. I am using data from 2004/05 up until 2018/19. So almost 15 years of historical data.
This sort of takes care of diminishing player performance with age since changing player stats from every year together with age and experience are mapped to game outcomes. However, this is not the same for teams because there is no temporal continuity of a team between seasons, as it can change a lot.
Anyway, this is really basic and I am currently trying to figure out a better approach. I think this early season uncertainty (until the stats mature) is responsible for this early season dip in prediction accuracy.
If anyone has better ideas please feel free to share.
Maybe some time-series prediction of team stats from player stats using a long short-term memory model (LSTM).