Page 1 of 1
Womens Play By Play Data?
Posted: Wed Feb 07, 2024 9:36 pm
by RowRowFan
Hello, I want to try to parse Play by Play data for College womens basketball, similar to
https://www.bigdataball.com/datasets/wnba/historical/
As of right now, the only code I found on github had everything except for the players on the court, which is a bit important as I want to run RAPM using it. I was thinking of doing it manually but I was just wondering if anyone new any resource or any place it was already availble, either coding wise or directly like in bigdatabball. Thanks!
Re: Womens Play By Play Data?
Posted: Thu Feb 08, 2024 12:36 am
by Crow
Does what bigdataball have what you want?
Have you checked with herhoopstats.com?
Can you scrape from ncaaw official site or possibly request a download? Are you doing basic research to share with public or for a commercial purpose?
Re: Womens Play By Play Data?
Posted: Thu Feb 08, 2024 1:04 am
by RowRowFan
Crow wrote: ↑Thu Feb 08, 2024 12:36 am
Does what bigdataball have what you want?
Have you checked with herhoopstats.com?
Can you scrape from ncaaw official site or possibly request a download? Are you doing basic research to share with public or for a commercial purpose?
BigDataBall doesn't have NCAA Womens data, but it essentially has the format im looking for in its WNBA and NBA data since I wanted to get net rtg and play around with lineup data and stuff like that form it just for fun, and herhoopsstats didnt have much either.
I was thinking of figuring out how to scrape the data but wanted to see if there was a simply option to purchase it commercially first, or a tutorial in terms of scraping it in the same format as BigDataBall. I used the wehoop package originally but it lacked the players on the floor
Re: Womens Play By Play Data?
Posted: Thu Feb 08, 2024 2:44 am
by rjb2
This will be sort of a solution. There is an R package that scrapes pbp from stats.ncaa called bigballR. There is a function called get_play_by_play which scrapes the pbp and a function called get_possessions which parses it and returns information that includes the players on the court. The problem is that it's built for men's basketball, which means that it may be difficult to efficiently scrape the women's games. If you are able to scrape the pbp ID's for women's games then you should be good.
https://github.com/jflancer/bigballR
Re: Womens Play By Play Data?
Posted: Thu Feb 08, 2024 7:22 am
by RowRowFan
rjb2 wrote: ↑Thu Feb 08, 2024 2:44 am
This will be sort of a solution. There is an R package that scrapes pbp from stats.ncaa called bigballR. There is a function called get_play_by_play which scrapes the pbp and a function called get_possessions which parses it and returns information that includes the players on the court. The problem is that it's built for men's basketball, which means that it may be difficult to efficiently scrape the women's games. If you are able to scrape the pbp ID's for women's games then you should be good.
https://github.com/jflancer/bigballR
Thank you! The season id is what made it mens seasons so I can just parse that and go by every date and it should work. Youre a life saver!
Re: Womens Play By Play Data?
Posted: Fri Feb 09, 2024 1:01 am
by RowRowFan
So its working but its incredibly slow (going to take around 2-3 days to get the data I need), or is that just how long it would be expected to take? it gets box score id, and then play by play id since it cant get play by play id directly.
Re: Womens Play By Play Data?
Posted: Fri Feb 09, 2024 1:47 am
by rjb2
RowRowFan wrote: ↑Fri Feb 09, 2024 1:01 am
So its working but its incredibly slow (going to take around 2-3 days to get the data I need), or is that just how long it would be expected to take? it gets box score id, and then play by play id since it cant get play by play id directly.
Yeah scraping a lot of games takes awhile. First it compiles all the ID's then scrapes each game individually.