How to get started?

Home for all your discussion of basketball statistical analysis.
Post Reply
Sartre
Posts: 6
Joined: Mon Jan 06, 2014 11:02 pm

How to get started?

Post by Sartre »

Hi!

I´m new to predictive modeling and was wondering if I could get som questions answered?
First off, I´ve searched all over for like a "beginners guide" on how to set up a model or even how to get my head around the concept with no luck.
How do you get started??

What do I use? Excel/ Stata/ R?
I get that I import raw data from various sites, but then what? How do I get from there to a model that predict something?

As you can see, I´m an ultra newbie but I´m interested and intrigued! I´m at square one! Where do I go from here?

Is there a "predictive modeling for dummies" ? ;)

Best regards
/S
nileriver
Posts: 63
Joined: Thu Jul 18, 2013 3:24 pm
Location: Vancouver, WA

Re: How to get started?

Post by nileriver »

Coming form the perspective of a SQL developer, I would first get that raw data into a database. I prefer using SQL Server as my database platform, and you can get a developers copy of it on Amazon for around $40 (http://goo.gl/cDff5Z). You can also use MySQL which is free but is more limited in certain areas. From there you can connect R to your database to do your statistical modelling. I have much more experience on the database side of things than on the predictive modelling side. I think that others on here could give you some guidance in that area. If you do happen to have any database related questions let me know.
Sartre
Posts: 6
Joined: Mon Jan 06, 2014 11:02 pm

Re: How to get started?

Post by Sartre »

nileriver wrote:Coming form the perspective of a SQL developer, I would first get that raw data into a database. I prefer using SQL Server as my database platform, and you can get a developers copy of it on Amazon for around $40 (http://goo.gl/cDff5Z). You can also use MySQL which is free but is more limited in certain areas. From there you can connect R to your database to do your statistical modelling. I have much more experience on the database side of things than on the predictive modelling side. I think that others on here could give you some guidance in that area. If you do happen to have any database related questions let me know.
Great answer, thanks!!

I use mac, does Navicat work?

I´m sure I´ll come up with a ton of questions for you as soon as I understand more...

That leaves that little question of predicting and modeling ;)

I really appreciate the help!
nileriver
Posts: 63
Joined: Thu Jul 18, 2013 3:24 pm
Location: Vancouver, WA

Re: How to get started?

Post by nileriver »

Sartre wrote:
nileriver wrote:Coming form the perspective of a SQL developer, I would first get that raw data into a database. I prefer using SQL Server as my database platform, and you can get a developers copy of it on Amazon for around $40 (http://goo.gl/cDff5Z). You can also use MySQL which is free but is more limited in certain areas. From there you can connect R to your database to do your statistical modelling. I have much more experience on the database side of things than on the predictive modelling side. I think that others on here could give you some guidance in that area. If you do happen to have any database related questions let me know.
Great answer, thanks!!

I use mac, does Navicat work?

I´m sure I´ll come up with a ton of questions for you as soon as I understand more...

That leaves that little question of predicting and modeling ;)

I really appreciate the help!
Navicat is the front end application for the database. You still need the underlying database to connect to. What Navicat is used for is to write queries and perform actions on an already existing database.

You mentioned you are on Mac which means you can't use SQL Server unless you dual boot. With that in mind I would guide you towards MySQL or PostgreSQL. I worked with MySQL in college and found it lacking but good for a free option. I have not worked with PostgreSQL at all, so I have no comment on it. You can always go to a site like http://www.stackoverflow.com to learn more about the various database platforms.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: How to get started?

Post by J.E. »

I love Python and use it for pretty much anything

The pyodbc module is great when working with a (SQL or other) database
Scikit-learn is great for machine learning - related problems

It's all free, too

Is there anything in particular you want to predict?
Sartre
Posts: 6
Joined: Mon Jan 06, 2014 11:02 pm

Re: How to get started?

Post by Sartre »

J.E. wrote:I love Python and use it for pretty much anything

The pyodbc module is great when working with a (SQL or other) database
Scikit-learn is great for machine learning - related problems

It's all free, too

Is there anything in particular you want to predict?
Thank you so much!

Ok, so I´ll give Python a try and Scikit-learn is for understanding how it all works, correct?

So can I use SQL on my mac? This one says I can http://www.razorsql.com/articles/sql_server_mac.html
What other database is otherwise recommended for a mac user?

I want to predict o/u primarily; I just can´t get my head around how this all works. In my mind I put together a core model (when I learn that) and then feed that model different filters/variables that I think is important. But how is that done? I mean, these are abstract thoughts I have, how do I convert them into numbers? In the end, I want to see a score that is more real than the line on offer 8-)

What have I gotten my self into?

Again, I really appreciate the help!
v-zero
Posts: 520
Joined: Sat Oct 27, 2012 12:30 pm

Re: How to get started?

Post by v-zero »

nileriver wrote:Coming form the perspective of a SQL developer, I would first get that raw data into a database. I prefer using SQL Server as my database platform, and you can get a developers copy of it on Amazon for around $40 (http://goo.gl/cDff5Z). You can also use MySQL which is free but is more limited in certain areas. From there you can connect R to your database to do your statistical modelling. I have much more experience on the database side of things than on the predictive modelling side. I think that others on here could give you some guidance in that area. If you do happen to have any database related questions let me know.
I am driven to ask: why wouldn't you suggest the free MS SQL Express 2012, which allows individual databases up to 10GB? I know $40 is nothing in the farcical world of MS SQL licensing, but still...

Anyway, sorry to derail. R is probably a good place to start with writing small scripts for calculating interesting things, though Python is my platform of choice for everything. I'll let others cover the rest.
nileriver
Posts: 63
Joined: Thu Jul 18, 2013 3:24 pm
Location: Vancouver, WA

Re: How to get started?

Post by nileriver »

v-zero wrote:
nileriver wrote:Coming form the perspective of a SQL developer, I would first get that raw data into a database. I prefer using SQL Server as my database platform, and you can get a developers copy of it on Amazon for around $40 (http://goo.gl/cDff5Z). You can also use MySQL which is free but is more limited in certain areas. From there you can connect R to your database to do your statistical modelling. I have much more experience on the database side of things than on the predictive modelling side. I think that others on here could give you some guidance in that area. If you do happen to have any database related questions let me know.
Whilst I try to avoid MSFT solutions altogether, but sadly have to work with them, I am driven to ask: why wouldn't you suggest the free MS SQL Express 2012, which allows individual databases up to 10GB? I know $40 is nothing in the farcical world of MS SQL licensing, but still...

Anyway, sorry to derail. R is probably a good place to start with writing small scripts for calculating interesting things, though Python is my platform of choice for everything. I'll let others cover the rest.
You are correct. I should have also mentioned the express edition of SQL Server. My mind immediately jumped to the developer edition because that is what I am using to develop for my personal stats website. Also, the express edition has other limitations besides 10 GB database limitation. Out of curiosity what are your issues with SQL Server? I have been a full time SQL developer for several years and really enjoy working with it and the tools it has (mainly SSIS and SSRS).

Sartre - I think that you are confusing the database engine and the tool you use to develop with. You will first need to have a database engine installed on a computer (which could be your Mac). You then use a development tool to connect to your database.

Database Engines:
SQL Server
MySQL
PostgreSQL

Development Tools:
SQL Server Management Studio
MySQL Workbench
Navicat
RazorSQL
v-zero
Posts: 520
Joined: Sat Oct 27, 2012 12:30 pm

Re: How to get started?

Post by v-zero »

I wouldn't think the additional limitations would really restrict the vast majority of hobbyist users, but you're right. You'll notice I edited my response before you replied, but after you quoted, removing the vitriol. I actually think that SQL Server is an excellent product, I too use it a lot for work. However, for my own purposes (and usually those of others) I despise platform dependence, especially when that dependence can come at a high financial cost. :lol: You can probably guess which OS I run on all my home machines (outside of my media PC for which Silverlight remains a clunky necessity for the time being). :geek:
nileriver
Posts: 63
Joined: Thu Jul 18, 2013 3:24 pm
Location: Vancouver, WA

Re: How to get started?

Post by nileriver »

v-zero wrote:I wouldn't think the additional limitations would really restrict the vast majority of hobbyist users, but you're right. You'll notice I edited my response before you replied, but after you quoted, removing the vitriol. I actually think that SQL Server is an excellent product, I too use it a lot for work. However, for my own purposes (and usually those of others) I despise platform dependence, especially when that dependence can come at a high financial cost. :lol: You can probably guess which OS I run on all my home machines (outside of my media PC for which Silverlight remains a clunky necessity for the time being). :geek:
I definitely agree that most users would not have a different experience using Express over Developer. Also, I appreciate someone that has an opinion. I was just curious what issues you have encountered with it. I can imagine that the Windows limitation would be frustrating for Mac or Linux users. For me, I have jumped into the Microsoft ecosystem for development and have never looked back.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: How to get started?

Post by J.E. »

Sartre wrote:Scikit-learn is for understanding how it all works, correct?
No. Scikit-learn is a library that provides you with tools/algorithms that help you do predictive modeling, so you don't have to implement all those algorithms yourself
I want to predict o/u primarily; I just can´t get my head around how this all works. In my mind I put together a core model (when I learn that) and then feed that model different filters/variables that I think is important. But how is that done? I mean, these are abstract thoughts I have, how do I convert them into numbers? In the end, I want to see a score that is more real than the line on offer 8-)
You could probably make a new thread for predicting O/U and then everybody can chime and and give you some ideas on how to tackle such a problem. If you wanted to start out in Python, you could write a small script that downloads a page that lists all scores so far this season (like here http://www.basketball-reference.com/lea ... games.html, probably using urllib) and then try to extract the the useful information from the html with a parser (you can write a small parser yourself, or try BeautifulSoup)

Obviously, look for Python tutorials.

Stackoverflow and Cross Validated are also very good resources
Mike G
Posts: 6175
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: How to get started?

Post by Mike G »

What the heck is O/U ?
Sartre
Posts: 6
Joined: Mon Jan 06, 2014 11:02 pm

Re: How to get started?

Post by Sartre »

Mike G wrote:What the heck is O/U ?
over/under. If the line is @ 195 I want to predict if the line is to high or to low.
Sartre
Posts: 6
Joined: Mon Jan 06, 2014 11:02 pm

Re: How to get started?

Post by Sartre »

J.E. wrote:
Sartre wrote:Scikit-learn is for understanding how it all works, correct?
No. Scikit-learn is a library that provides you with tools/algorithms that help you do predictive modeling, so you don't have to implement all those algorithms yourself
I want to predict o/u primarily; I just can´t get my head around how this all works. In my mind I put together a core model (when I learn that) and then feed that model different filters/variables that I think is important. But how is that done? I mean, these are abstract thoughts I have, how do I convert them into numbers? In the end, I want to see a score that is more real than the line on offer 8-)
You could probably make a new thread for predicting O/U and then everybody can chime and and give you some ideas on how to tackle such a problem. If you wanted to start out in Python, you could write a small script that downloads a page that lists all scores so far this season (like here http://www.basketball-reference.com/lea ... games.html, probably using urllib) and then try to extract the the useful information from the html with a parser (you can write a small parser yourself, or try BeautifulSoup)

Obviously, look for Python tutorials.

Stackoverflow and Cross Validated are also very good resources

That is such a perfect answer, thank you so much! I downloaded Python but couldn't even get it started, lol. I`m a computer retard.
I`m looking over tutorials at this moment. But I´m tired and I live in Sweden so I better get some sleep or else I`ll confuse things even more...

I can`t write a script, I can barely write a nice google search ;)
Thats why I keep asking myself why I`m so intrigued by this...

The more I read the more questions come up...;)

I`ll be back soon with a ton of questions as I learn more, thanks for all your help!
/S
Post Reply