Page 1 of 1

How to get started?

Posted: Mon Jan 06, 2014 11:11 pm
by Sartre
Hi!

I´m new to predictive modeling and was wondering if I could get som questions answered?
First off, I´ve searched all over for like a "beginners guide" on how to set up a model or even how to get my head around the concept with no luck.
How do you get started??

What do I use? Excel/ Stata/ R?
I get that I import raw data from various sites, but then what? How do I get from there to a model that predict something?

As you can see, I´m an ultra newbie but I´m interested and intrigued! I´m at square one! Where do I go from here?

Is there a "predictive modeling for dummies" ? ;)

Best regards
/S

Re: How to get started?

Posted: Mon Jan 06, 2014 11:32 pm
by nileriver
Coming form the perspective of a SQL developer, I would first get that raw data into a database. I prefer using SQL Server as my database platform, and you can get a developers copy of it on Amazon for around $40 (http://goo.gl/cDff5Z). You can also use MySQL which is free but is more limited in certain areas. From there you can connect R to your database to do your statistical modelling. I have much more experience on the database side of things than on the predictive modelling side. I think that others on here could give you some guidance in that area. If you do happen to have any database related questions let me know.

Re: How to get started?

Posted: Tue Jan 07, 2014 12:36 am
by Sartre
nileriver wrote:Coming form the perspective of a SQL developer, I would first get that raw data into a database. I prefer using SQL Server as my database platform, and you can get a developers copy of it on Amazon for around $40 (http://goo.gl/cDff5Z). You can also use MySQL which is free but is more limited in certain areas. From there you can connect R to your database to do your statistical modelling. I have much more experience on the database side of things than on the predictive modelling side. I think that others on here could give you some guidance in that area. If you do happen to have any database related questions let me know.
Great answer, thanks!!

I use mac, does Navicat work?

I´m sure I´ll come up with a ton of questions for you as soon as I understand more...

That leaves that little question of predicting and modeling ;)

I really appreciate the help!

Re: How to get started?

Posted: Tue Jan 07, 2014 1:02 am
by nileriver
Sartre wrote:
nileriver wrote:Coming form the perspective of a SQL developer, I would first get that raw data into a database. I prefer using SQL Server as my database platform, and you can get a developers copy of it on Amazon for around $40 (http://goo.gl/cDff5Z). You can also use MySQL which is free but is more limited in certain areas. From there you can connect R to your database to do your statistical modelling. I have much more experience on the database side of things than on the predictive modelling side. I think that others on here could give you some guidance in that area. If you do happen to have any database related questions let me know.
Great answer, thanks!!

I use mac, does Navicat work?

I´m sure I´ll come up with a ton of questions for you as soon as I understand more...

That leaves that little question of predicting and modeling ;)

I really appreciate the help!
Navicat is the front end application for the database. You still need the underlying database to connect to. What Navicat is used for is to write queries and perform actions on an already existing database.

You mentioned you are on Mac which means you can't use SQL Server unless you dual boot. With that in mind I would guide you towards MySQL or PostgreSQL. I worked with MySQL in college and found it lacking but good for a free option. I have not worked with PostgreSQL at all, so I have no comment on it. You can always go to a site like http://www.stackoverflow.com to learn more about the various database platforms.

Re: How to get started?

Posted: Tue Jan 07, 2014 9:36 pm
by J.E.
I love Python and use it for pretty much anything

The pyodbc module is great when working with a (SQL or other) database
Scikit-learn is great for machine learning - related problems

It's all free, too

Is there anything in particular you want to predict?

Re: How to get started?

Posted: Tue Jan 07, 2014 10:24 pm
by Sartre
J.E. wrote:I love Python and use it for pretty much anything

The pyodbc module is great when working with a (SQL or other) database
Scikit-learn is great for machine learning - related problems

It's all free, too

Is there anything in particular you want to predict?
Thank you so much!

Ok, so I´ll give Python a try and Scikit-learn is for understanding how it all works, correct?

So can I use SQL on my mac? This one says I can http://www.razorsql.com/articles/sql_server_mac.html
What other database is otherwise recommended for a mac user?

I want to predict o/u primarily; I just can´t get my head around how this all works. In my mind I put together a core model (when I learn that) and then feed that model different filters/variables that I think is important. But how is that done? I mean, these are abstract thoughts I have, how do I convert them into numbers? In the end, I want to see a score that is more real than the line on offer 8-)

What have I gotten my self into?

Again, I really appreciate the help!

Re: How to get started?

Posted: Tue Jan 07, 2014 10:28 pm
by v-zero
nileriver wrote:Coming form the perspective of a SQL developer, I would first get that raw data into a database. I prefer using SQL Server as my database platform, and you can get a developers copy of it on Amazon for around $40 (http://goo.gl/cDff5Z). You can also use MySQL which is free but is more limited in certain areas. From there you can connect R to your database to do your statistical modelling. I have much more experience on the database side of things than on the predictive modelling side. I think that others on here could give you some guidance in that area. If you do happen to have any database related questions let me know.
I am driven to ask: why wouldn't you suggest the free MS SQL Express 2012, which allows individual databases up to 10GB? I know $40 is nothing in the farcical world of MS SQL licensing, but still...

Anyway, sorry to derail. R is probably a good place to start with writing small scripts for calculating interesting things, though Python is my platform of choice for everything. I'll let others cover the rest.

Re: How to get started?

Posted: Tue Jan 07, 2014 10:58 pm
by nileriver
v-zero wrote:
nileriver wrote:Coming form the perspective of a SQL developer, I would first get that raw data into a database. I prefer using SQL Server as my database platform, and you can get a developers copy of it on Amazon for around $40 (http://goo.gl/cDff5Z). You can also use MySQL which is free but is more limited in certain areas. From there you can connect R to your database to do your statistical modelling. I have much more experience on the database side of things than on the predictive modelling side. I think that others on here could give you some guidance in that area. If you do happen to have any database related questions let me know.
Whilst I try to avoid MSFT solutions altogether, but sadly have to work with them, I am driven to ask: why wouldn't you suggest the free MS SQL Express 2012, which allows individual databases up to 10GB? I know $40 is nothing in the farcical world of MS SQL licensing, but still...

Anyway, sorry to derail. R is probably a good place to start with writing small scripts for calculating interesting things, though Python is my platform of choice for everything. I'll let others cover the rest.
You are correct. I should have also mentioned the express edition of SQL Server. My mind immediately jumped to the developer edition because that is what I am using to develop for my personal stats website. Also, the express edition has other limitations besides 10 GB database limitation. Out of curiosity what are your issues with SQL Server? I have been a full time SQL developer for several years and really enjoy working with it and the tools it has (mainly SSIS and SSRS).

Sartre - I think that you are confusing the database engine and the tool you use to develop with. You will first need to have a database engine installed on a computer (which could be your Mac). You then use a development tool to connect to your database.

Database Engines:
SQL Server
MySQL
PostgreSQL

Development Tools:
SQL Server Management Studio
MySQL Workbench
Navicat
RazorSQL

Re: How to get started?

Posted: Tue Jan 07, 2014 11:30 pm
by v-zero
I wouldn't think the additional limitations would really restrict the vast majority of hobbyist users, but you're right. You'll notice I edited my response before you replied, but after you quoted, removing the vitriol. I actually think that SQL Server is an excellent product, I too use it a lot for work. However, for my own purposes (and usually those of others) I despise platform dependence, especially when that dependence can come at a high financial cost. :lol: You can probably guess which OS I run on all my home machines (outside of my media PC for which Silverlight remains a clunky necessity for the time being). :geek:

Re: How to get started?

Posted: Tue Jan 07, 2014 11:38 pm
by nileriver
v-zero wrote:I wouldn't think the additional limitations would really restrict the vast majority of hobbyist users, but you're right. You'll notice I edited my response before you replied, but after you quoted, removing the vitriol. I actually think that SQL Server is an excellent product, I too use it a lot for work. However, for my own purposes (and usually those of others) I despise platform dependence, especially when that dependence can come at a high financial cost. :lol: You can probably guess which OS I run on all my home machines (outside of my media PC for which Silverlight remains a clunky necessity for the time being). :geek:
I definitely agree that most users would not have a different experience using Express over Developer. Also, I appreciate someone that has an opinion. I was just curious what issues you have encountered with it. I can imagine that the Windows limitation would be frustrating for Mac or Linux users. For me, I have jumped into the Microsoft ecosystem for development and have never looked back.

Re: How to get started?

Posted: Wed Jan 08, 2014 8:14 pm
by J.E.
Sartre wrote:Scikit-learn is for understanding how it all works, correct?
No. Scikit-learn is a library that provides you with tools/algorithms that help you do predictive modeling, so you don't have to implement all those algorithms yourself
I want to predict o/u primarily; I just can´t get my head around how this all works. In my mind I put together a core model (when I learn that) and then feed that model different filters/variables that I think is important. But how is that done? I mean, these are abstract thoughts I have, how do I convert them into numbers? In the end, I want to see a score that is more real than the line on offer 8-)
You could probably make a new thread for predicting O/U and then everybody can chime and and give you some ideas on how to tackle such a problem. If you wanted to start out in Python, you could write a small script that downloads a page that lists all scores so far this season (like here http://www.basketball-reference.com/lea ... games.html, probably using urllib) and then try to extract the the useful information from the html with a parser (you can write a small parser yourself, or try BeautifulSoup)

Obviously, look for Python tutorials.

Stackoverflow and Cross Validated are also very good resources

Re: How to get started?

Posted: Wed Jan 08, 2014 10:01 pm
by Mike G
What the heck is O/U ?

Re: How to get started?

Posted: Wed Jan 08, 2014 11:05 pm
by Sartre
Mike G wrote:What the heck is O/U ?
over/under. If the line is @ 195 I want to predict if the line is to high or to low.

Re: How to get started?

Posted: Wed Jan 08, 2014 11:16 pm
by Sartre
J.E. wrote:
Sartre wrote:Scikit-learn is for understanding how it all works, correct?
No. Scikit-learn is a library that provides you with tools/algorithms that help you do predictive modeling, so you don't have to implement all those algorithms yourself
I want to predict o/u primarily; I just can´t get my head around how this all works. In my mind I put together a core model (when I learn that) and then feed that model different filters/variables that I think is important. But how is that done? I mean, these are abstract thoughts I have, how do I convert them into numbers? In the end, I want to see a score that is more real than the line on offer 8-)
You could probably make a new thread for predicting O/U and then everybody can chime and and give you some ideas on how to tackle such a problem. If you wanted to start out in Python, you could write a small script that downloads a page that lists all scores so far this season (like here http://www.basketball-reference.com/lea ... games.html, probably using urllib) and then try to extract the the useful information from the html with a parser (you can write a small parser yourself, or try BeautifulSoup)

Obviously, look for Python tutorials.

Stackoverflow and Cross Validated are also very good resources

That is such a perfect answer, thank you so much! I downloaded Python but couldn't even get it started, lol. I`m a computer retard.
I`m looking over tutorials at this moment. But I´m tired and I live in Sweden so I better get some sleep or else I`ll confuse things even more...

I can`t write a script, I can barely write a nice google search ;)
Thats why I keep asking myself why I`m so intrigued by this...

The more I read the more questions come up...;)

I`ll be back soon with a ton of questions as I learn more, thanks for all your help!
/S