There are more than a few ways of calculating park factors. The simplest system of all and the one that tells the best story of which park played as a hitter or pitchers park based on the empirical data (actual results) is the one where you simply divide runs per game at home scored by both teams by runs per game on the road scored by both teams. ESPN does a great job of providing this data for previous seasons.
One problem with these year to year park factors (for runs scored) is that there is a ton of noise (variance) from year to year. It is just very difficult to pin down what the true park factor should be for each park. Some people like to take the previous two or three seasons and weight the more recent seasons heavier to come up with a number. This is actually a safe way of doing it and one I usually prefer.
When it comes to betting on baseball run totals (over/unders) one needs a really good idea on what a stadiums' true park factor is. From this base park factor number you can adjust up or down based off of weather or wind conditions if you like, but you need a good park factor number for each stadium first. The Vegas sportsbooks obviously have their own numbers and if they don't you can easily reverse engineer the numbers that they used over the course of the season for each park. All you need to do is take all of their run total numbers and adjust for juice to come up with an over/under number for each game. Let's say you calculate that number as 7.25 runs scored. You do this for all games and use this 7.25 (calculated number) as a substitute for the actual number of runs that were scored in that game and calculate each teams' park factor based off of this calculated number instead of the actual total number of runs scored. In doing so, you can get a glimpse into what Vegas used as park factors for each team and then compare their park factors with the actual empirical number. I calculate these Vegas park factors as the season progresses as kind of a sanity check against the park factors that I use in my day to day baseball game simulations.
Below is a look at each teams' Vegas park factor and Actual 2013 park factor and the difference between the two sorted by parks that Vegas had the run environment too low on. Just because Vegas was off on a park factor may or may not mean they were dumb on selecting their park factor for that team as like I said above there is quite a bit of noise involved here. But it would've obviously made for some good betting opportunities.
|Team||Vegas PF||Actual 2013 PF||2013 Delta|
As a further exercise I computed the RMSE for the Vegas 2013 park factors against the actual park factors for the 2013, 2012 and 2011 seasons for the fun of it.
The RMSE (sum of the squares of the 32 park factor errors)... were.....
2013 = 0.1429
2012 = 0.4396
2011 = 0.2423
(these numbers are pre-square root)
You would expect to see the 2013 number be the lowest as that is what Vegas was predicting against. The 2012 park factors had a lot of noise as there were a few crazy outliers bringing the error total up. The 2011 park factors did pretty well, but about where you would expect it.