Friday, March 18, 2011

2011 MLB Schedule In Text Ascii Format


I began looking for an online MLB schedule in text ascii format. I couldn't find one anywhere, so I decided to make my own. After scraping the schedules off of mlb.com and writing a program to parse the data, I have come up with my own text ascii (csv) 2011 MLB Schedule.

The format is the following:

Game Number,Year,Month,Day,Away,Home
2431,2011,9,28,LAN,ARI

The game number is just a counter from 1 to 2431 of all the scheduled games. It is easy enough to change or ignore the game counter if need be, but I use it. I have uploaded this text ascii csv file as a shared Google document, one that anyone with the link can view or download.

Note: Second version (3/19) of file is currently loaded as first version listed FLA for all Seattle games.

6 comments:

mattmaison said...

Good stuff. Not sure what I could do with it, but pretty cool to have if needed. Thanks for sharing!

Dude Can't Draw said...

AWESOME!

I'm writing some code to do some stat mining (I have too much time on my hands) and this makes my life SO MUCH EASIER!

Xeifrank said...

I have some updates with the rain outs and made up games etc... Let me know if you need it or can make the changes yourself.

Dude Can't Draw said...

I can deal with those. I kinda have to do some manual checking anyway because I have to match it up to url data on mlb.com (e.g., the Dodger game that was suspended and resumed the next day is still listed under the original date)

Dude Can't Draw said...

Incidentally, my whole exercise was because it seemed to me that the Dodgers were making a habit of immediately giving runs back in the next half inning after they'd scored themselves.

The data seems to support that observation. They have the highest rate in that category in the NL (after 35% of the innings in which they score, they give runs back the very next half inning - ignoring any Dodger innings which were the last at-bat of the game). Compared to a noticeably much lower (28%) rate following innings in which they didn't score.

The things I keep myself up at night thinking about...

Dave Tufte said...

Very cool. I'm doing a programming exercise, and was looking for a baseball schedule (from any season) in TXT/CSV format.