Sluicing 30 Years of NBA Data

by | Jan 14, 2015 | Case Studies

The Source: NBA Game Stats Since 1985-1986

We are always on the look out for complex (nasty!) datasets to sharpen MongoSluice’s teeth.  Valeri Karpov (aka The Code Barbarian) created one that suits us well!. It’s data from NBA games over 30 years that was scraped from the web. You can read about his quest for NBA analytics utopia — and the MongoDB aggregation framework — on his blog. But here’s a little case study on how to make a NoSQL mess into tidy tables using MongoSluice.

* You can download the data set here. The data set is property of Sports Reference, LLC, and may only be used for education and evaluation under clause #1 of their terms of use. If you have not read or do not agree to Sports Reference, LLC’s terms of use, please do not download the data set.

The Data: A Ton of Scraped Non-Relational NBA Game Data

Here is a sample document — notice the embedded arrays? Ugh.

  •  [{“players” : [{“ast” : 9,”blk” : 1,”drb” : 9,”fg” : 3,”fg3″ : 0,”fg3_pct” : “”,”fg3a” : 0,”fg_pct” : “.750″,”fga” : 4,”ft” : 8,”ft_pct” : “.727″,”fta” : 11,”mp” : “44 : 00″,”orb” : 4,”pf” : 4,”player” : “JeffRuland”,”pts” : 14,”stl” : 1,”tov” : 4,”trb” : 13},{“ast” : 1,”blk” : 1,”drb” : 5,”fg” : 7,”fg3″ : 0,”fg3_pct” : “”,”fg3a” : 0,”fg_pct” : “.700″,”fga” : 10,”ft” : 3,”ft_pct” : “1.000”,”fta” : 3,”mp” : “34 : 00″,”orb” : 1,”pf” : 3,”player” : “DanRoundfield”,”pts” : 17,”stl” : 0,”tov” : 3,”trb” : 6},{“ast” : 7,”blk” : 0,”drb” : 6,”fg” : 8,”fg3″ : 0,”fg3_pct” : “.000″,”fg3a” : 2,”fg_pct” : “.615″,”fga” : 13,”ft” : 1,”ft_pct” : “.167″,”fta” : 6,”mp” : “31 : 00″,”orb” : 0,”pf” : 2,”player” : “GusWilliams”,”pts” : 17,”stl” : 2,”tov” : 4,”trb” : 6},{“ast” : 3,”blk” : 0,”drb” : 4,”fg” : 6,”fg3″ : 0,”fg3_pct” : “.000″,”fg3a” : 1,”fg_pct” : “.400″,”fga” : 15,”ft” : 2,”ft_pct” : “1.000”,”fta” : 2,”mp” : “24 : 00″,”orb” : 0,”pf” : 2,”player” : “JeffMalone”,”pts” : 14,”stl” : 0,”tov” : 1,”trb” : 4},{“ast” : 0,”blk” : 2,”drb” : 3,”fg” : 2,”fg3″ : 0,”fg3_pct” : “”,”fg3a” : 0,”fg_pct” : “.400″,”fga” : 5,”ft” : 0,”ft_pct” : “”,”fta” : 0,”mp” : “24 : 00″,”orb” : 0,”pf” : 1,”player” : “CharlesJones”,”pts” : 4,”stl” : 0,”tov” : 1,”trb” : 3}

The Transformation: MongoSluice Converts NoSQL to SQL

  • MongoSluice jumped into this batch of non-relational data and had its way with it. It interrogated every document and then generated a schema — in memory. It then streamed all of the data out of MongoDB into SQL — delivering nice and tidy, related tables.  It took 5 minutes and 10 seconds. 

The Results: NoSQL as SQL – Easy To Understand, Ready for Analysis

  • The schema and tables are below.

nba_mysql_schema

MongoSluice to Participate in Phorum Philly 2015

MongoSluice to Participate in Phorum Philly 2015

We are proud to announce that our incubating company MongoSluice, will be included in this year’s Phorum Philly event. MongoSluice streams MongoDB to any RDBMS. Simply point MongoSluice to a MongoDB collection and any RDBMS data store and hit enter: watch data stream...

read more