MongoDB Clickstream Data ‘Sluiced to SQL

by | Nov 3, 2014 | Case Studies

The Source: Consumer Clickstream Data

Clickstream data from a large consumer, content-based (ad-supported) website was languishing — taking a few days to get the appropriate massaging to make it work in the corporate BI platform. That’s costly — and there’s lost opportunity when you can’t understand what’s happening in real time.

The MongoDB Data

Sample documents:

  • { “_id” : ObjectId(“53baf7e9e4b013efe07d602e”), “_class” : “com.corp.cp.domain.UserTactic”, “active” : true, “assigned” : { “date” : ISODate(“2014-07-07T19:41:29.772Z”), “rule” : { “_id” : 1279, “stageRule” : “allOf{\r\ncondition {\r\narticle{ \r\na.appname=[‘viewarticle’]\r\na.url=’/viewarticle/805620’\r\na.times=’x<2’\r\n}}}\r\nactions{assign tacticId:11825}\r\n”, “liveRule” : “allOf{\r\ncondition {\r\narticle{ \r\na.appname=[‘viewarticle’]\r\na.url=’/viewarticle/805620’\r\na.times=’x<2’\r\n}}}\r\nactions{assign tacticId:11825}\r\n”, “type” : “onLogin”, “ruleFileName” : “without-campaign.rule”, “name” : “”, “packageId” : 0 }, “activityId” : 10190 }, “conditions” : { “tacticid” : 11825, “tacticnotes” : “ilazarev222”, “tacticruleid” : 1279 }, “date” : ISODate(“2014-07-07T19:41:29.772Z”), “emailProcessed” : 2, “jobId” : 5, “tacticId” : 11825, “uid” : “123456789” }
  • { “_id” : ObjectId(“53bc3d96e4b042a82a0aaf9d”), “_class” : “com.corp.cp.domain.UserTactic”, “active” : true, “assigned” : { “date” : ISODate(“2014-07-08T18:51:02.755Z”), “rule” : { “_id” : 1242, “stageRule” : “allOf\r\n{condition{p.professionId==10}\r\ncondition{p.specialtyId==39}\r\n}\r\nactions{assign tacticId:11703}”, “liveRule” : “allOf\r\n{condition{p.professionId==10}\r\ncondition{p.specialtyId==39}\r\n}\r\nactions{assign tacticId:11703}”, “type” : “onLogin”, “ruleFileName” : “Rulefile2.txt”, “name” : “11703_1”, “packageId” : 514 }, “activityId” : 11702 }, “conditions” : { “tacticid” : 11703, “tacticnotes” : “Login tactic1_Qa02”, “tacticruleid” : 1242 }, “date” : ISODate(“2014-07-08T18:51:02.751Z”), “emailProcessed” : 2, “jobId” : 5, “tacticId” : 11703, “uid” }
  • { “_id” : ObjectId(“53bc3d97e4b042a82a0aaf9f”), “_class” : “com.corp.cp.domain.UserTactic”, “active” : true, “assigned” : { “date” : ISODate(“2014-07-08T18:51:03.381Z”), “activityId” : 11667 }, “conditions” : { “tacticid” : 11763, “tacticnotes” : “test fix2”, “tacticruleid” : 1273 }, “date” : ISODate(“2014-07-08T18:51:03.381Z”), “emailProcessed” : 2, “jobId” : 1, “tacticId” : 11673, “uid” : “123456789” }

MongoSluice

  • MongoSluicing something involves two basic processes:  1) the interrogation of the data by which the schema is generated and 2) the streaming of all the data into SQL (MongoDB, Java Runtime and SQL). This MongoDB was chosen for complexity not volume — but if you’re curious it just took a few seconds to run.

The Results: Tidy Set of SQL Tables Ready for Analysis

  • MongoSluice created 7 interrelated tables — a complex structure that was latent inside MongoDB. With this data now available to the company’s BI team, they can readily crunch and analyze using the tools and skills that they have.

MongoSluice-table-diagram-schema-2