Benchmarking MongoSluice: Streaming Yelp’s Business Data to MySQL

by | Dec 17, 2018 | Case Studies

The Goal

MongoSluice’s ultimate goal is to accurately portray NoSQL data from MongoDB in SQL format for simple analysis.  In order to generate a perfect representation of the data, every single document within a collection needs to be checked in order to be 100% sure that all the correct fields are generated.  It is up to developers and data analysts to decide which data is relevant.

With this high degree of accuracy, speed is not the first priority.  However, given the complexity of its tasks, MongoSluice performs rather well.

The Data

In order to test MongoSluice’s speed, we used a 140 MB JSON dataset provided by Yelp called yelp_dataset_business.json that consisted of 188,593 documents in MongoDB. 

The Hardware and Software

Here is the specs of our hardware running as separate Digital Ocean Droplets:

  • The MongoDB Droplet: Ubuntu 4.0.2 with 16 GB Memory; 6 vCPUs; and 320 GB of disk space
  • The MySQL Droplet: Ubuntu 4.0.2; 4 GB Memory; 2 vCPUs; and 80 GB of disk space
  • The MongoSluice Droplet: Ubuntu 4.0.2; 4 GB Memory; 2 vCPUs; and 80 GB of disk space

Processing Time

Here is the time that MongoSluice took to process the data:

  • Total Time: 39 minutes
    • Generating schema: 17 minutes
    • Streaming data: 22 minutes

 End Result

Here is a look at the schema in MySQL workbench:

 

About MongoSluice

MongoSluice is the most complete solution for leveraging your data in MongoDB in BI application and other RDBMS systems.

Guarantee

We guarantee satisfaction.
Zero hassles.