Speeding Up Streaming from MongoDB To RDBMS Through MongoSluice’s “Commit Size”

by | Dec 21, 2018 | Case Studies

The Goal: Have Options To Improve Speed

MongoSluice is great at accurately Sluicing through complex data, but it is important to have a tool that is also built for speed.  MongoSluice meets the best of both worlds with its optional feature –Commit Size — which allows you to take control of the amount of data persisted in memory before pushing to SQL.  It can be found in the docs here.  Here is a test that we did to showcase the performance increase by using Commit Size.

The Hardware and Software

Here is the specs of our hardware running as separate Digital Ocean Droplets:

  • MongoDB Droplet: Ubuntu 4.0.2, 16 GB Memory; 6 vCPUs; and 320 GB of disk space
  • MySQL Droplet: Ubuntu 4.0.2; 4 GB Memory; 2 vCPUs; and 80 GB of disk space
  • MongoSluice Droplet: Ubuntu 4.0.2; 4 GB Memory; 2 vCPUs; and 80 GB of disk space

The Test

Recently, we ran Yelp Business and blogged about it here.  We documented that the data within this 140 MB JSON dataset took 22 minutes to stream to MySQL.  This is great, but we thought that we could improve by increasing its default commit size from 1,000 to 10,000.

The Result

We found that the overall streaming time when testing this dataset with the added parameter was 17 minutes, a 23% efficiency gain from the previous test!

 

About MongoSluice

MongoSluice is the most complete solution for leveraging your data in MongoDB in BI application and other RDBMS systems.

Guarantee

We guarantee satisfaction.
Zero hassles.