Speeding Up Streaming from MongoDB To RDBMS Through MongoSluice’s “Commit Size”
The Goal: Have Options To Improve Speed
MongoSluice is great at accurately Sluicing through complex data, but it is important to have a tool that is also built for speed. MongoSluice meets the best of both worlds with its optional feature –Commit Size — which allows you to take control of the amount of data persisted in memory before pushing to SQL. It can be found in the docs here. Here is a test that we did to showcase the performance increase by using Commit Size.
The Hardware and Software
Here is the specs of our hardware running as separate Digital Ocean Droplets:
- MongoDB Droplet: Ubuntu 4.0.2, 16 GB Memory; 6 vCPUs; and 320 GB of disk space
- MySQL Droplet: Ubuntu 4.0.2; 4 GB Memory; 2 vCPUs; and 80 GB of disk space
- MongoSluice Droplet: Ubuntu 4.0.2; 4 GB Memory; 2 vCPUs; and 80 GB of disk space
Recently, we ran Yelp Business and blogged about it here. We documented that the data within this 140 MB JSON dataset took 22 minutes to stream to MySQL. This is great, but we thought that we could improve by increasing its default commit size from 1,000 to 10,000.
We found that the overall streaming time when testing this dataset with the added parameter was 17 minutes, a 23% efficiency gain from the previous test!
A convenient feature of MongoSluice is its ability to quickly sync changed or new data without doing any additional work such as investigating a schema…
The Problem: How To Migrate MongoDB When A Field Has A Few Different Data Types. There are a couple tools out there that try to…
The Goal: MongoSluice’s power feature is that it can accurately convert data from MongoDB (BSON) to tables — rows and columns — without any…