Benchmarking MongoSluice: Streaming Yelp’s User Data to MySQL
MongoSluice’s power feature is that it can accurately convert data from MongoDB (BSON) to tables — rows and columns — without any manual labor. In order to generate a perfect representation of the data, every single document within a collection needs to be checked. MongoSluice does that. Once the data is moved over it’s up to you as to what comes next!
In order to benchmark MongoSluice we used a 2 GB JSON dataset provided by Yelp called yelp_dataset_users.json that consisted of 1,518,169 documents.
The Hardware and Software
Here is the specs of our hardware running as separate Digital Ocean Droplets:
- The MongoDB Droplet: Ubuntu 4.0.2 with 16 GB Memory; 6 vCPUs; and 320 GB of disk space
- The MySQL Droplet: Ubuntu 4.0.2; 4 GB Memory; 2 vCPUs; and 80 GB of disk space
- The MongoSluice Droplet: Ubuntu 4.0.2; 4 GB Memory; 2 vCPUs; and 80 GB of disk space
Here is the time that MongoSluice took to process the — moving it from MongoDB to MySQL.
- Total Time: 158 minute
- Generating schema: 85 minutes
- Streaming data from MongoDB to MySQL: 73 minutes
Here is a look at the schema in MySQL workbench:
A convenient feature of MongoSluice is its ability to quickly sync changed or new data without doing any additional work such as investigating a schema…read more
MongoSluice is great at accurately Sluicing through complex data, but it is important to have a tool that is also built for speed. MongoSluice meets…read more
The Problem: How To Migrate MongoDB When A Field Has A Few Different Data Types. There are a couple tools out there that try to…read more