NoSQL databases have no longer become an unknown database paradigm, since it has been almost a decade or two after its introduction in the market. Some of the most popular NoSQL databases include MongoDB, CouchDB, Voldemort, Cassandra etc. The use of these databases in the industry, mainly due to their elastic nature and simplified application development, is spreading like a viral disease these days. Everyone tries to use them in their applications, most probably because of their open source nature. But the question arises “Is every NoSQL databases fit for every kind of purpose?” Well, certainly not, partially because all of these databases have different features and schema definitions and partially because there is no single proper way to compare performance of these databases. “So how someone can compare what kind of NoSQL is suitable for their workload?” This is definitely a question to ponder upon.
Some NoSQL databases have their own benchmark tools such as the ones proposed by MarkLogic and Riak etc. One of such benchmarking tool is the open source YCSB (Yahoo Cloud Serving Benchmark) which is initially designed for testing the performance of Yahoo’s PNUTS (a parallel and geographically distributed database system). The main reason of developing YCSB as defined by Brian F. Cooper, the mastermind behind YCSB, is “The purpose of using the Yahoo! Cloud Serving Benchmark (YCSB) is to develop a framework and common set of workloads for evaluating the performance of different databases”. YCSB is mainly designed for evaluating the performance and scalability of NoSQL and “Cloud based” data stores. These are the two main tiers used for the evaluation of NoSQL databases though YCSB Benchmark. On one hand, the performance tier (tier 1) calculates the throughput and latency of NoSQL databases after increasing the server load, and on the other, the scalability tier (tier 2) measures the scalability of databases by increasing the number of servers and monitoring their performance. However, the configuration of YCSB is quite simple and consists of only two main parts:
- The workload generating client – used for generating load and making decisions about which operation to perform, what record to insert of delete etc.
- Workload packages – standard and custom defined packages for defining read/write mix operations, request distribution and record size etc.
Once you are done with these configurations and minor tweaking (in case this benchmark is not supporting your database), you can easily generate different comparison reports to compare performance and scalability of various NoSQL databases. A few of the NoSQL database comparisons benchmarked by YCSB can be found in  and . Furthermore, YCSB is also planning to release two more evaluation tiers of replication and availability in future which will definitely the increase the effectiveness of using this tool.
Here are some links which might interests you:
- Setting Up YCSB for performance evaluation of Cassandra
- Running a Workload in YCSB (The official documentation)
- Various NoSQL benchmarks
The article is written by Miss Anam Zahid.