The Importance of Key Value Stores

Key Value Stores are perhaps the most common form of NoSQL and NewSQL databases.  They consist of (surprise!) keys and values and are built from the ground up to store and retrieve these values as fast as possible.  For this reason, a KVS is considered an excellent way to store and retrieve information for high-traffic web sites and other high-performance content, but not the greatest for transaction-driven projects.   According to DB-Engines, key value stores are one of the more popular non-RDBMS databases in use.

Structurally, KVS are the most straightforward of the NoSQL databases and this basic underlying factor accounts largely for why they are so mind-bogglingly fast.  The beauty of a KVS is its simplicity.  Instead of worrying about complex schema and data relationships (as with a traditional RDBMS), a KVS just has to store and retrieve values linked to a key.  The most commonly implemented KVSs include Redis, Riak, and VoldemortNEC’s IERS is built on top KVS with many added enhancements.

It’s easier to understand a KVS if you first look at a traditional RDBMS.  Think of this as a structured and table-based database.  For example, if you’re working with employee data, you’d have a table with columns for each field you wanted to track and a row for each user.  It would look something like this:

ID First Name Last Name
1 Homer Simpson
2 Marge Bouvier
3 Herschel Krustofsky

The table approach works well if you have a reasonable number, a few dozen to a few thousand, of people to track.  It also works well if you can do your queries off-line where speed isn’t an issue, and can do your batch processing for reporting at off hours because those reports will take a considerable amount of time.

However, in the big data world we don’t have the luxury of running queries and reports during off-hours.  Whatever it is, in the big data world we need it now.  Not only that, the traditional table shown above may become a big management mess when it’s too big to fit on a single server. Taking the example to a KVS, imagine that you’ve got users instead of employees.  Now you’re talking about millions of records instead of thousands, and they need to be available quickly from around the world 24/7.  When a user logs in, he wants to be able to have instant access to his account.  Plus, not every user record has every bit of information as every other record; some users may provide their phone numbers, some may not.  Each record potentially has a different length and different values.

To store and retrieve this kind of data quickly, you generate a key for each record and then store whatever fields (what would have been columns in the table above) are available.  Each field is comprised of a data name and the data itself.  If you don’t have a particular piece of data, instead of leaving an empty cell in a table you simply don’t store the data name / data combination.

Let’s take a look:

Key: 1 ID: HS First Name: Homer

 

Key: 2 Email: mbouvier@gmail.com City: Springfield Age: 34

 

Key: 3 Twitter ID: @hkrustofsky First Name: Herschel Occupation: Clown

As you can see, users can log in using ID, email, or Twitter ID.  This simply wouldn’t have been possible using a traditional table style RDBMS.  Also, queries need to be built around keys because there are no field (or column) names.  There’s no need to pull data from multiple tables, reformat it and import it into another table just so users with different information stored can log in.

NEC’s IERS takes advantage of the straightforward nature of a KVS.  I blogged about this a few weeks ago when I posted coverage of my interview with Atsushi Kitazawa, the “father” of IERS.  Due to the nature of a storing multiple values associated with a unique key, distributed KVS performance is predictable.  A KVS is usually partitioned to run on multiple nodes.  Because each key is unique, all values associated with a key, regardless of where the values are physically located, are equally accessible.

So there you have it, an explanation of KVS’s and how they work.  While a KVS forms the foundation of NEC’s IERS, there are plenty of enhancements that take IERS above and beyond what the average KVS is capable of.  In particular, IERS provides a high-performance and consistent environment with transparent scaling for transactions.  My next posting will discuss these advantages and how to make the best use of them when developing for IERS.

To learn more about NEC’s IER’S solution visit:  http://goo.gl/TnFkbR

Matt Sarrel *Matt Sarrel is a leading tech analyst and writer providing guest content for NEC.