IERS Combines the Best of NewSQL and Key Value Stores

In my last blog post I provided a general introduction to key value stores (KVS). In this post I’m going to explain how InfoFrame Elastic Relational Store (IERS) takes the basic concepts of the KVS and improves upon them to build a database with strong business oriented features.

The main improvement is that IERS is built to process high-speed transactions with full ACID capabilities. As a quick refresher, ACID stands for Atomicity, Consistency, Isolation, Durability and refers to the set of properties that deliver reliable processing of database transactions. Atomicity preserves transaction integrity by only allowing complete transactions to be committed, not just parts of transactions. Consistency allows for well-defined rules to control and validate the data before it is written to the database. Isolation verifies that concurrent execution of transactions still results in complete transactions being committed; this is where conflict resolution takes places. Durability means that once a transaction has been committed it will remain intact even in the event of power loss, crashes, or other system errors. IERS offers full ACID support and thus meets the requirements for a business environment processing transactions, which many KVS fail to meet.

Most KVS cannot guarantee the constraints that developers need to place on data in order to preserve consistency. Consistency needs to be handled by the application, which pushes this critical function further away from the database engine and makes it more cumbersome to design the application as the application must now include features the database should handle. On the other hand, IERS provides consistency at the database layer, preserving the high performance of KVS.

Whereas most KVS require database-specific code to be written, IERS uses an industry standard SQL interface. Most KVS platforms don’t support SQL, hence the name NoSQL being used as a term to describe them. Over time, NoSQL as a term has expanded to include next-generation databases with a SQL interface by rebranding itself as “Not Only SQL.” Some industry analysts refer to next-generation databases with a SQL interface as NewSQL. However, the most commonly used database programming language is SQL and many business environments already have a significant investment in SQL. For these reasons, SQL support is one of the most important and most used features of IERS. Using another KVS might require developers to learn a custom API, thus delaying the development process. IERS, with its SQL support, allows developers to get up and running as fast as possible.

Database security is also a requirement in a business environment. IERS provides the same user authentication and table level access management as an RDBMS. In contrast, the typical KVS will push this up to the application layer. IERS also offers full support for user activity logging and can be integrated with a solution like IBM Guardium to provide complete audit trails.

IERS also fully supports range queries, a common database operation that retrieves all records where some value is between an upper and lower boundary. For example, list all customers between ages 8 and 18. A typical KVS cannot support a range query. In fact, a typical KVS only supports queries on the key.

As you can see, IERS contains many enhancements that are typically not found in a KVS. When the database layer lacks such functionality, then it must be implemented in the application layer. This requires the application layer to manage transactions, security, data constraints and consistency. It’s much easier to simply use a database that contains this functionality such as IERS. By including the functionality described in this posting, IERS demonstrates that it is more applicable for use in solving business problems than a typical key value store. The most important of these enhancements is full support for ACID transactions; without ACID there cannot be transactions. Businesses evaluating NoSQL and NewSQL key value stores for a high-speed transaction driven environment will find that IERS more than meets their needs.

To learn more about NEC’s IERS solution visit:  http://goo.gl/TnFkbR

Matt Sarrel *Matt Sarrel is a leading tech analyst and writer providing guest content for NEC.

An Interview with Atsushi Kitazawa of NEC Japan, the “Father” of IERS

Everything you wanted to know about IERS, from its position in the world of next-generation databases to its design goals, architecture, and prominent use cases.

I recently got the chance to talk to Atsushi Kitazawa, chief engineer at NEC Corporation, about the company’s new InfoFrame Elastic Relational Store (IERS) database.    I enjoyed the discussion with Kitazawa-san immensely – he has an ability to seamlessly flow from a deep technical point to a higher-level business point that made our talk especially informative.

Matt Sarrel (MS): Where did the idea for IERS come from?

Atsushi Kitazawa (AK): We decided to build IERS on top of NEC’s micro-sharding technology in 2011. The reason is that all of the cloud players see scalability and consistency as major features and we wanted to build a product with both. Google published the Google File System implementation in 2003 and then they published Bigtable (KVS) in 2006. Amazon also published Amazon Dynamo (KVS) in 2007. NEC published our CloudDB vision paper in 2009, which helped us to establish the architecture of a key value store under the database umbrella. In 2011, Facebook published improved performance of Apache Hadoop and Google published the method of transaction processing on top of BigTable called Megastore BigTable. Those players looked at scalability and then consistency. By 2011 they had both.

A KVS is well-suited for building a scalable system. The performance has to be predictable under increasing and changing workloads. At the beginning, all the cloud players were using replication in order to increase performance, but they hit some walls because of the unpredictability of caching. You cannot cache everything. So they moved to a caching and sharding architecture so you can partition data to multiple servers in order to increase caching in memory. And then the problem here is that it is not so easy to shard a database in a consistent manner. This is the problem of deep partitioning. You can see the partitioning or sharding in the beginning—it is not so difficult–but dynamic partitioning and sharding is very difficult. The end goal of many projects was to provide a distributed KVS. The requirement of a KVS is predictability of performance under whatever workload we have.

MS:  Why is a KVS is better? 

AK: The most important thing about a KVS is that we can move part of the data from one node to another in order to balance performance. Typically, the implementation of a KVS relies on small partitions that can be moved between nodes. This is very difficult when you consider all of the nodes included in a relational database or any database for that matter. In a KVS, everything is built on the key value so we can track where data resides.

140624-fig-1

Going back to the evolution of database products, Facebook developed Cassandra on its own because it needed it. It had to move part of the application from Cassandra to HBase but had to improve HBase first. Facebook reported in a paper the reason why it had to use HBase is that it need consistency in order to implement its messaging application. The messaging application, made available in 2011, enabled users to manage a single inbox for various messages including chats and Tweets. This totals 15 billion messages from 350 million members every month and 120 billion chats between 300 million members. Then Facebook wanted to add consistency on top of performance because of the increased number of messages delivered.

On the other hand, Google added a transactional layer on top of its BigTable KVS. It did this for the app engine that is used by many users concurrently. The transactional layer allowed users to write their application code.  Google also developed Caffeine for near-real-time index processing and HRD (High Replication Datastore) for OLTP systems such as AppEngine to use.

Those are the trends that cloud players illustrated when NEC was deciding to enter this market. At NEC we developed our own proprietary database for mainframe moret han 30 years ago. Incidentally, I was on that team. We didn’t extend our reach to Unix or Windows so we didn’t have a database product for those platforms. In 2005, we decided to develop our own in-memory database and made it available in Japan. This is TAM or transactional in-memory database. We added the ability to process more queries by adding a columnar database called DataBooster in 2007. Now we have in-memory databases for transactions and queries. In 2010, we successfully released and deployed the in-memory database for a large Japanese customer. As our North America research team released the CloudDB paper, we merged the technologies together to become IERS.

We felt that if we could develop everything on top of a KVS, then it would be scalable. That is a core concept of IERS.

MS:  What were the design goals of IERS?  Could you describe how those goals are met by the system’s architecture?

AK: Regarding our architecture, the transaction nodes implement intelligent logs with in-memory database to facilitate transaction processing. The difference between IERS and most databases is that IERS is a log system machine. IERS does not have any cache (read, dirty, write) and this means we don’t have to synchronize cache in the usual manner. We just record all the changes to the transactional server in time order fashion and then synchronize the changes in batches to other data pods over IERS, which are database servers. The result is that the KVS only maintains committed changes.

140624-fig-2

We do have a cache, but it is a read-only cache, not the typical database cache. The only data the cache maintains is for reads from the query server. We do not need to be concerned with cache coherency. The transaction server itself is an in-memory database. We record every change on the transaction server and we replicate across at least three nodes. The major difference between IERS and other databases is the method of data propagation. Our technology allows the query server, accessible via SQL, to see a consistent view even though we have separate read and write cache. If you do not care much about consistency, then you can rely on the storage server’s cache. The storage server consists of the data previously transferred from the transaction server. If you consider the consistency between each record or each table, then you should read from the transaction server so that we maintain the entire consistency of the transaction.

The important point in terms of scalability is that both the KVS (storage) server and the transaction work as if they are KVS storage so we can maintain scalability as if the entire database is a KVS even though we have a transactional logging layer.

From a business point of view, there are users who are using a KVS such as Cassandra, which does not support consistency in a transactional manner. We want to see those users to extend their databases by adding another application. If they want a KVS that supports consistent transactions then we can help them. On the other hand, in Japan we see that some of our customers are trying to move their existing applications from RDBMS to a more scalable environment because of a rapid increase in their incoming traffic. In that case, they have their own SQL applications. Rewriting SQL for a KVS is very difficult if it doesn’t support SQL. So we added a SQL layer that allows users to easily migrate existing applications from RDBMS to KVS.

MS: Is there a part of IERS’ functionality or architecture that makes it unique?

AK:  From a customer point of view the difference is that IERS provides complete scalability and consistency. The key is the extent that we support the consistency and SQL to make it easier for customers to run their applications. We added a productivity layer on top of a pure scalable database. We can continue to improve the productivity layer. Typically, people have to compromise productivity to get scalability. Simply pursuing scalability isn’t so difficult. Application database vendors focus on the productivity layer. Then they add scalability. Our direction is different. We first look at scalability. We built a completely scalable database. Then we added the productivity layer – security support, transactional support – without compromising scalability.

MS: What types of projects is IERS well-suited for?

AK: Messaging is one good application. If you want to store each message in transaction fashion (track if it goes out, if it’s read, responded to, etc.) and require scalability, then this is a good application for IERS.

Another case is M2M because it requires scalability and there is usually a dramatic increase over time of the number of devices connected. The customer also has a requirement to maintain each device in transaction fashion. Each device has its own history that must be maintained in a consistent manner.

To learn more about NEC’s IER’S solution visit:  http://goo.gl/TnFkbR

Matt Sarrel *Matt Sarrel is a leading tech analyst and writer providing guest content for NEC.