A Comprehensive Comparison of NoSQL and Relational Database Management Systems- Part 1
The intent of this series of articles is to perform a technical comparison of relational and NoSQL database offerings. Unlike many articles that you’ll read, it is not intended to sell readers on a particular vendor offering, nor will it favor one architecture over another.
The supposition that relational and NoSQL offerings are competitors, in that they are viable replacements for each other for most implementations, is erroneous. We can also easily assume that, as with most data store technologies, there will be varying degrees of overlap in some areas of application.
The era of the one size fits all database has been over for some time. In actuality, there never was a perfect solution that encompassed all types of data that needed to be stored. IT organizations have realized for some time that not all of the data they are required to store, process and present to their end user communities neatly fit into relational rows and columns. Document, graph, key-value, wide column and other non-structured data storage technologies became viable, competitive offerings because they offer a solution to the business community’s need to store semi and non-structured data. This rapidly expanding group of product offerings allows IT consumers to custom tailor a database architecture that meets each application’s unique storage and processing requirements.
In order to perform a thorough evaluation of relational and NoSQL offerings, we cannot constrain ourselves to analyzing database products alone; we must also examine the entire database “ecosystem.” For the intent of this article series, I define the database ecosystem as the server hardware, storage architecture, operating system, database and edge technologies and products.
Choosing the correct database ecosystem is critical to the success of any database-driven application that is required to store, process and present data. This decision was simple when the number of alternatives available was limited. With the seemingly endless array of ecosystems now available, that choice is no longer as clear cut. Database pros now have more choices available to them than ever before. In order to thoroughly evaluate NoSQL and relational technologies, DBAs must analyze and compare database ecosystems and not the individual database vendor offerings.
Upcoming articles of this series will focus on the following topics and their corresponding comparison criterion:
Relational and NoSQL History
In order to compare relational and NoSQL architectures, we must begin with their historical roots. The focus of this section will be on the business drivers that led to each architecture being readily adopted by the IT community, as opposed to an in-depth historical essay on each model’s technical genesis. Describing why the products became popular provides us with the opportunity to perform a high-level analysis of the relational and NoSQL models and architectures.
The launching point for relational database management systems was Edgar F, Codd’s landmark whitepaper titled A Relational Model of Data for Large Shared Data Banks.[i] Published in the early 1970’s, Codd’s work created the foundation for a relational model backed by mathematical theory. We’ll begin our brief historical overview by analyzing the causal factors that led to the IT community’s widespread adoption of products that adhered to (in many cases, varying degrees) the relational model.
NoSQL’s beginnings are subject to debate. The argument can be made that IBM’s hierarchical offering, IMS, IDMS/CODASYL network databases and several other systems that predate relational products could loosely be defined as NoSQL. For the sake of this discussion, we’ll focus our analysis on the inception of the more nascent offerings that include MongoDB, Redis, Oracle NoSQL and the numerous Amazon projects.
https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf
Vendors
The “big three” relational database vendors have been integral parts of the IT product landscape for decades. IBM’s DB2, Oracle and Microsoft’s SQL Server products can arguably be defined as the relational standards by which other offerings are compared. Their market penetration has led to the relational database becoming the heart of the modern IT infrastructure.
Throughout the course of this analysis, I will often refer to these three vendors as examples of the relational architecture. This is based on market popularity and historical lineage; it is not because of any strict adherence to the relational model or personal preference.
Various, and sometimes debatable, approaches have been attempted to establish a taxonomy that defines the NoSQL classification. But, the more challenging task we face with our NoSQL analysis is that the current competitive landscape is both dynamic and fluid. There are numerous products that are all equally viable alternatives that can we can use as subjects for our evaluation. In addition, each of the NoSQL “types” provide fairly dramatic differences in the data structures used to store semi and non-structured data. To further complicate our evaluation, the numerous vendor offerings in each type provide features intended to differentiate their products from competitors and increase market share. For example, while most NoSQL vendor offerings do not offer true support for ACID transactions, there are some products that can be classified as NoSQL that do.
In order to simplify our comparison and to keep our analytical data to a manageable level, like their relational counterparts, we will use market popularity and historical lineage to select the products used. We will use Cassandra for wide column data stores (although it could be described as a key-value/columnar hybrid), MongoDB for document and Redis for key-value. In certain sections, we will also refer to Neo4j when we discuss graph database architectures. These selected products do not imply any personal preference by the author.
Data Structure Definitions (Schemas)
In this section, we take an in-depth look at relational, wide column, document, key-value and graph data structures. NoSQL types vary greatly in the mechanisms they deploy to store data. Schemaless is a popular term often used to characterize the NoSQL structure definitions used to describe stored data. A more accurate comparative terminology to use would be dynamic (NoSQL) versus rigid (relational) schemas.
In reality, for NoSQL models, the schema definition could also be described as implicitly defined (versus explicitly defined in the relational mode), in that they are shared between the database management system and the application code. Any application code that retrieves and manipulates the data requires that it make some assumptions on the how the data is defined.
Data Access
The acronym CRUD stands for create, read, update and delete. They are the operations used to retrieve and manipulate data stored in database management systems. Relational databases use an extremely powerful, industry-standard Structured Query Language (SQL) to perform data interaction operations. Their NoSQL counterparts provide simpler APIs and languages to store and access data. Each of the NoSQL product offerings either have their own or preferred languages that provide CRUD capabilities. We’ll compare SQL CRUD operations with their counterpart languages provided by Cassandra, MongoDB and Redis.
Our discussion will include data manipulation as well as advanced query constructs that include joins, or lack thereof, complex queries and aggregation.
Transaction Management – ACID, BASE and Brewer’s CAP Theorem
Transaction management is a key differentiator between relational and NoSQL Architectures. Their consistency models are dramatically different, with relational offerings adhering to ACID transaction principles which provide rather strict requirements. Their NoSQL counterparts, depending on their type and individual product features, usually adhere to a more loose consistency model called BASE. BASE is a set of transactional properties that trade the benefit of immediate consistency for data availability, scalability and resilience. When we discuss distributing data among multiple computing systems to provide horizontal scalability, we must also include an analysis of Brewer’s CAP theorem. The CAP theorem is a set of basic principles that govern data consistency and availability properties for networked shared data systems.
Lock Management
Databases, by their vary nature, are highly concurrent systems. All DBMS products use their own, unique, methods to prevent destructive interactions between competing units of work from occurring. A delicate balance must be maintained between data consistency, integrity and application throughput. Although an in-depth analysis of locking methods could easily result in its own series of articles, we’ll review the various locking strategies and mechanisms provided by several of our selected vendors. During the course of our examination, we’ll be delving into some fairly deep topics that include pessimistic and optimistic locking, isolation levels, lock granularity and duration as well as lock escalation.
Security
Database administrators, from the very inception of their job descriptions, have been responsible for the protection of their organization’s most sensitive database assets. They are tasked with ensuring that key data stores are safeguarded against any type of unauthorized data access. Security teams spend countless hours building, maintaining and monitoring a multitude of perimeter protection mechanisms. Once those safeguards are breached, the last line of defense for database environments will be the protections the database administrator has put in place.
This is serious business, and we are up against some extremely proficient opponents. Hackers can be classified as intelligent, inquisitive, patient, thorough, driven and, more often than not, successful. When they target your systems, you will need every tool at your disposal to prevent their unwarranted intrusions.
In this installment of the series, we will compare the various security features provided by both relational and NoSQL product vendors. It is beyond the scope of this article to provide a detailed comparison of the individual vendor offerings. Instead, we will perform a high-level overview of the basic and advanced security features offered.
Performance
One of the most debated comparison points between the two architectures is performance. As stated during my introduction, each architecture has its own strengths and weaknesses and, as a result, has a home in the modern IT infrastructure. The most animated discussions occur when both architectures possess the ability to store the same type of data.
In this section, we will learn how the performance of each architecture is highly dependent upon the type (structured, semi-structured, non-structured) of data being stored, the volume of data and the operations required to store, process and present data to the end user community.
High Availability
The goal of any highly available architecture is rather simple – minimize application downtime. The strategies and mechanisms used to achieve desired availability goals vary greatly in both cost and complexity. In reality, decisions regarding highly available architecture deployments are almost always economic. How much availability is your organization willing to pay for? NoSQL vendor marketing material focuses heavily on the use of low-cost, commodity hardware to provide data redundancy and horizontal scalability. In order to perform an effective evaluation, we must consider not only the highly available feature sets provided by our subject vendors, but we must also analyze the costs associated with their implementation.
Backup and Recovery
Recovering a database management system can be a wonderfully complex task. Data files, log files, journal files, user errors, catastrophic outages and application/system bugs all combine to make many administrators lie awake nights thinking about whether their databases can be easily recovered (or not).
It is self-evident to all database vendors that robust backup and recovery mechanisms are critical features that differentiate their products from their competitors. The big three relational database heavyweights, Oracle, Microsoft and IBM, have been offering relational database management systems for decades. During this time, each new release of their flagship products often contained new recovery features as well as improvements to existing backup and recovery components.
How do the more nascent NoSQL database products stack up against their relational counterparts? In this section, we will evaluate the backup and recovery mechanisms utilized by the NoSQL vendors and compare them to their more mature relational competitors.
Vendor Support
Many of the NoSQL offerings were originally conceived, or continue to be, open source offerings. One of the reasons why organizations often shy away from open source offerings is the lack of commercial vendor support. There is no stable, mature organization that they can rely upon for product support, patches and upgrades. They feel uncomfortable implementing critical or mission-critical applications that are “crowd supported.” A new class of NoSQL service providers have stepped in over the last few years to fill this gap and provide 24×7 support and/or totally hosted environments.
We’ll review the different support options provided by the NoSQL vendors and compare them to their more mature relational counterparts.
Edge Products and Technologies
In order to effectively evaluate relational and NoSQL architectures, we must also examine products, languages and technologies that are commonly used in conjunction with their deployment. All database-driven applications are highly dependent upon all components of their technology stack. In this discussion, we will focus on the NoSQL and relational technology stack as a single entity as opposed to individual products. The database, although a key component of the technology stack, works in unison with other members to provide the desired application functionality.
Ecosystem Costs
As information technology specialists, it is our responsibility to provide the greatest functionality at the lowest cost to our customers. Hardware, database, OS and application product licensing and maintenance contracts command a premium price in the marketplace. NoSQL vendors tout their offerings as low cost alternatives to their relational competitors.
A common marketing theme for the NoSQL vendors is their product’s ability to store and process large volumes of data at attractive price points. Their architectures are able to leverage low cost, commodity hardware to provide horizontal scaling that allows users to cost effectively add computing power as the data grows. This strategy is sound as an ever-increasing amount of data needs to be stored, analyzed and presented to the end user community. Data volumes that were unheard of a year ago are now commonplace. Day-to-day operational systems are now storing such large amounts of data that they rival data warehouses in disk storage and administrative complexity.
We’ll evaluate both data stores in terms of ecosystems costs. As we discussed earlier, we cannot limit our analysis to database products alone, we must examine the entire database “ecosystem” which includes the server hardware, storage architecture, operating system, database and edge technologies and products.
Application Fit
We’ll conclude this series of articles with a discussion on application fit. What criterion do we choose for a given application’s data storage, processing and presentation requirements that will ensure we select the best DBMS product? We have relational and NoSQL models, and for NoSQL several types of storage alternatives to choose from.
Thanks for reading! Next time, I will focus on relational and NoSQL architecture history and how each became adopted by the IT community.
