Is Microsoft Azure Cosmos DB the NoSQL Competition Killer?
Most database pros are aware that the era of the one-size-fits-all database has been over for some time. We realized that not all of the data our shops are required to store, process and present to our end user communities neatly fits into relational rows and columns. The need to store unstructured data in conjunction with the ability to provide almost absurdly high degrees of scalability, data distribution and availability were the business drivers behind the genesis of NoSQL offerings. Many of the NoSQL products were specifically designed to leverage low cost hardware to provide the ability to store semi and non-structured data and provide extremely high horizontal scalability and data redundancy at an affordable price point. Document, graph, key-value, wide column and other non-structured data storage technologies became viable, competitive offerings because they offer a solution to the IT community’s need to store semi and non-structured data.
This rapidly expanding group of product offerings allows IT consumers to custom tailor a database architecture that meets each application’s unique storage and processing requirements. We can also easily predict that the larger competitors would not sit idly by and have their market share stolen by these upstart offerings. It is important to note that we are comparing vendors, not technologies. Those of us who have been tasked with comparing competing market offerings are aware that the vendor with the best technology doesn’t always win the battle for market share or even survive as a viable competitor, for that matter. This especially rings true when those vendors are up against offerings provided by much larger and more entrenched competitors.
Here was my prediction from my Who Will Win the Database Wars? blog series posted in December 2016: Relational Vendors Will Win by Becoming “Multi-Model.” NoSQL Vendors Will Remain as Niche Offerings or Have Their Technologies Coopted by their Larger Competitors.
NoSQL vendors’ larger relational competitors, including Oracle, IBM and Microsoft, will attempt to co-opt any NoSQL technology that challenges their dominant role in the industry. As they identify offerings as tangible threats, their strategy will be to ensure that the technologies used by those vendors become components of, not replacements for, their database products. The key to their continued dominance will be their ability to identify and appropriate those technologies that are destined to become more widely adopted versus those that will continue to be niche offerings. NoSQL vendors that use data storage and access techniques in addition to technologies that are easily integrated into a competitor’s product will be the first to be consumed. These smaller vendors will be required to constantly innovate and integrate new features that differentiate their products from their larger competitors. This constant differentiation will be an absolute requirement, not a guarantee, for their continued ability to survive.
Those vendors that have niche offerings with limited market appeal or have complex processing architectures (Datastax/Cassandra is an example of a fairly complex architecture) will be less attractive to their larger competitors and have the greatest chance of surviving future market consolidations. As the NoSQL model continues to mature, it will become more robust, more intelligent and more standardized. As a result, NoSQL’s adoption rate will continue to grow, as does any technology that possesses these traits. The result will be that NoSQL, as a class of database storage architectures, will increasingly challenge its relational model counterpart. Although the NoSQL architectures will continue to grow in market acceptance, the vendors that initially offered them may not.
Introducing Microsoft Azure Cosmos DB
Microsoft describes Cosmos DB as a “globally distributed, multi-model database.” The vendor is very accurately describing the two main benefits the product provides:
- Users are able to globally distribute their database systems to multiple datacenters across the globe
- The database is both multi-model and multi-API
- Supports document, key-value, graph and columnar data models
- Extensible APIs for Node.js, Java, .Net, NET Core, Python and MongoDB
- Provides SQL and Gremlin as query languages
In addition to the above features, Cosmos DB provides a host of additional benefits. This list is not intended to be all-encompassing; it provides an overview of some of the most interesting and beneficial features that differentiate Cosmos DB from many of its competitors. World-Wide Deployment Architecture and Data Distribution Cosmos DB leverages Microsoft’s world-wide architecture which offers a little over 30 regions to customers. Administrators are able to dynamically add and remove regions, assign failover priorities and utilize logical (region agnostic) or physical (region specific) endpoints which allows clients to leverage multi-homing and fine-grained, region specific read/write control. The interface to create a global database system is intuitive and easy to use. System Performance and Availability SLA Guarantees One of the more interesting benefits offered by Microsoft is their Cosmos DB Service Level Agreements. In addition to the more traditional availability SLAs, here’s a quick laundry list of some of the guarantees offered by Microsoft for Cosmos DB:
- Performance:
- Database account configuration (signup) – 2 Minutes
- Add new region to existing application – 60 Minutes
- Manual Failover – 5 Minutes
- Resource Operations – 5 Seconds
- Latency – <10 ms for Reads and <15 ms for Writes at p99
- Throughput:
- Microsoft calculates a monthly throughput percentage and offers a service credit if it falls below 99.99%.
- Microsoft calculates a monthly throughput percentage and offers a service credit if it falls below 99.99%.
- Consistency:
- Microsoft also guarantees that Cosmos DB will adhere to the consistency level you choose (see 5 consistency levels below). The vendor calculates a monthly consistency attainment percentage. Applications experiencing less than 99.99% receive monthly credits.
Five Consistency Models to Choose From Historically, most DB systems provided two extremes of consistency models: strong consistency and eventual consistency. Developers were forced to evaluate data flexibility, availability, latency, throughput, horizontal scalability and data consistency when selecting the most appropriate architecture for their database-driven application. Microsoft is attempting to make that decision a bit easier by offering developers five different consistency levels in Cosmos DB:
- Strong Consistency – Cosmos DB will guarantee that reads will return the most recent version of a data item. Data written to the database can only be read after it is committed to the majority quorum of replicas. Readers will never see uncommitted or partial data. The tradeoff is that applications leveraging strong consistency are unable to leverage multiple Azure regions to provide geographic data distribution. In addition, the cost in latency is higher than the other consistency models, but the data is guaranteed to be consistent.
- Bounded Staleness – Reads can lag behind writes by a user specified number of versions or time interval. Cosmos DB will guarantee the data consistency except within the staleness window. The data item returned could be out of date, but the level of staleness is controlled so it isn’t too out of date. Bounded staleness attempts to resolve the tradeoff between operational latency and data currency. The challenge becomes identifying application needs. How current does the data item need to be, and what will the application do when the data it returns isn’t the most current? Users choosing bounded staleness will incur a higher latency than session and eventual consistency (below) but are able to leverage multiple Azure regions to provide geographic data distribution.
- Session – Strong and bounded Consistency settings control data at the global level. As its name implies, session consistency’s scope is at the user session. Session consistency guarantees monotonic reads, writes and read your own writes guarantees. Monotonic reads mean that when your session initially reads a data item, subsequent reads in that session will return the same or more current data item – never older. Monotonic writes mean that your session will write to a data item in the order in which the updates are executed. Your first write will complete before any successive write operations to the same data item are performed. Read your own writes are fairly self-explanatory. The data your session writes to the system will be retrieved by subsequent read operations in that session. Session consistency provides low latency reads and writes, and users are able to leverage multiple Azure regions to provide geographic data distribution.
- Consistent Prefix – Readers will never see writes that are out of order. If writes to a data item were in 1, 2, 3 order, readers would see 1 – 1,2 – 1,2,3. Never 1,3 – 2, 1, 3. In addition, the data items in the replica will eventually converge in the absence of any further writes. This consistency level provides very low read latency in addition to poor consistency, and users are able to leverage multiple Azure regions to provide geographic data distribution.
- Eventual – Like its consistent prefix counterpart, eventual consistency guarantees that the data items in multiple replicas will eventually converge if no further writes are performed. Eventual consistency provides the lowest read and write latency, but the tradeoff is that it also offers the weakest read consistency of any Cosmos DB consistency model. Think traditional NoSQL consistency.
Administrators select the default consistency level when setting up the database account on Cosmos DB. Application developers can choose more “relaxed” consistency models by selecting a consistency level for a specific read request in the API. Schemaless Architecture Microsoft describes Cosmos DB as “schema agnostic” stating that competing products force users to deal with schema and index management, which includes versioning and migration. In a relational database environment, a rigid schema must be pre-created before data can be inserted. In NoSQL databases, including Cosmos DB, the schema is automatically created when the data is inserted. Because the data must match the schema definition, altering rigid schemas often requires that administrators coordinate the changes with the application development teams. Schemaless architectures allow rapid changes to be made during the development process. For NoSQL models, the schema definition could also be described as implicitly defined (versus explicitly defined in the relational model), in that the definitions are shared between the database management system and the application code. NoSQL developers are willing to have the application itself assume more ownership of data quality and business rule enforcement to fully leverage NoSQL’s schema flexibility. The developers writing application code that retrieves and manipulates the data are required to make some assumptions on the current structure. Automatic Indexing Cosmos DB indexes all data by default. Administrators are able to override this by establishing custom index policies. Index policies allow the administrator to:
- Disable automatic indexing. Indexing can still be performed, but all data is not automatically indexed
- Include/exclude specific objects when indexing
- Create index path definitions. Cosmos DB uses trees as a storage mechanism. Administrators can choose which document paths in the tree must be included or excluded when indexing
- Define index kinds which include hash, range and spatial as well as index data types and precision
- Choose from Consistent, Lazy and None indexing modes. Choosing the appropriate indexing mode is critical, as it has a direct impact on the consistency level for query requests.
Feature Wrap-up
As stated previously, listing all of the features provided by Cosmos DB is far beyond the scope of this article. The intent was to demonstrate to readers that Cosmos DB is going to be a formidable competitor and promises to be one of the more disruptive DB offerings Microsoft has unveiled for some time.
What Impact Will a Multi-Model DBMS Product Have on the NoSQL Market?
Can a multi-model “one database data store for all your NoSQL needs” product compete against vendors that focus on a specific NoSQL storage model? When you do a deep dive into how Cosmos DB actually stores the data, it becomes pretty obvious that Microsoft has architected the product to easily support additional data storage models as well. Do a quick web search and you’ll find a lot of industry pundits with pedigrees far stronger than mine expressing the opinion that the days of the general purpose database are long gone. The IT community is looking for multiple special purpose products that focus on a particular storage model, database vendors that do one thing and do it very well.
Smaller NoSQL Competitors are Also Embracing Multi-Model and Relational
Microsoft’s multi-model strategy is not unique. Two of the more popular NoSQL competitors are also leveraging the multi-model strategy to gain market share. Datastax/Cassandra, which is a wide-column data store, has been supporting the Graph data storage model for some time. MongoDB has also enhanced its product to support the Graph model. Will the other NoSQL vendors follow their lead or continue to focus on a single data storage model? In order to increase customer acceptance, Cosmos DB’s NoSQL competitors are also adding relational-like functionality that allows their offerings to be more widely adopted. They support ACID transactions, referential integrity, joins, etc… The end result will be that the lines of distinction between relational and NoSQL systems will also become increasingly blurred.
What Impact Will Cosmos DB Have on the NoSQL Market Arena?
As stated previously, it was an easy prediction to state that the larger database vendors wouldn’t sit idly by and let a group of small NoSQL competitors nibble away at their market share. All DBMS Industry heavyweights, although somewhat distracted by their required cloud initiatives, will continue to incorporate additional data storage models into their existing platforms or unveil offerings like Microsoft Cosmos DB. NoSQL vendors that have storage technologies that easily integrate into competing products will become ripe targets. Microsoft has the R&D budget, talent and desire to further enhance the Cosmos DB product. It’s another safe prediction that Cosmos DB, which is already built upon a strong foundation, will continue to evolve and grow in both product features and database functionality. The challenge for the smaller NoSQL vendors is that Cosmos DB not only supports their storage models, the product also provides a wealth of features that makes it a formidable competitor. Search the web and you’ll find C-Level responses to Cosmos DB from several NoSQL competitors. They provide various reasons why their offering will be able to compete successfully against Cosmos DB. Although they do provide varying degrees of supporting evidence to back up their claims, offerings like Cosmos DB have certainly made the NoSQL market arena a much more competitive one. There will be NoSQL vendors that survive the onslaught from larger vendors like Microsoft. They will remain as viable alternatives not because of the superiority of their architecture and feature sets but because their offering consists of a unique storage model that is also open source, a combination which allows them to be viewed as attractive, cost-effective alternatives to commercial products. The remaining NoSQL vendors will either remain as niche providers, add RDBMS-like functionality or merge/acquire other product vendors and storage models to boost their chances of survival and become viable alternatives to their larger counterparts.