NoSQL is not the name of any particular database instead it refers to a broad class of non-relational databases that differ from classical relational database management systems (RDBMS) in some significant aspects, most notably because they do not use SQL as their primary query language, instead providing access by means of Application Programming Interfaces (API).

NoSQL can be considered "Internet age" databases that are being used by Amazon, Facebook, Google and the like to address performance and scalability requirements that cannot be met by traditional relational databases.

NoSQL databases and data-processing frameworks are primarily utilized because of their speed, scalability and flexibility. Adoption of NoSQL in the enterprise level, however, is still emerging. Some consider it the absolute apogee of achievement, while others maintain it at the peak of the Inflated Expectations Phase of Gartner’s Hype Cycle, used to characterize the over-enthusiasm or “hype” and subsequent disappointment that typically happen with the introduction of new technologies. Still others relegate it to an inferior and inconspicuous position in favor of columnar relational databases such as Sybase IQ or Oracle 11g.

Features of NoSQL databases

One major difference between traditional relational databases and NoSQL is that the latter do not generally provide guarantees for atomicity, consistency, isolation and durability (commonly known as ACID property), although some support is beginning to emerge. Instead of ACID, NoSql databases more or less follow something called "BASE". We will discuss this in more detail later in the article.

ACID is comprised of a set of properties that guarantees that database transactions are processed reliably. To know more about ACID, read What is a database?

The other major difference is, NoSQL databases are generally schema-less - that is records in these databases do not require to conform to a pre-defined storage schema.

In a relational database, schema is the structure of a database system described in a formal language supported by the DBMS and refers how the database will be constructed and divided into database objects such as tables, fields, relationships, views, indexes, packages, procedures, functions, queues, triggers and other elements.

In NoSQL databases, schema-free collections are utilized instead so that different types and document structures such as {“color”, “blue”} and {“price”, “23.5”} can be stored within a single collection.

Below table lists down the major characteristic features of NoSQL databases1

Feature Description
Schema-less "Tables" don't have a pre-defined schema. Records have a variable number of fields that can vary from record to record. Record contents and semantics are enforced by applications.
Shared nothing architecture Instead of using a common storage pool (e.g., SAN), each server uses only its own local storage. This allows storage to be accessed at local disk speeds instead of network speeds, and it allows capacity to be increased by adding more nodes. Cost is also reduced since commodity hardware can be used.
Elasticity Both storage and server capacity can be added on-the-fly by merely adding more servers. No downtime is required. When a new node is added, the database begins giving it something to do and requests to fulfill.
Sharding Instead of viewing the storage as a monolithic space, records are partitioned into shards. Usually, a shard is small enough to be managed by a single server, though shards are usually replicated. Sharding can be automatic (e.g., an existing shard splits when it gets too big), or applications can assist in data sharding by assigning each record a partition ID.
Asynchronous replication Compared to RAID storage (mirroring and/or striping) or synchronous replication, NoSQL databases employ asynchronous replication. This allows writes to complete more quickly since they don't depend on extra network traffic. One side effect of this strategy is that data is not immediately replicated and could be lost in certain windows. Also, locking is usually not available to protect all copies of a specific unit of data.
BASE instead of ACID NoSQL databases emphasize performance and availability. This requires prioritizing the components of the CAP theorem (described elsewhere) that tends to make true ACID transactions implausible.

1Source: http://dbpedias.com/wiki/NoSQL:Survey_of_Distributed_Databases

Types of NoSQL databases

NoSQL database systems came into being by some of the major internet players such as Google, Facebook, LinkedIn and others which had significantly different challenges in dealing with data than those addressed by traditional RDBMS solutions. There was a need to provide information out of large volumes of data that to a greater or lesser degree adhered to similar horizontal structures. These companies realized that performance and real-time character was more important than consistency, to which much of the processing time in a traditional RDBMS had been devoted.

As such, NoSQL databases are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage. The reduced run-time flexibility compared to full SQL systems is counterbalanced by significant gains in scalability and performance for certain data models. NoSQL databases demonstrate their strengths above all with regard to the flexible handling of variable data by document-oriented databases, in the representation of relationships by graph databases and in the reduction of a database to a container with key-value pairs provided by key-value databases.

Consequently, NoSQL databases are often categorized according to the way they store data and fall under the following major categories:

  • Key-value stores
  • Columnar (or column-oriented) databases
  • Graph databases
  • Document databases

Key-value stores

Key-value stores allow the application to store its data in a schema-less (key, value) pairs. These data can be stored in a hash table like datatypes of a programming language - so that each value can be accessed by its key. Although such storage might not be very efficient - since they provide only a single way to access the values - but eliminates the need for a fixed data model.

Columnar databases

A column-oriented DBMS stores its content by column rather than by row. It contains predefined families of columns and is more accomplished at scaling and updating at relatively high speeds, which offers advantages for data warehouses and library catalogs where aggregates are computed over large numbers of similar data items.

Graph databases

Graph databases optimize the storage of networks – or “Graphs“ – of related nodal data as a single logical unit. A graph database uses graph structures with nodes, edges and properties to represent and store data and provides index-free adjacency, meaning that every element contains a direct pointer to its adjacent element and no index lookups are necessary. This can be useful in cases of finding degrees of separation where SQL would require extremely complex queries. A popular movie service, for example, shows the logged-in user a “Best Guess for You” rating for each film based on how similar people rated it, while other services such as LinkedIn, Facebook or Netflix show people in a network at various degrees of separation. Although such queries become simple in Graph databases, the relevance of this technology in many other types of industries are difficult to determine.

Document databases

Document stores are used for large, unstructured or semistructured records. Data is organized in documents that can contain any number of fields of any length. All document-oriented database implementations assume documents encapsulate and encode data in some sort of standard formats – known as encodings – and are ideal for MS Office or PDF documents. Document databases should not be confused with Document Management Systems, however. The documents referred to are not actual documents as such, although they can be. Documents inside a document-oriented database are similar in some ways to records or rows in relational databases, but they are less rigid because they are not required to adhere to a standard schema. Unlike a relational database where each record would have the same set of fields and unused fields might be kept empty, there are no empty fields in document records. This system allows new information to be added to or removed from any record without wasting space by creating empty fields on all other records. In contrast to key-value and columnar databases, which view each record as a list of attributes which are updated one at a time, document stores allow insertion, updates and queries of entire records using a JavaScript Object Notation (JSON) format.

The concept of a join is less relevant in document databases than in traditional RDBMS systems. As a result, records that might be joined in a traditional RDBMS, are generally denormalized into wide records. Denormalization refers to a process by which the read-performance of a database is optimized by the addition of redundant or grouped data. Some of the NoSQL vendors, most notably MongoDB, do in fact feature add-on join capabilities as well. Many of these database categories are beginning to blur, however. As all of them support the association of values with keys, they are therefore all fundamentally key-value stores; document databases, moreover, can perform all of the capabilities of columnar databases from a semantic point of view. As a result, the distinguishing factors must be evaluated in terms of performance and ease of use for a particular solution.

Popular incarnations of NoSql databases

Most implemented solutions cannot be strictly assigned to a specific type and contain features from two or more categories. We should also recognize that each NoSQL implementation has its own special nuances. Popular offerings include the following:

Apache Cassandra

Apache Cassandra is an open-source, distributed database-management system designed to handle very large amounts of data spread out across many commodity servers while providing a high degree of service availability with no single point of failure. It is particularly fast at write operations as opposed to reads and might therefore lend itself best to applications that require analysis of large sets of data with write-backs.

HBase

HBase is also an open-source, distributed database modeled after Google’s BigTable. HBase technologies are not strictly a data-store, but generally work closely with a NoSQL database to accomplish highly scalable analyses. HBase scales linearly with the number of nodes and can quickly return queries on tables consisting of billions of rows and millions of columns.

BigTable

BigTable can be defined as a sparse, distributed, multi-dimensional sorted map. BigTable is designed to scale into the petabyte range – a petabyte is equivalent to 1 million gigabytes - across hundreds or thousands of machines and to make it easy to add more machines to the system and start taking advantage of those resources automatically without any reconfiguration.

Coherence and Ehcache

Coherence and Ehcache are equipped with In-Memory caches. Coherence is in heavy use in financial industries where network latency – defined as the time it takes to cross a network connection from sender to receiver - is a factor.

Possible applications of NoSql Databases

NoSQL databases should generally be considered as potential options when any high-intensity computation or analysis of large data sets is required, especially when performing real-time analysis. This can easily make their use in many industry sectors e.g. financial institutions' electronic-trading applications. Relational databases, especially the columnar variety, do not generally perform well on updates. As a result, a NoSQL database might present itself as a viable alternative in cases where massive updates are required. In situations involving variable-record templates or sparse data, NoSQL document databases can offer a welcome alternative.


Have a question on this subject?

Ask questions to our expert community members and clear your doubts. Asking question or engaging in technical discussion is both easy and rewarding.

Are you on Twitter?

Start following us. This way we will always keep you updated with what's happening in Data Analytics community. We won't spam you. Promise.

  • Understanding CAP theorem

    Despite the high demand in recent years for massively distributed databases with high partition fault-tolerance, the CAP theorem stipulates that it is actually impossible for a distributed system to provide consistency, availability and partition...

  • What is NoSQL

    NoSQL is not the name of any particular database instead it refers to a broad class of non-relational databases that differ from classical relational database management systems (RDBMS) in some significant aspects, most notably because they do not...

  • We Know ACID. What is BASE?

    When it comes to relational database systems, we already know what is meant by ACID property of database. But what is the BASE property of NoSQL database? Let's understand BASE in this article.