Introduction to NOSQL

NoSQL is a set of concepts that allows the rapid and efficient processing of data sets with a focus on performance, reliability, and agility.

It’s not about the SQL language—The definition of NoSQL isn’t an application that uses a language other than SQL. SQL as well as other query languages are used with NoSQL databases.

Relational database have failed in solving some of the complex modern problems like

  1. Continuously changing the nature of data – structured, semi-structured, unstructured and polymorphic data.
  2. Applications now serve millions of users in different Geo-locations, in different timezones and have to be up and running all the time with data integrity maintained.
  3. Applications nowadays are becoming more distributed with many moving towards cloud computing.

NoSQL plays a vital role in an enterprise application which needs to access and analyze a massive sets of data that is being made available on multiple vital servers , remote based in the cloud infrastructure and mainly when the data set is not structured.

Hence the NoSQL database is designed to overcome the Performance, Scalability, Data Modeling and Distribution limitations that are in the relational databases. You can also watch following keynote to know about NoSQL.

– Martin’s keynote “Introduction to NoSQL”

Characteristics of NoSQL

  • It’s more than rows in tables—NoSQL systems store and retrieve data from many formats: key-value stores, graph databases, column-family (Bigtable) stores, document stores, and even rows in tables.
  • It’s free of joins—NoSQL systems allow you to extract your data using simple interfaces without joins.
  • It’s schema-free—NoSQL systems allow you to drag-and-drop your data into a folder and then query it without creating an entity-relational model.
  • It works on many processors—NoSQL systems allow you to store your database on multiple processors and maintain high-speed performance.
  • It uses shared-nothing commodity computers—Most (but not all) NoSQL systems leverage low-cost commodity processors that have separate RAM and disk.
  • It supports linear scalability—When you add more processors, you get a consistent increase in performance.
  • It’s innovative—NoSQL offers options to a single way of storing, retrieving, and manipulating data. NoSQL supporters (also known as NoSQLers) have an inclusive attitude about NoSQL and recognize SQL solutions as viable options. To the NoSQL community, NoSQL means “Not only SQL”.

ACID and BASE

When we talk about  NoSQL databases, data consistency models can sometimes be strikingly different than those used by relational databases .It uses two consistency models are known by the acronyms ACID and BASE.

We will discuss the key differences between ACID and BASE data consistency models .

The ACID Consistency Model

The key ACID guarantee is that it provides a safe environment in which to operate on your data. The ACID acronym stands for:

  • Atomic
    • All operations in a transaction succeed or every operation is rolled back.
  • Consistent
    • On the completion of a transaction, the database is structurally sound.
  • Isolated
    • Transactions do not contend with one another. Contentious access to data is moderated by the database so that transactions appear to run sequentially.
  • Durable
    • The results of applying a transaction are permanent, even in the presence of failures.

NoSQL use an ACID consistency model to ensure data is safe and consistently stored.

The BASE Consistency Model

In the NoSQL world, ACID transactions are less fashionable as some databases have loosened the requirements for immediate consistency, data freshness and accuracy in order to gain other benefits, like scale and resilience.

Here’s how the BASE acronym breaks down:

  • Basic Availability
    • The database appears to work most of the time.
  • Soft-state
    • Stores don’t have to be write-consistent, nor do different replicas have to be mutually consistent all the time.
  • Eventual consistency
    • Stores exhibit consistency at some later point (e.g., lazily at read time).

BASE properties are much looser than ACID guarantees, but there isn’t a direct one-for-one mapping between the two consistency models. 

A BASE datastore values availability (since that’s important for scale), but it doesn’t offer guaranteed consistency of replicated data at write time. Overall, the BASE consistency model provides a less strict assurance than ACID: data will be consistent in the future, either at read time nor it will always be consistent.

List of NOSQL database

Here is a list of some free and widely used NoSQL databases:

1.MongoDB

This highly scalable and agile NoSQL database is an amazing performing system. This open source database written in C++ comes with a storage that is document oriented.

2.Redis

This is an open source, key value store of an advanced level. Owing to the presence of hashes, sets, strings, sorted sets and lists in a key; Redis is also called as a data structure server.

3.Couch DB

Couch DB is a Apache project and a really powerful database for JSON based web applications, provides API to store JSON objects as documents in the database. You can use JavaScript to run MapReduce Queries on CouchDB. It also provides a very convenient web based administration console

4.REVENDB

RAVENDB is a second generation open source Database.This Database is document oriented and schema free, so you simply have to dump in your objects into it. There is full support for ACID transactions along with safety of your data. Easy extensibility via bundles is provided along with high performance.

5.MemcacheDB

This is a distributed storage system of key value. It should not be confused with a cache solution; rather, it is a persistent storage engine which is meant for data storage and retrieval in a fast and reliable manner.

6.Riak

It provides for easy and predictable scaling and equips users with the ability for quick testing, prototyping and application deployment so as to simplify development.

7.HBASE

HBase can be easily considered as a scalable, distributed and a big data store. This database can be used when you are looking for real time and random access to your data. It comes with modular and linear scalability along with reads and writes that are strictly consistent.

8.Perst

This is an object oriented DBMS that is open source and has a dual license. With this, you will be able to store, sort and retrieve data in your applications with low overhead storage and memory and very high speed.

9.HyperGraphDB

This is an open source data storage system that is extensible, distributed, general purpose, portable and embeddable. Basically this is a graph database which is mostly mean,t for AI, Semantic web projects and knowledge representation. However , it can also handle Java projects of different sizes.

Types of NoSQL databases

Several different varieties of  NoSQL databases have been created to support specific needs and use cases. These databases can broadly be categorised into four type.ure 1: Column based family

1.Key-value store NoSQL database

From an API perspective, key-value stores are the simplest NoSQL data stores to use. The client can get the value for the key, assign a value for a key or delete a key from the data store. The value is a blob that the data store just stores, without caring or knowing what’s inside; it’s the responsibility of the application to understand what was stored. Since key-value stores always use primary-key access, they generally have great performance and can be easily scaled.

The key-value database uses a hash table to store unique keys and pointers (in some databases it’s also called the inverted index) with respect to each data value it stores. There are no column type relations in the database; hence, its implementation is easy. Key-value databases give great performance and can be very easily scaled as per business needs.

Uses: Here are some popular uses of the key-value databases:

  • For storing user session data
  • Maintaining schema-less user profiles
  • Storing user preferences
  • Storing shopping cart data

However key-value databases are not the ideal choice for every use case when:

  • We have to query the database by specific data value.
  • We need relationships between data values.
  • We need to operate on multiple unique keys.
  • Our business needs updating a part of the value frequently.

Examples of this database are Redis, MemcacheDB and Riak.

2.Document store NoSQL database

Document store NoSQL databases are similar to key-value databases in that there’s a key and a value. Data is stored as a value. Its associated key is the unique identifier for that value. The difference is that, in a document database, the value contains structured or semi-structured data. This structured/semi-structured value is referred to as a document and can be in XML, JSON or BSON format.

Uses: Document store databases are preferable for:

  • E-commerce platforms
  • Content management systems
  • Analytics platforms
  • Blogging platforms

Document store NoSQL databases are not the right choice if you have to run complex search queries or if your application requires complex multiple operation transactions.

Examples of document store NoSQL databases are MongoDB, Apache CouchDB and Elasticsearch.

3.Column oriented NoSQL database

In column-oriented NoSQL databases, data is stored in cells grouped in columns of data rather than as rows of data. Columns are logically grouped into column families. Column families can contain a virtually unlimited number of columns that can be created at runtime or while defining the schema. Read and write is done using columns rather than rows. Column families are groups of similar data that is usually accessed together. As an example, we often access customers’ names and profile information at the same time, but not the information on their orders.

The main advantages of storing data in columns over relational DBMS are fast search/access and data aggregation. Relational databases store a single row as a continuous disk entry. Different rows are stored in different places on the disk while columnar databases store all the cells corresponding to a column as a continuous disk entry, thus making the search/access faster.

Each column family can be compared to a container of rows in an RDBMS table, where the key identifies the row and the row consists of multiple columns. The difference is that various rows do not have to have the same columns, and columns can be added to any row at any time without having to add them to other rows.

Uses: Developers mainly use column databases in:

  • Content management systems
  • Blogging platforms
  • Systems that maintain counters
  • Services that have expiring usage
  • Systems that require heavy write requests (like log aggregators)

Column store databases should be avoided if you have to use complex querying or if your querying patterns frequently change. Also avoid them if you don’t have an established database requirement, a trend which we are beginning to see in new systems.

Examples of column store NoSQL databases are Cassandra and Apache Hadoop Hbase.

4.Graph base NoSQL database

Graph databases are basically built upon the Entity – Attribute – Value model. Entities are also known as nodes, which have properties. It is a very flexible way to describe how data relates to other data. Nodes store data about each entity in the database, relationships describe a relationship between nodes, and a property is simply the node on the opposite end of the relationship. Whereas a traditional database stores a description of each possible relationship in foreign key fields or junction tables, graph databases allow for virtually any relationship to be defined on-the-fly.

Uses: Graph base NoSQL databases are usually used in:

  • Fraud detection
  • Graph based search
  • Network and IT operations
  • Social networks, etc

Examples of graph base NoSQL databases are Neo4j, ArangoDB and OrientDB.

Advantages and Drawbacks

Regardless of these obstacles, NoSQL databases have been widely adopted in many enterprises for the following reasons:

  • Elastic scalability
    • RDBMSs are not as easy to scale out on commodity clusters, whereas NoSQL databases are made for transparent expansion, taking advantage of new nodes. These databases are designed for use with low-cost commodity hardware. In a world where upward scalability is being replaced by outward scalability, NoSQL databases are a better fit.
  • Big data applications
    • Given that transaction rates are growing from recognition, there is need to store massive volumes of data. While RDBMSs have grown to match the growing needs, but it’s difficult to realistically use one RDBMS to manage such data volumes. These volumes are however easily handled by NoSQL databases.
  • Database administration
    • The best RDBMSs require the services of expensive administrators to design, install and maintain the systems. On the other hand, NoSQL databases require much less hands-on management, with data distribution and auto repair capabilities, simplified data models and fewer tuning and administration requirements. However, in practice, someone will always be needed to take care of performance and availability of databases.
  • Economy
    • RDBMSs require installation of expensive storage systems and proprietary servers, while NoSQL databases can be easily installed in cheap commodity hardware clusters as transaction and data volumes increase. This means that you can process and store more data at much less cost. 

There are various uses and advantages of NoSQL,but  still many obstacles are there which must be overcome before they can become fully accepted among the more established enterprises.

Below are a few of these obstacles:

1.Less mature

RDBMSs have been around a lot longer than NoSQL databases. The first RDBMS was released into the market about 25 years ago. While proponents of NoSQL may present this as a disadvantage citing that age is an indicator of obsolescence, with the advancement of years RDBMSs have matured to become richly functional and stable systems.

In contrast, most of the NoSQL database  alternatives have just barely made it out of the pre-production stages, and there are many important features that have not yet been implemented. It’s an exciting prospect for a developer to be teetering on the cutting edge of technology, but caution must be exercised to avoid any disastrous consequences.

2.Less support

All enterprises need to have the reassurance that should a key function within their data management system fail, they will have access to competent support in a timely manner. All the RDMBS vendors have made great effort to ensure that such services are available, and enterprises can also enlist 24 hour support from remote database administration services, which have the expertise to handle most of the RDBMSs.

Each NoSQL database in contrast tends to be open-source, with just one or two firms handling the support angle. Many of them have been developed by smaller startups which lack the resources to fund support on a global scale, and also the credibility that the established RDBMS vendors like Oracle, IBM and Microsoft enjoy.

3.Business intelligence and analytics

NoSQL databases were created with the demands of the Web 2.0 modern-day web applications in mind. As such, most features are directed at meeting these demands. Where the demands of a data app extend beyond the characteristic ‘insert-read-update-delete’ cycle of a typical web app, these databases offer few features for analysis and query ad-hoc.

Simple queries require some programming knowledge, and the most common business intelligence tools that many enterprises rely on do not offer connectivity to NoSQL databases. However, this may be solved in time, seeing as some tools like PIG or HIVE have been created to offer ad-hoc query functionality for NoSQL databases.

4.Administration

The end goal for NoSQL database design was to offer a solution that would require no administration, but the reality on the ground is much different. NoSQL databases still demand a lot of technical skill with both installation and maintenance.

5.No advanced expertise

Because NoSQL databases are still new, virtually every NoSQL developer out there is still learning the ropes, unlike RDBMS systems, which have millions of proficient developers throughout the market and in every field of trade. Over time, this situation will resolve itself, but presently, it remains easier to find an RDBMS expert than a NoSQL expert.

Any organization that wants to implement NoSQL solutions needs to proceed with caution, bearing in mind the above limitations in addition to understanding the benefits that NoSQL databases offer their relational counterparts.