Python Simplified

PythonSimplifiedcomLogo

Introduction to Graph Databases

Graph databases resized

Introduction

Databases were first introduced in the 1960s and they are evolving ever since. Until the year 2000, relational database systems were popular. From 2000 we started seeing a new type of database system called non-relational or No SQL database systems. 

In relational database systems, data is organized in the form of row and column and SQL programming language is used to query the data from the database.

Non-relational or No SQL databases come in different flavors such as Documents databaseskey value stores, Column-oriented databases and graph databases.

The goal of this article is to give you an introduction to Graph Databases and when should and shouldn’t consider using a graph database

What is Graph Database

Graph Databases is a NoSQL database based on Graph Theory and it consists of objects called nodes, properties, and edges (relationships) to represent, store, and search the relationships of data. Graph databases treat the relationships between nodes equally important as data itself.

                         Relationships are first-class citizens in Graph Database

Nodes

Each node represents an entity such as people, places, etc. You can think of nodes as records in a relational database. Every node in the graph database has at least one incoming or outgoing edge or both.

Edges

Edges are also called Relationships. These are the lines that connect the nodes and represent the relationship between the connecting nodes. One of the biggest differences between relational DB and graph databases is that relational databases don’t store relations between the records but graph databases store this information.

Properties

Properties store information related to nodes and edges.

Graph database example
Source: Wikipedia

In the above diagram, there are three nodes. Each node has its own properties. Nodes containing Id: 1 and Id: 2 are having three properties: id, Name and Age. Similarly, there is another node with three properties: Id, Type and Name.

The lines connecting these nodes are called edges or relationships. These relationships can be easily understood. Alice knows Bob since 2001/10/03 and Bob know Alice since 2001/10/04. Both Alice and Bob are part of the group called Chess and this relationship is shown using is_member and Members relationships.

When you should/shouldn't use Graph Database?

Refer to this article by Jennifer Reif on Neo4j publication where she beautifully explained this very same question. I am going to highlight key points here but for a detailed explanation refer to the above-mentioned article.

Cases where graph databases are NOT a good choice

  • If the data is highly disconnected, and relationships between the data don’t matter (for example, customer transaction data), you don’t need a graph database.
  • If the requirement is to just store the data and you are using only simple queries, a graph database is not needed.
  • If your data structure is fixed and consistent then there is no need of going for graph databases as graph databases are best suited for storing all types of data and changing business needs.
  • Graph databases are not suited if you are querying for bulk data scans as they are not optimized for such operations.
  • If the requirement is to store and retrieve entity properties that contain extremely large values (such as BLOBs, CLOBs, etc), then a graph database is not an ideal solution.

Cases where graph databases are a good choice

  • If you are dealing with highly connected data such as Facebook friend connections etc. then you should go for the database as graph databases are purposely built to handle the highly connected data.

Graph Database Use cases

Graph Databases are used in almost all industries and are being used by thousands of companies around the world. Here are some of the use cases where Graph Databases find their applications –

  • Fraud or Anomaly Detection
  • Real-Time Recommendations
  • Graph-Based Search
  • Social Networks
  • Machine Learning
  • Identity and Access Management, etc.

Graph Databases

As per the db-engines.com, Neo4j is the market leader followed by Microsoft Azure Cosmos DB and ArangoDB.

Graph database rankings
Source: db-engines.com
Graph database rankings 2
Source: db-engines.com

If you are new to a graph database and my advice is to start with Neo4j as it is the most commonly used graph database and also there are tons of resources to get started with the Neo4j graph database. Even official documentation of Neo4j is a great starting point to learn graph database.

Pros and Cons of Graph Databases

Pros

  • Graph databases query in real-time for highly connected data. As we have just gone through they are designed mainly for highly connected data.
  • The performance of graph databases is astounding with small or big data and it makes them an ideal solution for real-time big data analytics queries where data size grows rapidly.
  • With graph databases, you can manage constantly changing business needs and object types which are not easy in relational databases, etc.

Cons

  • Graph databases don’t have uniform query language. For example, Neo4j uses Cipher, and Cosmos DB uses SQL as a JSON query language, and so on. So, if you want to switch between the graph databases you will have to learn the query language of that graph database.
  • Graph databases don’t do well for aggregating data so they should not be used in business intelligence.
  • Graph databases don’t scale out well, etc.

Conclusion

Hope you have got an understanding of what graph databases are, different graph databases and when should/should not use a graph database. Now you know that, when you have deal with interconnected data, you will choose one of the graph database as per you need. 

References

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Share on email
Chetan Ambi

Chetan Ambi

A Software Engineer & Team Lead with over 10+ years of IT experience, a Technical Blogger with a passion for cutting edge technology. Currently working in the field of Python, Machine Learning & Data Science. Chetan Ambi holds a Bachelor of Engineering Degree in Computer Science.
Scroll to Top