Technology | Databases
Graph Databases – Why relationships need real focus
Graph databases are not a new technology, but they do tend to find themselves at the cutting edge of technology. They have become especially relevant in modern times due to our world growing ever more connected. Let’s take a look at why that is and what exactly a graph database is.
What is a graph database?
A graph database is one that places equal importance on the relationships between data points, as on the data points themselves. The term graph, in this case, refers to the structure that represents the relationship between data items stored to a series of nodes and edges.
It is an inherently flexible model because the data isn’t stored in a predefined structure. Instead it exists as it is first drawn out, demonstrating the connections and relationships between each entity.
When compared with relational databases, graph databases tend not to store such complete information about items, as their main function is to map the relationships between items.
Graph databases are modelled in large part on graph theory, a mathematical theory. In translation to computer science it has carried over much of the terminology and concepts, whilst developing many of its own.
What elements make up a graph database?
Considering the relative complexity of the topic, it’s worth breaking down some of the terminology involved and what each element is called and does. This won’t be an exhaustive list, but should give you an understanding of the main points.
Nodes
Initially called a vertex in mathematical graph theory, but referred to as a node when discussing graph databases. A node is a basic unit of a data structure and can contain data, as well as be linked to other nodes.
They are used to represent entities, such as people, businesses, accounts or any other item that needs to be tracked.
Edges
The connection between nodes is referred to as an edge. Each node may have a single edge or many. Each edge will also have a particular meaning depending on whether it is undirected or directed.
In an undirected graph, each edge has only one meaning. The connection between two nodes represented by an undirected edge means that this relationship is fixed and can be traversed in either direction.
In a directed graph, each edge will have a meaning based on the direction it goes. This is more complex as depending on which direction the edge is running, either from or to a node, the meaning is changed.
Properties
Properties refer to information related to nodes and edges. For example, if one of the nodes was ‘Atlas’ then it might have properties such as company, agile, software developer, brand name, purple, or innovator.
What are graph databases used for?
Graph databases excel at semantic queries, that is to say queries that deal with both implicit and explicit data. Traditional relational databases represent relationships in an implicit manner.
That means that the relationships between two data sets only becomes apparent when a query statement is written by a developer, which requires intimate knowledge of the database schema and can be time consuming.
Because a graph database directly stores the relationship between data nodes, semantic searches are much simpler and less resource intensive.
When searching for data that’s one or more search levels deep, the difference becomes even more apparent. Relational databases need to query for each level of search, for example name, location, and shoe size. Then they need to perform a further query to find the data points in each set that line up.
A graph database could begin that same search with shoe size and traverse the backlinks through the relationships to find the correct data points.
Graph databases are also adept at helping to identify hot, cold, short and long paths within a graph. A path can be visualised as a start node and an end node, with any number of middle nodes in between.
By identifying the shortest path, you could facilitate navigation software to identify the shortest route to a destination. Or, you could identify a hot path, the one with the highest count of the most common path, allowing the same navigation software to identify the most travelled route.
We’ve taken an overview of the fundamentals of graph databases here and we hope it’s provided some insight. We’ll be back soon to discuss some more exciting topics around graph databases, including how we can use them to visualise data in interesting ways.