Graph processing is a new method of modeling many-to-many relationships. It was first added in SQL Server 2017 and is a different method of storing and querying many-to-many relationships. In a traditional database, many-to-many relationships typically make use of a third table that stores a mapping between the two main tables. Graph tables use similar concepts, nodes and edges. Both tables, nodes contain the data and edges contain the relationship between nodes. In this post, I will explain each concept.
Nodes and Edges
Node tables are just like regular tables, except they contain an automatically generated $node_id column storing JSON data to identify each record. It is recommended to create a unique constraint or index on this column, but it is not required.
Edge tables are slightly different. These tables express the relationship and consist of at least three required columns that store JSON strings. The $edge_id column uniquely identifies the edge in the database. The $from_id and $to_id columns store the $node_id records from Node tables as their mapping to other node records. In this manner, it’s very similar to traditional many-to-many mappings. Edges can optionally contain user-generated columns, but they are not required.
The special JSON columns in nodes and edges are named $node_id_/$edge_id_ followed by a hex string. This is for internal processing and they can be omitted if you query those columns.
Creating nodes and edges is also just like creating regular tables, but simply add ‘AS NODE’ or ‘AS EDGE’.
Populating nodes is identical to inserting into normal tables. Edges require slightly different inserts as the $node_id from the source and target nodes will have to be specified during the insert to create the relationships. Below are some sample scripts. The examples I’m using include incomplete Marvel’s Cinematic Universe films and actors (Avengers, Iron Man, Thor and Captain America). For brevity, I’m only including examples of each type. If you’d like full scripts, they can be downloaded from GitHub: https://github.com/jshurak/SQL_Graphs
create table Movies ( MovieID int identity(1,1), Title Varchar(60) ) as node Create table Actor ( ActorID int identity(1,1), FirstName varchar(60), LastName varchar(60) ) as Node Create table StarredIn as edge insert into actor values ('robert','downy') insert into actor values ('scarlett','johansson') ….. insert into movies values ('The Avengers') insert into movies values ('Iron Man') … … insert into StarredIn values ((select $node_id from actor where lastname = 'downy'),(select $node_id from Movies where title = 'The Avengers')) insert into StarredIn values ((select $node_id from actor where lastname = 'johansson'),(select $node_id from Movies where title = 'The Avengers'))
Querying graph databases
There are syntactical differences in querying graph databases. They use the implicit join style structure when specifying tables, with a special MATCH statement in the where clause. Within the MATCH statement is the graph search pattern. This is a uniquely structured pattern that utilizes a hyphen and arrow structure to specify the relationship.
The basic structure is as follows:
SELECT col FROM node, edge WHERE MATCH(node-edge->node)
The arrow can be in either direction and multiple graph search patterns can be included with an AND statement. Currently OR and NOT statements are not allowed within the MATCH statement. In some instances, this can lead to simplified queries; in others it can be more complex. Below are some samples of queries with their traditional equivalents. Granted, there is more than one way to write a query.
--Get all movies,actors using graph processing select Title,FirstName,LastName from Movies,Actor, StarredIn where match(actor-(starredIn)->Movies) --Get all movies and actors using traditional sql select title,FirstName,LastName from Actor a inner join StarredIn s on a.actorId = s.ActorID inner join Movies m on m.MovieId = s.MovieID
It’s easy enough to filter on actors. This gets all the films Chris Hemsworth has starred in (
--Get Chris Hemsworth films select Title,FirstName,LastName from Movies,Actor, StarredIn where match(actor-(starredIn)->Movies) and FirstName = 'Chris' and lastName = 'Hemsworth'
Now we can do some interesting things. We can query all actors in our database that have co-starred with Mr. Hemsworth. This is where traditional SQL can be a bit verbose:
--Get Chris Hemsworth co stars using graph processing select Title,a2.FirstName,a2.LastName from Movies,Actor a,StarredIn, actor a2, starredIn s2 where match( a-(starredIn)->Movies and Movies-(s2)->a2 ) and a.ActorID = (select actorID from actor where FirstName = 'Chris' and lastName = 'Hemsworth') and a2.actorId <> (select actorID from actor where FirstName = 'Chris' and lastName = 'Hemsworth') --find actors that have starred with Chris Hemsworth using traditional sql with hemsworth_movies as (select m.MovieId,m.title, a.actorId from actor a inner join StarredIn s on a.actorId = s.ActorID inner join Movies m on m.MovieId = s.MovieID where FirstName = 'Chris' and LastName = 'Hemsworth' ) select title,firstName,LastName from actor a inner join starredIn s on a.ActorId = s.ActorId inner join hemsworth_movies hm on hm.movieId = s.MovieID and a.ActorId <> hm.actorID order by Title
Finally, since we’re restricting to AND statements in the graph search pattern, we can use multiple queries to find actors that have not starred alongside Chris Hemsworth:
--get actors that have NOT co starred with Chris Hemsworth using graph processing with hemsworth_movies as ( select movieID,actorId from Movies,Actor, StarredIn where match(actor-(starredIn)->Movies) and title in ('The Avengers','Thor') ) select Title,FirstName,LastName from Movies m,Actor a,StarredIn s where match(m-(s)->a) and movieID not in (select movieId from hemsworth_movies) and ActorID not in (select actorId from hemsworth_movies) order by Title --Actors that have NOT co-starred with Chris Hemsworth using traditional SQL with hemsworth_costars as ( select a.actorID from Actor a inner join starredin s on a.actorId = s.actorId inner join Movies m on m.movieId = s.MovieId where title in ('The Avengers', 'Thor') ) select Title,a.FirstName,a.LastName from Actor a left join hemsworth_costars h on a.actorId = h.actorId inner join starredIn s on s.ActorId = a.actorId inner join Movies m on m.MovieId = s.MovieID and h.ActorId is null order by Title
When to use graph processing
This is the rub: there’s nothing you can do with graph processing that can’t be done with traditional relational structures. However, if your database consists of mostly many-to-many relationships, it may be worth exploring. They can add a natural language type of syntax to queries so It can be used to simplify your queries. But if you’ve been writing SQL for years, this may not be of value.
Finally, if your organization needs help modeling many-to-many relationships—or assistance with any other SQL Server element, please don’t hesitate to reach out to us. We’d love to help you get started.
SQL Server Architect