Neo4j 3.0 and the Graph of the Universe


The Cosmic Web Paper by the Barabasi Lab


After I came across this tweet the other night,


I checked out the original website of Cosmic Web, which is beautifully done.

The Cosmic Web Full Graph Visualization by Kim Albrecht

Their paper describes the work of correlating galaxies in our cosmos by different means.

    • Fixed-Length Model: All galaxies within a set distance of l are connected by an undirected link.
    • Varying-Length Model: The length of each link is proportional to the “size” of the galaxy, l = a * R(i) ^ (1/2)
    • Nearest Neighbors Model: Each galaxy is connected to its closest neighbors with a directed links. In this model, the length of each link depends on the distance to the nearest galaxy.

The last model provided the most accurate representation of the real-world constellations.

Graph Visualization


A visual artist, Kim Albrecht, visualized the resulting graphs beautifully using Three.js.

Learn How to Create a Graph Visualization of the Universe Using Neo4j 3.0


Working with Raw CSV Data


Fortunately for me, the raw sources for this dataset were CSV files with the galaxies forming nodes and the different relationship types that represent the means for connecting them described in their research.

I had four CSV files to work with:

Importing Data into Neo4j 3.0


With Neo4j 3.0, I could quickly import them using the LOAD CSV mechanism, here is the full script.

galaxies.cypher
create constraint on (g:Galaxy) assert g.id is unique;

// create galaxies
with "https://cosmicweb.kimalbrecht.com/viz/data/12-05-15//ccnr-universe-nodes-nn.csv" as nodes
load csv with headers from nodes as row
with collect(row) as rows
unwind range(0,size(rows)-1) as id
create (g:Galaxy {id:id}) set g+=rows[id];

// Fixed Length Model
with "https://cosmicweb.kimalbrecht.com/viz/data/12-05-15/ccnr-universe-fll-t-1-15.csv" as relationships
load csv with headers from relationships as row
match (g1:Galaxy {id:toInt(row.source)}),(g2:Galaxy {id:toInt(row.target)})
create (g1)-[:FLL]->(g2);

// Varying Length Model
with "https://cosmicweb.kimalbrecht.com/viz/data/12-05-15/ccnr-universe-vll-t-1-10.csv" as relationships
load csv with headers from relationships as row
match (g1:Galaxy {id:toInt(row.source)}),(g2:Galaxy {id:toInt(row.target)})
create (g1)-[:VLL]->(g2);

// Nearest Neighbors Model
with "https://cosmicweb.kimalbrecht.com/viz/data/12-05-15/ccnr-universe-nn-t-1-10.csv" as relationships
load csv with headers from relationships as row
match (g1:Galaxy {id:toInt(row.source)}),(g2:Galaxy {id:toInt(row.target)})
create (g1)-[:NN]->(g2);

The only trick I had to pull of was to collect the galaxies first into a list, to get an index for their row in the CSV. That’s why loading the node-CSV takes longer than the relationships.

Query & Visualize in the Neo4j Browser


But running the import gives me some nice visual results in the Neo4j Browser.

MATCH (g:Galaxy) WHERE size( (g)--() ) = 10
WITH g LIMIT 1
MATCH (g)-[rels:NN*..7]-()
UNWIND rels as r
RETURN distinct r;

A Graph Visualization of Galaxies in the Neo4j Browser


Neo4j 3.0 Bolt Binary Protocol Test


With Neo4j 3.0, I wanted to test the performance of the new binary protocol (a.k.a. Bolt). So I grabbed the JavaScript [neo4j-driver from npm], and retrieved all 211k neighbourhood relationships in one go. Just pulling the data and measuring the outcome is easy, as you can see below.

test-neo-driver.js
var neo4j = require('neo4j-driver').v1;
var driver = neo4j.driver("bolt://localhost", neo4j.auth.basic("neo4j", "test"));
var session = driver.session();

var counter = function() {
  var start = undefined;
  return {
    start : Date.now(),
    count : 0,
    onNext: function(r) { this.count++; },
    onCompleted: function() { console.log("rows",this.count,"took",(Date.now()-start),"ms"); }}
};

session.run("CYPHER runtime=compiled MATCH (n:Galaxy)-[:NN]->(m:Galaxy) RETURN id(n),id(m)").subscribe(counter());

NOTE:
It interestingly took only 330ms to pull all that data out of the database and across the wire into my client.

test run
$ npm install neo4j-driver
$ node test-neo-driver.js

> rows 211959 took 327 ms

Force Layout Graph Visualization with ngraph


Although I have no artistic talents whatsoever, I could at least try to load the data from Neo4j into Anvaka’s ngraph and let its force layout algorithm do the work.

Please note that the artistic three.js visualization mentioned above uses pre-laid-out data, the x, y, z coordinates are still available as properties in the data.

But I wanted to see how well ngraph can load and layout 200k relationships without any preparation just in JavaScript.

The loading was quite quick, like before. The force-layout did take some time though, but resulted in a really nice two-dimensional graph rendering of our cosmos.

A Neo4j Graph Visualization of the Cosmic Web


Everyone can import the data on their own quickly by running my import script, after starting your Neo4j 3.0 server. (You might need to confige in conf/neo4j.config that the “remote-shell” is enabled.)

import
$NEO4J_HOME/bin/neo4j-shell -file galaxies.cypher

Conclusion


This was only me having fun with galaxies and Neo4j 3.0 around 3 a.m. If you want to read and hear from a real graph-astronomer, check out Caleb W. Jones’ work on “Using Neo4j to Take Us to the Stars”.

Ad Astra,

Michael

P.S. Graphs are everywhere – even our cosmos form one.


Want to try this out for yourself? Click below to download Neo4j 3.0 and test it out for your next project or application.