I've been working with Neo4j for something like two years now. I started with Ruby and Neo4j.rb 2.3, which used JRuby 1.7.x with Neo4j Embedded 1.9, and learned Ruby, Neo4j, Rails, and the Neo4j.rb gem concurrently. Funny as it might be, I made it pretty far without ever writing much more than an extremely basic Cypher query for over a year. Phillymetal.com is built entirely on this stack but there's very little "graphy" stuff going on. Neo4j Embedded is so fast that working with the database has almost no overhead for simple find/return/traverse operations, so I was mostly exploiting the easy data modeling and schema-free goodness.
Things changed last year when I started contributing heavily to the Neo4j.rb gem, then got even more intense when I ramped up work on my own project. As performance became more of a concern, I started writing more Cypher to limit the number of queries in and out of the database. As it turns out, Ruby seems to be Neo4j.rb's biggest problem, but the fact is still that the REST API with Neo4j 2.1.7 is just not as performant as Java and Neo4j Embedded.
Enter unmanaged extensions and the Java API. Neo4j has a very cool feature: since it is open source and exposes a REST API, they provide an easy way to write your own plugins, "unmanaged extensions," that expose new REST endpoints. Once your data hits that endpoint, you can use the Java API to do all your work, then you return whatever you want back to your application. This gives you the best of both worlds: Cypher when you need it, Java API for the heavy lifting. Without exception, everyone I know who needs serious performance from Neo4j has told me that they fall back to this method.
At work, we're building an ambitious social media analytics platform that relies heavily on Neo4j. It's collecting a lot of data and revealing a lot of interesting connections. The front-end app is a client of the JSON API my team is building and the goal is for it to have the feel of a desktop app. This means it wants a ton of data very fast. Unfortunately, the combination of our query, the amount of data, the REST API, and Neo4j.rb just do not make for the performance we're looking for. Without getting into how much data we're returning, I can say that we've been seeing 1.5s responses when we really need something closer to 400ms.
I took some steps to improve performance but each time, I got smacked down. I patched Neo4j-core's
Query class with an
unwrapped method that returns simpler objects, I implemented the amazing GraphAware TimeTree plugin, I optimized the hell out of my query, but we were still having these issues; in fact, the 1.5s response mentioned above is AFTER these optimizations!
After hearing so much about the power of unmanaged extensions, I thought this might be a good opportunity to dive in. Problem was, I had never written a line of Java before. Hell, I had never successfully read a line of Java! Still, I knew what I wanted to do: take the complex Cypher query we're running, rewrite it in Java, return JSON that is usable in the app. To do this, I started looking for resources about Java that were aimed at Rubyists. I figured that with the popularity of both languages, there had to be something out there for Ruby devs looking to learn Java... right? Wrong. Everything I could find was about going from Java to Ruby. Coooool...
I figured I might as well dive into TimeTree to see if I could figure some of it out. I started with the API, since it was all I was really interacting with directly. https://github.com/graphaware/neo4j-timetree/blob/master/src/main/java/com/graphaware/module/timetree/api/TimeTreeApi.java has all the endpoints and figuring out what methods were being called was easy enough. It was also pretty trivial to expose a new endpoint, take a few params, and... make the whole thing crash. A lot.
A few searches and half a book on a plane ride later, I made it to a point that I was able to hack together my own unmanaged extension. In the span of seven days, I was able to write it, extract it from TimeTree, test it, and get it into staging. Our report now runs at around 400ms and I'm going to move more of our app's code into Java ASAP. Victory!
I'm writing this post and what I hope will be a few companion pieces for two reasons.
First, from a Neo4j dev's perspective, I want to illustrate the power of the Java API compared to Cypher. This is not a knock on Cypher in any way, because it is a fantastically expressive, readable, powerful language that I'm always happy to work with, but the superior control you have over your data within Java just can't be understated.
Second, separate from Java, I want to provide a resource for other people who may be in my shoes: Ruby devs who want or need to learn Java quickly but don't want to read through a book that spends a significant amount of time on familiar principles of OOP. As I work with Java, it's going to be harder to remember how I felt about it when I got started, so now is the time to do this.
I don't expect there to be more than one or two other posts about this and I doubt that either of them will be very heavy on Neo4j, if they're mentioned at all. Though I may mention it, it's really not much more than a reference for how I figured some things out. If you are an experienced Java developer, please don't blast me if/when I misstate details, since I'm still learning here.
I'm going to start working on the first piece of this immediately, so stay tuned!