Model Caching in Rails, or when a Student is not a Student

For a few months, we’ve had a few reports by users from Neo4j.rb users of an odd bug. The story goes, “I try to create a relationship between two nodes but the type checking tells me that one of the nodes is not of the appropriate type but I know that it is.” In code, it could look like this:

student = Student.first
lesson = Lesson.first
# Creates a relationship between the lesson and student
lesson.students && student

More specifically, it was always reported by Rails users dealing with create actions. They’d be loading a node, creating a new node of another class, and associating the new node with the old.

student = Student.create(student_params)
lesson = Lesson.find(lesson_id)
lesson.students && student

An error would be raised saying words to the effect of, “Node type invalid. Expected , found ."

In my arrogance, I assumed that since the errors were not reported that often and I had never personally seen it, these guys must have had some conflict with another gem or messed up code somewhere. After all, if it was a real bug, we’d have seen it ourselves or our tests would have caught it, right? No, wrong — completely wrong. Finally, someone copied/pasted their code to me and I could see everything was done correctly. They also noticed that it only happened after updating a file and restarting their server would fix it. It began to sound like an issue with the Rails automatic reloader, so I set out to track it down.

Classes are Still Instances

The root, or maybe roots, of the problem turned out to be variables and constants hanging onto copies of classes, persisting across Rails reload!. There were two spots where this was happening, but to understand why it’s a problem, we need to talk about what a class is.

In Ruby, a class is an instance of Class. Because of that, it is possible that active Ruby process can have two independent instances of the same class, one a different version of the other. It’s something like this:

container = Container.new
container.enclosed_model = Student
# the Student model is changed, Rails `reload!` is called
# From this point out, our `Student` model can be thought of as Student v2.

# This is an instance of Student v2
student = Student.new

# This is false because our container's `enclosed_model` is Student v1
student.is_a?(container.enclosed_model)
# =>false

# And this hints at why
Student.object_id == container.enclosed_model.object_id

Comparing two objects’ object_id results is always the final answer on whether they are the same. By spitting out the object ids before the error was raised, I saw that, yes, Rails was loading a new instance of the class but somewhere in the bowels of Neo4j.rb was a reference to the old version.

After that, there were two questions remaining: where were references to old models hanging around and how could I keep things current?

The Culprits

The answer to the first question was pretty easy to track down but might be a bit too specific to Neo4j.rb to warrant inclusion in this blog post so I’ll just go over it briefly.

Exhibit A: Node Wrapping

When nodes are returned from the database, we have to figure out which model is responsible for it, and doing that requires us to find the model that has the same combination of labels as the node. It’s not a terribly expensive process but it can be wasteful since you only really need to perform that process once, when a combination of labels is first encountered, and should be able to reuse the results every time after. The results are stored in a hash, MODELS_FOR_LABELS_CACHE, which maps labels => model_constant. In a normal Ruby app, it’s not a big deal that this never gets GCed since you will only have so many possible entries; in Rails development mode, with models constantly being reloaded, that’s not always true, so it was possible for node’s to be instantiated using the wrong versions of models!

Exhibit B: Association models

The associations system and QueryProxy class are arguably the coolest features of Neo4j.rb. You’re able to define associations in models on the fly. Once defined, you can easily create and traverse relationships. It’s possible to build relatively complex queries that jump between models, executed lazily, as you do in ActiveRecord. What makes Neo4j.rb’s association chaining cool is that it will only perform one query, no matter how many models you jump across. So this:

student.lessons.teachers.children.lessons

…will find the lessons of children whose parents are teachers who teach classes taken by one student. In Cypher, it could look something like:

MATCH (s:Student)-[:ENROLLED_IN]->lesson)-[:TAUGHT_BY]->teacher)-[:PARENT_OF]->child)-[:ENROLLED_IN]->result) WHERE ID(s) = {student_id} RETURN result

In a model, the association definitions look like this:

has_many :out, :lessons, type: 'ENROLLED_IN'

That method returns an instance of Neo4j::ActiveNode::HasN::Association, which becomes bound to the model in its @associations hash. This association instance needs to know the model on the other side, so whether you let it infer that from the association name or use model_class to set it, it ends up with a @model_class instance variable that stores — you guessed it — the constant of the other class. When you try to create a relationship, it does a type check in Ruby. If the node given is not an instance of the model saved in @model_class, it raises an error. And there we have it: lesson.students && student will raise an error if student was not borne of the same version of Student held in its @model_class.

Whew! So… now what?

Learning to Let Go… of Models

The answer to the second question, how do we clean things up, was found just last night.

When I first diagnosed the problem, I was eager to get a workaround in so I patched the gem to not use is_a? to determine association compatibility, opting instead to use the node’s labels compared to the model’s labels. This solved the immediate problem but it wasn’t a real fix. Last night, someone commented to an issue on the devise-neo4j library that I had forgotten about, describing the same problem, and I realized that there’s a very good chance this caching was the root. He had done some research into Devise and posted a snippet that included a reference to ActiveSupport::Dependencies::ClassCache, so I looked that up and found a note about the before_remove_const method.

before_remove_const seems to be the solution. When implemented as a class method, it is called by Rails reloader at the start of a reload cycle. I was able to use it to wipe out the constants that were hanging onto models and trigger a refresh of @model_class in each association. You can see the PR here. I say it “seems” to be the solution because I’m still waiting on confirmation of the devise-neo4j issue’s resolution, but I’m reasonably confident. Even if it doesn’t, I think we’ve confirmed that there’s an old reference to a model hanging out somewhere, so we just have to figure out what we missed and queue it for update later on.

So there you have it! An interesting bug squashed and in the process, we saw more proof of Ruby’s “everything-is-an-object” ethos. We learned a bit more about ActiveSupport, some best practices when caching class names, and a crucial reminder to take bug reports seriously, even if they seem impossible to you.

Sublime to RubyMine to RubyMine + Vim to...?

As glamorous as the life of a co-contributor to a sort-of-kind-of-well-respected open source library is, I think one of the best perks is being able to cash in on the free, open source license of RubyMine. It took a bit of coaxing, mostly in the form of a lot of positive feedback from /r/ruby, and I was hooked after a few weeks.

A few weeks ago, though, I became mildly envious of my friends' at work who flew around the screens with Vim. I love keyboard shortcuts, always have, and the potential to make the editing process significantly faster was really attractive. Still, I love RubyMine's features, so... what was I to do?

Enter IdeaVim, the Vim emulator for RubyMine. Hallelujah, I have been saved! Real fast, I'll run through what I like best about this union.

RubyMine features I can't live without

  • CMD + Click on any method to go to its definition, even if it's in a dependency.
  • Right-click and there's an option to find all usages of a method.
  • Highlight and CMD + Option + M to extract code into a new method. It finds variables and makes them arguments. Fuuuuck! So cool.
  • YARD integration. If you annotate your methods, it will use the info to improve its code completion.
  • The best find/replace/rename options I've ever encountered.
  • Method calls get pointed out if I send the wrong number of arguments.

I know that's not much — it's sure as hell nowhere close to taking advantage of everything RubyMine offers — but they're all huge, huge game-changers. The thought of not having the ability to locate method source is brutal, I think it's a crucial option for anyone working with large codebases.

There are a few other features that are cool but just not for me. I use the test support on occasion but tend to want to run the same one or two specs repeatedly, not an entire file, so it's usually overkill. As for git, I find the CLI more comfortable, though its integration does seem fantastic. Finally, the Rails features are extremely cool but I rarely work with it, so they're lost on me.

Vim makes things faster

A lot of Vim users go crazy with plugins. It seems only natural, since every itch seems to be scratched by something that's very focused and sounds very helpful, but all that power comes with a steep learning curve. I think I'm mitigating that by focusing on the basics: fast movement around the screen and word/line deletion/replacement. You don't realize how much time you spend moving around the screen with some combination of mouse + arrow keys + ctrl/home/end until you start using EasyMotion (emacsIDEAs) in my case and get a hang of basic editing.

My Vim list:

  • emacsIDEAs is insane. I have it mapped to (CMD + J)(CMD + F)(char I want to find). It highlights that character all over the screen, replaces it with a character (a thru z) and then I hit that character to jump there on the page. Stolen from Google: Check it out It takes a little getting used to but it's very precise, very helpful.
  • Everything here. Everything.

That's it so far. It might not seem revolutionary but it's one of those "you-have-to-see-it-to-understand" things.

To those getting started with Vim + RubyMine...

I had some problems adjusting and still find some things a little annoying. My quick bullet-points for those who are new:

  • Remap the hotkey that switches from insert to normal mode. I went with (CMD + j)(CMD + j). For some reason, I always have to hit another key — any key — after switching modes for it to register the change. Having to reach up to escape constantly was a real buzzkill.
  • Remap emacsIDEAs basic word search. I went with (CMD + j)(CMD + f) and remember it like "jump" + "find". It's helpful because those two characters are always accessible.
  • Don't be afraid to turn off Vim mode if you're frustrated. The nice thing about the plugin is that you still have RubyMine backing you, so it's not like you're going back to Notepad in its absence.

In Conclusion

I imagine that if this post is read by anyone, it will be people considering the switch to either or both. If that describes you, my advice is to go for it but go slow, don't feel pressured to use every new feature at all once. While the Vim Master Race might push you to use ALL THE PLUGINS ALL THE TIME, I find that it's such a huge paradigm shift from a traditional text editor that a slow ramping up is helpful. Find the tools that are easiest to work into your workflow, gradually increase resistance. Right away, you'll find an improvement in your work, and in time, things will only get better!

For my next trick, I intend to get better with Vim away from RubyMine. I've been doing a bit with Rust lately and find the switch back to SublimeText like... barely a step up from Notepad, frankly, but I'm still struggling with folder navigation and window management in MacVim. It's a process, I'll get there.

Happy coding!

A Simple API Request/Response Decision Tree

I threw this together for work. I think it’s easy to get lost when designing APIs, web/software/whatever, but keeping this in mind can help you stay focused.

http://i.imgur.com/hdqMaxa.jpg

A Java Crash Course for Ruby Developers, Part 1

(Part 1 of a multi-part series. Introduction.)

Welcome! Let's get right to it. This is going to be largely unstructured and reflects the current state of my Java experience, which is very little. It will hopefully help other devs who are getting started. I expect that:

  • You know some basics about Java
  • You know Ruby basics

I'm not going to mention many things that should be obvious, like... lines are terminated with semi-colons, Java is not sensitive to whitespace, etc,...

Let's do this.

#### Use an IDE for Development.

While you can use a text editor to write Java, every resource I encountered referenced some IDE or another. The two big ones seem to be Eclipse and NetBeans. I ended up with NetBeans, though I honestly can't remember why. Having just switched from SublimeText to RubyMine, this was a pretty easy transition for me, as NetBeans feels a hell of a lot like RM, but it might take some getting used to for you. (Incidentally, JetBrains are the developers of IntelliJ, a Java IDE that I imagine is pretty amazing. It's also not free, so...)

If this is your first time using an IDE, the key features that I can no longer live without are:

  • Command + Click on a method to find its definition.
  • Right-click a method and click "Find Usages" to find everywhere that it's used
  • Highlight, right-click, "Refactor" menu to easily move pieces of code into methods, new classes, sub/parent classes, etc,...
  • Built in test integration and compilation
  • Easy dependency management
  • Flagging unused variables and dependencies
  • Hints RE: object types required during method calls or class initialization

Focus on that and you'll adjust in no time.

Don't start from nothing. Find an open source project to modify.

Starting a new language can be daunting, especially something like Java that introduces some concepts that are very different from Ruby. I set my sights on GraphAware's TimeTree plugin for Neo4j, found where the public API endpoints were exposed, copied the methods, and started modifying from there. Anyone who learned HTML in the days before CodeAcademy and sites like it will probably have done this at some point.

Plan on writing tests.

Because Java is compiled, it's very annoying to test your code by poking at it in your app. Write small methods, test carefully, and you'll save yourself a lot of time. There's JUnit, which uses methods and simple assert methods to test behavior (sound familiar?) and there's also Cucumber-JVM, if you're more comfortable there.

Class Definitions

One of the hardest parts of figuring out Java for me was just reading the damn code. You should already know that Java is strongly typed, but... final? Static? Void? WTF? This ends up being very easy.

Basic class Signature Examples

The typical Java class signature looks like this:

public class MyClass {}

That makes sense, right? public, protected, or private define visibility, just like Ruby. The rest is self-explanatory. Java has single inheritance and uses extends, so:

public class MyClass extends AnotherClass {}

Interfaces

Instead of modules, Java has interfaces. An interface lists methods that must be implemented in the class. It deals with what is there but not how they will behave. (NOTE: In Java 8, you can define default behavior for interface methods. This helps you out in the event you want to add a method to an interface that is widely implemented.) A class that includes an interface looks like this:

public class MyClass implements MyInterface {}

At that point, any methods defined in MyInterface are expected to also be defined in MyClass. If they aren't and default behavior isn't provided in the interface, your code will not compile.

Interfaces are extremely helpful because when it comes to method signatures, you can say that a method expects or returns a MyInterface instead of MyClass. It would be like doing this in Ruby:

module MyModule; end
class MyClass
  extend MyModule
end

# Will return true
MyClass.new.is_a?(MyModule)

Ruby doesn't work that way; instead, we'd use respond_to?, but Java's way gets us to the same place by a different route: we are checking that a given object has specific behavior implemented. We care less about the object and more about what it can or cannot do.

There's a lot more to interfaces than I'm going to get into now, but this is a good start.

There is no initialize method, but...

Instead of initialize, you define a method of the same name as the class and do not include a return value.

public class MyClass {
  public AnotherClass myVar;

  public MyClass(AnotherClass myVar) {
    this.myVar = myVar;
  }
}

That means that MyClass is initialized with one argument, an object of type AnotherClass, that will be referred to within its init method as myVar. Once instantiated, myObj.myVar will be set to this argument. For example:

# Define the variable
MyClass myObj;
# Instantiate the object
myObj = new MyClass(myOtherObj);
# call `myVar`, will return `myOtherObj`
return myObj.myVar;

There can be multiple constructors for a class

Java makes it possible to instantiate a new object using many different arguments, just define multiple constructor methods.

public class MyClass {
  public MyClass(AnotherClass myVar) {
    # do something
  }

  public MyClass(DifferentClass myMar, AnotherClass myOtherVar) {
    # do something
  }
}

Of course, the constructor methods would do something with those vars. Point is, you have flexibility in how you instantiate objects. This same rule holds true for methods.

Method definitions

Basic Method Signature Examples

Method signatures can omit visibility and will default to public but the best practice seems to always include it. They look like this:

public ReturnType methodName(FirstArgClass firstVar, SecondArgClass secondVar) {
  # Some code
  return # something here
}

Note that every method identifies not only what types of objects it expects but what it returns. To define a method that does not return anything, use the void keyword.

public void ReturnType methodName() {
  # do something but do not call `return`!
}

To define a class method, use the static keyword.

public static ReturnType methodName() { return null; }

Method arguments are typed

Method arguments follow the simple pattern of ObjectType variableName, separated by commas.

public String myMethod(String firstVar, int secondVar) {
  return firstVar;
}

There's not much more to it than that.

Variables

Variables are declared with types

This shouldn't have to be said at this point but:

String myVar;

That would allocate memory for a new String with name myVar.

Declarations can be assigned immediately... but this does not seem to be best practice

You can do this:

# assume `thisOtherMethod` returns something of type MyObj...
MyObj var1 = thisOtherMethod(anotherVar);

My IDE corrects me when I do this, though, so I get the sense that it's a best practice to declare first, then assign.

Instance variables

Define instance variables in the body of your class.

public class MyClass {
  public String myString;
  public List myList;
}

Those can now be assigned within methods. This is the equivalent of @var in Ruby.

Class variables

Define class variables using the static keyword in the body of the class.

public class MyClass {
  public static String myString;
}

Constants

Define constants using both static and final.

public class MyClass {
  private static final String myString = 'oh hai';
}

The final keyword indicates a value that cannot be changed.


That's it for now. Keep an eye out for part two. Hope this helps someone out.

A Java Crash Course for Ruby Developers, Part 0

I've been working with Neo4j for something like two years now. I started with Ruby and Neo4j.rb 2.3, which used JRuby 1.7.x with Neo4j Embedded 1.9, and learned Ruby, Neo4j, Rails, and the Neo4j.rb gem concurrently. Funny as it might be, I made it pretty far without ever writing much more than an extremely basic Cypher query for over a year. Phillymetal.com is built entirely on this stack but there's very little "graphy" stuff going on. Neo4j Embedded is so fast that working with the database has almost no overhead for simple find/return/traverse operations, so I was mostly exploiting the easy data modeling and schema-free goodness.

Things changed last year when I started contributing heavily to the Neo4j.rb gem, then got even more intense when I ramped up work on my own project. As performance became more of a concern, I started writing more Cypher to limit the number of queries in and out of the database. As it turns out, Ruby seems to be Neo4j.rb's biggest problem, but the fact is still that the REST API with Neo4j 2.1.7 is just not as performant as Java and Neo4j Embedded.

Enter unmanaged extensions and the Java API. Neo4j has a very cool feature: since it is open source and exposes a REST API, they provide an easy way to write your own plugins, "unmanaged extensions," that expose new REST endpoints. Once your data hits that endpoint, you can use the Java API to do all your work, then you return whatever you want back to your application. This gives you the best of both worlds: Cypher when you need it, Java API for the heavy lifting. Without exception, everyone I know who needs serious performance from Neo4j has told me that they fall back to this method.

At work, we're building an ambitious social media analytics platform that relies heavily on Neo4j. It's collecting a lot of data and revealing a lot of interesting connections. The front-end app is a client of the JSON API my team is building and the goal is for it to have the feel of a desktop app. This means it wants a ton of data very fast. Unfortunately, the combination of our query, the amount of data, the REST API, and Neo4j.rb just do not make for the performance we're looking for. Without getting into how much data we're returning, I can say that we've been seeing 1.5s responses when we really need something closer to 400ms.

I took some steps to improve performance but each time, I got smacked down. I patched Neo4j-core's Query class with an unwrapped method that returns simpler objects, I implemented the amazing GraphAware TimeTree plugin, I optimized the hell out of my query, but we were still having these issues; in fact, the 1.5s response mentioned above is AFTER these optimizations!

After hearing so much about the power of unmanaged extensions, I thought this might be a good opportunity to dive in. Problem was, I had never written a line of Java before. Hell, I had never successfully read a line of Java! Still, I knew what I wanted to do: take the complex Cypher query we're running, rewrite it in Java, return JSON that is usable in the app. To do this, I started looking for resources about Java that were aimed at Rubyists. I figured that with the popularity of both languages, there had to be something out there for Ruby devs looking to learn Java... right? Wrong. Everything I could find was about going from Java to Ruby. Coooool...

I figured I might as well dive into TimeTree to see if I could figure some of it out. I started with the API, since it was all I was really interacting with directly. https://github.com/graphaware/neo4j-timetree/blob/master/src/main/java/com/graphaware/module/timetree/api/TimeTreeApi.java has all the endpoints and figuring out what methods were being called was easy enough. It was also pretty trivial to expose a new endpoint, take a few params, and... make the whole thing crash. A lot.

A few searches and half a book on a plane ride later, I made it to a point that I was able to hack together my own unmanaged extension. In the span of seven days, I was able to write it, extract it from TimeTree, test it, and get it into staging. Our report now runs at around 400ms and I'm going to move more of our app's code into Java ASAP. Victory!

I'm writing this post and what I hope will be a few companion pieces for two reasons.

First, from a Neo4j dev's perspective, I want to illustrate the power of the Java API compared to Cypher. This is not a knock on Cypher in any way, because it is a fantastically expressive, readable, powerful language that I'm always happy to work with, but the superior control you have over your data within Java just can't be understated.

Second, separate from Java, I want to provide a resource for other people who may be in my shoes: Ruby devs who want or need to learn Java quickly but don't want to read through a book that spends a significant amount of time on familiar principles of OOP. As I work with Java, it's going to be harder to remember how I felt about it when I got started, so now is the time to do this.

I don't expect there to be more than one or two other posts about this and I doubt that either of them will be very heavy on Neo4j, if they're mentioned at all. Though I may mention it, it's really not much more than a reference for how I figured some things out. If you are an experienced Java developer, please don't blast me if/when I misstate details, since I'm still learning here.

I'm going to start working on the first piece of this immediately, so stay tuned!

subscribe via RSS