The Ruby way
Hi there,
I started to do some prototyping of a deep file traversal in Neo4j.rb (the awesome JRuby bindings done by Andreas Ronge of Jayway - the best Java consultants in Southern Sweden). Using Cucumber, things progressed very fast, so after 2h I had my tests running with the following nodespace layout (you can find the full code example here:
Now a dynamic calculation of a top folder total size by traversing all files and summing up their "size" properties looks like this in step_definitions.rb:
def calcTotalSize(folder)
totSize = 0
folder.relationships.outgoing(:child).nodes.each do |node|
if(node[:size] != nil)
totSize+=node[:size]
else #this is a folder
totSize+=calcTotalSize(node)
end
end
return totSize
end
When doing this with the first test in the treesizes.feature
Scenario: Simple tests
When I create a filetree with 2 files a 1kb and 1 subfolders in each folder, 3 times nested
Then the total number of nodes in the db should be greater than 7
Then the total size of one top folder files should be 4 kb and response time less than 0.015 s
Things are pretty good, we are traversing 2 files, and the total time on my MBP with SSD (yes that ROCKS) is 5ms.
However, cranking the test up to over 20.000 files and folders:
Scenario: Bigger data sample
When I create a filetree with 400 files a 1kb and 50 subfolders in each folder, 3 times nested
Then the total number of nodes in the db should be greater than 20000
Then the total size of one top folder files should be 20400 kb and response time less than 0.5 s
Results in a traversal speed of over 2.3 seconds for that method. Why is this so slow? Well, in Neo4j.rb we are trading development ease for performance. Every node created through Neo4j.rb with Neo4j::Node.new is getting wrapped in a nice little Ruby class holding the properties and basically hiding the Neo4j graph under a very Object Database-like fashion - persistence just "happens" under the hood.
testFile = Neo4j::Node.new puts 'classname: ' + testFile[:classname]
gives us
classname: Neo4j::Node
Thus, every time we step in the above traversal, we are making a roundtrip from the graph into JRuby objects and back to the next hop in the graph.
Java Traversers for the rescue
Luckily, speeding up the traversal a bit is quite easy. The Neo4j Java Traversar API is taking a different approach. By giving the instructions on how to traverse the graph upfront, the full traversal is done in the graph, lazily returning and hoping around in the data structure as the result set is fetched by the client. Thus, the speed of traversal is magnitudes higher than "out of graph" traversal.
The Neo4j Java Traversar API is easily accessible from JRuby, so we can extract the underlying Neo4j node reference from the JRuby Neo4j::Node wrapper, and traverse the graph using Neo4j Java API while still not leaving JRuby:
#this is about 8x faster - untweaked
def calcSizeJava(node)
neoNode = node.internal_node
size = 0
child = org.neo4j.api.core.DynamicRelationshipType.withName 'child'
traverser = neoNode.traverse(org.neo4j.api.core.Traverser::Order::DEPTH_FIRST,
org.neo4j.api.core.StopEvaluator::END_OF_GRAPH,
org.neo4j.api.core.ReturnableEvaluator::ALL, child, org.neo4j.api.core.Direction::OUTGOING )
while traverser.hasNext()
node = traverser.next
if node.hasProperty('size')
size += node.getProperty('size')
end
end
size
end
Now, this gains on the above traversal over 20K nodes about 8 times the speed, resulting in 304ms traversal time with the Java API. Well under the target 0.5 seconds in the feature. Still, this is interpreted code, so there are significantly more gains to be done, but at least the traversal is done "in-graph" without leaving JRuby and not even taking into consideration JRuby in compiled mode or tweaking in Java to get it down to the full speed for this type of traversal, which should be well under 50ms. More on that in another post :)
I found it a very "cheap" way to crank up the JRuby speed for a bigger prototype, it might work for you, too?
p.s. feel free to run, fork and improve the code, this is just a few hours spike ...
Hi,
i have a problem with relationships. I have a node with more than five children's. So now i will delete the relationship from node1(home node) to node2. But when i say node1.rel(:children).end_node i become following exception NativeException at /config
org.neo4j.graphdb.NotFoundException: More than one relationship[DynamicRelationshipType[feature_config], OUTGOING] found for NodeImpl#158.
So what can i do that i become the relationship between node1 and node2?
Thanks.
Posted by: buan | 06/22/2010 at 08:59 AM
Hi there,
you should probably address this question to the Neo4j Ruby forum, http://groups.google.com/group/neo4jrb so the experts can help you. Sorry for the inconvenience!
/peter
Posted by: Peter Neubauer | 07/14/2010 at 07:25 AM
The business loans are essential for guys, which are willing to organize their career. As a fact, that is very comfortable to receive a short term loan.
Posted by: AutumnAnderson34 | 12/25/2011 at 09:16 PM