Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apoc.index.* doesn't add or find newly-added nodes in index #329

Closed
igorclark opened this issue Mar 19, 2017 · 39 comments
Closed

apoc.index.* doesn't add or find newly-added nodes in index #329

igorclark opened this issue Mar 19, 2017 · 39 comments
Assignees

Comments

@igorclark
Copy link

Hi all,

I'm trying to use fulltext indexes from APOC, but I'm not having much luck.

I'm running neo4j 3.1.2 and apoc-3.1.2.5 on debian 8.7.

Creating the index on an existing db works as expected:

neo4j-sh (?)$ call apoc.index.addAllNodes( 'search_index', { City: ["name"] } );
+-------------------------------+
| label  | property | nodeCount |
+-------------------------------+
| "City" | "name"   | 1122      |
+-------------------------------+
1 row
866 ms
neo4j-sh (?)$ call apoc.index.search( 'search_index', 'City.name:"La Paz"' ) yield node return node;
+--------------------------------------------------------------------------------------------------------------+
| node                                                                                                         |
+--------------------------------------------------------------------------------------------------------------+
| Node[1593267]{name:"La Paz",url:"/places/bolivia/la-paz",lat:-16.489689,lon:-68.1192936} |
+--------------------------------------------------------------------------------------------------------------+

But although I have "autoUpdate" set in neo4j.conf:

$ fgrep autoUpdate /etc/neo4j/neo4j.conf
apoc.autoUpdate.enabled=true

Adding new items doesn't automatically update the index (I've tried restarting the server after creating the index, but it doesn't make any difference):

neo4j-sh (?)$ create (c:City{name:"Made Up City",url:"/places/nowhere/made-up-city"}) return c;
+-----------------------------------------------------------------------+
| c                                                                     |
+-----------------------------------------------------------------------+
| Node[1746889]{url:"/places/nowhere/made-up-city",name:"Made Up City"} |
+-----------------------------------------------------------------------+
1 row
Nodes created: 1
Properties set: 2
Labels added: 1
77 ms
neo4j-sh (?)$ call apoc.index.search( 'search_index', 'City.name:"Made Up"' ) yield node return node;
+------+
| node |
+------+
+------+
0 row
134 ms
neo4j-sh (?)$ start n=node:search_index('name:"Made Up"') return n;
+---+
| n |
+---+
+---+
0 row
46 ms

So I try to add the node to the index manually, but whether I have autoUpdate set to true or not, adding the node directly with addNode doesn't work:

neo4j-sh (?)$ match (c:City{url:"/places/nowhere/made-up-city"}) with c call apoc.index.addNode( c, [ 'name' ] ) return c;
+-----------------------------------------------------------------------+
| c                                                                     |
+-----------------------------------------------------------------------+
| Node[1746893]{name:"Made Up City",url:"/places/nowhere/made-up-city"} |
+-----------------------------------------------------------------------+
1 row
86 ms
neo4j-sh (?)$ call apoc.index.search( 'search_index', 'City.name:"Made Up"' ) yield node return node;
+------+
| node |
+------+
+------+
0 row
11 ms
neo4j-sh (?)$ start n=node:search_index('name:"Made Up"') return n;
+---+
| n |
+---+
+---+
0 row
8 ms

Adding with addNodeByLabel doesn't work either:

neo4j-sh (?)$ match (c:City{url:"/places/nowhere/made-up-city"}) with c call apoc.index.addNodeByLabel( 'City', c, [ 'name' ] ) return c;
+-----------------------------------------------------------------------+
| c                                                                     |
+-----------------------------------------------------------------------+
| Node[1746893]{name:"Made Up City",url:"/places/nowhere/made-up-city"} |
+-----------------------------------------------------------------------+
1 row
77 ms
neo4j-sh (?)$ call apoc.index.search( 'search_index', 'City.name:"Made Up"' ) yield node return node;
+------+
| node |
+------+
+------+
0 row
11 ms
neo4j-sh (?)$ start n=node:search_index('name:"Made Up"') return n;
+---+
| n |
+---+
+---+
0 row
11 ms

Adding the node with addNodeByName does add the node to the index:

neo4j-sh (?)$ match (c:City{url:"/places/nowhere/made-up-city"}) with c call apoc.index.addNodeByName( 'search_index',  c, [ 'name' ] ) return c;
+-----------------------------------------------------------------------+
| c                                                                     |
+-----------------------------------------------------------------------+
| Node[1746893]{name:"Made Up City",url:"/places/nowhere/made-up-city"} |
+-----------------------------------------------------------------------+
1 row
55 ms
neo4j-sh (?)$ start n=node:search_index('name:"Made Up"') return n;
+-----------------------------------------------------------------------+
| n                                                                     |
+-----------------------------------------------------------------------+
| Node[1746893]{name:"Made Up City",url:"/places/nowhere/made-up-city"} |
+-----------------------------------------------------------------------+
1 row
19 ms

But even then, apoc.index.search doesn't find the node:

neo4j-sh (?)$ call apoc.index.search( 'search_index', 'City.name:"Made Up"' ) yield node return node;
+------+
| node |
+------+
+------+
0 row
6 ms

Would be great to know if I'm doing something wrong, or if there's something else going on?

Thanks!
Igor

@igorclark igorclark changed the title apoc.index.search doesn't seem to find newly-added nodes from index apoc.index.* doesn't add or find newly-added nodes in index Mar 19, 2017
@jexp
Copy link
Member

jexp commented Mar 19, 2017

@sarmbruster could you look into this?

@sarmbruster sarmbruster self-assigned this Mar 20, 2017
@sarmbruster
Copy link
Contributor

apoc.autoUpdate.enabled=true in neo4j.conf is not sufficient to make index tracking work. You additionally have to configure the index itself to be tracked. This can be achieved by using:

call apoc.index.addAllNodesExtended('search_index',{City:['name']},{autoUpdate:true})
instead of
call apoc.index.addAllNodes( 'search_index', { City: ["name"] } );

Please note that you cannot change a index configuration, so be sure that search_index does not exist when emitting addAllNodesExtended. You can use call apoc.index.remove('search_index') to get rid of it.

Can you please try the sketched approach. If it succeeds I guess we should improve the docs on this.

@igorclark
Copy link
Author

Hi there @jexp and @sarmbruster, thanks for taking a look!

I just tried that, but I'm sorry to say it made no difference. I used apoc.index.remove to remove search_index and apoc.index.addAllNodesExtended to add it again, as sketched, but otherwise I replicated the above steps exactly, and got exactly the same results.

Just to be clear, this isn't only a problem with auto-updating. As per the report above, using apoc.index.addNode and apoc.index.addNodeByLabel to add the node manually both fail to add it to the index, whereas apoc.index.addNodeByName does add it to the named index (because node:search_index does find it), but even then apoc.index.search doesn't find the node in the index.

Thanks - I really hope to get this working, as it'll be an extremely useful feature for us :-)

@sarmbruster
Copy link
Contributor

Thanks for testing again. I'll run some tests to get a better understanding during next couple of days.

@igorclark
Copy link
Author

Great. Thanks. Please do let me know if I can help test anything, happy to do what I can!

@sarmbruster
Copy link
Contributor

Looks like the documentation is misleading. Can you please try to use
apoc.autoIndex.enabled=true
instead of
apoc.autoUpdate.enabled=true
in conf/neo4j.conf ?
If that setup works for you as expected I'll fix the documentation.

@igorclark
Copy link
Author

Hi @sarmbruster, thanks, I'm away from a computer for a couple of days but I'll test it out when I get back.

Do you expect this will affect the problem where manually added nodes still don't show in the index?

@sarmbruster
Copy link
Contributor

No, apoc.index.addAllNodesExtended and apoc.index.addAllNodes apply a naming convention for the index fields: <Label>.<propertyKey> whereas apoc.index.node does not apply this convention and just uses the property key as index field name.

@sarmbruster
Copy link
Contributor

So let's first figure out if index tracking works correctly and afterwards sort out this asymmetry between addAllNodes and apoc.index.node (and others).

@jexp
Copy link
Member

jexp commented Mar 24, 2017

Then we could def. store the label + property information in the index definition in the same way as in index.addAllNodes

@igorclark
Copy link
Author

Ok great thanks! I'll test it on Sunday eve and report back 👍

@igorclark
Copy link
Author

Hi there @sarmbruster, I tried with autoIndex like so:

$ grep autoIndex /etc/neo4j/neo4j.conf
apoc.autoIndex.enabled=true

and ran the commands below, with the apparent result that:

  1. New nodes are automatically added to the index, as start n=node:search_index finds them
  2. apoc.index.search still doesn't find the new nodes in the index
  3. Now that the new nodes are being added to the index, it seems impossible to delete them
  4. Now that the new nodes are being added to the index, it seems impossible to delete pre-existing nodes too

I ran the session below immediately after a completely fresh VM build including fresh debian 8 + neo4j 3.1.2 installs and identical import of existing database.

neo4j-sh (?)$ call apoc.index.addAllNodesExtended('search_index',{City:['name']},{autoUpdate:true});
+-------------------------------+
| label  | property | nodeCount |
+-------------------------------+
| "City" | "name"   | 1131      |
+-------------------------------+
1 row
1661 ms
neo4j-sh (?)$ create (c:City{name:"Made Up City",url:"/places/nowhere/made-up-city"}) return c;
+-----------------------------------------------------------------------+
| c                                                                     |
+-----------------------------------------------------------------------+
| Node[1878635]{url:"/places/nowhere/made-up-city",name:"Made Up City"} |
+-----------------------------------------------------------------------+
1 row
Nodes created: 1
Properties set: 2
Labels added: 1
119 ms
neo4j-sh (?)$ start n=node:search_index('name:"Made Up"') return n;
+-----------------------------------------------------------------------+
| n                                                                     |
+-----------------------------------------------------------------------+
| Node[1878635]{name:"Made Up City",url:"/places/nowhere/made-up-city"} |
+-----------------------------------------------------------------------+
1 row
68 ms
neo4j-sh (?)$ call apoc.index.search( 'search_index', 'City.name:"Made Up"' ) yield node return node;
+------+
| node |
+------+
+------+
0 row
65 ms
neo4j-sh (?)$ match (c:City{name:"Made Up City"}) delete c;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 1
183 ms
TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
neo4j-sh (?)$ match (c:City{name:"Made Up City"}) detach delete c;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 1
45 ms
TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
neo4j-sh (?)$ match (c:City{name:"London"}) detach delete c;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 1
Relationships deleted: 42024
974 ms
TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.

@sarmbruster
Copy link
Contributor

Thanks @igorclark for further testing. To sum up we currently talk about three different issues here. I've opened separate tickets for all of them, see above.

Let's continue the specific discussions in the individual tickets.

I'll close that "master" ticket off when all sub issues are solved.

@sarmbruster
Copy link
Contributor

@igorclark meanwhile I've fixed the code in 3.1 branch (no release of this yet), could you please test again.

@igorclark
Copy link
Author

Hi @sarmbruster, great, thanks, & please excuse the delay in getting back to you, it took me a bit of time to test all this through.

I did a full VM rebuild & reload using neo4j 3.1.2 and apoc-3.1.2.6-SNAPSHOT-all.jar and tested the same scenario as above.

The first thing I noticed was that the import took much, much longer than neo4j 3.0.x, or 3.1.2.5 with apoc-3.1.2.5.jar. I guess this is because it's now obeying apoc.autoIndex.enabled=true and there's extra work happening on every insert, with the new TransactionEventHandler? It was probably 20 times as long and more for the full import - it's a graphxml import with batch-size 5000 (import-graphml -i ${IMPORT_FILE} -t -b 5000 -c) and it took roughly 60s for each commit, instead of 2-3s for each commit without apoc.autoIndex enabled.

Regardless, here's what happened:

neo4j-sh (?)$ call apoc.index.addAllNodesExtended('search_index',{City:['name']},{autoUpdate:true});
+-------------------------------+
| label  | property | nodeCount |
+-------------------------------+
| "City" | "name"   | 1136      |
+-------------------------------+
1 row
901 ms
neo4j-sh (?)$ create (c:City{name:"Made Up City",url:"/places/nowhere/made-up-city"}) return c;
+-----------------------------------------------------------------------+
| c                                                                     |
+-----------------------------------------------------------------------+
| Node[1940793]{url:"/places/nowhere/made-up-city",name:"Made Up City"} |
+-----------------------------------------------------------------------+
1 row
Nodes created: 1
Properties set: 2
Labels added: 1
63 ms
neo4j-sh (?)$ start n=node:search_index('name:"Made Up"') return n;
+---+
| n |
+---+
+---+
0 row
96 ms
neo4j-sh (?)$ call apoc.index.search( 'search_index', 'City.name:"Made Up"' ) yield node return node;
+-----------------------------------------------------------------------+
| node                                                                  |
+-----------------------------------------------------------------------+
| Node[1940793]{name:"Made Up City",url:"/places/nowhere/made-up-city"} |
+-----------------------------------------------------------------------+
1 row
76 ms
neo4j-sh (?)$ match (c:City{name:"Made Up City"}) delete c;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 1
38 ms

So it's clearly adding the node to an index called search_index, because apoc.index.search finds it there. Which is great, and the delete works without error too :-)

But, using a regular start n=node:search_index() doesn't find it any more. Is that expected behaviour?


The second thing I've noticed might be a separate issue, in which case I'm happy to raise another one, but it seems related, so I'm mentioning it here in case you think it's connected.

The issue is that there are other failures trying to remove properties when I have apoc.autoIndex.enabled=true. For example, from the end of our import script:

neo4j-sh (?)$ MATCH (p:Person) REMOVE p.token RETURN COUNT(p) AS number_tokens_removed;
+-----------------------+
| number_tokens_removed |
+-----------------------+
| 58186                 |
+-----------------------+
1 row
Properties set: 2593
123 ms
TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.

This previously worked without any errors (on 3.0.1, and still works on 3.1.2 + apoc without apoc.autoIndex.enabled); if token wasn't present, it just wouldn't remove it, and that would be that.

Now (on 3.1.2 with both apoc-3.1.2.5 and apoc-3.1.2.6-SNAPSHOT-all, and apoc.autoIndex.enabled=true), the remove seems to trigger an error.

It doesn't happen on all nodes; when there's a token property on the node(s) I try to remove the token from, it reliably fails - but when I match where not exists(n.token), it reliably works as expected. I tried it with a different property, with exactly the same outcome - if the property's present, failure; if not, success.

neo4j-sh (?)$ MATCH (p:Person) WHERE EXISTS (p.token) WITH p ORDER BY RAND() LIMIT 10 REMOVE p.token RETURN COUNT(p) AS number_tokens_removed;
+-----------------------+
| number_tokens_removed |
+-----------------------+
| 10                    |
+-----------------------+
1 row
Properties set: 10
147 ms
TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
neo4j-sh (?)$ MATCH (p:Person) WHERE NOT EXISTS (p.token) WITH p ORDER BY RAND() LIMIT 10 REMOVE p.token RETURN COUNT(p) AS number_tokens_removed;
+-----------------------+
| number_tokens_removed |
+-----------------------+
| 10                    |
+-----------------------+
1 row
144 ms

This is in our import script, and it seems to fail when apoc.autoIndex.enabled=true has been switched on before a clean import, whether using apoc-3.1.2.5.jar or apoc-3.1.2.6-SNAPSHOT-all.jar.

The thing is, the label Person isn't in the search_index - and in fact search_index doesn't even exist in our main import, it's just something I've been testing with. Person is covered by our dbms.auto_index.nodes.keys=name,index_name,url,description setting - but like I say, this error doesn't happen when apoc.autoIndex isn't enabled.

So I tried doing this with a completely separate, previously-unused label, and found the same happens:

neo4j-sh (?)$ create (s:Submarine{color:"yellow",periscope:true});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 2
Labels added: 1
52 ms
neo4j-sh (?)$ create (s:Submarine{color:"green"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 1
Labels added: 1
21 ms
neo4j-sh (?)$ match (s:Submarine) WHERE EXISTS(s.periscope) REMOVE s.periscope RETURN COUNT(s) AS number_periscopes_removed;
+---------------------------+
| number_periscopes_removed |
+---------------------------+
| 1                         |
+---------------------------+
1 row
Properties set: 1
22 ms
TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
neo4j-sh (?)$ match (s:Submarine) WHERE NOT EXISTS(s.periscope) REMOVE s.periscope RETURN COUNT(s) AS number_periscopes_removed;
+---------------------------+
| number_periscopes_removed |
+---------------------------+
| 1                         |
+---------------------------+
1 row
40 ms

I tried just deleting the nodes, with a completely empty database with apoc.autoIndex.enabled=true using apoc-3.1.2.6-SNAPSHOT-all.jar, and it works fine:

neo4j-sh (?)$ create (s:Submarine{color:"yellow",periscope:true});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 2
Labels added: 1
171 ms
neo4j-sh (?)$ create (s:Submarine{color:"green"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 1
Labels added: 1
21 ms
neo4j-sh (?)$ match (s:Submarine) WHERE EXISTS(s.periscope) DELETE s RETURN COUNT(s) AS number_subs_deleted;
+---------------------+
| number_subs_deleted |
+---------------------+
| 1                   |
+---------------------+
1 row
Nodes deleted: 1
175 ms
neo4j-sh (?)$ match (s:Submarine) WHERE NOT EXISTS(s.periscope) DELETE s RETURN COUNT(s) AS number_subs_deleted;
+---------------------+
| number_subs_deleted |
+---------------------+
| 1                   |
+---------------------+
1 row
Nodes deleted: 1
57 ms

But with apoc-3.1.2.5.jar the same still happens:

neo4j-sh (?)$ create (s:Submarine{color:"yellow",periscope:true});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 2
Labels added: 1
211 ms
neo4j-sh (?)$ create (s:Submarine{color:"green"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 1
Labels added: 1
20 ms
neo4j-sh (?)$ match (s:Submarine) WHERE EXISTS(s.periscope) DELETE s RETURN COUNT(s) AS number_subs_deleted;
+---------------------+
| number_subs_deleted |
+---------------------+
| 1                   |
+---------------------+
1 row
Nodes deleted: 1
171 ms
TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
neo4j-sh (?)$ match (s:Submarine) WHERE NOT EXISTS(s.periscope) DELETE s RETURN COUNT(s) AS number_subs_deleted;
+---------------------+
| number_subs_deleted |
+---------------------+
| 1                   |
+---------------------+
1 row
Nodes deleted: 1
64 ms
TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.

So I guess your commits have fixed the delete bug, but this problem with remove-ing properties seems to be still there in 3.1.2.6.

Sorry for the giant update but it's quite complicated to explain. Hope it makes sense!

@sarmbruster
Copy link
Contributor

@igorclark thanks again for your help. Some comments on your observations:

But, using a regular start n=node:search_index() doesn't find it any more. Is that expected behaviour?

Yes, that's expected. When using start you need to use a label+propertyKey combo:

start n=node:search_index('City.name:"Made Up"') return n

On the property removal issue I'll create a subticket being referenced here.

@igorclark
Copy link
Author

Hey, great, thanks @sarmbruster! So with this fix we're able to create the new index, add and auto-add nodes to it, and query it using both start and apoc.index.search. Excellent. Thank you!

I'll follow the other one in #367, thanks for creating that - happy to help testing that out too.

Cheers!

@igorclark
Copy link
Author

Hi @sarmbruster, just got round to re-testing all this on neo4j 3.1.4 and apoc-3.1.3.7-all.jar and found that while all the index calls from above seem to work well, there does seem to be one issue when deleting a node:

neo4j-sh (?)$ call apoc.index.addAllNodesExtended(search_index,{City:[name]},{autoUpdate:true});
+-------------------------------+
| label  | property | nodeCount |
+-------------------------------+
| "City" | "name"   | 1142      |
+-------------------------------+
1 row
1191 ms
neo4j-sh (?)$ create (c:City{name:"Made Up City",url:"/places/nowhere/made-up-city"}) return c;
+-----------------------------------------------------------------------+
| c                                                                     |
+-----------------------------------------------------------------------+
| Node[2306629]{url:"/places/nowhere/made-up-city",name:"Made Up City"} |
+-----------------------------------------------------------------------+
1 row
Nodes created: 1
Properties set: 2
Labels added: 1
19 ms
neo4j-sh (?)$ start n=node:search_index(City.name:Made Up) return n;
+-----------------------------------------------------------------------+
| n                                                                     |
+-----------------------------------------------------------------------+
| Node[2306629]{name:"Made Up City",url:"/places/nowhere/made-up-city"} |
+-----------------------------------------------------------------------+
1 row
10 ms
neo4j-sh (?)$ call apoc.index.search(search_index, City.name:Made Up) yield node return node;
+-----------------------------------------------------------------------+
| node                                                                  |
+-----------------------------------------------------------------------+
| Node[2306629]{name:"Made Up City",url:"/places/nowhere/made-up-city"} |
+-----------------------------------------------------------------------+
1 row
12 ms
neo4j-sh (?)$ match (c:City{name:"Made Up City"}) delete c;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 1
14 ms
neo4j-sh (?)$ start n=node:search_index(City.name:Made Up) return n;
+---+
| n |
+---+
+---+
0 row
11 ms
neo4j-sh (?)$ call apoc.index.search(search_index, City.name:Made Up) yield node return node;
QueryExecutionException: Failed to invoke procedure `apoc.index.search`: Caused by: org.neo4j.graphdb.NotFoundException: Node 2306629 not found

If I explicitly remove the node from the apoc index, there's no exception:

neo4j-sh (?)$ create (c:City{name:"Another Made Up City",url:"/places/nowhere/another-made-up-city"}) return c;
+---------------------------------------------------------------------------------------+
| c                                                                                     |
+---------------------------------------------------------------------------------------+
| Node[2306630]{url:"/places/nowhere/another-made-up-city",name:"Another Made Up City"} |
+---------------------------------------------------------------------------------------+
1 row
Nodes created: 1
Properties set: 2
Labels added: 1
10 ms
neo4j-sh (?)$ start n=node:search_index(City.name:Another Made Up) return n;
+---------------------------------------------------------------------------------------+
| n                                                                                     |
+---------------------------------------------------------------------------------------+
| Node[2306630]{name:"Another Made Up City",url:"/places/nowhere/another-made-up-city"} |
+---------------------------------------------------------------------------------------+
1 row
13 ms
neo4j-sh (?)$ call apoc.index.search(search_index, City.name:Another Made Up) yield node return node;
+---------------------------------------------------------------------------------------+
| node                                                                                  |
+---------------------------------------------------------------------------------------+
| Node[2306630]{name:"Another Made Up City",url:"/places/nowhere/another-made-up-city"} |
+---------------------------------------------------------------------------------------+
1 row
14 ms
neo4j-sh (?)$ match (c:City{name:"Another Made Up City"}) call apoc.index.removeNodeByName(search_index, c) delete c;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 1
16 ms
neo4j-sh (?)$ start n=node:search_index(City.name:Another Made Up) return n;
+---+
| n |
+---+
+---+
0 row
7 ms
neo4j-sh (?)$ call apoc.index.search( search_index, City.name:Another Made Up ) yield node return node;
+------+
| node |
+------+
+------+
0 row
9 ms

Is this expected behaviour? I can live with having to remove nodes from the index, but I had thought the point of apoc.autoIndex.enabled=true would be to keep the index in sync with the nodes automatically? Seems like it's the sort of thing that's going to get easily forgotten at some point.

Also, as above, with apoc.autoIndex.enabled=true but no apoc indexes created, a full database import from a GraphML dump is taking over 2 hours, as opposed to the maybe 10 minutes it was taking without. Is this to be expected?

@igorclark igorclark reopened this May 15, 2017
@janwo
Copy link

janwo commented May 31, 2017

@sarmbruster @igorclark I am experiencing the same error. When deleting a node, I get a org.neo4j.graphdb.NotFoundException: Node XXXXX not found-Error.

@igorclark
Copy link
Author

Hi @sarmbruster, wonder if you might have a moment to check this out? It's still kind of in the way of adopting apoc indexes for us - (a) it seems like deleted nodes shouldn't remain in indexes, and (b) the thought that inserts might slow things down quite that drastically as a matter of course is obviously a concern. Any thoughts?

Thanks very much!

@sarmbruster
Copy link
Contributor

@igorclark sorry for the silence, lots of other stuff to do. However I made good progress on the observed performance penalty upon inserts, see code in my branch: https://github.com/sarmbruster/neo4j-apoc-procedures/tree/jmh.
Currently I need to investigate on performance of inserts with async indexing configured.
Will have a look on the delete issue as well.

@igorclark
Copy link
Author

Thanks @sarmbruster, that sounds great. Look forward to updates when you can. Appreciate it!

@jexp
Copy link
Member

jexp commented Jul 25, 2017

@sarmbruster ping ? is this fixed?

@sarmbruster
Copy link
Contributor

waiting for feedback from @igorclark here

@igorclark
Copy link
Author

Hello @sarmbruster! Sorry if I misunderstood, I read your last comment as meaning you would be investigating further into the performance and looking into the delete issue. Did I get that wrong? What can I do to help if so? I'm not familiar with the codebase of the plugin itself, just enthusiastic about using it :-) Also the branch you linked seems to be 404-ing right now.

@sarmbruster
Copy link
Contributor

the PR above was merged and is part of the latest release. I'd be thankful if you could do some testing with currently released apoc version.

@igorclark
Copy link
Author

Oh! Wonderful. Sure thing, thanks for letting me know - I'll try it out in the next day or two and let you know how it goes. Thank you 👍

@igorclark
Copy link
Author

Hi again @sarmbruster, I just got to testing this. I'm using neo4j 3.1.5 and apoc-3.1.3.8-all.jar, do let me know if that's not right.

Firstly, the GraphML import performance seems to be drastically improved. I'll do this a few more times to make sure I have everything right, but even with apoc.autoIndex.enabled=true, it's looking like I'm seeing faster imports using call apoc.import.graphml even than I previously saw with import-graphml from the tools package. Around 9 minutes for the full data import instead of 10. This is great, thanks!

Secondly, however, the delete problem still seems to happen. Here's what I get:

neo4j-sh (?)$ call apoc.index.addAllNodesExtended('search_index',{City:['name']},{autoUpdate:true});
+-------------------------------+
| label  | property | nodeCount |
+-------------------------------+
| "City" | "name"   | 1180      |
+-------------------------------+
1 row
1447 ms
neo4j-sh (?)$ create (c:City{name:"Imaginary City",url:"/places/nowhere/imaginary-city"}) return c;                                                            
+---------------------------------------------------------------------------+
| c                                                                         |
+---------------------------------------------------------------------------+
| Node[3097025]{url:"/places/nowhere/imaginary-city",name:"Imaginary City"} |
+---------------------------------------------------------------------------+
1 row
Nodes created: 1
Properties set: 2
Labels added: 1
21 ms
neo4j-sh (?)$ start n=node:search_index('City.name:"Imaginary"') return n;
+---------------------------------------------------------------------------+
| n                                                                         |
+---------------------------------------------------------------------------+
| Node[3097025]{name:"Imaginary City",url:"/places/nowhere/imaginary-city"} |
+---------------------------------------------------------------------------+
1 row
21 ms
neo4j-sh (?)$ call apoc.index.search( 'search_index', 'City.name:"Imaginary"' ) yield node return node;
+---------------------------------------------------------------------------+
| node                                                                      |
+---------------------------------------------------------------------------+
| Node[3097025]{name:"Imaginary City",url:"/places/nowhere/imaginary-city"} |
+---------------------------------------------------------------------------+
1 row
16 ms
neo4j-sh (?)$ match (c:City{name:"Imaginary City"}) delete c;                      
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 1
18 ms
neo4j-sh (?)$ start n=node:search_index('City.name:"Imaginary"') return n;
+---+
| n |
+---+
+---+
0 row
6 ms
neo4j-sh (?)$ call apoc.index.search( 'search_index', 'City.name:"Imaginary"' ) yield node return node;
QueryExecutionException: Failed to invoke procedure `apoc.index.search`: Caused by: org.neo4j.graphdb.NotFoundException: Node 3097025 not found

As before, if I manually remove the index entry, the error doesn't happen:

neo4j-sh (?)$ create (c:City{name:"Forbidden City",url:"/places/nowhere/forbidden-city"}) return c;
+---------------------------------------------------------------------------+
| c                                                                         |
+---------------------------------------------------------------------------+
| Node[3097026]{url:"/places/nowhere/forbidden-city",name:"Forbidden City"} |
+---------------------------------------------------------------------------+
1 row
Nodes created: 1
Properties set: 2
Labels added: 1
11 ms
neo4j-sh (?)$ start n=node:search_index('City.name:"Forbidden"') return n;
+---------------------------------------------------------------------------+
| n                                                                         |
+---------------------------------------------------------------------------+
| Node[3097026]{name:"Forbidden City",url:"/places/nowhere/forbidden-city"} |
+---------------------------------------------------------------------------+
1 row
15 ms
neo4j-sh (?)$ call apoc.index.search( 'search_index', 'City.name:"Forbidden"' ) yield node return node;
+---------------------------------------------------------------------------+
| node                                                                      |
+---------------------------------------------------------------------------+
| Node[3097026]{name:"Forbidden City",url:"/places/nowhere/forbidden-city"} |
+---------------------------------------------------------------------------+
1 row
18 ms
neo4j-sh (?)$ match (c:City{name:"Forbidden City"}) call apoc.index.removeNodeByName('search_index', c) delete c;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 1
34 ms
neo4j-sh (?)$ start n=node:search_index('City.name:"Forbidden"') return n;
+---+
| n |
+---+
+---+
0 row
5 ms
neo4j-sh (?)$ call apoc.index.search( 'search_index', 'City.name:"Forbidden"' ) yield node return node;
+------+
| node |
+------+
+------+
0 row
6 ms

Can I do anything else here to help narrow this down?

@igorclark
Copy link
Author

Hello, just checking in, anything I can do to help on this one?

😊

@neurofoo
Copy link

neurofoo commented Feb 14, 2018

is the autoIndex issues solved? i'm having the same problem of the autoIndex not automatically indexing new nodes. I have the the stopwatch set to true, but I'm seeing nothing in my debug.log.

here is my config

# APOC configuration
setting "dbms.security.procedures.unrestricted" "apoc.*"
setting "apoc.autoIndex.enabled" "true"
setting "apoc.autoIndex.async" "true"
setting "apoc.autoIndex.queue_capacity" "100000"
setting "apoc.autoIndex.async_rollover_opscount" "50000"
setting "apoc.autoIndex.async_rollover_millis" "5000"
setting "apoc.autoIndex.tx_handler_stopwatch" "true"

@igorclark
Copy link
Author

I haven't checked for a while as I've been working on other projects. I'd really like to be able to use this though as it would enable some significant feature improvements for my Neo4J project. @sarmbruster I'm sure you've had a lot on your plate too - do you know the current situation with this?

Thanks 👍

@sarmbruster
Copy link
Contributor

I've pushed a change today forcing the background thread to explicitly wait until db is available.
Additionally a fix for index deletions has been merged, see neo4j/neo4j#11133
After next neo4j release (3.2.x or 3.3.x) we should test this again.

@igorclark
Copy link
Author

Hey @sarmbruster, thanks for the update. I'll keep an eye out for the next release and test it out again then. Cheers.

@sarmbruster
Copy link
Contributor

Hey @igorclark, I've changed the tx event handler to explicitly remove deleted nodes from the full text indexes. Would be cool if you could give it a try.

@igorclark
Copy link
Author

Hi @sarmbruster, thanks for the heads-up! I'll test this later in the week, probably Thursday or Friday, unless I can get to it sooner. Will let you know. Thanks!

@igorclark
Copy link
Author

Hi again @sarmbruster, excuse the huge delay. I've finally had a chance to test this on a newer version with the merged code. I've set everything up on 3.4.6 and APOC 3.4.0.2 and it all seems to be working fine. Great news. Thanks! 👍

@sarmbruster
Copy link
Contributor

@igorclark thanks for testing again. Happy to hear this is resolved.

@jaredhancock31
Copy link

Using Neo4j 3.4.7 with APOC 3.4.0.3 and not seeing the IndexUpdateTransactionEventHandler come up on startup. Config matches up with documentation and the actual call to apoc.index.addAllNodes works properly.

2018-10-11 19:14:38.142+0000 INFO [o.n.k.i.DiagnosticsManager] dbms.auto_index.nodes.enabled=true
2018-10-11 19:14:38.146+0000 INFO [o.n.k.i.DiagnosticsManager] indexes.auto=assert
2018-10-11 19:14:38.149+0000 INFO [o.n.k.i.DiagnosticsManager] apoc.autoIndex.async=true
2018-10-11 19:14:38.149+0000 INFO [o.n.k.i.DiagnosticsManager] apoc.autoIndex.enabled=true
2018-10-11 19:14:38.150+0000 INFO [o.n.k.i.DiagnosticsManager] dbms.security.procedures.unrestricted=apoc.*

@sarmbruster any insight into how I can diagnose this issue?

@sarmbruster
Copy link
Contributor

@jaredhancock31 your issue is not related. Based on a private conversation I've identified yours as an duplicate of #778. Not the symptom but the root cause: use of URL.setStreamFactoryHandler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants