Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apoc.import.graphml doesn't work for edges #2659

Closed
rohankharwar opened this issue Mar 25, 2022 · 1 comment · Fixed by #2853
Closed

Apoc.import.graphml doesn't work for edges #2659

rohankharwar opened this issue Mar 25, 2022 · 1 comment · Fixed by #2853

Comments

@rohankharwar
Copy link
Contributor

Problem Statement

I am trying to export some data using the apoc.export.csv.data and then import using apoc.import.graphml.
When I run apoc.export.csv.data for extracting both nodes and rels in a single file and then import that single file using apoc.import.graphml, it works fine.

But for performance and parallelize it I am trying to run for nodes and rels separately.
So we get 2 files - one for nodes and other for relationships.
Then import each the nodes file first and then the relationship file.
However the problem is apoc.import.graphml works fine for the nodes but then fails for rels giving error as java null pointer exception.
The Relationship file does have the source and target node ids so it should be able to load the data.

I can easily reproduce this using the following queries.

Reproducible steps

//Extract node files
MATCH (person:Person)-[actedIn:ACTED_IN]->(movie:Movie)
WHERE person.name starts with "K"
with collect(person)+collect(movie) as node
CALL apoc.export.graphml.data(node, [], "/Users/rohankharwar/Downloads/movies_nodes.graphml", {stream: false})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data;

//Extract Relationship files
MATCH (person:Person)-[actedIn:ACTED_IN]->(movie:Movie)
WHERE person.name starts with "K"
with collect(person)+collect(movie) as node, collect(actedIn) as rels
CALL apoc.export.graphml.data([], rels, "/Users/rohankharwar/Downloads/movies_rels.graphml", {stream: false})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data;

//graphml file import nodes
CALL apoc.import.graphml("file:///Users/rohankharwar/Downloads/movies_nodes.graphml", {readLabels:true})

//graphml file import rels
CALL apoc.import.graphml("file:///Users/rohankharwar/Downloads/movies_rels.graphml", {readLabels:true})

We get the following error (attached image)

Simple Dataset (where it's possibile)

I just used the movies dataset.

Versions

  • OS: Linux
  • Neo4j: 4.3.10 , 4.4.4
  • Neo4j-Apoc: 4.3.0.5, 4.4.0.3

nullpointerexception

@robobenklein
Copy link
Contributor

I can confirm this problem only occurs when readLabels is true:

When it is false the ratio of edges to nodes is about 1.0:
image

When set to true, it ends up missing almost half the edges:
image

My dataset is a DAG, with every node having exactly one parent edge, so it's easy to tell that the graph is broken when imported with readLabels.

vga91 added a commit to vga91/neo4j-apoc-procedures that referenced this issue May 27, 2022
vga91 added a commit to vga91/neo4j-apoc-procedures that referenced this issue May 27, 2022
ncordon pushed a commit that referenced this issue Jun 15, 2022
ncordon added a commit that referenced this issue Jun 16, 2022
neo4j-oss-build added a commit that referenced this issue Jun 16, 2022
gem-neo4j pushed a commit to gem-neo4j/neo4j-apoc-procedures that referenced this issue Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants