-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: use parameters to allow special characters in neo4j cypher statement #382
fix: use parameters to allow special characters in neo4j cypher statement #382
Conversation
@@ -232,7 +212,7 @@ def _create_indices(self, node_file: str) -> None: | |||
LOGGER.info('Creating indices. (Existing indices will be ignored)') | |||
|
|||
with open(node_file, 'r', encoding='utf8') as node_csv: | |||
for node_record in csv.DictReader(node_csv): | |||
for node_record in pandas.read_csv(node_csv).to_dict(orient='records'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Record field types are string when using csv.DictReader
which is inadequate when switching to use parameter. It will create issue that property in numeric type in neo4j is suppled with string value.
pandas
has ability to infer data type when reading csv. Parameters generated from pandas record for neo4j will have correct type that can prevent the issue.
template = Template(""" | ||
MERGE (node:{{ LABEL }} {key: $KEY}) | ||
ON CREATE SET {{ PROP_BODY }} | ||
{% if update %} ON MATCH SET {{ PROP_BODY }} {% endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace string.Template
with jinja.Template
in which if-condition is supported. Putting condition clause in template is to help readability
Also, {{ }}
syntax in jinja is different from $VAR
syntax in cypher (which is also the syntax for string.Template
, source of confusion). The distinction is helpful for readability too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the info
|
||
template_params[k] = v | ||
props.append('{id}.{key} = {val}'.format(id=identifier, key=k, val=f'${k}')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The property value is taken out wherein a param token (i.e. $param
) is put in.
@@ -3,3 +3,4 @@ hive,gold,test_schema,test_table1,"1st test table","tag1,tag2,pii,high_quality", | |||
dynamo,gold,test_schema,test_table2,"2nd test table","high_quality,recommended",false, | |||
hive,gold,test_schema,test_view1,"1st test view","tag1",true, | |||
hive,gold,test_schema,test_table3,"3rd test","needs_documentation",false, | |||
hive,gold,test_schema,"test's_table4","4th test","needs_documentation",false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a test data from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @instazackwu
3cb86fe
to
d7bb9df
Compare
@allisonsuarez will do some test in Lyft staging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @allisonsuarez for quick test, lgtm as well with a few small nits
# Setting field_size_limit to solve the error below | ||
# _csv.Error: field larger than field limit (131072) | ||
# https://stackoverflow.com/a/54517228/5972935 | ||
csv.field_size_limit(int(ctypes.c_ulong(-1).value // 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, we no longer set this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. csv
module is no longer used in this module therefore taking it out.
I'm wondering if csv.field_size_limit()
is a global setting. If that's the case, it could impact other modules that uses csv
though. I will put it back to make sure it's backward compatible. It could be better if it's moved into somewhere that is "global" later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, let's keep it here for now and we could remove it later. cc @jinhyukchang
MERGE (n1)-[r1:$TYPE]->(n2)-[r2:$REVERSE_TYPE]->(n1) | ||
$PROP_STMT RETURN n1.key, n2.key""") | ||
|
||
CREATE_UNIQUE_INDEX_TEMPLATE = Template('CREATE CONSTRAINT ON (node:${LABEL}) ASSERT node.key IS UNIQUE') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @rogertangcn , any reason to delete all the templates constant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, nvm, read your latter comment
template = Template(""" | ||
MERGE (node:{{ LABEL }} {key: $KEY}) | ||
ON CREATE SET {{ PROP_BODY }} | ||
{% if update %} ON MATCH SET {{ PROP_BODY }} {% endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the info
requirements.txt
Outdated
@@ -57,6 +57,8 @@ unicodecsv==0.14.1,<1.0 | |||
|
|||
httplib2>=0.18.0 | |||
unidecode | |||
Jinja2==2.11.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the databuilder is a lib, we should make it a range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about larger than 2.10.0 and less than 2.12 for now
requirements.txt
Outdated
@@ -57,6 +57,8 @@ unicodecsv==0.14.1,<1.0 | |||
|
|||
httplib2>=0.18.0 | |||
unidecode | |||
Jinja2==2.11.2 | |||
pandas==1.1.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about larger than 0.21.0 and less than 1.2.0
@@ -3,3 +3,4 @@ hive,gold,test_schema,test_table1,"1st test table","tag1,tag2,pii,high_quality", | |||
dynamo,gold,test_schema,test_table2,"2nd test table","high_quality,recommended",false, | |||
hive,gold,test_schema,test_view1,"1st test view","tag1",true, | |||
hive,gold,test_schema,test_table3,"3rd test","needs_documentation",false, | |||
hive,gold,test_schema,"test's_table4","4th test","needs_documentation",false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @instazackwu
Signed-off-by: Roger Tang <[email protected]>
d7bb9df
to
3653b25
Compare
thanks @rogertangcn for the update! @instazackwu the pr is going to merge, let us know if it resolves your issue! |
…ement (#382) Signed-off-by: Roger Tang <[email protected]> Co-authored-by: Roger Tang <[email protected]> Signed-off-by: dikshathakur3119 <[email protected]>
Summary of Changes
This change fixes the issue of Use query parameters rather than string building in Neo4jCsvPublisher
csv
package withpandas
so that it can infer data type out of csv filesstring.Template
withjinja2.Template
so that it can support conditional tokens.Tests
N/A
Documentation
N/A
CheckList
Make sure you have checked all steps below to ensure a timely review.
make test