-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for importing parent property. #97
Added support for importing parent property. #97
Conversation
46ce7b5
to
006a502
Compare
Hi @missinglink, Do you have the opportunity to review this pull-request? |
Hi @orangejulius, Do you have the opportunity to review this pull-request? |
lib/streams/documentStream.js
Outdated
@@ -135,6 +135,15 @@ function processRecord(record, next_uid, stats) { | |||
|
|||
const pelias_document = new peliasModel.Document( source, layer, model_id ); | |||
|
|||
const parent = getCaseInsensitiveAsJSON( `parent_json`, record ); | |||
if (parent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make the code a bit more robust and readable with lodash
, such as this:
if (_.isPlainObject(parent)) {
_.each(parent, (value, field) => {
pelias_document.addParent(field, value.name, value.id, value.abbr, value.source)
})
}
It has a couple advantages:
- explicitly check the type of
parent
is an object which prevents a class of errors when theparent_json
object is incorrectly specified. - use the
_.each
function to simplify iteration over the properties and make it a bit more succinct and readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where parent_json
is incorrectly formatted there will be no error returned to the user, this may cause frustration if the import is long-running as it will take time to detect the error and require re-running the import.
Should we detect a situation where the parent_json
column is provided but contains either malformed or invalid values? In this situation should we panic or emit an error/warning message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the validation:
Right now we are only showing the warnings in the logs and ignore the records which are malformed, this is the similar behaviour as with the other fields.
Following warning is shown for the malformed records:
Record import failed on line 26 due to invalid document type, expecting: string got: undefined.
In addition to that we have implemented the additional validation in the model to check that the parent fields are valid, the PR is released in [email protected]
.
I have upgraded the version of pelias-model in csv-importer as part of this PR, so that if we use the non-existing parent name the record will be ignored with following warning.
Record import failed on line 36 due to invalid property: land, should be one of: continent,country,dependency,macroregion,region,macrocounty,county,borough,locality,localadmin,neighbourhood,postalcode,ocean,marinearea,empire.
The way the csv-importer works is, it try to import all the records, give the warnings for the records it fails to import and give the following report at the end of import.
2022-08-04T12:38:39.408Z - info: [csv-importer] Finished import from /Users/me/1659614476347.csv, goodRecordCount=60371 badRecordCount=326
The report show that the warn and not fail
was the intentional implementation. Don't know why it was done like that. 🤔
This is not specific to parent_json
field, i think changing this behaviour may affect the other users of csv-importer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense 👍 this sounds sufficient
I've not run the code, so just wondering what the expected behaviour is here. Is this intended to be run with |
The In the case where multiple parent values are provided for the same field name, we store all copies in the elastic index, making them all searchable, but only the first entry is used for displaying the label. Your proposed JSON data model only permits a singular entry, Can we please wrap the values in an array (such as |
In our case we don't do any adminLookup, hance we don't use the PIP service. We have own admin units which we set in the parent_json field.
If we at later stage need both the parent field and PIP service then we will do it in separate PR.
|
Updating the
Updated the code to accept the new model ✅ |
…it tests, Upgraded the palias-model.
Actually, I think it's fine that it can be used with or without adminLookup. I assume in this case that the values in the CSV will be added first, and therefore be the 'main' value used in the response, while the values from admin lookup will be secondary, and only used for search. |
Great, looks good thanks, could you please add some documentation info in the README and it's good to merge 👍 |
@missinglink Added the documentation. |
Thanks! this was released on npm as |
👋
Here's the reason for this change 🚀
We have data from multiple sources with the custom data formats like NeTEx, SOSI, GeoJson, CSV etc.
In order to import that data to Pelias elasticsearch, we have decided to convert the data to single format and store it in our cloud storage which we will be importing to the Pelias elasticsearch later on.
Instead of writing aur own importer / importers, we have decided to convert the data from all the sources to CSV and then use the
pelias-csv-importer
to import it in pelias elasticsearch.The
pelias-csv-importer
is somewhat limited in supporting the fields that can be imported through it, although thepelias-model
support alot of fields.We want to contribute to the
pelias-csv-importer
to support more fields.Here's what actually got changed 👏
In this PR we are adding the support to import the
parent
field. We will be adding support for more fields later on.Added unit test for testing the import of new
parent
field.We have named the new field
parent_json
, the structure of json object will beIn the CSV file it will look like the other json fields like addendum_json.
"{""country"":{""id"":""NOR"",""name"":""NO""},""locality"":{""id"":""0301"",""name"":""Oslo""},""county"":{""id"":"":03"",""name"":""Oslo""}}"