-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Add ignore missing field to text chunking processor #906
Comments
Left a few review comments in #907 |
can we change the field name to "skip_if_absent" or something of this sort? Problem with "ignore" is that it has ambiguity of not specifying what will happen in case text is empty. |
+1 to @martin-gaievski |
@martin-gaievski @vibrantvarun I do not think the field name "skip_if_absent" makes sense There are tons of OpenSearch ingest processors that currently have the ignore_missing field name Examples: I prefer ignore_missing to keep consistency between other ingest processors |
if other processors has field with similar functionality then I agree, this name makes sense, although semantically it's not the best. Thanks for checking config of other processors. |
Closing this issue as the PR has been merged. Thanks for your contribution @IanMenendez ! |
What solution would you like?
Currently, if a document is ingested by a text chunking processor and the input field is null then the text chunking processor will output an empty list. There is no way to ignore the text chunking processor if the field does not exist
The proposed solution is to add the ignore_missing field to text chunking processors.
If ignore_missing == true then fields that should be chunked but do not exist will not ingest an empty list, instead they will get skipped
example:
Processor:
Input:
Output:
if ignore_missing == false then it will continue to work as it currently does. Fields that do not exist will have an empty list as output
Processor:
Input:
Output:
The default value would be ignore_missing = false
What alternatives have you considered?
To my knowledge, there is no alternative to this
The text was updated successfully, but these errors were encountered: