-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement text chunking processor with fixed token length and delimiter algorithm #607
feat: implement text chunking processor with fixed token length and delimiter algorithm #607
Commits on Mar 14, 2024
-
implement chunking processor and fixed token length
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cbc5423 - Browse repository at this point
Copy the full SHA cbc5423View commit details -
initialize node client for document chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3e2d365 - Browse repository at this point
Copy the full SHA 3e2d365View commit details -
initialize document chunking processor with analysis registry
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 89584a9 - Browse repository at this point
Copy the full SHA 89584a9View commit details -
chunker factory create with analysis registry
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 596fbf7 - Browse repository at this point
Copy the full SHA 596fbf7View commit details -
implement tokenizer in fixed token length algorithm with analysis reg…
…istry Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 636f907 - Browse repository at this point
Copy the full SHA 636f907View commit details -
add max token count parsing logic
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2ffd6b0 - Browse repository at this point
Copy the full SHA 2ffd6b0View commit details -
bug fix for non-existing index
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2195353 - Browse repository at this point
Copy the full SHA 2195353View commit details -
Configuration menu - View commit details
-
Copy full SHA for bdd418e - Browse repository at this point
Copy the full SHA bdd418eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 458420b - Browse repository at this point
Copy the full SHA 458420bView commit details -
unit tests for chunker factory
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 02420d7 - Browse repository at this point
Copy the full SHA 02420d7View commit details -
unit tests for chunker factory
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f8f60a1 - Browse repository at this point
Copy the full SHA f8f60a1View commit details -
add error message for chunker factory tests
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ff0587c - Browse repository at this point
Copy the full SHA ff0587cView commit details -
Configuration menu - View commit details
-
Copy full SHA for afc3189 - Browse repository at this point
Copy the full SHA afc3189View commit details -
Revert "implement evenly chunk"
This reverts commit 93dd2f4. Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 159e426 - Browse repository at this point
Copy the full SHA 159e426View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2405952 - Browse repository at this point
Copy the full SHA 2405952View commit details -
implement unit test for fixed token length chunker
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b930222 - Browse repository at this point
Copy the full SHA b930222View commit details -
add test cases in unit test for fixed token length chunker
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ecb8297 - Browse repository at this point
Copy the full SHA ecb8297View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d6d31fa - Browse repository at this point
Copy the full SHA d6d31faView commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for fafae93 - Browse repository at this point
Copy the full SHA fafae93View commit details -
Configuration menu - View commit details
-
Copy full SHA for d23c1fb - Browse repository at this point
Copy the full SHA d23c1fbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 39c6162 - Browse repository at this point
Copy the full SHA 39c6162View commit details -
bug fix for map type in document chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5714d1e - Browse repository at this point
Copy the full SHA 5714d1eView commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2f23c30 - Browse repository at this point
Copy the full SHA 2f23c30View commit details -
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 41cff0c - Browse repository at this point
Copy the full SHA 41cff0cView commit details -
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b0fda97 - Browse repository at this point
Copy the full SHA b0fda97View commit details -
add delimiter chunker processor
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b16e7c4 - Browse repository at this point
Copy the full SHA b16e7c4View commit details -
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 11e6a4b - Browse repository at this point
Copy the full SHA 11e6a4bView commit details -
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 81000f3 - Browse repository at this point
Copy the full SHA 81000f3View commit details -
basic unit tests for document chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f3b468f - Browse repository at this point
Copy the full SHA f3b468fView commit details -
fix tests for getProcessors in neural search
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for eea6fc8 - Browse repository at this point
Copy the full SHA eea6fc8View commit details -
add unit tests with string, map and nested map type for document chun…
…king processor Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ec6bf49 - Browse repository at this point
Copy the full SHA ec6bf49View commit details -
add unit tests for parameter valdiation in document chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3ae94e4 - Browse repository at this point
Copy the full SHA 3ae94e4View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c8dc66c - Browse repository at this point
Copy the full SHA c8dc66cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1e1ce1b - Browse repository at this point
Copy the full SHA 1e1ce1bView commit details -
integration tests for document chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b425122 - Browse repository at this point
Copy the full SHA b425122View commit details -
add back Run_Neural_Search.xml
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 31bf921 - Browse repository at this point
Copy the full SHA 31bf921View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 11d8f53 - Browse repository at this point
Copy the full SHA 11d8f53View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0662278 - Browse repository at this point
Copy the full SHA 0662278View commit details -
update integration test for cascade processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5e75e04 - Browse repository at this point
Copy the full SHA 5e75e04View commit details -
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 962ed32 - Browse repository at this point
Copy the full SHA 962ed32View commit details -
remove useless and apply spotless
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9487de5 - Browse repository at this point
Copy the full SHA 9487de5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 04043ca - Browse repository at this point
Copy the full SHA 04043caView commit details -
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 08bf2d1 - Browse repository at this point
Copy the full SHA 08bf2d1View commit details -
remove useless and apply spotless
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c7cc59f - Browse repository at this point
Copy the full SHA c7cc59fView commit details -
change logic of max chunk number
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0721f7a - Browse repository at this point
Copy the full SHA 0721f7aView commit details -
add max chunk limit into fixed token length algorithm
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d2bc576 - Browse repository at this point
Copy the full SHA d2bc576View commit details -
Support list<list<string>> type in embedding and extract validation l…
…ogic to common class Signed-off-by: zane-neo <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 120fae8 - Browse repository at this point
Copy the full SHA 120fae8View commit details -
fix unit tests for inference processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0af3024 - Browse repository at this point
Copy the full SHA 0af3024View commit details -
implement unit tests for unit tests with max_chunk_limit in fixed tok…
…en length Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e69bbe1 - Browse repository at this point
Copy the full SHA e69bbe1View commit details -
constructor for inference processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f21f40f - Browse repository at this point
Copy the full SHA f21f40fView commit details -
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4babd4d - Browse repository at this point
Copy the full SHA 4babd4dView commit details -
draft code for extending inference processor with document chunking p…
…rocessor Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 24f4980 - Browse repository at this point
Copy the full SHA 24f4980View commit details -
api refactor for document chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0b4036a - Browse repository at this point
Copy the full SHA 0b4036aView commit details -
remove nested list key for chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9ff6645 - Browse repository at this point
Copy the full SHA 9ff6645View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0e464fe - Browse repository at this point
Copy the full SHA 0e464feView commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d6b68ed - Browse repository at this point
Copy the full SHA d6b68edView commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a7a9260 - Browse repository at this point
Copy the full SHA a7a9260View commit details -
Revert InferenceProcessor.java
Signed-off-by: Yuye Zhu <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 39e8df5 - Browse repository at this point
Copy the full SHA 39e8df5View commit details -
revert changes in text embedding and sparse encoding processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2ee1923 - Browse repository at this point
Copy the full SHA 2ee1923View commit details -
implement chunk with map in document chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ca534ab - Browse repository at this point
Copy the full SHA ca534abView commit details -
Signed-off-by: Lu <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for eedd58d - Browse repository at this point
Copy the full SHA eedd58dView commit details -
implement max chunk logic in document chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b9bf3ef - Browse repository at this point
Copy the full SHA b9bf3efView commit details -
add initial value for max chunk limit in document chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2ac2f60 - Browse repository at this point
Copy the full SHA 2ac2f60View commit details -
bug fix in chunking processor: allow 0 max_chunk_limit
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6067044 - Browse repository at this point
Copy the full SHA 6067044View commit details -
implement overlap rate with big decimal
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 98d1ab3 - Browse repository at this point
Copy the full SHA 98d1ab3View commit details -
update max chunk limit in delimiter
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 79a637c - Browse repository at this point
Copy the full SHA 79a637cView commit details -
update parameter setting for fixed token length algorithm
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6da6395 - Browse repository at this point
Copy the full SHA 6da6395View commit details -
update max chunk limit implementation in chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 105d4a0 - Browse repository at this point
Copy the full SHA 105d4a0View commit details -
fix unit tests for fixed token length algorithm
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cd4eda7 - Browse repository at this point
Copy the full SHA cd4eda7View commit details -
spotless apply for document chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ceaa7d2 - Browse repository at this point
Copy the full SHA ceaa7d2View commit details -
initialize current chunk count
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 715c145 - Browse repository at this point
Copy the full SHA 715c145View commit details -
parameter validation for max chunk limit
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 75663e1 - Browse repository at this point
Copy the full SHA 75663e1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2e5dc00 - Browse repository at this point
Copy the full SHA 2e5dc00View commit details -
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d711390 - Browse repository at this point
Copy the full SHA d711390View commit details -
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 98124ee - Browse repository at this point
Copy the full SHA 98124eeView commit details -
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 353e88e - Browse repository at this point
Copy the full SHA 353e88eView commit details -
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for de554e6 - Browse repository at this point
Copy the full SHA de554e6View commit details -
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2453a79 - Browse repository at this point
Copy the full SHA 2453a79View commit details -
Signed-off-by: xinyual <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5f00107 - Browse repository at this point
Copy the full SHA 5f00107View commit details -
update unit tests for chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for fc94955 - Browse repository at this point
Copy the full SHA fc94955View commit details -
add more unit tests for chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 388fd43 - Browse repository at this point
Copy the full SHA 388fd43View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bb35c79 - Browse repository at this point
Copy the full SHA bb35c79View commit details -
Configuration menu - View commit details
-
Copy full SHA for b4d5fda - Browse repository at this point
Copy the full SHA b4d5fdaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 453dd35 - Browse repository at this point
Copy the full SHA 453dd35View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8c8fbaf - Browse repository at this point
Copy the full SHA 8c8fbafView commit details -
Configuration menu - View commit details
-
Copy full SHA for b588983 - Browse repository at this point
Copy the full SHA b588983View commit details -
Configuration menu - View commit details
-
Copy full SHA for 23dd769 - Browse repository at this point
Copy the full SHA 23dd769View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3ad78da - Browse repository at this point
Copy the full SHA 3ad78daView commit details -
fix update ut for fixed token length chunker
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for abb9bde - Browse repository at this point
Copy the full SHA abb9bdeView commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 82aa219 - Browse repository at this point
Copy the full SHA 82aa219View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3158e28 - Browse repository at this point
Copy the full SHA 3158e28View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 38d6e60 - Browse repository at this point
Copy the full SHA 38d6e60View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 584bc59 - Browse repository at this point
Copy the full SHA 584bc59View commit details -
implement chunk count wrapper for max chunk limit
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cbea5df - Browse repository at this point
Copy the full SHA cbea5dfView commit details -
rename variable end to nextDelimiterPosition
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c3c8ff2 - Browse repository at this point
Copy the full SHA c3c8ff2View commit details -
Configuration menu - View commit details
-
Copy full SHA for da055e7 - Browse repository at this point
Copy the full SHA da055e7View commit details -
update java doc for fixed token length algorithm
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d32840c - Browse repository at this point
Copy the full SHA d32840cView commit details -
reanme interface name and fixed token length algorithm name
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 830f665 - Browse repository at this point
Copy the full SHA 830f665View commit details -
update fixed token length algorithm configuration for integration tests
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1275bd6 - Browse repository at this point
Copy the full SHA 1275bd6View commit details -
make delimiter member variables static
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4e2f5d4 - Browse repository at this point
Copy the full SHA 4e2f5d4View commit details -
remove redundant set field value in execute method
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5c20b9b - Browse repository at this point
Copy the full SHA 5c20b9bView commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for addd37e - Browse repository at this point
Copy the full SHA addd37eView commit details -
add integration tests with more tokenizers
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7469153 - Browse repository at this point
Copy the full SHA 7469153View commit details -
bug fix: unit test failure due to invalid tokenizer
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ad00b88 - Browse repository at this point
Copy the full SHA ad00b88View commit details -
bug fix: token concatenation in fixed token length algorithm
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d4673d4 - Browse repository at this point
Copy the full SHA d4673d4View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7a589c6 - Browse repository at this point
Copy the full SHA 7a589c6View commit details -
track chunkCount within function
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e1f6c79 - Browse repository at this point
Copy the full SHA e1f6c79View commit details -
bug fix: allow white space as the delimiter
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bb372e6 - Browse repository at this point
Copy the full SHA bb372e6View commit details -
Signed-off-by: xinyual <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2538ab3 - Browse repository at this point
Copy the full SHA 2538ab3View commit details -
Signed-off-by: xinyual <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9c9172d - Browse repository at this point
Copy the full SHA 9c9172dView commit details -
Signed-off-by: xinyual <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d05b246 - Browse repository at this point
Copy the full SHA d05b246View commit details -
Signed-off-by: xinyual <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 04fc7d3 - Browse repository at this point
Copy the full SHA 04fc7d3View commit details -
Signed-off-by: xinyual <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7fe93c0 - Browse repository at this point
Copy the full SHA 7fe93c0View commit details -
move analysis_registry to non-runtime parameters
Signed-off-by: xinyual <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cefb0a6 - Browse repository at this point
Copy the full SHA cefb0a6View commit details -
Signed-off-by: xinyual <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 16038af - Browse repository at this point
Copy the full SHA 16038afView commit details -
Signed-off-by: xinyual <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d1d88dc - Browse repository at this point
Copy the full SHA d1d88dcView commit details -
Signed-off-by: xinyual <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for eb439bd - Browse repository at this point
Copy the full SHA eb439bdView commit details -
Signed-off-by: xinyual <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bc7f70c - Browse repository at this point
Copy the full SHA bc7f70cView commit details -
Signed-off-by: xinyual <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bb941cd - Browse repository at this point
Copy the full SHA bb941cdView commit details -
Signed-off-by: xinyual <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 77d4101 - Browse repository at this point
Copy the full SHA 77d4101View commit details -
fixed token length: re-implement with start and end offset
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 92f587f - Browse repository at this point
Copy the full SHA 92f587fView commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 94b1967 - Browse repository at this point
Copy the full SHA 94b1967View commit details -
fix document chunking processor IT
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 98944d1 - Browse repository at this point
Copy the full SHA 98944d1View commit details -
bug fix: adjust start, end content position in fixed token length alg…
…orithm Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8799fd0 - Browse repository at this point
Copy the full SHA 8799fd0View commit details -
update changelog for 2.x release
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5cda870 - Browse repository at this point
Copy the full SHA 5cda870View commit details -
Configuration menu - View commit details
-
Copy full SHA for c942b17 - Browse repository at this point
Copy the full SHA c942b17View commit details -
update default delimiter to be \n\n
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6461b32 - Browse repository at this point
Copy the full SHA 6461b32View commit details -
remove change log in 3.0 unreleased
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2a0a879 - Browse repository at this point
Copy the full SHA 2a0a879View commit details -
fix IT failure due to chunking processor rename
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for fbb4edb - Browse repository at this point
Copy the full SHA fbb4edbView commit details -
update javadoc for text chunking processor factory
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 050f163 - Browse repository at this point
Copy the full SHA 050f163View commit details -
adjust functions in chunker interface
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e61f295 - Browse repository at this point
Copy the full SHA e61f295View commit details -
move algorithm name definition to concrete chunker class
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4f87008 - Browse repository at this point
Copy the full SHA 4f87008View commit details -
update string formatted message for text chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c651b3e - Browse repository at this point
Copy the full SHA c651b3eView commit details -
update string formatted message for chunker factory
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0f45782 - Browse repository at this point
Copy the full SHA 0f45782View commit details -
update string formatted message for chunker parameter validator
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3d1b792 - Browse repository at this point
Copy the full SHA 3d1b792View commit details -
update java doc for delimiter algorithm
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5600b36 - Browse repository at this point
Copy the full SHA 5600b36View commit details -
support range double in chunker parameter validator
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3d962ca - Browse repository at this point
Copy the full SHA 3d962caView commit details -
update string formatted message for fixed token length algorithm
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 42de900 - Browse repository at this point
Copy the full SHA 42de900View commit details -
update sneaky throw with text chunking processor it
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6d4fe8c - Browse repository at this point
Copy the full SHA 6d4fe8cView commit details -
add word tokenizer restriction for fixed token length algorithm
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e666f17 - Browse repository at this point
Copy the full SHA e666f17View commit details -
update error message for multiple algorithms in text chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 958cc3b - Browse repository at this point
Copy the full SHA 958cc3bView commit details -
add comment in text chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 183e928 - Browse repository at this point
Copy the full SHA 183e928View commit details -
validate max chunk limit with util parameter class
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 09fccc1 - Browse repository at this point
Copy the full SHA 09fccc1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8ad1e51 - Browse repository at this point
Copy the full SHA 8ad1e51View commit details -
Configuration menu - View commit details
-
Copy full SHA for 489fe7b - Browse repository at this point
Copy the full SHA 489fe7bView commit details -
Configuration menu - View commit details
-
Copy full SHA for d67880e - Browse repository at this point
Copy the full SHA d67880eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 666e7b9 - Browse repository at this point
Copy the full SHA 666e7b9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9161c93 - Browse repository at this point
Copy the full SHA 9161c93View commit details -
implement a map from chunker name to constuctor function in chunker f…
…actory Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0f9c140 - Browse repository at this point
Copy the full SHA 0f9c140View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a574980 - Browse repository at this point
Copy the full SHA a574980View commit details -
remove get all chunkers in chunker factory
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 87679ad - Browse repository at this point
Copy the full SHA 87679adView commit details -
remove type check for parameter check for max token count
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 08dcd19 - Browse repository at this point
Copy the full SHA 08dcd19View commit details -
remove type check for parameter check for analysis registry
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f16882d - Browse repository at this point
Copy the full SHA f16882dView commit details -
implement parser and validator
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a969a60 - Browse repository at this point
Copy the full SHA a969a60View commit details -
Configuration menu - View commit details
-
Copy full SHA for 34348b3 - Browse repository at this point
Copy the full SHA 34348b3View commit details -
provide fixed token length as the default algorithm
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4153988 - Browse repository at this point
Copy the full SHA 4153988View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 06ca1c7 - Browse repository at this point
Copy the full SHA 06ca1c7View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3cf671d - Browse repository at this point
Copy the full SHA 3cf671dView commit details
Commits on Mar 15, 2024
-
use object nonnull and require nonnull
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5fe5eef - Browse repository at this point
Copy the full SHA 5fe5eefView commit details -
apply final to ingest document and chunk count
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f3decb4 - Browse repository at this point
Copy the full SHA f3decb4View commit details -
merge parameter validator into the parser
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3b8a3af - Browse repository at this point
Copy the full SHA 3b8a3afView commit details -
assign positive default value for max chunk limit
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 89c465c - Browse repository at this point
Copy the full SHA 89c465cView commit details -
validate supported chunker algorithm in text chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e7dffe0 - Browse repository at this point
Copy the full SHA e7dffe0View commit details -
update parameter setting of max chunk limit
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 463de71 - Browse repository at this point
Copy the full SHA 463de71View commit details -
add unit test with non list of string
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0a04012 - Browse repository at this point
Copy the full SHA 0a04012View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a524954 - Browse repository at this point
Copy the full SHA a524954View commit details -
add unit test for tokenization excpetion in fixed token length algorithm
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 10f6568 - Browse repository at this point
Copy the full SHA 10f6568View commit details -
tune method name in text chunking processor unit test
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3f41f37 - Browse repository at this point
Copy the full SHA 3f41f37View commit details -
tune method name in delimiter algorithm unit test
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e4bdabc - Browse repository at this point
Copy the full SHA e4bdabcView commit details -
add unit test for overlap rate too small in fixed token length algorithm
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9e37171 - Browse repository at this point
Copy the full SHA 9e37171View commit details -
tune method modifier for all classes
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 18ba1b1 - Browse repository at this point
Copy the full SHA 18ba1b1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2ce9840 - Browse repository at this point
Copy the full SHA 2ce9840View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2aea7a5 - Browse repository at this point
Copy the full SHA 2aea7a5View commit details -
tune exception type in parameter parser
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 63bbae9 - Browse repository at this point
Copy the full SHA 63bbae9View commit details -
Configuration menu - View commit details
-
Copy full SHA for aaee028 - Browse repository at this point
Copy the full SHA aaee028View commit details -
Configuration menu - View commit details
-
Copy full SHA for ab2a151 - Browse repository at this point
Copy the full SHA ab2a151View commit details -
include max chunk limit in both algorithms
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1eb12aa - Browse repository at this point
Copy the full SHA 1eb12aaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 40991a3 - Browse repository at this point
Copy the full SHA 40991a3View commit details -
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ea4bbb8 - Browse repository at this point
Copy the full SHA ea4bbb8View commit details -
update runtime max chunk limit in text chunking processor
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f0dfb57 - Browse repository at this point
Copy the full SHA f0dfb57View commit details -
Configuration menu - View commit details
-
Copy full SHA for cb4b39b - Browse repository at this point
Copy the full SHA cb4b39bView commit details -
implement test for multiple field max chunk limit exceed
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 98dd886 - Browse repository at this point
Copy the full SHA 98dd886View commit details -
tune methods name in text chunking proceesor unit tests
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d245a04 - Browse repository at this point
Copy the full SHA d245a04View commit details -
add unit tests for both algorithms with max chunk limit
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ad7ba25 - Browse repository at this point
Copy the full SHA ad7ba25View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9702168 - Browse repository at this point
Copy the full SHA 9702168View commit details
Commits on Mar 17, 2024
-
extract max chunk limit check to util class
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3d8c030 - Browse repository at this point
Copy the full SHA 3d8c030View commit details
Commits on Mar 18, 2024
-
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9931fae - Browse repository at this point
Copy the full SHA 9931faeView commit details -
Configuration menu - View commit details
-
Copy full SHA for fb6a961 - Browse repository at this point
Copy the full SHA fb6a961View commit details -
bug fix: only update runtime max chunk limit when enabled
Signed-off-by: yuye-aws <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 68fef4f - Browse repository at this point
Copy the full SHA 68fef4fView commit details