Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement text chunking processor with fixed token length and delimiter algorithm #607

Merged

Commits on Mar 14, 2024

  1. implement chunking processor and fixed token length

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    cbc5423 View commit details
    Browse the repository at this point in the history
  2. initialize node client for document chunking processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    3e2d365 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    89584a9 View commit details
    Browse the repository at this point in the history
  4. chunker factory create with analysis registry

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    596fbf7 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    636f907 View commit details
    Browse the repository at this point in the history
  6. add max token count parsing logic

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    2ffd6b0 View commit details
    Browse the repository at this point in the history
  7. bug fix for non-existing index

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    2195353 View commit details
    Browse the repository at this point in the history
  8. change error log

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    bdd418e View commit details
    Browse the repository at this point in the history
  9. implement evenly chunk

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    458420b View commit details
    Browse the repository at this point in the history
  10. unit tests for chunker factory

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    02420d7 View commit details
    Browse the repository at this point in the history
  11. unit tests for chunker factory

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    f8f60a1 View commit details
    Browse the repository at this point in the history
  12. add error message for chunker factory tests

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    ff0587c View commit details
    Browse the repository at this point in the history
  13. resolve comments

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    afc3189 View commit details
    Browse the repository at this point in the history
  14. Revert "implement evenly chunk"

    This reverts commit 93dd2f4.
    
    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    159e426 View commit details
    Browse the repository at this point in the history
  15. add default value logic back

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    2405952 View commit details
    Browse the repository at this point in the history
  16. implement unit test for fixed token length chunker

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    b930222 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    ecb8297 View commit details
    Browse the repository at this point in the history
  18. support map type as an input

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    d6d31fa View commit details
    Browse the repository at this point in the history
  19. support map type as an input

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    fafae93 View commit details
    Browse the repository at this point in the history
  20. bug fix for map type

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    d23c1fb View commit details
    Browse the repository at this point in the history
  21. bug fix for map type

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    39c6162 View commit details
    Browse the repository at this point in the history
  22. bug fix for map type in document chunking processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    5714d1e View commit details
    Browse the repository at this point in the history
  23. remove system out println

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    2f23c30 View commit details
    Browse the repository at this point in the history
  24. add delimiter chunker

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    41cff0c View commit details
    Browse the repository at this point in the history
  25. add UT for delimiter chunker

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    b0fda97 View commit details
    Browse the repository at this point in the history
  26. add delimiter chunker processor

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    b16e7c4 View commit details
    Browse the repository at this point in the history
  27. add more UTs

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    11e6a4b View commit details
    Browse the repository at this point in the history
  28. add more UTs

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    81000f3 View commit details
    Browse the repository at this point in the history
  29. basic unit tests for document chunking processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    f3b468f View commit details
    Browse the repository at this point in the history
  30. fix tests for getProcessors in neural search

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    eea6fc8 View commit details
    Browse the repository at this point in the history
  31. add unit tests with string, map and nested map type for document chun…

    …king processor
    
    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    ec6bf49 View commit details
    Browse the repository at this point in the history
  32. Configuration menu
    Copy the full SHA
    3ae94e4 View commit details
    Browse the repository at this point in the history
  33. add back deleted xml file

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    c8dc66c View commit details
    Browse the repository at this point in the history
  34. restore xml file

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    1e1ce1b View commit details
    Browse the repository at this point in the history
  35. integration tests for document chunking processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    b425122 View commit details
    Browse the repository at this point in the history
  36. add back Run_Neural_Search.xml

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    31bf921 View commit details
    Browse the repository at this point in the history
  37. restore Run_Neural_Search.xml

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    11d8f53 View commit details
    Browse the repository at this point in the history
  38. add changelog

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    0662278 View commit details
    Browse the repository at this point in the history
  39. update integration test for cascade processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    5e75e04 View commit details
    Browse the repository at this point in the history
  40. add max chunk limit

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    962ed32 View commit details
    Browse the repository at this point in the history
  41. remove useless and apply spotless

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    9487de5 View commit details
    Browse the repository at this point in the history
  42. update error message

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    04043ca View commit details
    Browse the repository at this point in the history
  43. change field UT

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    08bf2d1 View commit details
    Browse the repository at this point in the history
  44. remove useless and apply spotless

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    c7cc59f View commit details
    Browse the repository at this point in the history
  45. change logic of max chunk number

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    0721f7a View commit details
    Browse the repository at this point in the history
  46. add max chunk limit into fixed token length algorithm

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    d2bc576 View commit details
    Browse the repository at this point in the history
  47. Support list<list<string>> type in embedding and extract validation l…

    …ogic to common class
    
    Signed-off-by: zane-neo <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    zane-neo authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    120fae8 View commit details
    Browse the repository at this point in the history
  48. fix unit tests for inference processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    0af3024 View commit details
    Browse the repository at this point in the history
  49. implement unit tests for unit tests with max_chunk_limit in fixed tok…

    …en length
    
    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    e69bbe1 View commit details
    Browse the repository at this point in the history
  50. constructor for inference processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    f21f40f View commit details
    Browse the repository at this point in the history
  51. use inference processor

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    4babd4d View commit details
    Browse the repository at this point in the history
  52. draft code for extending inference processor with document chunking p…

    …rocessor
    
    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    24f4980 View commit details
    Browse the repository at this point in the history
  53. api refactor for document chunking processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    0b4036a View commit details
    Browse the repository at this point in the history
  54. remove nested list key for chunking processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    9ff6645 View commit details
    Browse the repository at this point in the history
  55. remove unused function

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    0e464fe View commit details
    Browse the repository at this point in the history
  56. remove processor validator

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    d6b68ed View commit details
    Browse the repository at this point in the history
  57. remove processor validator

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    a7a9260 View commit details
    Browse the repository at this point in the history
  58. Revert InferenceProcessor.java

    Signed-off-by: Yuye Zhu <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    39e8df5 View commit details
    Browse the repository at this point in the history
  59. Configuration menu
    Copy the full SHA
    2ee1923 View commit details
    Browse the repository at this point in the history
  60. implement chunk with map in document chunking processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    ca534ab View commit details
    Browse the repository at this point in the history
  61. add default delimiter value

    Signed-off-by: Lu <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    Lu authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    eedd58d View commit details
    Browse the repository at this point in the history
  62. implement max chunk logic in document chunking processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    b9bf3ef View commit details
    Browse the repository at this point in the history
  63. Configuration menu
    Copy the full SHA
    2ac2f60 View commit details
    Browse the repository at this point in the history
  64. bug fix in chunking processor: allow 0 max_chunk_limit

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    6067044 View commit details
    Browse the repository at this point in the history
  65. implement overlap rate with big decimal

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    98d1ab3 View commit details
    Browse the repository at this point in the history
  66. update max chunk limit in delimiter

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    79a637c View commit details
    Browse the repository at this point in the history
  67. update parameter setting for fixed token length algorithm

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    6da6395 View commit details
    Browse the repository at this point in the history
  68. Configuration menu
    Copy the full SHA
    105d4a0 View commit details
    Browse the repository at this point in the history
  69. fix unit tests for fixed token length algorithm

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    cd4eda7 View commit details
    Browse the repository at this point in the history
  70. spotless apply for document chunking processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    ceaa7d2 View commit details
    Browse the repository at this point in the history
  71. initialize current chunk count

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    715c145 View commit details
    Browse the repository at this point in the history
  72. parameter validation for max chunk limit

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    75663e1 View commit details
    Browse the repository at this point in the history
  73. fix integration tests

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    2e5dc00 View commit details
    Browse the repository at this point in the history
  74. fix current UT

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    d711390 View commit details
    Browse the repository at this point in the history
  75. change delimiter UT

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    98124ee View commit details
    Browse the repository at this point in the history
  76. remove delimiter useless code

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    353e88e View commit details
    Browse the repository at this point in the history
  77. add more UT

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    de554e6 View commit details
    Browse the repository at this point in the history
  78. add UT for list inside map

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    2453a79 View commit details
    Browse the repository at this point in the history
  79. add UT for list inside map

    Signed-off-by: xinyual <[email protected]>
    Signed-off-by: yuye-aws <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    5f00107 View commit details
    Browse the repository at this point in the history
  80. update unit tests for chunking processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    fc94955 View commit details
    Browse the repository at this point in the history
  81. add more unit tests for chunking processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    388fd43 View commit details
    Browse the repository at this point in the history
  82. resolve code review comments

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    bb35c79 View commit details
    Browse the repository at this point in the history
  83. add java doc

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    b4d5fda View commit details
    Browse the repository at this point in the history
  84. update java doc

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    453dd35 View commit details
    Browse the repository at this point in the history
  85. update java doc

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    8c8fbaf View commit details
    Browse the repository at this point in the history
  86. fix import order

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    b588983 View commit details
    Browse the repository at this point in the history
  87. update java doc

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    23dd769 View commit details
    Browse the repository at this point in the history
  88. fix java doc error

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    3ad78da View commit details
    Browse the repository at this point in the history
  89. fix update ut for fixed token length chunker

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    abb9bde View commit details
    Browse the repository at this point in the history
  90. resolve code review comments

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    82aa219 View commit details
    Browse the repository at this point in the history
  91. resolve code review comments

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    3158e28 View commit details
    Browse the repository at this point in the history
  92. resolve code review comments

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    38d6e60 View commit details
    Browse the repository at this point in the history
  93. resolve code review comments

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    584bc59 View commit details
    Browse the repository at this point in the history
  94. implement chunk count wrapper for max chunk limit

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    cbea5df View commit details
    Browse the repository at this point in the history
  95. rename variable end to nextDelimiterPosition

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    c3c8ff2 View commit details
    Browse the repository at this point in the history
  96. adjust method place

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    da055e7 View commit details
    Browse the repository at this point in the history
  97. update java doc for fixed token length algorithm

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    d32840c View commit details
    Browse the repository at this point in the history
  98. Configuration menu
    Copy the full SHA
    830f665 View commit details
    Browse the repository at this point in the history
  99. Configuration menu
    Copy the full SHA
    1275bd6 View commit details
    Browse the repository at this point in the history
  100. make delimiter member variables static

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    4e2f5d4 View commit details
    Browse the repository at this point in the history
  101. remove redundant set field value in execute method

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    5c20b9b View commit details
    Browse the repository at this point in the history
  102. resolve code review comments

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    addd37e View commit details
    Browse the repository at this point in the history
  103. add integration tests with more tokenizers

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    7469153 View commit details
    Browse the repository at this point in the history
  104. bug fix: unit test failure due to invalid tokenizer

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    ad00b88 View commit details
    Browse the repository at this point in the history
  105. Configuration menu
    Copy the full SHA
    d4673d4 View commit details
    Browse the repository at this point in the history
  106. update chunker interface

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    7a589c6 View commit details
    Browse the repository at this point in the history
  107. track chunkCount within function

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    e1f6c79 View commit details
    Browse the repository at this point in the history
  108. bug fix: allow white space as the delimiter

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    bb372e6 View commit details
    Browse the repository at this point in the history
  109. fix fixed length chunker

    Signed-off-by: xinyual <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    2538ab3 View commit details
    Browse the repository at this point in the history
  110. fix delimiter chunker

    Signed-off-by: xinyual <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    9c9172d View commit details
    Browse the repository at this point in the history
  111. fix chunker factory

    Signed-off-by: xinyual <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    d05b246 View commit details
    Browse the repository at this point in the history
  112. fix UTs

    Signed-off-by: xinyual <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    04fc7d3 View commit details
    Browse the repository at this point in the history
  113. fix UT and chunker factory

    Signed-off-by: xinyual <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    7fe93c0 View commit details
    Browse the repository at this point in the history
  114. move analysis_registry to non-runtime parameters

    Signed-off-by: xinyual <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    cefb0a6 View commit details
    Browse the repository at this point in the history
  115. fix Uts

    Signed-off-by: xinyual <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    16038af View commit details
    Browse the repository at this point in the history
  116. avoid java doc change

    Signed-off-by: xinyual <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    d1d88dc View commit details
    Browse the repository at this point in the history
  117. move validate to commonUtlis

    Signed-off-by: xinyual <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    eb439bd View commit details
    Browse the repository at this point in the history
  118. remove useless function

    Signed-off-by: xinyual <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    bc7f70c View commit details
    Browse the repository at this point in the history
  119. change java doc

    Signed-off-by: xinyual <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    bb941cd View commit details
    Browse the repository at this point in the history
  120. fix Document process ut

    Signed-off-by: xinyual <[email protected]>
    xinyual authored and yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    77d4101 View commit details
    Browse the repository at this point in the history
  121. Configuration menu
    Copy the full SHA
    92f587f View commit details
    Browse the repository at this point in the history
  122. update exception message

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    94b1967 View commit details
    Browse the repository at this point in the history
  123. fix document chunking processor IT

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    98944d1 View commit details
    Browse the repository at this point in the history
  124. bug fix: adjust start, end content position in fixed token length alg…

    …orithm
    
    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    8799fd0 View commit details
    Browse the repository at this point in the history
  125. update changelog for 2.x release

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    5cda870 View commit details
    Browse the repository at this point in the history
  126. rename processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    c942b17 View commit details
    Browse the repository at this point in the history
  127. update default delimiter to be \n\n

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    6461b32 View commit details
    Browse the repository at this point in the history
  128. remove change log in 3.0 unreleased

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    2a0a879 View commit details
    Browse the repository at this point in the history
  129. fix IT failure due to chunking processor rename

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    fbb4edb View commit details
    Browse the repository at this point in the history
  130. update javadoc for text chunking processor factory

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    050f163 View commit details
    Browse the repository at this point in the history
  131. adjust functions in chunker interface

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    e61f295 View commit details
    Browse the repository at this point in the history
  132. move algorithm name definition to concrete chunker class

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    4f87008 View commit details
    Browse the repository at this point in the history
  133. Configuration menu
    Copy the full SHA
    c651b3e View commit details
    Browse the repository at this point in the history
  134. update string formatted message for chunker factory

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    0f45782 View commit details
    Browse the repository at this point in the history
  135. Configuration menu
    Copy the full SHA
    3d1b792 View commit details
    Browse the repository at this point in the history
  136. update java doc for delimiter algorithm

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    5600b36 View commit details
    Browse the repository at this point in the history
  137. support range double in chunker parameter validator

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    3d962ca View commit details
    Browse the repository at this point in the history
  138. Configuration menu
    Copy the full SHA
    42de900 View commit details
    Browse the repository at this point in the history
  139. update sneaky throw with text chunking processor it

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    6d4fe8c View commit details
    Browse the repository at this point in the history
  140. Configuration menu
    Copy the full SHA
    e666f17 View commit details
    Browse the repository at this point in the history
  141. Configuration menu
    Copy the full SHA
    958cc3b View commit details
    Browse the repository at this point in the history
  142. add comment in text chunking processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    183e928 View commit details
    Browse the repository at this point in the history
  143. validate max chunk limit with util parameter class

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    09fccc1 View commit details
    Browse the repository at this point in the history
  144. update comments

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    8ad1e51 View commit details
    Browse the repository at this point in the history
  145. update comments

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    489fe7b View commit details
    Browse the repository at this point in the history
  146. update java doc

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    d67880e View commit details
    Browse the repository at this point in the history
  147. update java doc

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    666e7b9 View commit details
    Browse the repository at this point in the history
  148. make parameter final

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    9161c93 View commit details
    Browse the repository at this point in the history
  149. implement a map from chunker name to constuctor function in chunker f…

    …actory
    
    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    0f9c140 View commit details
    Browse the repository at this point in the history
  150. bug fix in chunker factory

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    a574980 View commit details
    Browse the repository at this point in the history
  151. remove get all chunkers in chunker factory

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    87679ad View commit details
    Browse the repository at this point in the history
  152. remove type check for parameter check for max token count

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    08dcd19 View commit details
    Browse the repository at this point in the history
  153. Configuration menu
    Copy the full SHA
    f16882d View commit details
    Browse the repository at this point in the history
  154. implement parser and validator

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    a969a60 View commit details
    Browse the repository at this point in the history
  155. update comment

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    34348b3 View commit details
    Browse the repository at this point in the history
  156. provide fixed token length as the default algorithm

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    4153988 View commit details
    Browse the repository at this point in the history
  157. adjust exception message

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    06ca1c7 View commit details
    Browse the repository at this point in the history
  158. adjust exception message

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    3cf671d View commit details
    Browse the repository at this point in the history

Commits on Mar 15, 2024

  1. use object nonnull and require nonnull

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    5fe5eef View commit details
    Browse the repository at this point in the history
  2. apply final to ingest document and chunk count

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    f3decb4 View commit details
    Browse the repository at this point in the history
  3. merge parameter validator into the parser

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    3b8a3af View commit details
    Browse the repository at this point in the history
  4. assign positive default value for max chunk limit

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    89c465c View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    e7dffe0 View commit details
    Browse the repository at this point in the history
  6. update parameter setting of max chunk limit

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    463de71 View commit details
    Browse the repository at this point in the history
  7. add unit test with non list of string

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    0a04012 View commit details
    Browse the repository at this point in the history
  8. add unit test with null input

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    a524954 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    10f6568 View commit details
    Browse the repository at this point in the history
  10. tune method name in text chunking processor unit test

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    3f41f37 View commit details
    Browse the repository at this point in the history
  11. tune method name in delimiter algorithm unit test

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    e4bdabc View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    9e37171 View commit details
    Browse the repository at this point in the history
  13. tune method modifier for all classes

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    18ba1b1 View commit details
    Browse the repository at this point in the history
  14. tune code

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    2ce9840 View commit details
    Browse the repository at this point in the history
  15. tune code

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    2aea7a5 View commit details
    Browse the repository at this point in the history
  16. tune exception type in parameter parser

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    63bbae9 View commit details
    Browse the repository at this point in the history
  17. tune comment

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    aaee028 View commit details
    Browse the repository at this point in the history
  18. tune comment

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    ab2a151 View commit details
    Browse the repository at this point in the history
  19. include max chunk limit in both algorithms

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    1eb12aa View commit details
    Browse the repository at this point in the history
  20. tune comment

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    40991a3 View commit details
    Browse the repository at this point in the history
  21. allow 0 for max chunk limit

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    ea4bbb8 View commit details
    Browse the repository at this point in the history
  22. update runtime max chunk limit in text chunking processor

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    f0dfb57 View commit details
    Browse the repository at this point in the history
  23. tune code for chunker

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    cb4b39b View commit details
    Browse the repository at this point in the history
  24. implement test for multiple field max chunk limit exceed

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    98dd886 View commit details
    Browse the repository at this point in the history
  25. tune methods name in text chunking proceesor unit tests

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    d245a04 View commit details
    Browse the repository at this point in the history
  26. add unit tests for both algorithms with max chunk limit

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    ad7ba25 View commit details
    Browse the repository at this point in the history
  27. optimize code

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    9702168 View commit details
    Browse the repository at this point in the history

Commits on Mar 17, 2024

  1. extract max chunk limit check to util class

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 17, 2024
    Configuration menu
    Copy the full SHA
    3d8c030 View commit details
    Browse the repository at this point in the history

Commits on Mar 18, 2024

  1. resolve code review comments

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    9931fae View commit details
    Browse the repository at this point in the history
  2. fix unit tests

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    fb6a961 View commit details
    Browse the repository at this point in the history
  3. bug fix: only update runtime max chunk limit when enabled

    Signed-off-by: yuye-aws <[email protected]>
    yuye-aws committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    68fef4f View commit details
    Browse the repository at this point in the history