Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix irreproducible inference due to the imprecise floating point values in xgboost's get_dump #500

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

pltb
Copy link

@pltb pltb commented Jul 23, 2024

Tries to address the issues of using the output of the get_dump method in xgboost as the import format by parsing the output of save_model instead

More context here #497

TODO:

  • parse using ObjectParser
  • add more tests
  • review from the plugin's authors

@pltb pltb changed the title [DRAFT] fix xgboost import [DRAFT] Allow loading xgboost models from the json output of the save_model method Jul 23, 2024
@pltb pltb changed the title [DRAFT] Allow loading xgboost models from the json output of the save_model method Fix irreproducible inference due to the imprecise floating point values in xgboost's get_dump Jul 30, 2024
@pltb pltb marked this pull request as ready for review July 30, 2024 18:46
@wrigleyDan
Copy link
Contributor

wrigleyDan commented Oct 1, 2024

Summary of our chat at Haystack:

  • Rename the parser to model/xgboost+json+raw
  • Add the switch-case for the additional model parser
  • Preserve existing functionality
  • Add tests
  • Documentation (Daniel)
  • Add notebook to hello-ltr Github repo (Daniel)

@pltb
Copy link
Author

pltb commented Oct 1, 2024

@wrigleyDan a small correction: shall we rename it to model/xgboost+json+raw instead of v2?

@wrigleyDan
Copy link
Contributor

@wrigleyDan a small correction: shall we rename it to model/xgboost+json+raw instead of v2?

Thanks, copied the old name and forgot to change it. Well spotted, corrected it in the comment above.

@pltb
Copy link
Author

pltb commented Oct 5, 2024

Hi @wrigleyDan, I've pushed a few changes that should cover the code-related action points that we've outlined:

  • added 8 tests for some basic parsing cases (I'd love to see more extensive testing here, but I'm not sure if this should be a blocker for merging this version of the parser)
  • added some extra validation and refactored the code a little bit
  • renamed the parser like we've agreed
  • registered the parser in the plugin

Please let me know if I can provide any assistance with the docs and/or the notebook, I'd be glad to help :D

@wrigleyDan
Copy link
Contributor

wrigleyDan commented Oct 8, 2024

Thanks for making the changes so quickly @pltb!
There are a couple of tasks failing when executing the tests around checkStyle:

> Task :checkstyleTest
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:56: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:115: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:173: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:227: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:282: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:336: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:397: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:458: Line is longer than 140 characters (found 152). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/test/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParserTests.java:511: Line is longer than 140 characters (found 152). [LineLength]

> Task :checkstyleTest FAILED

> Task :checkstyleMain
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:8:34: Using the '.*' form of import should be avoided - org.elasticsearch.xcontent.*. [AvoidStarImport]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:12:17: Using the '.*' form of import should be avoided - java.util.*. [AvoidStarImport]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:42: Line is longer than 140 characters (found 156). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:66: Line is longer than 140 characters (found 149). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:120: Line is longer than 140 characters (found 159). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:121: Line is longer than 140 characters (found 178). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:170: Line is longer than 140 characters (found 155). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:177:9: Redundant 'public' modifier. [RedundantModifier]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:197: Line is longer than 140 characters (found 149). [LineLength]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:217:9: Redundant 'public' modifier. [RedundantModifier]
[ant:checkstyle] [ERROR] /path/to/elasticsearch-learning-to-rank/src/main/java/com/o19s/es/ltr/ranker/parser/XGBoostRawJsonParser.java:248:9: Redundant 'public' modifier. [RedundantModifier]

> Task :checkstyleMain FAILED

Could you have a look and resolve these so we don't see any checkStyle violations when running the tests?

Thanks!

@pltb
Copy link
Author

pltb commented Oct 9, 2024

@wrigleyDan ah I totally missed it 😅
Fixed, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants