Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper treatment of PUNCTs for KNP #48

Merged
merged 11 commits into from
Apr 22, 2020

Conversation

KoichiYasuoka
Copy link
Contributor

括弧始-PUNCTs, such as ”(" "「" "『" and so on, are not suitable for head tokens in Universal Dependencies.

括弧始-PUNCTs are not suitable for head tokens in Universal Dependencies.
@KoichiYasuoka
Copy link
Contributor Author

Oops, "build (juman)" has failed. How do I do?

@KoichiYasuoka
Copy link
Contributor Author

I'm vague why JUMAN treats "(※" as single midashi with two tokens...

@KoichiYasuoka
Copy link
Contributor Author

OK, now I understand that "(※" is treated as two tokens by JUMAN, and as single 顔文字 by KNP.

$ echo "(※" | juman
( ( ( 特殊 1 括弧始 3 * 0 * 0 NIL
※ ※ ※ 特殊 1 記号 5 * 0 * 0 NIL
EOS
$ echo "(※" | juman | knp -tab
# S-ID:1 KNP:4.19-CF1.1 DATE:2020/04/22 SCORE:-0.30231
* -1D <文頭><文末><体言><用言:判><体言止><レベル:C><区切:5-5><ID:(文末)><裸名詞><提題受:30><主節><状態述語>
+ -1D <文頭><文末><体言><用言:判><体言止><レベル:C><区切:5-5><ID:(文末)><裸名詞><提題受:30><主節><状態述語><判定詞><名詞項候補><先行詞候補><用言代表表記:(※/(※><時制-無時制><格解析結果:(※/(※:判0:ガ/U/-/-/-/->
(※ (※ (※ 特殊 1 記号 5 * 0 * 0 NIL <形態素連結-顔文字><顔文字:自動認識><文頭><文末><記英数カ><英記号><記号><自立><内容語><タグ単位始><文節始><文節主辞>
EOS

@tamuhey
Copy link
Contributor

tamuhey commented Apr 22, 2020

Thank you for your contribution!
Do you have any examples that this PR enables you to handle correctly?
If any, please add them as test cases.

@tamuhey tamuhey self-requested a review April 22, 2020 03:33
@KoichiYasuoka
Copy link
Contributor Author

KoichiYasuoka commented Apr 22, 2020

I think test_knp.py already includes very good example "(※" for this PR. It was very hard task for me to pass the test "(※" with 全角, and I got it.

@tamuhey
Copy link
Contributor

tamuhey commented Apr 22, 2020

I think _modify_head_punct is not tested enough, so please add test cases into test_dependency_parser.py

@tamuhey
Copy link
Contributor

tamuhey commented Apr 22, 2020

Thank you very much, @KoichiYasuoka!

@tamuhey tamuhey merged commit 819fe11 into PKSHATechnology-Research:master Apr 22, 2020
Copy link
Contributor

@tamuhey tamuhey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@tamuhey tamuhey added the bug Something isn't working label Apr 22, 2020
@tamuhey
Copy link
Contributor

tamuhey commented Apr 22, 2020

@KoichiYasuoka I've publishe new version of camphr, so please check it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants