Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于多音字“莎”的拼音解析优化建议 #1670

Closed
1 task done
RekaDowney opened this issue Aug 11, 2021 · 2 comments
Closed
1 task done

关于多音字“莎”的拼音解析优化建议 #1670

RekaDowney opened this issue Aug 11, 2021 · 2 comments
Assignees
Labels
feature request Suggest an idea for this project

Comments

@RekaDowney
Copy link

RekaDowney commented Aug 11, 2021

Describe the feature and the current behavior/state.
这个多音字一般常见于人名,比如:莎士比亚丽莎等,当前发现在莎草这个词或者一些古文中读作suo
目前的拼音词典里将suo排在了sha前面,导致所有人名、地名的拼音转换全都不正确,是否将sha作为词典的第一顺位解析比较合适呢?

Will this change the current api? How?
当前的API不需要调整,只需要调整词典即可,当然用户可以自己调整拼音词典,但是这个多音字确实是sha1用的比较频繁,应用更广泛,因此才恳请在项目中调整。
修改词典文件data/dictionary/pinyin/pinyin.txt,将莎=suo1,sha1调整成莎=sha1,suo1,同时添加词典踏莎行=ta4,suo1,xing2

Who will benefit with this feature?
基本所有人都会受益于这次改动,毕竟莎草也已经定义到词典中了,剩余情况基本都是解释成sha

Are you willing to contribute it (Yes/No):
如果有必要的话,我可以提交PR。当然作者直接调整会更快些。

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
  • Python version: Java 8
  • HanLP version:
<dependency>
    <groupId>com.hankcs</groupId>
    <artifactId>hanlp</artifactId>
    <version>portable-1.8.2</version>
</dependency>

Any other info
代码段:

System.out.println(HanLP.convertToPinyinString("《罗密欧与朱丽叶》是莎士比亚创作的。", "_", false));

输出结果:

《_luo_mi_ou_yu_zhu_li_ye_》_shi_suo_shi_bi_ya_chuang_zuo_de_。
  • I've carefully completed this form.
@RekaDowney RekaDowney added the feature request Suggest an idea for this project label Aug 11, 2021
hankcs added a commit that referenced this issue Aug 11, 2021
@hankcs
Copy link
Owner

hankcs commented Aug 11, 2021

感谢反馈,已经修复,请参考上面的commit。
如果还有问题,欢迎重开issue。

@hankcs hankcs closed this as completed Aug 11, 2021
@hanlpbot
Copy link
Collaborator

This issue has been mentioned on Butterfly Effect. There might be relevant details there:

https://bbs.hankcs.com/t/python-3-9/4780/28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Suggest an idea for this project
Projects
None yet
Development

No branches or pull requests

3 participants