语义角色标注

注意： 使用前需要下载 SRL 模型，然后将解压出的 3 个模型文件

CoNLL2009-ST-Chinese-ALL.anna-3.3.parser.model

CoNLL2009-ST-Chinese-ALL.anna-3.3.postagger.model

CoNLL2009-ST-Chinese-ALL.anna-3.3.srl-4.1.srl.model

放到 data/model/srl 目录下。

单句语义角色标注

List<SRLPredicate> SRL(String sentence)

对单个句子进行语义角色标注，返回谓词列表。语义角色标注过程会识别出句子中所有的谓词和对应的论元，每一个谓词 SRLPredicate 对象包含其自身的论元。

参数

sentence: 待分析的句子

示例

String sentence = "全球最大石油生产商沙特阿美（Saudi Aramco）周三（7月21日）证实，公司的一些文件遭泄露。";
List<SRLPredicate> predicateList = AHANLP.SRL(sentence);
    for (SRLPredicate p : predicateList)
        System.out.println(p);

Sentence: 全球最大石油生产商沙特阿美（Saudi Aramco）周三（7月21日）证实，公司的一些文件遭泄露。
Predicate: 证实    [36, 37]
    TMP : 周三（7月21日）    [27, 35]
    A0 : 全球最大石油生产商沙特阿美（Saudi Aramco）    [0, 26]
    A1 : 公司的一些文件遭泄露    [39, 48]

Sentence: 全球最大石油生产商沙特阿美（Saudi Aramco）周三（7月21日）证实，公司的一些文件遭泄露。
Predicate: 遭    [46, 46]
    A0 : 公司的一些文件    [39, 45]
    A1 : 泄露    [47, 48]

每一个谓词和对应的论元均可以通过 getLocalOffset() 和 getLocalIdxs() 获取到在句子原文中对应的句内偏移量和句内索引。示例代码如下：

String sentence = "全球最大石油生产商沙特阿美（Saudi Aramco）周三（7月21日）证实，公司的一些文件遭泄露。";
List<SRLPredicate> predicateList = AHANLP.SRL(sentence);
System.out.println("解析句子： " + sentence);
for (SRLPredicate p : predicateList) {
    System.out.print("谓词: " + p.getPredicate());
    System.out.print("\t\t句内偏移量: " + p.getLocalOffset());
    System.out.print("\t句内索引： [" + p.getLocalIdxs()[0] + ", " + p.getLocalIdxs()[1] + "]\n");
    for (Arg arg : p.getArguments()) {
        System.out.print("\t" + arg.getLabel() + ": " + arg.getSpan());
        System.out.print("\t\t句内偏移量: " + arg.getLocalOffset());
        System.out.print("\t句内索引: [" + arg.getLocalIdxs()[0] + ", " + arg.getLocalIdxs()[1] + "]\n");
    }
    System.out.println();
}

解析句子： 全球最大石油生产商沙特阿美（Saudi Aramco）周三（7月21日）证实，公司的一些文件遭泄露。
谓词: 证实    句内偏移量: 36    句内索引： [36, 37]
    TMP: 周三（7月21日）                            句内偏移量: 27    句内索引: [27, 35]
    A0: 全球最大石油生产商沙特阿美（Saudi Aramco）    句内偏移量: 0    句内索引: [0, 26]
    A1: 公司的一些文件遭泄露                         句内偏移量: 39    句内索引: [39, 48]

谓词: 遭    句内偏移量: 46    句内索引： [46, 46]
    A0: 公司的一些文件    句内偏移量: 39    句内索引: [39, 45]
    A1: 泄露             句内偏移量: 47    句内索引: [47, 48]

长文语义角色标注

List<SRLPredicate> SRLParseContent(String content)

对长文进行语义角色标注，返回谓词列表。如果待解析的文本较长，包含多个句子，建议使用长文语义角色标注函数。该函数首先进行分句，然后逐句进行语义角色标注。

参数

content: 待解析的长文本

示例

与单句解析类似，每一个谓词和论元均会记录在长文中的偏移量和索引，分别通过 getGlobalOffset() 和 getGlobalIdxs() 获取到在长文中对应的全文偏移量和全文索引。示例代码如下：

String content = "全球最大石油生产商沙特阿美（Saudi Aramco）周三（7月21日）证实，公司的一些文件遭泄露。" + 
                 "此前，一名网络勒索者声称获取了该公司大量数据，并要求其支付5000万美元赎金。";
List<SRLPredicate> predicateList = AHANLP.SRLParseContent(content);
System.out.println("\n解析长文： " + content);
for (SRLPredicate p : predicateList) {
    System.out.print("谓词: " + p.getPredicate());
    System.out.print("\t\t全文偏移量: " + p.getGlobalOffset());
    System.out.print("\t全文索引： [" + p.getGlobalIdxs()[0] + ", " + p.getGlobalIdxs()[1] + "]\n");
    for (Arg arg : p.getArguments()) {
        System.out.print("\t" + arg.getLabel() + ": " + arg.getSpan());
        System.out.print("\t\t全文偏移量: " + arg.getGlobalOffset());
        System.out.print("\t全文索引: [" + arg.getGlobalIdxs()[0] + ", " + arg.getGlobalIdxs()[1] + "]\n");
    }
    System.out.println();
}

解析长文： 全球最大石油生产商沙特阿美（Saudi Aramco）周三（7月21日）证实，公司的一些文件遭泄露。此前，一名网络勒索者声称获取了该公司大量数据，并要求其支付5000万美元赎金。
谓词: 证实    全文偏移量: 36    全文索引： [36, 37]
    A0: 全球最大石油生产商沙特阿美（Saudi Aramco）    全文偏移量: 0    全文索引: [0, 26]
    A1: 公司的一些文件遭泄露                         全文偏移量: 39    全文索引: [39, 48]
    TMP: 周三（7月21日）                            全文偏移量: 27    全文索引: [27, 35]

谓词: 遭    全文偏移量: 46    全文索引： [46, 46]
    A1: 泄露             全文偏移量: 47    全文索引: [47, 48]
    A0: 公司的一些文件    全文偏移量: 39    全文索引: [39, 45]

谓词: 声称    全文偏移量: 60    全文索引： [60, 61]
    TMP: 此前                 全文偏移量: 50    全文索引: [50, 51]
    A0: 一名网络勒索者         全文偏移量: 53    全文索引: [53, 59]
    A1: 获取了该公司大量数据    全文偏移量: 62    全文索引: [62, 71]

谓词: 获取    全文偏移量: 62    全文索引： [62, 63]
    A1: 该公司大量数据    全文偏移量: 65    全文索引: [65, 71]

谓词: 要求    全文偏移量: 74    全文索引： [74, 75]
    TMP: 此前            全文偏移量: 50    全文索引: [50, 51]
    A1: 其支付           全文偏移量: 76    全文索引: [76, 78]
    A1: 5000万美元赎金    全文偏移量: 79    全文索引: [79, 87]
    DIS: 并              全文偏移量: 73    全文索引: [73, 73]

对于每一个谓词 Predicate 和论元 Arg 对象，都记录了句内和全文的偏移量和索引，可以分别通过以下 4 个函数获取：

getLocalOffset(): 获取句内偏移量，记录谓词（论元）在句子中的起始索引
getLocalIdxs(): 获取句内索引，记录谓词（论元）在句子中的起始和结束索引 [start, end], sentence[start:end+1]=span
getGlobalOffset(): 获取全文偏移量，记录谓词（论元）在长文本中的起始索引
getGlobalIdxs(): 获取全文索引，记录谓词（论元）在长文本中的起始和结束索引 [start, end], content[start:end+1]=span

批量解析句子

如果需要批量地对句子进行语义角色标注，或者需要对长文手工进行分句再进行解析，可以调用批量解析句子函数。

List<List<SRLPredicate>> parseSentences(List<String> sentenceList)

逐句地对句子列表中的每一个句子进行语义角色标注，返回对应于每一个句子的谓词列表。

注：该函数未封装进 AHANLP 统一调用接口，为内部函数，需要通过 SRLParser 类进行调用。

参数

sentenceList: 待解析的句子列表

示例

List<String> senList = Arrays.asList(
    "全球最大石油生产商沙特阿美（Saudi Aramco）周三（7月21日）证实，公司的一些文件遭泄露。",
    "此前，一名网络勒索者声称获取了该公司大量数据，并要求其支付5000万美元赎金。"
);
List<List<SRLPredicate>> results = SRLParser.parseSentences(senList);
for (int i = 0; i < senList.size(); i++) {
    System.out.println("\n解析句子: " + senList.get(i));
    for (SRLPredicate p : results.get(i)) {
        System.out.println("谓词:" + p.getPredicate());
    }
}

解析句子: 全球最大石油生产商沙特阿美（Saudi Aramco）周三（7月21日）证实，公司的一些文件遭泄露。
谓词:证实
谓词:遭

解析句子: 此前，一名网络勒索者声称获取了该公司大量数据，并要求其支付5000万美元赎金。
谓词:声称
谓词:获取
谓词:要求

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

语义角色标注

单句语义角色标注

参数

示例

长文语义角色标注

参数

示例

批量解析句子

参数

示例

Clone this wiki locally