Releases: explosion/spacy-models
en_core_web_trf-3.7.3
Checksum .tar.gz:
dae355f7f419bee53f2804a8e62a6473425e8680ac8ff8e8a7b30b7e2b8b0c4f
Checksum .whl:f72abb34bdf174876bd4267b29b2501677e605e0a251fdc56c163003182ed68b
Details: https://spacy.io/models/en#en_core_web_trf
English transformer pipeline (Transformer(name='roberta-base', piece_encoder='byte-bpe', stride=104, type='roberta', width=768, window=144, vocab_size=50265)). Components: transformer, tagger, parser, ner, attribute_ruler, lemmatizer.
Feature | Description |
---|---|
Name | en_core_web_trf |
Version | 3.7.3 |
spaCy | >=3.7.2,<3.8.0 |
Default Pipeline | transformer , tagger , parser , attribute_ruler , lemmatizer , ner |
Components | transformer , tagger , parser , attribute_ruler , lemmatizer , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) ClearNLP Constituent-to-Dependency Conversion (Emory University) WordNet 3.0 (Princeton University) roberta-base (Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov) |
License | MIT |
Author | Explosion |
Model size | 436 MB |
Label Scheme
View label scheme (112 labels for 3 components)
Component | Labels |
---|---|
tagger |
$ , '' , , , -LRB- , -RRB- , . , : , ADD , AFX , CC , CD , DT , EX , FW , HYPH , IN , JJ , JJR , JJS , LS , MD , NFP , NN , NNP , NNPS , NNS , PDT , POS , PRP , PRP$ , RB , RBR , RBS , RP , SYM , TO , UH , VB , VBD , VBG , VBN , VBP , VBZ , WDT , WP , WP$ , WRB , XX , ```` |
parser |
ROOT , acl , acomp , advcl , advmod , agent , amod , appos , attr , aux , auxpass , case , cc , ccomp , compound , conj , csubj , csubjpass , dative , dep , det , dobj , expl , intj , mark , meta , neg , nmod , npadvmod , nsubj , nsubjpass , nummod , oprd , parataxis , pcomp , pobj , poss , preconj , predet , prep , prt , punct , quantmod , relcl , xcomp |
ner |
CARDINAL , DATE , EVENT , FAC , GPE , LANGUAGE , LAW , LOC , MONEY , NORP , ORDINAL , ORG , PERCENT , PERSON , PRODUCT , QUANTITY , TIME , WORK_OF_ART |
Accuracy
Type | Score |
---|---|
TOKEN_ACC |
99.86 |
TOKEN_P |
99.57 |
TOKEN_R |
99.58 |
TOKEN_F |
99.57 |
TAG_ACC |
98.13 |
SENTS_P |
94.89 |
SENTS_R |
85.79 |
SENTS_F |
90.11 |
DEP_UAS |
95.26 |
DEP_LAS |
93.91 |
ENTS_P |
90.08 |
ENTS_R |
90.30 |
ENTS_F |
90.19 |
Installation
pip install spacy
python -m spacy download en_core_web_trf
en_core_web_sm-3.7.1
Checksum .tar.gz:
1075c2aa2bc2fee105ab6e90a01a5d1a428c9f5b20a1fa003dc2cb6a438d295e
Checksum .whl:86cc141f63942d4b2c5fcee06630fd6f904788d2f0ab005cce45aadb8fb73889
Details: https://spacy.io/models/en#en_core_web_sm
English pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler, lemmatizer.
Feature | Description |
---|---|
Name | en_core_web_sm |
Version | 3.7.1 |
spaCy | >=3.7.2,<3.8.0 |
Default Pipeline | tok2vec , tagger , parser , attribute_ruler , lemmatizer , ner |
Components | tok2vec , tagger , parser , senter , attribute_ruler , lemmatizer , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) ClearNLP Constituent-to-Dependency Conversion (Emory University) WordNet 3.0 (Princeton University) |
License | MIT |
Author | Explosion |
Model size | 12 MB |
Label Scheme
View label scheme (113 labels for 3 components)
Component | Labels |
---|---|
tagger |
$ , '' , , , -LRB- , -RRB- , . , : , ADD , AFX , CC , CD , DT , EX , FW , HYPH , IN , JJ , JJR , JJS , LS , MD , NFP , NN , NNP , NNPS , NNS , PDT , POS , PRP , PRP$ , RB , RBR , RBS , RP , SYM , TO , UH , VB , VBD , VBG , VBN , VBP , VBZ , WDT , WP , WP$ , WRB , XX , _SP , ```` |
parser |
ROOT , acl , acomp , advcl , advmod , agent , amod , appos , attr , aux , auxpass , case , cc , ccomp , compound , conj , csubj , csubjpass , dative , dep , det , dobj , expl , intj , mark , meta , neg , nmod , npadvmod , nsubj , nsubjpass , nummod , oprd , parataxis , pcomp , pobj , poss , preconj , predet , prep , prt , punct , quantmod , relcl , xcomp |
ner |
CARDINAL , DATE , EVENT , FAC , GPE , LANGUAGE , LAW , LOC , MONEY , NORP , ORDINAL , ORG , PERCENT , PERSON , PRODUCT , QUANTITY , TIME , WORK_OF_ART |
Accuracy
Type | Score |
---|---|
TOKEN_ACC |
99.86 |
TOKEN_P |
99.57 |
TOKEN_R |
99.58 |
TOKEN_F |
99.57 |
TAG_ACC |
97.25 |
SENTS_P |
92.02 |
SENTS_R |
89.21 |
SENTS_F |
90.59 |
DEP_UAS |
91.75 |
DEP_LAS |
89.87 |
ENTS_P |
84.55 |
ENTS_R |
84.57 |
ENTS_F |
84.56 |
Installation
pip install spacy
python -m spacy download en_core_web_sm
en_core_web_md-3.7.1
Checksum .tar.gz:
3273a1335fcb688be09949c5cdb73e85eb584ec3dfc50d4338c17daf6ccd4628
Checksum .whl:6a0f857a2b4d219c6fa17d455f82430b365bf53171a2d919b9376e5dc9be032e
Details: https://spacy.io/models/en#en_core_web_md
English pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler, lemmatizer.
Feature | Description |
---|---|
Name | en_core_web_md |
Version | 3.7.1 |
spaCy | >=3.7.2,<3.8.0 |
Default Pipeline | tok2vec , tagger , parser , attribute_ruler , lemmatizer , ner |
Components | tok2vec , tagger , parser , senter , attribute_ruler , lemmatizer , ner |
Vectors | 514157 keys, 20000 unique vectors (300 dimensions) |
Sources | OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) ClearNLP Constituent-to-Dependency Conversion (Emory University) WordNet 3.0 (Princeton University) Explosion Vectors (OSCAR 2109 + Wikipedia + OpenSubtitles + WMT News Crawl) (Explosion) |
License | MIT |
Author | Explosion |
Model size | 40 MB |
Label Scheme
View label scheme (113 labels for 3 components)
Component | Labels |
---|---|
tagger |
$ , '' , , , -LRB- , -RRB- , . , : , ADD , AFX , CC , CD , DT , EX , FW , HYPH , IN , JJ , JJR , JJS , LS , MD , NFP , NN , NNP , NNPS , NNS , PDT , POS , PRP , PRP$ , RB , RBR , RBS , RP , SYM , TO , UH , VB , VBD , VBG , VBN , VBP , VBZ , WDT , WP , WP$ , WRB , XX , _SP , ```` |
parser |
ROOT , acl , acomp , advcl , advmod , agent , amod , appos , attr , aux , auxpass , case , cc , ccomp , compound , conj , csubj , csubjpass , dative , dep , det , dobj , expl , intj , mark , meta , neg , nmod , npadvmod , nsubj , nsubjpass , nummod , oprd , parataxis , pcomp , pobj , poss , preconj , predet , prep , prt , punct , quantmod , relcl , xcomp |
ner |
CARDINAL , DATE , EVENT , FAC , GPE , LANGUAGE , LAW , LOC , MONEY , NORP , ORDINAL , ORG , PERCENT , PERSON , PRODUCT , QUANTITY , TIME , WORK_OF_ART |
Accuracy
Type | Score |
---|---|
TOKEN_ACC |
99.86 |
TOKEN_P |
99.57 |
TOKEN_R |
99.58 |
TOKEN_F |
99.57 |
TAG_ACC |
97.33 |
SENTS_P |
92.21 |
SENTS_R |
89.37 |
SENTS_F |
90.77 |
DEP_UAS |
92.05 |
DEP_LAS |
90.23 |
ENTS_P |
84.94 |
ENTS_R |
85.49 |
ENTS_F |
85.22 |
Installation
pip install spacy
python -m spacy download en_core_web_md
en_core_web_lg-3.7.1
Checksum .tar.gz:
4c8b2fd2572a5fb232c7b38345d301e7e092d1242b7184e14a86eff8ef6eb6d7
Checksum .whl:ab70aeb6172cde82508f7739f35ebc9918a3d07debeed637403c8f794ba3d3dc
Details: https://spacy.io/models/en#en_core_web_lg
English pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler, lemmatizer.
Feature | Description |
---|---|
Name | en_core_web_lg |
Version | 3.7.1 |
spaCy | >=3.7.2,<3.8.0 |
Default Pipeline | tok2vec , tagger , parser , attribute_ruler , lemmatizer , ner |
Components | tok2vec , tagger , parser , senter , attribute_ruler , lemmatizer , ner |
Vectors | 514157 keys, 514157 unique vectors (300 dimensions) |
Sources | OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) ClearNLP Constituent-to-Dependency Conversion (Emory University) WordNet 3.0 (Princeton University) Explosion Vectors (OSCAR 2109 + Wikipedia + OpenSubtitles + WMT News Crawl) (Explosion) |
License | MIT |
Author | Explosion |
Model size | 560 MB |
Label Scheme
View label scheme (113 labels for 3 components)
Component | Labels |
---|---|
tagger |
$ , '' , , , -LRB- , -RRB- , . , : , ADD , AFX , CC , CD , DT , EX , FW , HYPH , IN , JJ , JJR , JJS , LS , MD , NFP , NN , NNP , NNPS , NNS , PDT , POS , PRP , PRP$ , RB , RBR , RBS , RP , SYM , TO , UH , VB , VBD , VBG , VBN , VBP , VBZ , WDT , WP , WP$ , WRB , XX , _SP , ```` |
parser |
ROOT , acl , acomp , advcl , advmod , agent , amod , appos , attr , aux , auxpass , case , cc , ccomp , compound , conj , csubj , csubjpass , dative , dep , det , dobj , expl , intj , mark , meta , neg , nmod , npadvmod , nsubj , nsubjpass , nummod , oprd , parataxis , pcomp , pobj , poss , preconj , predet , prep , prt , punct , quantmod , relcl , xcomp |
ner |
CARDINAL , DATE , EVENT , FAC , GPE , LANGUAGE , LAW , LOC , MONEY , NORP , ORDINAL , ORG , PERCENT , PERSON , PRODUCT , QUANTITY , TIME , WORK_OF_ART |
Accuracy
Type | Score |
---|---|
TOKEN_ACC |
99.86 |
TOKEN_P |
99.57 |
TOKEN_R |
99.58 |
TOKEN_F |
99.57 |
TAG_ACC |
97.35 |
SENTS_P |
92.19 |
SENTS_R |
89.27 |
SENTS_F |
90.71 |
DEP_UAS |
92.08 |
DEP_LAS |
90.27 |
ENTS_P |
85.16 |
ENTS_R |
85.70 |
ENTS_F |
85.43 |
Installation
pip install spacy
python -m spacy download en_core_web_lg
zh_core_web_trf-3.7.2
Checksum .tar.gz:
38857a79f6754b9427619362843c84c18e6410e7ba1f05a1d7aa1c91f7b08904
Checksum .whl:16b8d4bf23d20a04cfcbe676ae1be2be4437b40cf8101c9f3e7f6db4674ec91d
Details: https://spacy.io/models/zh#zh_core_web_trf
Chinese transformer pipeline (Transformer(name='bert-base-chinese', piece_encoder='bert-wordpiece', stride=152, type='bert', width=768, window=208, vocab_size=21128)). Components: transformer, tagger, parser, ner, attribute_ruler.
Feature | Description |
---|---|
Name | zh_core_web_trf |
Version | 3.7.2 |
spaCy | >=3.7.0,<3.8.0 |
Default Pipeline | transformer , tagger , parser , attribute_ruler , ner |
Components | transformer , tagger , parser , attribute_ruler , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) CoreNLP Universal Dependencies Converter (Stanford NLP Group) bert-base-chinese (Hugging Face) |
License | MIT |
Author | Explosion |
Model size | 396 MB |
Label Scheme
View label scheme (99 labels for 3 components)
Component | Labels |
---|---|
tagger |
AD , AS , BA , CC , CD , CS , DEC , DEG , DER , DEV , DT , ETC , FW , IJ , INF , JJ , LB , LC , M , MSP , NN , NR , NT , OD , ON , P , PN , PU , SB , SP , URL , VA , VC , VE , VV , X |
parser |
ROOT , acl , advcl:loc , advmod , advmod:dvp , advmod:loc , advmod:rcomp , amod , amod:ordmod , appos , aux:asp , aux:ba , aux:modal , aux:prtmod , auxpass , case , cc , ccomp , compound:nn , compound:vc , conj , cop , dep , det , discourse , dobj , etc , mark , mark:clf , name , neg , nmod , nmod:assmod , nmod:poss , nmod:prep , nmod:range , nmod:tmod , nmod:topic , nsubj , nsubj:xsubj , nsubjpass , nummod , parataxis:prnmod , punct , xcomp |
ner |
CARDINAL , DATE , EVENT , FAC , GPE , LANGUAGE , LAW , LOC , MONEY , NORP , ORDINAL , ORG , PERCENT , PERSON , PRODUCT , QUANTITY , TIME , WORK_OF_ART |
Accuracy
Type | Score |
---|---|
TOKEN_ACC |
95.85 |
TOKEN_P |
94.58 |
TOKEN_R |
91.36 |
TOKEN_F |
92.94 |
TAG_ACC |
91.75 |
SENTS_P |
70.92 |
SENTS_R |
67.57 |
SENTS_F |
69.21 |
DEP_UAS |
75.72 |
DEP_LAS |
71.45 |
ENTS_P |
76.09 |
ENTS_R |
72.18 |
ENTS_F |
74.08 |
Installation
pip install spacy
python -m spacy download zh_core_web_trf
zh_core_web_sm-3.7.0
Checksum .tar.gz:
c22fe1cb9a0479a297d24d33641592436d1b68385c9bbd750ea20e84c4273ef5
Checksum .whl:f51075665749e07406d629d1055ce5a68635fae6ab3c34257ee798c62b4fc431
Details: https://spacy.io/models/zh#zh_core_web_sm
Chinese pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler.
Feature | Description |
---|---|
Name | zh_core_web_sm |
Version | 3.7.0 |
spaCy | >=3.7.0,<3.8.0 |
Default Pipeline | tok2vec , tagger , parser , attribute_ruler , ner |
Components | tok2vec , tagger , parser , senter , attribute_ruler , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) CoreNLP Universal Dependencies Converter (Stanford NLP Group) |
License | MIT |
Author | Explosion |
Model size | 46 MB |
Label Scheme
View label scheme (100 labels for 3 components)
Component | Labels |
---|---|
tagger |
AD , AS , BA , CC , CD , CS , DEC , DEG , DER , DEV , DT , ETC , FW , IJ , INF , JJ , LB , LC , M , MSP , NN , NR , NT , OD , ON , P , PN , PU , SB , SP , URL , VA , VC , VE , VV , X , _SP |
parser |
ROOT , acl , advcl:loc , advmod , advmod:dvp , advmod:loc , advmod:rcomp , amod , amod:ordmod , appos , aux:asp , aux:ba , aux:modal , aux:prtmod , auxpass , case , cc , ccomp , compound:nn , compound:vc , conj , cop , dep , det , discourse , dobj , etc , mark , mark:clf , name , neg , nmod , nmod:assmod , nmod:poss , nmod:prep , nmod:range , nmod:tmod , nmod:topic , nsubj , nsubj:xsubj , nsubjpass , nummod , parataxis:prnmod , punct , xcomp |
ner |
CARDINAL , DATE , EVENT , FAC , GPE , LANGUAGE , LAW , LOC , MONEY , NORP , ORDINAL , ORG , PERCENT , PERSON , PRODUCT , QUANTITY , TIME , WORK_OF_ART |
Accuracy
Type | Score |
---|---|
TOKEN_ACC |
95.85 |
TOKEN_P |
94.58 |
TOKEN_R |
91.36 |
TOKEN_F |
92.94 |
TAG_ACC |
89.33 |
SENTS_P |
77.85 |
SENTS_R |
72.62 |
SENTS_F |
75.14 |
DEP_UAS |
69.60 |
DEP_LAS |
64.08 |
ENTS_P |
72.03 |
ENTS_R |
64.93 |
ENTS_F |
68.30 |
Installation
pip install spacy
python -m spacy download zh_core_web_sm
zh_core_web_md-3.7.0
Checksum .tar.gz:
920cf2f7e8db666f22d52b763ff76cf9eeac2c7e6dbc00f5e99ed543ba7da50e
Checksum .whl:a528dbbcf7f323718be4b523559840dc850303046e25a62f9a1049b7ab9f9e68
Details: https://spacy.io/models/zh#zh_core_web_md
Chinese pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler.
Feature | Description |
---|---|
Name | zh_core_web_md |
Version | 3.7.0 |
spaCy | >=3.7.0,<3.8.0 |
Default Pipeline | tok2vec , tagger , parser , attribute_ruler , ner |
Components | tok2vec , tagger , parser , senter , attribute_ruler , ner |
Vectors | 500000 keys, 20000 unique vectors (300 dimensions) |
Sources | OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) CoreNLP Universal Dependencies Converter (Stanford NLP Group) Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia) (Explosion) |
License | MIT |
Author | Explosion |
Model size | 74 MB |
Label Scheme
View label scheme (100 labels for 3 components)
Component | Labels |
---|---|
tagger |
AD , AS , BA , CC , CD , CS , DEC , DEG , DER , DEV , DT , ETC , FW , IJ , INF , JJ , LB , LC , M , MSP , NN , NR , NT , OD , ON , P , PN , PU , SB , SP , URL , VA , VC , VE , VV , X , _SP |
parser |
ROOT , acl , advcl:loc , advmod , advmod:dvp , advmod:loc , advmod:rcomp , amod , amod:ordmod , appos , aux:asp , aux:ba , aux:modal , aux:prtmod , auxpass , case , cc , ccomp , compound:nn , compound:vc , conj , cop , dep , det , discourse , dobj , etc , mark , mark:clf , name , neg , nmod , nmod:assmod , nmod:poss , nmod:prep , nmod:range , nmod:tmod , nmod:topic , nsubj , nsubj:xsubj , nsubjpass , nummod , parataxis:prnmod , punct , xcomp |
ner |
CARDINAL , DATE , EVENT , FAC , GPE , LANGUAGE , LAW , LOC , MONEY , NORP , ORDINAL , ORG , PERCENT , PERSON , PRODUCT , QUANTITY , TIME , WORK_OF_ART |
Accuracy
Type | Score |
---|---|
TOKEN_ACC |
95.85 |
TOKEN_P |
94.58 |
TOKEN_R |
91.36 |
TOKEN_F |
92.94 |
TAG_ACC |
90.04 |
SENTS_P |
78.89 |
SENTS_R |
72.80 |
SENTS_F |
75.72 |
DEP_UAS |
70.50 |
DEP_LAS |
65.22 |
ENTS_P |
71.88 |
ENTS_R |
67.90 |
ENTS_F |
69.83 |
Installation
pip install spacy
python -m spacy download zh_core_web_md
zh_core_web_lg-3.7.0
Checksum .tar.gz:
0a07048baf3e73f22b16a7edac47f97632772c7a05ebf1bcc51ab458f0670dcf
Checksum .whl:6bfd1796788dc27c0f5e0cc43374eb96abe0b4f0ec1b29f19f5782051216c556
Details: https://spacy.io/models/zh#zh_core_web_lg
Chinese pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler.
Feature | Description |
---|---|
Name | zh_core_web_lg |
Version | 3.7.0 |
spaCy | >=3.7.0,<3.8.0 |
Default Pipeline | tok2vec , tagger , parser , attribute_ruler , ner |
Components | tok2vec , tagger , parser , senter , attribute_ruler , ner |
Vectors | 500000 keys, 500000 unique vectors (300 dimensions) |
Sources | OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) CoreNLP Universal Dependencies Converter (Stanford NLP Group) Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia) (Explosion) |
License | MIT |
Author | Explosion |
Model size | 575 MB |
Label Scheme
View label scheme (100 labels for 3 components)
Component | Labels |
---|---|
tagger |
AD , AS , BA , CC , CD , CS , DEC , DEG , DER , DEV , DT , ETC , FW , IJ , INF , JJ , LB , LC , M , MSP , NN , NR , NT , OD , ON , P , PN , PU , SB , SP , URL , VA , VC , VE , VV , X , _SP |
parser |
ROOT , acl , advcl:loc , advmod , advmod:dvp , advmod:loc , advmod:rcomp , amod , amod:ordmod , appos , aux:asp , aux:ba , aux:modal , aux:prtmod , auxpass , case , cc , ccomp , compound:nn , compound:vc , conj , cop , dep , det , discourse , dobj , etc , mark , mark:clf , name , neg , nmod , nmod:assmod , nmod:poss , nmod:prep , nmod:range , nmod:tmod , nmod:topic , nsubj , nsubj:xsubj , nsubjpass , nummod , parataxis:prnmod , punct , xcomp |
ner |
CARDINAL , DATE , EVENT , FAC , GPE , LANGUAGE , LAW , LOC , MONEY , NORP , ORDINAL , ORG , PERCENT , PERSON , PRODUCT , QUANTITY , TIME , WORK_OF_ART |
Accuracy
Type | Score |
---|---|
TOKEN_ACC |
95.85 |
TOKEN_P |
94.58 |
TOKEN_R |
91.36 |
TOKEN_F |
92.94 |
TAG_ACC |
90.33 |
SENTS_P |
78.05 |
SENTS_R |
72.63 |
SENTS_F |
75.24 |
DEP_UAS |
70.86 |
DEP_LAS |
65.71 |
ENTS_P |
73.55 |
ENTS_R |
69.25 |
ENTS_F |
71.34 |
Installation
pip install spacy
python -m spacy download zh_core_web_lg
xx_sent_ud_sm-3.7.0
Checksum .tar.gz:
fc769f274ad087e1ee3042d671a5487714a885d2a0fba5baea56cd5a6b23cc8d
Checksum .whl:aafb609d5a895a62ed9672fbef2aa8061106a4b164a700999a376f8529acc3ad
Details: https://spacy.io/models/xx#xx_sent_ud_sm
Multi-language pipeline optimized for CPU. Components: senter.
Label Scheme
Accuracy
Type | Score |
---|---|
TOKEN_ACC |
98.59 |
TOKEN_P |
95.31 |
TOKEN_R |
95.72 |
TOKEN_F |
95.52 |
SENTS_P |
90.66 |
SENTS_R |
81.58 |
SENTS_F |
85.88 |
Installation
pip install spacy
python -m spacy download xx_sent_ud_sm
xx_ent_wiki_sm-3.7.0
Checksum .tar.gz:
96e9c622429d34c08127aca1689fb5c5c557bbd3027c4a5a655874dd915206cc
Checksum .whl:66c227a793f8a79814d6ca1da7c0ae633172e2fb0a94737bc8bd2e517479e73c
Details: https://spacy.io/models/xx#xx_ent_wiki_sm
Multi-language pipeline optimized for CPU. Components: ner.
Feature | Description |
---|---|
Name | xx_ent_wiki_sm |
Version | 3.7.0 |
spaCy | >=3.7.0,<3.8.0 |
Default Pipeline | ner |
Components | ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | WikiNER (Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, James R Curran) |
License | MIT |
Author | Explosion |
Model size | 10 MB |
Label Scheme
View label scheme (4 labels for 1 components)
Component | Labels |
---|---|
ner |
LOC , MISC , ORG , PER |
Accuracy
Type | Score |
---|---|
ENTS_P |
83.53 |
ENTS_R |
82.65 |
ENTS_F |
83.08 |
Installation
pip install spacy
python -m spacy download xx_ent_wiki_sm