-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON generated by NER Annotator doesn't seem to work with Spacy convertor #44
Comments
Hello there! A simple function to convert generated by ner-annotator JSON directly to docbin would be this one: from spacy.tokens import DocBin
import spacy
import json
from tqdm import tqdm
import random
nlp = spacy.blank("en")
def load_data(file):
with open(file, "r", encoding="utf-8") as f:
data = json.load(f)
return (data["annotations"])
train_data = load_data("./data/annotation_1.json")
valid_data = load_data("./data/annotation_3.json")
def create_training(TRAIN_DATA):
db = DocBin()
for text, annot in tqdm(TRAIN_DATA):
doc = nlp.make_doc(text)
ents = []
for start, end, label in annot["entities"]:
span = doc.char_span(start, end, label=label,
alignment_mode="contract")
if span is None:
print("Skipping entity")
else:
ents.append(span)
doc.ents = ents
db.add(doc)
return (db)
train_data = create_training(train_data)
train_data.to_disk("./data/train2.spacy")
valid_data = create_training(valid_data)
valid_data.to_disk("./data/valid2.spacy") PS Good job for the app, I love it. |
@MikhailKlemin Thank you for coming up with the solution. |
I am facing issues in saving it to disk .spacy file, what to do ? |
Resolved!! |
Hey @ankitladva11! Glad to know your problem was resolved. When you have the time, could you please leave a comment describing your issue and how you managed to resolve it? It would be useful to future users who might face the same issue. TIA! |
@dreji18 also has a nicely documented approach to getting Spacy to work with the NER Annotator export. |
See comment at #43 (reply in thread)
The text was updated successfully, but these errors were encountered: