-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base types and their sensors #482
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Saul NLP Base Types | ||
Built in hierarchy of common linguistic units for processing text and NLP tasks | ||
and a relation class that connects various linguistic units. | ||
Nlp base types help to design data models in Saul and provide many built in feature | ||
extractors and helpers which the NLP sensors operate on them. | ||
|
||
## Hierarchy | ||
All classes of the hierarchy assumed to have a mandatory unique **id** throughout the entire corpora, | ||
an optional **text**, and also an optional character based span which shows the **start** index and exclusive | ||
**end** index of each linguistic unit. | ||
|
||
On the top level of the hierarchy we have the [`Document`](Document.java) class. | ||
Each document contains many [`Sentences`](Sententce.java) which in turn can | ||
have many [`Phrases`](Phrase.java) and finally each phrase can contain many [`Tokens`](Token.java). | ||
Note that, you can omit one or more of these linguistic units for specific usages. | ||
|
||
### Properties | ||
We can specify additional properties for all hierarchy classes using | ||
`setPropertyValue`. This function adds a value to the list of values for that property. | ||
The value list can be retrieved by `getPropertyValues` function. | ||
And there is the `getPropertyFirstValue` which returns the first | ||
value of the list for that property. | ||
|
||
## Relation | ||
Data modeling in Saul usually requires having edges between the model's nodes. | ||
[Relations](Relation.java) help to have a container that holds the information | ||
needed to construct those edges. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Data modeling in Saul usually requires combining nodes and forming hypernodes. |
||
|
||
Each relation should have a unique **Id** and two or more **argumentId** which determine | ||
the Id of the linguistic units that used in this relation. Additional properties can be added | ||
using `setProperty` function. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
/** This software is released under the University of Illinois/Research and Academic Use License. See | ||
* the LICENSE file in the root folder for details. Copyright (c) 2016 | ||
* | ||
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign | ||
* http://cogcomp.cs.illinois.edu/ | ||
*/ | ||
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes; | ||
|
||
/** | ||
* Created by Taher on 2016-12-18. | ||
*/ | ||
public class Document extends NlpBaseElement { | ||
public Document() { | ||
} | ||
|
||
public Document(String id) { | ||
super(id, -1, -1, ""); | ||
} | ||
|
||
public Document(String id, Integer start, Integer end) { | ||
super(id, start, end, ""); | ||
} | ||
|
||
public Document(String id, Integer start, Integer end, String text) { | ||
super(id, start, end, text); | ||
} | ||
|
||
@Override | ||
public NlpBaseElementTypes getType() { | ||
return NlpBaseElementTypes.Document; | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
/** This software is released under the University of Illinois/Research and Academic Use License. See | ||
* the LICENSE file in the root folder for details. Copyright (c) 2016 | ||
* | ||
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign | ||
* http://cogcomp.cs.illinois.edu/ | ||
*/ | ||
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes; | ||
|
||
/** | ||
* Created by Taher on 2016-12-28. | ||
*/ | ||
public class ExactMatching implements ISpanElementMatching { | ||
|
||
@Override | ||
public boolean matches(ISpanElement e1, ISpanElement e2) { | ||
return e1.matches(e2); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
/** This software is released under the University of Illinois/Research and Academic Use License. See | ||
* the LICENSE file in the root folder for details. Copyright (c) 2016 | ||
* | ||
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign | ||
* http://cogcomp.cs.illinois.edu/ | ||
*/ | ||
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes; | ||
|
||
/** | ||
* Created by Taher on 2016-12-28. | ||
*/ | ||
public interface ISpanElement { | ||
int getStart(); | ||
void setStart(int start); | ||
void setEnd(int end); | ||
int getEnd(); | ||
String getText(); | ||
boolean matches(ISpanElement e); | ||
boolean contains(ISpanElement e); | ||
boolean isPartOf(ISpanElement e); | ||
boolean overlaps(ISpanElement e); | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why the name There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have a better suggestion? We talk about span-based elements. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I meant why it starts with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is an interface. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. then maybe SpanElementInterface? it is up to you but the name was confusing to me. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
/** This software is released under the University of Illinois/Research and Academic Use License. See | ||
* the LICENSE file in the root folder for details. Copyright (c) 2016 | ||
* | ||
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign | ||
* http://cogcomp.cs.illinois.edu/ | ||
*/ | ||
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes; | ||
|
||
import edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes.ISpanElement; | ||
|
||
/** | ||
* Created by Taher on 2016-12-28. | ||
*/ | ||
public interface ISpanElementMatching { | ||
boolean matches(ISpanElement e1, ISpanElement e2); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
/** This software is released under the University of Illinois/Research and Academic Use License. See | ||
* the LICENSE file in the root folder for details. Copyright (c) 2016 | ||
* | ||
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign | ||
* http://cogcomp.cs.illinois.edu/ | ||
*/ | ||
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes; | ||
|
||
/** | ||
* Created by Taher on 2016-12-28. | ||
*/ | ||
public class InclusionMatching implements ISpanElementMatching { | ||
|
||
@Override | ||
public boolean matches(ISpanElement e1, ISpanElement e2) { | ||
return e1.contains(e2); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
/** This software is released under the University of Illinois/Research and Academic Use License. See | ||
* the LICENSE file in the root folder for details. Copyright (c) 2016 | ||
* | ||
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign | ||
* http://cogcomp.cs.illinois.edu/ | ||
*/ | ||
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes; | ||
|
||
import java.util.ArrayList; | ||
import java.util.HashMap; | ||
import java.util.List; | ||
import java.util.Map; | ||
|
||
/** | ||
* Created by Taher on 2016-12-18. | ||
*/ | ||
public abstract class NlpBaseElement extends SpanBasedElement { | ||
private String id; | ||
private String text; | ||
private Map<String, List<String>> properties = new HashMap<>(); | ||
|
||
public NlpBaseElement() { | ||
setStart(-1); | ||
setEnd(-1); | ||
} | ||
|
||
public NlpBaseElement(String id, Integer start, Integer end, String text) { | ||
this.setId(id); | ||
this.setStart(start); | ||
this.setEnd(end); | ||
this.setText(text); | ||
} | ||
|
||
public abstract NlpBaseElementTypes getType(); | ||
|
||
public boolean containsProperty(String name) { | ||
return properties.containsKey(name) && !properties.get(name).isEmpty(); | ||
} | ||
|
||
public String getPropertyFirstValue(String name) { | ||
if (containsProperty(name)) | ||
return properties.get(name).get(0); | ||
return null; | ||
} | ||
|
||
public List<String> getPropertyValues(String name) { | ||
if (containsProperty(name)) | ||
return properties.get(name); | ||
return new ArrayList<>(); | ||
} | ||
|
||
public void addPropertyValue(String name, String value) { | ||
if (!containsProperty(name)) | ||
properties.put(name, new ArrayList<>()); | ||
properties.get(name).add(value); | ||
} | ||
|
||
public void removeProperty(String name) { | ||
if (containsProperty(name)) | ||
properties.remove(name); | ||
} | ||
|
||
public String getId() { | ||
return id; | ||
} | ||
|
||
public void setId(String id) { | ||
this.id = id; | ||
} | ||
|
||
public static NlpBaseElement create(NlpBaseElementTypes type) { | ||
|
||
switch (type) { | ||
case Document: | ||
return new Document(); | ||
case Sentence: | ||
return new Sentence(); | ||
case Phrase: | ||
return new Phrase(); | ||
case Token: | ||
return new Token(); | ||
} | ||
return null; | ||
} | ||
|
||
@Override | ||
public String toString() { | ||
return getText(); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
/** This software is released under the University of Illinois/Research and Academic Use License. See | ||
* the LICENSE file in the root folder for details. Copyright (c) 2016 | ||
* | ||
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign | ||
* http://cogcomp.cs.illinois.edu/ | ||
*/ | ||
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes; | ||
|
||
/** | ||
* Created by Taher on 2016-12-24. | ||
*/ | ||
public enum NlpBaseElementTypes { | ||
Document, Sentence, Phrase, Token | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
/** This software is released under the University of Illinois/Research and Academic Use License. See | ||
* the LICENSE file in the root folder for details. Copyright (c) 2016 | ||
* | ||
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign | ||
* http://cogcomp.cs.illinois.edu/ | ||
*/ | ||
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes; | ||
|
||
/** | ||
* Created by Taher on 2016-12-28. | ||
*/ | ||
public class OverlapMatching implements ISpanElementMatching{ | ||
@Override | ||
public boolean matches(ISpanElement e1, ISpanElement e2) { | ||
return e1.overlaps(e2); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
/** This software is released under the University of Illinois/Research and Academic Use License. See | ||
* the LICENSE file in the root folder for details. Copyright (c) 2016 | ||
* | ||
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign | ||
* http://cogcomp.cs.illinois.edu/ | ||
*/ | ||
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes; | ||
|
||
/** | ||
* Created by Taher on 2016-12-28. | ||
*/ | ||
public class PartOfMatching implements ISpanElementMatching { | ||
|
||
@Override | ||
public boolean matches(ISpanElement e1, ISpanElement e2) { | ||
return e1.isPartOf(e2); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
/** This software is released under the University of Illinois/Research and Academic Use License. See | ||
* the LICENSE file in the root folder for details. Copyright (c) 2016 | ||
* | ||
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign | ||
* http://cogcomp.cs.illinois.edu/ | ||
*/ | ||
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes; | ||
|
||
/** | ||
* Created by Taher on 2016-12-24. | ||
*/ | ||
public class Phrase extends NlpBaseElement { | ||
|
||
private Sentence sentence; | ||
|
||
public Phrase(){ | ||
|
||
} | ||
|
||
public Phrase(Sentence sentence, String id, Integer start, Integer end, String text) { | ||
super(id, start, end, text); | ||
this.sentence = sentence; | ||
} | ||
|
||
@Override | ||
public NlpBaseElementTypes getType() { | ||
return NlpBaseElementTypes.Phrase; | ||
} | ||
|
||
public Document getDocument() { | ||
return getSentence().getDocument(); | ||
} | ||
|
||
public Sentence getSentence() { | ||
return sentence; | ||
} | ||
|
||
public void setSentence(Sentence sentence) { | ||
this.sentence = sentence; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
built-in features