Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base types and their sensors #482

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Saul NLP Base Types
Built in hierarchy of common linguistic units for processing text and NLP tasks
and a relation class that connects various linguistic units.
Nlp base types help to design data models in Saul and provide many built in feature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

built-in features

extractors and helpers which the NLP sensors operate on them.

## Hierarchy
All classes of the hierarchy assumed to have a mandatory unique **id** throughout the entire corpora,
an optional **text**, and also an optional character based span which shows the **start** index and exclusive
**end** index of each linguistic unit.

On the top level of the hierarchy we have the [`Document`](Document.java) class.
Each document contains many [`Sentences`](Sententce.java) which in turn can
have many [`Phrases`](Phrase.java) and finally each phrase can contain many [`Tokens`](Token.java).
Note that, you can omit one or more of these linguistic units for specific usages.

### Properties
We can specify additional properties for all hierarchy classes using
`setPropertyValue`. This function adds a value to the list of values for that property.
The value list can be retrieved by `getPropertyValues` function.
And there is the `getPropertyFirstValue` which returns the first
value of the list for that property.

## Relation
Data modeling in Saul usually requires having edges between the model's nodes.
[Relations](Relation.java) help to have a container that holds the information
needed to construct those edges.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data modeling in Saul usually requires combining nodes and forming hypernodes.
+Relations help to have a container that holds the information
+needed to construct a flexible base type for this purpose. The programmer can establish edges between the new hypernode and other nodes which formed its parts explicitly if needed.
(Parisa: maybe some thing like this is a more clear explanation?)


Each relation should have a unique **Id** and two or more **argumentId** which determine
the Id of the linguistic units that used in this relation. Additional properties can be added
using `setProperty` function.
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
/** This software is released under the University of Illinois/Research and Academic Use License. See
* the LICENSE file in the root folder for details. Copyright (c) 2016
*
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign
* http://cogcomp.cs.illinois.edu/
*/
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes;

/**
* Created by Taher on 2016-12-18.
*/
public class Document extends NlpBaseElement {
public Document() {
}

public Document(String id) {
super(id, -1, -1, "");
}

public Document(String id, Integer start, Integer end) {
super(id, start, end, "");
}

public Document(String id, Integer start, Integer end, String text) {
super(id, start, end, text);
}

@Override
public NlpBaseElementTypes getType() {
return NlpBaseElementTypes.Document;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
/** This software is released under the University of Illinois/Research and Academic Use License. See
* the LICENSE file in the root folder for details. Copyright (c) 2016
*
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign
* http://cogcomp.cs.illinois.edu/
*/
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes;

/**
* Created by Taher on 2016-12-28.
*/
public class ExactMatching implements ISpanElementMatching {

@Override
public boolean matches(ISpanElement e1, ISpanElement e2) {
return e1.matches(e2);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
/** This software is released under the University of Illinois/Research and Academic Use License. See
* the LICENSE file in the root folder for details. Copyright (c) 2016
*
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign
* http://cogcomp.cs.illinois.edu/
*/
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes;

/**
* Created by Taher on 2016-12-28.
*/
public interface ISpanElement {
int getStart();
void setStart(int start);
void setEnd(int end);
int getEnd();
String getText();
boolean matches(ISpanElement e);
boolean contains(ISpanElement e);
boolean isPartOf(ISpanElement e);
boolean overlaps(ISpanElement e);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the name ISpan..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a better suggestion? We talk about span-based elements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant why it starts with I?

Copy link
Collaborator Author

@Rahgooy Rahgooy Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then maybe SpanElementInterface? it is up to you but the name was confusing to me.

Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
/** This software is released under the University of Illinois/Research and Academic Use License. See
* the LICENSE file in the root folder for details. Copyright (c) 2016
*
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign
* http://cogcomp.cs.illinois.edu/
*/
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes;

import edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes.ISpanElement;

/**
* Created by Taher on 2016-12-28.
*/
public interface ISpanElementMatching {
boolean matches(ISpanElement e1, ISpanElement e2);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
/** This software is released under the University of Illinois/Research and Academic Use License. See
* the LICENSE file in the root folder for details. Copyright (c) 2016
*
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign
* http://cogcomp.cs.illinois.edu/
*/
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes;

/**
* Created by Taher on 2016-12-28.
*/
public class InclusionMatching implements ISpanElementMatching {

@Override
public boolean matches(ISpanElement e1, ISpanElement e2) {
return e1.contains(e2);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
/** This software is released under the University of Illinois/Research and Academic Use License. See
* the LICENSE file in the root folder for details. Copyright (c) 2016
*
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign
* http://cogcomp.cs.illinois.edu/
*/
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/**
* Created by Taher on 2016-12-18.
*/
public abstract class NlpBaseElement extends SpanBasedElement {
private String id;
private String text;
private Map<String, List<String>> properties = new HashMap<>();

public NlpBaseElement() {
setStart(-1);
setEnd(-1);
}

public NlpBaseElement(String id, Integer start, Integer end, String text) {
this.setId(id);
this.setStart(start);
this.setEnd(end);
this.setText(text);
}

public abstract NlpBaseElementTypes getType();

public boolean containsProperty(String name) {
return properties.containsKey(name) && !properties.get(name).isEmpty();
}

public String getPropertyFirstValue(String name) {
if (containsProperty(name))
return properties.get(name).get(0);
return null;
}

public List<String> getPropertyValues(String name) {
if (containsProperty(name))
return properties.get(name);
return new ArrayList<>();
}

public void addPropertyValue(String name, String value) {
if (!containsProperty(name))
properties.put(name, new ArrayList<>());
properties.get(name).add(value);
}

public void removeProperty(String name) {
if (containsProperty(name))
properties.remove(name);
}

public String getId() {
return id;
}

public void setId(String id) {
this.id = id;
}

public static NlpBaseElement create(NlpBaseElementTypes type) {

switch (type) {
case Document:
return new Document();
case Sentence:
return new Sentence();
case Phrase:
return new Phrase();
case Token:
return new Token();
}
return null;
}

@Override
public String toString() {
return getText();
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
/** This software is released under the University of Illinois/Research and Academic Use License. See
* the LICENSE file in the root folder for details. Copyright (c) 2016
*
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign
* http://cogcomp.cs.illinois.edu/
*/
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes;

/**
* Created by Taher on 2016-12-24.
*/
public enum NlpBaseElementTypes {
Document, Sentence, Phrase, Token
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
/** This software is released under the University of Illinois/Research and Academic Use License. See
* the LICENSE file in the root folder for details. Copyright (c) 2016
*
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign
* http://cogcomp.cs.illinois.edu/
*/
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes;

/**
* Created by Taher on 2016-12-28.
*/
public class OverlapMatching implements ISpanElementMatching{
@Override
public boolean matches(ISpanElement e1, ISpanElement e2) {
return e1.overlaps(e2);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
/** This software is released under the University of Illinois/Research and Academic Use License. See
* the LICENSE file in the root folder for details. Copyright (c) 2016
*
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign
* http://cogcomp.cs.illinois.edu/
*/
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes;

/**
* Created by Taher on 2016-12-28.
*/
public class PartOfMatching implements ISpanElementMatching {

@Override
public boolean matches(ISpanElement e1, ISpanElement e2) {
return e1.isPartOf(e2);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/** This software is released under the University of Illinois/Research and Academic Use License. See
* the LICENSE file in the root folder for details. Copyright (c) 2016
*
* Developed by: The Cognitive Computations Group, University of Illinois at Urbana-Champaign
* http://cogcomp.cs.illinois.edu/
*/
package edu.illinois.cs.cogcomp.saulexamples.nlp.BaseTypes;

/**
* Created by Taher on 2016-12-24.
*/
public class Phrase extends NlpBaseElement {

private Sentence sentence;

public Phrase(){

}

public Phrase(Sentence sentence, String id, Integer start, Integer end, String text) {
super(id, start, end, text);
this.sentence = sentence;
}

@Override
public NlpBaseElementTypes getType() {
return NlpBaseElementTypes.Phrase;
}

public Document getDocument() {
return getSentence().getDocument();
}

public Sentence getSentence() {
return sentence;
}

public void setSentence(Sentence sentence) {
this.sentence = sentence;
}
}
Loading