-
Notifications
You must be signed in to change notification settings - Fork 4
BirdsofaFeather2011
The Penn Treebank was a game-changing resource because of the kind of feature extraction (including parser-based) that it enabled. In opening up new areas of NLP research, it has also opened up new horizons: those applications that are now imaginable but beyond the reach of current technology. We propose to extend that reach by creating an annotated data source that is much larger, more diverse in genre, and more deeply and consistently annotated than what is currently available. In particular, we plan to (mostly automatically) annotate the entire Open American National Corpus (16 million words) with syntactic and semantic representations produced by the English Resource Grammar (Flickinger 2000). This will also include the Manually Annotated Sub-Corpus, such that our annotations (in this case all manually verified) will be interoperable (through GrAF) with other annotations being produced over those 90,000 words.
We plan to apply for NSF (CRI) funding for this effort. As part of the preparation for that proposal, we are organizing a “Birds of a Feather” meeting in Portland OR on June 19, 2011. The goal of this meeting will be to gather input from the community on how the proposed resource could be used to further NLP research. This input will also allow us to shape the resource so that it is maximally useful.
The meeting will be informal, and consist primarily of discussions. We are interested to learn from participants about the following:
- When have syntactic and semantic features not panned out in experiments you’ve done? What were the applications you were targeting? What genres of text? What kind of features?
- What applications or tasks have you thought about but held back from because of insufficiently precise/deep linguistic feature extraction?
- What other kinds of applications or tasks would you like to work on, if only there were richer data sources over which to train feature extractors?
[Tentative schedule]
10:30-10:35 | Go grab some coffee |
10:35-10:45 | Welcome & Intro (Emily and Dan) |
10:45-12:30 | Participant presentations (5-10 min each, addressing the questions above) |
Tracy King | |
James Curran | |
Eva Hajičová | |
Chris Callison-Burch | |
Sameer Pradhan | |
Julia Hockenmeier | |
Francis Bond | |
Fei Xia | |
Mark Steedman | |
12:30-1:30 | Lunch (on your own) |
1:30-2:00 | Presentation of proposed annotation project |
2:00-3:30 | Small group brain-storming |
3:30-4:00 | Coffee break |
4:00-5:00 | Small group brain-storming |
5:00-5:30 | Report-back and wrap-up |
We are providing sample annotations for the following eight sentences, as a starting point for the discussion.
-
Simple tough-adjective sentences:
-
She is easy to get along with. (Gap is inside PP complement.)
-
tough-adjective sentences, non-gapped variant.
-
Longer sentence with coordination and tough-adjective:
-
Examples highlight scope of negation with respect to auxiliaries:
Please visit the BirdsofaFeather2011Summary page.
Home | Forum | Discussions | Events