Skip to content

BerlinFrontiers

ValiaKordoni edited this page Aug 23, 2007 · 39 revisions

Discussion -- Pushing the Frontiers: Expanding our research to new languages and applications and setting new research goals - Strategies for Dissemination (Moderator: HansUszoreit)

Shared tasks and resources:

- common benchmark for base coverage: parallel corpora treebanks for the participating languages

- shared tasks for HPSG processing: a. abstract processing exercises, b. processing with respect to concrete applications

- shared tasks for applications

- cross-framework evaluations

Applications:

- information management (relation extraction, incl. event and opinion detection)

- machine translation (in combination with other checking methods)

- grammar checking (in combination with other checking methods)

- dialogue systems (e.g., for web agents and computer games)

- Others?

Steps towards applications:

HU: generation, exploitation of application semantics for getting to the meaning of applications;

AC: we should work on resource semantics for different applications

Steps towards shared tasks and resources:

Shared corpora:

  • -- Europarl -- parallel corpora based on touristic brochures, guides, etc., which are already translated in many languages, but which will also have to be translated to many more languages
    • AC: we should start with setting up the necessary machinery, even with smaller treebanks, even of different kinds of texts SO: a single coherent corpus DF: we collect the corpus by picking up parts/sentences from different kinds of texts for the various participating languages/grammars

      HU's proposed strategy to be adopted for the near future: a. collect the languages we want to participate, b. get the people/sites who will be responsible for annotating/parsing the corpus, c. choose the corpus, which is not too highly marked stylistically and has been translated to many other languages --> city/region descriptions, cathedral essay (on Francis' suggestion, translated into all the languages we are working on, approx. 800 sentences), novels, linux/technical documentation, everything, d. do the languages' matrix and see whether there would still be gaps; Tasks: -- languages: en (Stanford/Oslo), no (Trondheim), pt (Lisbon), es/ca (Barcelona), ja (Kyoto), de (Saarbrücken), el (Saarbrücken/Athens), sw (Linköping), fr (Toulouse), zh (Saarbrücken), ko (Seoul?); -- Saarbrücken builds the Wiki page by the 1st of September; -- the groups mentioned above submit the texts and translations to the Wiki page by mid October, accompanying them, in a prose field, with short description in order to know how the text in the various translations correlate to each other; -- guidelines Wiki subpage to be created by Oslo;

Clone this wiki locally