Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wsj corpus: unexpected predicate names #46

Open
arademaker opened this issue Dec 26, 2023 · 3 comments
Open

wsj corpus: unexpected predicate names #46

arademaker opened this issue Dec 26, 2023 · 3 comments

Comments

@arademaker
Copy link
Member

arademaker commented Dec 26, 2023

wsj201

Item

1000000400370@unknown@formal@none@1@S@⌊•⌊#1965, ⌊>H. A. Simon>⌋: "[M]achines will be capable, within twenty years, of doing any work a man can do"#⌋@@@@1@19@@oe@26-8-2013

% ace -g ../erg.dat -E
⌊•⌊#1965, ⌊>H. A. Simon>⌋: "[M]achines will be capable, within twenty years, of doing any work a man can do"#⌋
1965 , H. A. Simon : “ [ M]achines will be capable , within twenty years , of doing any work a man can do ”

The token M]achines generates the predicate _m]achines/NNS_u_unknown. Does it make sense?

@arademaker arademaker changed the title unexpected predicate name wsj corpus: unexpected predicate names Dec 26, 2023
@arademaker
Copy link
Member Author

arademaker commented Dec 26, 2023

% ace -g ../erg.dat -E 
⌊∗The clock∗⌋: Bolter credits the invention of the weight-driven ⌊>clock>⌋ as “The key invention [of Europe in the Middle Ages]", in particular the ⌊>verge escapement>⌋< (Bolter 1984:24) that provides us with the tick and tock of a mechanical clock.
The clock : Bolter credits the invention of the weight - driven clock as “ The key invention [ of Europe in the Middle Ages ] ” , in particular the verge escapement< ( Bolter 1984:24 ) that provides us with the tick and tock of a mechanical clock .

Se the < in the word escapement< . Maybe a bug introduced when the markups were added?

Hi @danflick , see delph-in/pydelphin#371 (comment); a complicated regex is needed to allow < in the name of the predicates. Can we avoid that? I prefer to consider the predicate names convention from ERG as not part of the MRS text representation grammar.

I could not confirm the original content. Both https://catalog.ldc.upenn.edu/LDC2013T19 and https://catalog.ldc.upenn.edu/LDC99T42 do not contain the 201 set.

@fcbond
Copy link
Member

fcbond commented Dec 26, 2023 via email

@arademaker
Copy link
Member Author

arademaker commented Dec 27, 2023

% ace -g ../erg.dat -E
The word(s) 
The word(s)

% ace -g ../erg.dat -Tf
The word(s)
SENT: The word(s)
[ LTOP: h0
INDEX: e2 [ e SF: prop ]
RELS: < [ unknown<0:11> LBL: h1 ARG0: e2 ARG: x4 [ x PERS: 3 NUM: pl IND: + ] ]
 [ _the_q<0:3> LBL: h5 ARG0: x4 RSTR: h6 BODY: h7 ]
 [ _word_n_of<4:11> LBL: h8 ARG0: x4 ARG1: i9 ] >
HCONS: < h0 qeq h1 h6 qeq h8 >
ICONS: < > ]
NOTE: 1 readings, added 428 / 50 edges to chart (17 fully instantiated, 22 actives used, 11 passives used)	RAM: 1337k

There is no mark for the 'optional' plural. word(s) is always considered plural and one single token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants