Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow SimpleMRS to read from noisy input #92

Closed
goodmami opened this issue Jan 13, 2017 · 2 comments
Closed

Allow SimpleMRS to read from noisy input #92

goodmami opened this issue Jan 13, 2017 · 2 comments

Comments

@goodmami
Copy link
Member

goodmami commented Jan 13, 2017

The -q option is currently used in pipelines involving ACE and pyDelphin to suppress the input sentence from the output stream, which (along with -T, which suppresses derivations) allows pyDelphin to read the stream containing only MRS data. This option may not be supported in the future, so perhaps the SimpleMRS reader should allow noisy data (see how the Penman package handles noisy data).

Non-MRS data can probably be discarded. Alternatively, if we expect SENT: ... to contain the input string, we could use that to fill the XMRS object's surface field.

(edited to fix Penman URL)

@fcbond
Copy link
Member

fcbond commented Jan 13, 2017 via email

@goodmami goodmami modified the milestone: v0.7.0 Jan 20, 2017
@goodmami goodmami removed this from the v0.7.0 milestone May 7, 2018
@goodmami goodmami added this to the v0.8.0 milestone Aug 6, 2018
@goodmami
Copy link
Member Author

Since this issue is really about converting from ACE output and ignoring the non-MRS data, I added ace as a --from codec for the convert subcommand. Redoing the SimpleMRS codec would have been much more work, and I'd want to do it properly instead of hacking in some iterative decoding. Besides, I didn't want to lose the parsing error messages on bad input, which can be useful.

$ echo -e "One dog slept.\nTwo dogs slept." | ace -g ~/grammars/erg-1214-x86-64-0.9.27.dat | ./delphin.sh convert --from ace --pretty-print 
NOTE: 1 readings, added 595 / 96 edges to chart (43 fully instantiated, 52 actives used, 27 passives used)	RAM: 1524k
NOTE: 2 readings, added 615 / 118 edges to chart (49 fully instantiated, 68 actives used, 32 passives used)	RAM: 1649k
NOTE: parsed 2 / 2 sentences, avg 1586k, time 0.01775s
[ "One dog slept."
  TOP: h0
  INDEX: e2 [ e SF: prop TENSE: past MOOD: indicative PROG: - PERF: - ]
  RELS: < [ udef_q<0:3> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
          [ card<0:3> LBL: h7 ARG0: e9 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x3 CARG: "1" ]
          [ _dog_n_1<4:7> LBL: h7 ARG0: x3 ]
          [ _sleep_v_1<8:14> LBL: h1 ARG0: e2 ARG1: x3 ] >
  HCONS: < h0 qeq h1 h5 qeq h7 > ]
[ "Two dogs slept."
  TOP: h0
  INDEX: e2 [ e SF: prop TENSE: past MOOD: indicative PROG: - PERF: - ]
  RELS: < [ udef_q<0:3> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: pl IND: + ] RSTR: h5 BODY: h6 ]
          [ card<0:3> LBL: h7 ARG0: e9 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x3 CARG: "2" ]
          [ _dog_n_1<4:8> LBL: h7 ARG0: x3 ]
          [ _sleep_v_1<9:15> LBL: h1 ARG0: e2 ARG1: x3 ] >
  HCONS: < h0 qeq h1 h5 qeq h7 > ]
[ "Two dogs slept."
  TOP: h0
  INDEX: e2 [ e SF: prop TENSE: past MOOD: indicative PROG: - PERF: - ]
  RELS: < [ focus_d<0:15> LBL: h1 ARG0: e4 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: e2 ARG2: e5 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ]
          [ loc_nonsp<0:3> LBL: h1 ARG0: e5 ARG1: e2 ARG2: x6 [ x PERS: 3 NUM: sg ] ]
          [ number_q<0:3> LBL: h7 ARG0: x6 RSTR: h8 BODY: h9 ]
          [ card<0:3> LBL: h10 ARG0: x6 ARG1: i12 CARG: "2" ]
          [ udef_q<4:8> LBL: h13 ARG0: x3 [ x PERS: 3 NUM: pl IND: + ] RSTR: h14 BODY: h15 ]
          [ _dog_n_1<4:8> LBL: h16 ARG0: x3 ]
          [ _sleep_v_1<9:15> LBL: h1 ARG0: e2 ARG1: x3 ] >
  HCONS: < h0 qeq h1 h8 qeq h10 h14 qeq h16 > ]

This change reads both regular ACE parsing output and with the --tsdb-stdout option. For the former, it detects SENT: lines and populates the MRSs' surface attributes accordingly. It's not easy (or possible) to get the original string (only the REPP-tokenized form) from --tsdb-stdout results, so I don't attempt to set surface there.

This does not detect the MRS data for generation (when using --show-realization-mrses).

Also note that this does not use the delphin.interfaces.ace module, so there may different bugs than when using the ACE interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants