Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplemrs.encode() doesn't escape quotes properly #367

Closed
EricZinda opened this issue May 26, 2023 · 5 comments · Fixed by #369
Closed

simplemrs.encode() doesn't escape quotes properly #367

EricZinda opened this issue May 26, 2023 · 5 comments · Fixed by #369

Comments

@EricZinda
Copy link

simplemrs.encode(mrs) of the MRS for:

"Blue" is in this folder

Creates an MRS that can't be loaded by simplemrs.loads(). It fails because of the " characters:

File "/Users/ericzinda/Enlistments/Perplexity/venv/lib/python3.8/site-packages/delphin/codecs/simplemrs.py", line 61, in loads
    ms = list(_decode(s.splitlines()))
  File "/Users/ericzinda/Enlistments/Perplexity/venv/lib/python3.8/site-packages/delphin/codecs/simplemrs.py", line 174, in _decode
    yield _decode_mrs(lexer)
  File "/Users/ericzinda/Enlistments/Perplexity/venv/lib/python3.8/site-packages/delphin/codecs/simplemrs.py", line 213, in _decode_mrs
    lexer.expect_type(RBRACK)
  File "/Users/ericzinda/Enlistments/Perplexity/venv/lib/python3.8/site-packages/delphin/util.py", line 539, in expect_type
    return self.expect(*((arg, None) for arg in args), skip=skip)
  File "/Users/ericzinda/Enlistments/Perplexity/venv/lib/python3.8/site-packages/delphin/util.py", line 508, in expect
    raise self._errcls('expected: ' + err,
delphin.mrs._exceptions.MRSSyntaxError: 
  line 1, character 4
    [ ""blue" is in this folder" TOP: h0 INDEX: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] RELS: < [ proper_q<0:6> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg ] RSTR: h5 BODY: h6 ] [ fw_seq<-1:-1> LBL: h7 ARG0: x3 ARG1: i8 ] [ quoted<1:5> LBL: h7 ARG0: i8 CARG: "blue" ] [ _in_p_loc<10:12> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x10 [ x PERS: 3 NUM: sg IND: + ] ] [ _this_q_dem<13:17> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ] [ _folder_n_of<18:24> LBL: h14 ARG0: x10 ARG1: i15 ] > HCONS: < h0 qeq h1 h5 qeq h7 h12 qeq h14 > ]
        ^
MRSSyntaxError: expected: ]

Converting the string to:

'Blue' is in this folder

(single quotes) does round trip properly

@goodmami
Copy link
Member

Ok, thanks @EricZinda, it looks like it is not escaping the quotes in the surface string on serialization. Should be an easy fix. Want to give it a shot?

@goodmami
Copy link
Member

Want to give it a shot?

@EricZinda nevermind, I went ahead and fixed it. Try out v1.8.1 and let me know if it worked for you.

@EricZinda
Copy link
Author

@goodmami thanks so much. v1.8.1 Works great!

@arademaker
Copy link
Member

arademaker commented May 30, 2023

This is weird; I could not reproduce the error reported by @EricZinda in the previous version of PyDelphin.

from delphin import ace
from delphin.codecs import simplemrs, mrx
response = ace.parse('erg.dat', '"Blue" is in this folder')
m = response.result(1).mrs()
print(simplemrs.encode(m, indent=True), file = open("lixo.txt", "w"))
a = open('lixo.txt').read()
m1 = simplemrs.loads(a)[0]
print(simplemrs.encode(m1, indent=True))

No error!

@goodmami
Copy link
Member

@arademaker The surface field of the MRS being populated depends on how ACE is invoked. If you use the standard ACE interface at the command line and use PyDelphin to convert it, you should see it:

$ ace -g ../erg-2018.dat -1 <<< "\"Blue\" is in this folder." | delphin convert -f ace --color=never
NOTE: 1 readings, added 1694 / 597 edges to chart (305 fully instantiated, 101 actives used, 180 passives used)	RAM: 5835k
NOTE: parsed 1 / 1 sentences, avg 5835k, time 0.02512s
[ ""Blue" is in this folder."
  TOP: h0
  INDEX: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ]
  RELS: < [ udef_q<0:6> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg ] RSTR: h5 BODY: h6 ]
          [ _blue_a_1<0:6> LBL: h7 ARG0: x3 ARG1: i8 ]
          [ _in_p_loc<10:12> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x9 [ x PERS: 3 NUM: sg IND: + ] ]
          [ _this_q_dem<13:17> LBL: h10 ARG0: x9 RSTR: h11 BODY: h12 ]
          [ _folder_n_of<18:25> LBL: h13 ARG0: x9 ARG1: i14 ] >
  HCONS: < h0 qeq h1 h5 qeq h7 h11 qeq h13 > ]

In this case, PyDelphin looks for the SENT: line generated by ACE and uses it to fill the surface field. When you use the ACE module in Python, it defaults to ACE's --tsdb-stdout mode, which might not report the :surface field, in which case PyDelphin would not be able to populate the MRS structure with the information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants