Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMR: Getting back our 75 f-score #45

Open
namednil opened this issue Jul 17, 2019 · 12 comments
Open

AMR: Getting back our 75 f-score #45

namednil opened this issue Jul 17, 2019 · 12 comments
Labels
mrp2019 Issues related to the MRP 2019 shared task

Comments

@namednil
Copy link
Contributor

I indeed messed something up in the relabeling with the properties. The fix (pushed) gives us an improvement of 1 point (see spreadsheet for results with bug):

{"n": 1681,
 "exact": 59,
 "tops": {"g": 1681, "s": 1680, "c": 1378, "p": 0.8202380952380952, "r": 0.8197501487209994, "f": 0.8199940493900625},
 "labels": {"g": 19270, "s": 18295, "c": 14913, "p": 0.8151407488384804, "r": 0.773897249610794, "f": 0.7939837614801012},
 "properties": {"g": 3284, "s": 2864, "c": 2013, "p": 0.7028631284916201, "r": 0.6129719853836785, "f": 0.654847104749512},
 "anchors": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "edges": {"g": 19644, "s": 17496, "c": 10924, "p": 0.6243712848651121, "r": 0.5560985542659336, "f": 0.5882606354334949},
 "attributes": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "all": {"g": 43879, "s": 40335, "c": 29228, "p": 0.7246312135862154, "r": 0.6661045146881196, "f": 0.6941363668748665},
 "time": 83.13233494758606,
 "cpu": 0.750736442}

I parsed the new dev set with an old model (https://www.comet.ml/namednil/mrp/93b5d12f46b5462bbe562ab8e16b9865) and got the following results:

{"n": 1681,
 "exact": 59,
 "tops": {"g": 1681, "s": 1680, "c": 1414, "p": 0.8416666666666667, "r": 0.8411659726353361, "f": 0.8414162451651295},
 "labels": {"g": 19270, "s": 18217, "c": 15279, "p": 0.838722072789153, "r": 0.7928905033731188, "f": 0.8151625896977619},
 "properties": {"g": 3284, "s": 2466, "c": 1453, "p": 0.589213300892133, "r": 0.44244823386114496, "f": 0.5053913043478262},
 "anchors": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "edges": {"g": 19644, "s": 17505, "c": 11936, "p": 0.6818623250499857, "r": 0.6076155569130524, "f": 0.6426014159196749},
 "attributes": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "all": {"g": 43879, "s": 39868, "c": 30082, "p": 0.7545399819404033, "r": 0.685567127783222, "f": 0.7184018532007116},
 "time": 97.26900267601013,
 "cpu": 0.7502532259999999}

The property score is so low here because I used the new version of am-tools where the property handling expects the input to look different and thus doesn't recognize named entities -- so nothing to worry about.
Also to keep in mind about these scores:

  • the devset I evaluated on probably mostly belonged to the training set of the model
  • I had to force the model to accept a different named entity tagset (LOC instead of LOCATION), I imagine by creating new (untrained) embeddings for the new vocabulary items.
@namednil namednil added the mrp2019 Issues related to the MRP 2019 shared task label Jul 17, 2019
@alexanderkoller
Copy link
Contributor

I can easily change the NE tagset if you tell me what we need. (Will it make a difference at all? My impression is that the identity of the NE tags doesn't matter anywhere in am-tools.

Can we talk about training time briefly once again? It seems strange to me that the NE tags are only available at dev+test time, but the parser is never trained to actually take them into account at training time. Where are NE tags predicted in training?

@namednil
Copy link
Contributor Author

No, it won't make a difference, except if we want to run an old model on new data.

ToAMConll creates the train.amconll file and uses the NE tagger.

@alexanderkoller
Copy link
Contributor

The dev_fixed.amconll file you sent me has some strange asymmetries that I don't understand:

#id:isi_0002.85
1	The	the	the	DT	O	_	_	_	0	_	false
2	game	_	game	NN	O	_	_	_	0	_	false
3	continued	_	continue	VBD	O	_	_	_	0	_	false
4	despite	_	despite	IN	O	_	_	_	0	_	false
5	of	_	of	IN	O	_	_	_	0	_	false
6	the	_	the	DT	O	_	_	_	0	_	false
7	rain	_	rain	NN	O	_	_	_	0	_	false
8	.	_	.	.	O	_	_	_	0	_	false

Obviously column 4 is for the lemmas, but what is colum 3 for? And is it okay that a lot of sentences only have a non-null entry in column 3 for their first tokens, but then not for the subsequent ones?

Are our classes ConllEntry etc. for representing sentences in AM-CoNLL format? (Could we more generally document the columns of AM-CoNLL on the wiki? And perhaps rename the Conll classes to AmConll? The latter maybe after the deadline.)

@alexanderkoller
Copy link
Contributor

No, it won't make a difference, except if we want to run an old model on new data.

Ah, okay.

ToAMConll creates the train.amconll file and uses the NE tagger.

Got it.

@alexanderkoller
Copy link
Contributor

alexanderkoller commented Jul 17, 2019

Collecting weirnesses about dev_fixed.amconll: Some "sentences" actually consist of multiple sentences. This seems like a tokenizer bug from when the graphbank was created, but is it at least consistently wrong in the companion data?

I guess if the POS tags and lemmas were taken from the companion data, that means it is?

#id:bolt12_632_5735.1
1	This	this	this	DT	O	_	_	_	0	_	false
2	is	_	be	VBZ	O	_	_	_	0	_	false
3	actually	_	actually	RB	O	_	_	_	0	_	false
4	not	_	not	RB	O	_	_	_	0	_	false
5	about	_	about	IN	O	_	_	_	0	_	false
6	being	_	be	VBG	O	_	_	_	0	_	false
7	a	_	a	DT	O	_	_	_	0	_	false
8	Samaritan	samaritan	samaritan	NN	O	_	_	_	0	_	false
9	or	_	or	CC	O	_	_	_	0	_	false
10	not.	_	not.	RB	O	_	_	_	0	_	false
11	Many	many	many	JJ	O	_	_	_	0	_	false
12	drunkards	_	drunkard	NNS	O	_	_	_	0	_	false
13	have	_	have	VBP	O	_	_	_	0	_	false
14	lost	_	lose	VBN	O	_	_	_	0	_	false
15	their	_	they	PRP$	O	_	_	_	0	_	false
16	consciousness	_	consciousness	NN	O	_	_	_	0	_	false
17	and	_	and	CC	O	_	_	_	0	_	false
18	will	_	will	MD	O	_	_	_	0	_	false
19	get	_	get	VB	O	_	_	_	0	_	false
20	in	_	in	IN	O	_	_	_	0	_	false
21	a	_	a	DT	O	_	_	_	0	_	false
22	drunken	_	drunken	JJ	O	_	_	_	0	_	false
23	fit	_	fit	NN	O	_	_	_	0	_	false
24	.	_	.	.	O	_	_	_	0	_	false

@namednil
Copy link
Contributor Author

The dev_fixed.amconll file you sent me has some strange asymmetries that I don't understand:

#id:isi_0002.85
1	The	the	the	DT	O	_	_	_	0	_	false
2	game	_	game	NN	O	_	_	_	0	_	false
3	continued	_	continue	VBD	O	_	_	_	0	_	false
4	despite	_	despite	IN	O	_	_	_	0	_	false
5	of	_	of	IN	O	_	_	_	0	_	false
6	the	_	the	DT	O	_	_	_	0	_	false
7	rain	_	rain	NN	O	_	_	_	0	_	false
8	.	_	.	.	O	_	_	_	0	_	false

Obviously column 4 is for the lemmas, but what is colum 3 for? And is it okay that a lot of sentences only have a non-null entry in column 3 for their first tokens, but then not for the subsequent ones?

Column 3 is the "replacement" column and is mostly intended for "name" in AMR. I populate it with tokens from sentences.txt that differ from their respective token in literals.txt. Since sentences.txt is lower-cased such strange things happen but don't affect performance because the NN doesn't read this column and the replacement column is only a backup option in the relabeling. It's the same as in our ACL experiments.

Are our classes ConllEntry etc. for representing sentences in AM-CoNLL format? (Could we more generally document the columns of AM-CoNLL on the wiki? And perhaps rename the Conll classes to AmConll? The latter maybe after the deadline.)

Yes. Let's do the renaming after the deadline. There's a brief documentation here: https://docs.google.com/spreadsheets/d/1oNNFy6vuDr8dcKNjsHrogqjhsCTCOQU5_WQN6-XFDJU/edit#gid=0

@alexanderkoller
Copy link
Contributor

The lemmas, POS tags and NE tags in the dev data all seem fine to me. In particular, there is no mismatch that would be explained by a token sequence and a tag sequence differing in length.

The NE tag LOC is because of a typo I made in NamedEntityRecognizer. I'll leave it as is now, we can fix it easily after the deadline.

@alexanderkoller
Copy link
Contributor

I moved the documentation of AM-CoNLL to the wiki: https://github.com/coli-saar/am-parser/wiki/AM-CoNLL-file-format

Let's put further updates there instead of the GD file.

@alexanderkoller
Copy link
Contributor

Similar mis-tokenized sentence to my dev example above also occur in the training data:

#flavor:2
#framework:amr
#id:bolt12_632_5731.35
1	Our	our	we	PRP$	O	(h<root> / --LEX--)	$LEMMA$	()	7	APP_s	false
2	family	_	family	NN	O	(h<root> / --LEX--)	$LEMMA$	()	6	APP_s	false
3	now	_	now	RB	O	(b<root> / --LEX--  :time-of (p<mod>))	$LEMMA$	(mod())	6	MOD_mod	false
4	can	_	can	MD	O	(p<root> / --LEX--  :ARG1 (r<s>))	possible-01	(s())	6	MOD_s	false
5	only	_	only	RB	O	(explicitanon0<root> / --LEX--  :mod-of (t<mod>))	$LEMMA$	(mod())	6	MOD_mod	false
6	rely	_	rely	VB	O	(i<root> / --LEX--  :ARG0 (t<s>)  :ARG2 (f<o2>)  :ARG1 (a<o>))	$LEMMA$-01	(o(), o2(s_UNIFY_s()), s())	22	APP_snt1	false
7	on	_	on	IN	O	(t<root> / --LEX--  :ARG0 (i<s>)  :ARG1 (s<o>)  :ARG2 (t1<o2>))	have-org-role-91	(o(), o2(), s())	2	MOD_o	false
8	my	_	my	PRP$	O	(h<root> / --LEX--)	i	()	9	APP_s	false
9	father	_	father	NN	O	(p<root> / person  :ARG0-of (h / have-rel-role-91  :ARG2 (m / --LEX--)  :ARG1 (h1<s>)))	$LEMMA$	(s())	11	APP_o	false
10	's	_unk_char_s	’s	POS	O	(h<root> / --LEX--)	member	()	7	APP_o2	false
11	pay	_	pay	NN	O	(t1<root> / thing  :ARG1-of (g / --LEX--  :ARG2 (a<o>)))	$LEMMA$-01	(o())	6	APP_o	false
12	to	_	to	TO	O	(h<root> / --LEX--)	monetary-quantity	()	17	APP_o	false
13	survive.	_	survive.	VB	O	(d<root> / --LEX--  :ARG0 (h<s>))	survive-01	(s())	6	APP_o2	false
14	We	we	we	PRP	O	(h<root> / --LEX--)	$LEMMA$	()	17	APP_s	false
15	absolutely	_	absolutely	RB	O	(explicitanon0<root> / --LEX--  :mod-of (t<mod>))	absolute	(mod())	16	MOD_mod	false
16	cannot	_	cannot	MD	O	(p<root> / --LEX--  :ARG1 (b<s>)  :polarity (explicitanon0 / -))	possible-01	(s())	22	APP_snt2	false
17	afford	_	afford	VB	O	(a<root> / --LEX--  :ARG0 (g<s>)  :ARG1 (m<o>))	$LEMMA$-01	(o(), s())	16	APP_s	false
18	such	_	such	JJ	O	(s<root> / --LEX--  :degree-of (t<mod>))	$LEMMA$	(mod())	19	MOD_mod	false
19	high	_	high	JJ	O	(p<root> / --LEX--  :ARG1 (r<s>))	$LEMMA$-02	(s())	12	MOD_s	false
20	hospital	_	hospital	NN	O	(h<root> / --LEX--)	$LEMMA$	()	21	APP_s	false
21	costs	_	cost	NNS	O	(b<root> / --LEX--  :ARG2 (d<o>)  :ARG1 (k<s>))	$LEMMA$-01	(o(), s())	12	MOD_o	false
22	.	_	.	.	O	(m<root> / --LEX--  :snt2 (i<snt2>)  :snt1 (n<snt1>))	multi-sentence	(snt1(), snt2())	0	ROOT	false

@alexanderkoller
Copy link
Contributor

The mistokenizations are from the MRP companion data, e.g. /proj/irtg/sempardata/mrp/LDC2019E45/2019/companion/amr/bolt.conllu

@namednil
Copy link
Contributor Author

This is what the sentence looked like in the ConceptNet + CoreNLP version:

#flavor:2
#input:Our family now can only rely on my father's pay to survive. We absolutely cannot afford such high hospital costs.
#framework:amr
#id:bolt12_632_5731.35
#time:2019-04-10 (20:10)
#version:0.9
1	Our	our	we	PRP$	O	(a<root> / --LEX--)	$LEMMA$	()	7	APP_s	false
2	family	_	family	NN	O	(a<root> / --LEX--)	$LEMMA$	()	6	APP_s	false
3	now	_	now	RB	DATE	(t<root> / --LEX--  :time-of (e<mod>))	$LEMMA$	(mod())	6	MOD_mod	false
4	can	_	can	MD	O	(a<root> / --LEX--  :ARG1 (b<s>))	possible-01	(s())	6	MOD_s	false
5	only	_	only	RB	O	(b<root> / --LEX--  :mod-of (e<mod>))	$LEMMA$	(mod())	6	MOD_mod	false
6	rely	_	rely	VB	O	(i<root> / --LEX--  :ARG0 (t<s>)  :ARG2 (f<o2>)  :ARG1 (a<o>))	$LEMMA$-01	(o(), o2(s_UNIFY_s()), s())	14	APP_snt1	false
7	on	_	on	IN	O	(t<root> / --LEX--  :ARG0 (i<s>)  :ARG1 (s<o>)  :ARG2 (t1<o2>))	have-org-role-91	(o(), o2(), s())	2	MOD_o	false
8	my	_	my	PRP$	O	(a<root> / --LEX--)	i	()	9	APP_s	false
9	father	_	father	NN	O	(p<root> / person  :ARG0-of (h / have-rel-role-91  :ARG2 (m / --LEX--)  :ARG1 (h1<s>)))	$LEMMA$	(s())	11	APP_o	false
10	's	_	's	POS	O	(a<root> / --LEX--)	member	()	7	APP_o2	false
11	pay	_	pay	NN	O	(t1<root> / thing  :ARG1-of (g / --LEX--  :ARG2 (a<o>)))	$LEMMA$-01	(o())	6	APP_o	false
12	to	_	to	TO	O	_	_	_	0	IGNORE	false
13	survive	_	survive	VB	O	(s<root> / --LEX--  :ARG0 (s1<s>))	$LEMMA$-01	(s())	6	APP_o2	false
14	.	_	.	.	O	(m<root> / --LEX--  :snt2 (i<snt2>)  :snt1 (n<snt1>))	multi-sentence	(snt1(), snt2())	0	ROOT	false
15	We	we	we	PRP	O	(a<root> / --LEX--)	$LEMMA$	()	19	APP_s	false
16	absolutely	_	absolutely	RB	O	(b<root> / --LEX--  :mod-of (e<mod>))	absolute	(mod())	17	MOD_mod	false
17	can	_	can	MD	O	(a<root> / --LEX--  :ARG1 (b<s>))	possible-01	(s())	14	APP_snt2	false
18	not	_	not	RB	O	(a<root> / --LEX--  :polarity-of (s<mod>))	-	(mod())	17	MOD_mod	false
19	afford	_	afford	VB	O	(f<root> / --LEX--  :ARG1 (r<o>)  :ARG0 (h<s>))	$LEMMA$-01	(o(), s())	17	APP_s	false
20	such	_	such	JJ	O	(s<root> / --LEX--  :degree-of (b<mod>))	$LEMMA$	(mod())	21	MOD_mod	false
21	high	_	high	JJ	O	(a<root> / --LEX--  :ARG1 (b<s>))	$LEMMA$-02	(s())	24	MOD_s	false
22	hospital	_	hospital	NN	O	(a<root> / --LEX--)	$LEMMA$	()	23	APP_s	false
23	costs	_	cost	NNS	O	(b<root> / --LEX--  :ARG2 (d<o>)  :ARG1 (k<s>))	$LEMMA$-01	(o(), s())	24	MOD_o	false
24	.	_	.	.	O	(a<root> / --LEX--)	monetary-quantity	()	19	APP_o	false

@namednil
Copy link
Contributor Author

namednil commented Jul 21, 2019

One difference between decomposable devset and dev set is how well we recognize named entities, which affects whether tokens are joined; this has of course an impact on the AM dependency tree.

Prediction on decomposable dev set (= with gold named entities, thus "joining" is gold):

#flavor:2
#framework:amr
#id:bolt12_6453_3054.4
1	negative	_	negative	JJ	O	(p<root> / --LEX--  :ARG1 (r<s>))	$LEMMA$-02	(s())	2	MOD_s	True
2	news	_	news	NN	O	(h<root> / --LEX--)	$LEMMA$	()	8	APP_op1	True
3	on	_	on	IN	O	_	mean-01	_	0	IGNORE	True
4	the	_	the	DT	O	_	multi-sentence	_	0	IGNORE	True
5	Internet.	internet.	internet.	NN	ORGANIZATION	(d<root> / --LEX--  :location-of (k<mod>))	$LEMMA$	(mod())	2	MOD_mod	True
6	Online	online	online	JJ	ORGANIZATION	(explicitanon0<root> / --LEX--  :mod-of (t<mod>))	$LEMMA$	(mod())	7	MOD_mod	True
7	Marketing	marketing	marketing	NN	ORGANIZATION	(h<root> / --LEX--)	market-01	()	8	APP_op2	True
8	or	_	or	CC	O	(a<root> / --LEX--  :op3 (z<op3>)  :op1 (n<op1>)  :op2 (n1<op2>))	$LEMMA$	(op1(), op2(), op3())	13	APP_s	True
9	Online	online	online	JJ	O	(explicitanon0<root> / --LEX--  :mod-of (t<mod>))	$LEMMA$	(mod())	10	MOD_mod	True
10	Publicity.	publicity.	publicity.	NN	O	(h<root> / --LEX--)	search-01	()	8	APP_op3	True
11	For	for	for	IN	O	_	multi-sentence	_	0	IGNORE	True
12	details	_	detail	NNS	O	(l<root> / --LEX--  :purpose-of (m<mod>))	$LEMMA$-01	(mod())	15	MOD_mod	True
13	,	_	,	,	O	(p<root> / --LEX--  :ARG1 (r<s>))	multi-sentence	(s())	15	APP_o	True
14	please	_	please	UH	O	(explicitanon0<root> / --LEX--  :polite-of (c<mod>))	+	(mod())	15	MOD_mod	True
15	contact	_	contact	VB	O	(h<root> / --LEX--  :mode (explicitanon0 / imperative)  :ARG0 (a<s>)  :ARG1 (p<o>))	$LEMMA$-01	(o(), s())	0	ROOT	True
16	at	_	at	IN	O	(h<root> / --LEX--)	multi-sentence	()	15	APP_s	True

Prediction on dev set (different model though):

#id:bolt12_6453_3054.4
1	negative	_	negative	JJ	O	(p<root> / --LEX--  :ARG1 (r<s>))	$LEMMA$-02	(s())	2	MOD_s	True
2	news	_	news	NN	O	(h<root> / --LEX--)	$LEMMA$	()	13	APP_s	True
3	on	_	on	IN	O	_	pers$LEMMA$	_	0	IGNORE	True
4	the	_	the	DT	O	_	be-located-at-91	_	0	IGNORE	True
5	Internet. Online Marketing	_name_	marketing	NN	ORGANIZATION	(h<root> / --LEX--)	market-01	()	6	APP_op1	True
6	or	_	or	CC	O	(b<root> / --LEX--  :op2 (r<op2>)  :op1 (h<op1>)  :location-of (d<mod>))	$LEMMA$	(mod(), op1(), op2())	2	MOD_mod	True
7	Online	online	online	JJ	O	(explicitanon0<root> / --LEX--  :mod-of (t<mod>))	$LEMMA$	(mod())	8	MOD_mod	True
8	Publicity.	publicity.	publicity.	NN	O	(h<root> / --LEX--)	$LEMMA$	()	6	APP_op2	True
9	For	for	for	IN	O	_	multi-sentence	_	0	IGNORE	True
10	details	_	detail	NNS	O	(l<root> / --LEX--  :purpose-of (m<mod>))	$LEMMA$	(mod())	13	MOD_mod	True
11	,	_	,	,	O	(h<root> / --LEX--)	multi-sentence	()	13	APP_o	True
12	please	_	please	UH	O	(explicitanon0<root> / --LEX--  :polite-of (c<mod>))	+	(mod())	13	MOD_mod	True
13	contact	_	contact	VB	O	(h<root> / --LEX--  :mode (explicitanon0 / imperative)  :ARG0 (a<s>)  :ARG1 (p<o>))	$LEMMA$-01	(o(), s())	0	ROOT	True
14	at	_	at	IN	O	_	multi-sentence	_	0	IGNORE	True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mrp2019 Issues related to the MRP 2019 shared task
Projects
None yet
Development

No branches or pull requests

2 participants