LaTeX processing is not being done on ingestion #644

davidweichiang · 2019-11-12T17:42:54Z

This came up in #628, and I think it appeared for the first time for EMNLP 2019 because we pushed some simplifying changes to START that turned off LaTeX on their end.

Is it because normalize_anth.py is not being run with the -t option?

The problem is that currently, normalize_anth.py cannot be rerun with the -t option; all kinds of errors come up. It ought to be fixable but might not be an easy fix.

The text was updated successfully, but these errors were encountered:

mjpost · 2019-11-12T18:52:41Z

Is it because normalize_anth.py is not being run with the -t option?

Ah, yes, this must be it!

I do not call normalize_anth.py directly, but only call it via bin/ingest.py, where I call 'process()', but pass it "xml" instead of "latex".

Can I use normalize_anth.py while reading in an XML file?

davidweichiang · 2019-11-12T18:53:18Z

Sorry, I didn't understand the last question...

mjpost · 2019-11-12T18:53:48Z

It looks like I should change this line, passing "latex" instead of "xml". Is that correct?

[Edit: added the link]

davidweichiang · 2019-11-12T18:56:23Z

Right, with the caveat that LaTeX processing should not be done more than once.

mjpost · 2019-11-12T18:57:30Z

Ingest is called just once, so this is the perfect place for it.

davidweichiang · 2019-11-12T19:08:36Z

OK, and do you want to do anything about EMNLP 2019?

mjpost · 2019-11-12T19:09:45Z

Yes, will add to #645 (don't merge yet).

mjpost · 2019-11-12T19:14:43Z

Did abstracts, titles and others are trickier at this point, will require custom script. Is it worth it for me to do that?

davidweichiang · 2019-11-12T19:16:32Z

Maybe it's easier to eyeball all the titles.

mjpost · 2019-11-12T19:17:16Z

Just titles, though? Or anything else? (author names?)

davidweichiang · 2019-11-12T19:19:38Z

I think START sends us author names in UTF-8.

davidweichiang · 2019-11-12T19:33:08Z

I took a quick look at the D19 index page and only found

A Label Informative Wide & Deep Classifier for Patents and Papers
LTRC-MT Simple & Effective Hindi-English Neural Machine Translation Systems at WAT 2019
Supervised neural machine translation based on data augmentation and improved training & inference process
Finding Generalizable Evidence by Learning to Convince Q&A Models

uniblock -> uniblock

Answering Naturally : Factoid to Full length Answer Generation (not sure if this needs to be corrected)
Samvaadhana : A Telugu Dialogue System in Hospital Domain (same)
Efficiency through Auto-Sizing:Notre Dame NLP’s Submission to the WNGT 2019 Efficiency Task (missing space after colon -- yes, I'd like to correct this!)

I saw tons of author capitalization problems (#643).

call normalize correctly (closes #644)

call normalize correctly (closes acl-org#644)

davidweichiang closed this as completed in 6996d00 Nov 12, 2019

davidweichiang added a commit that referenced this issue Nov 12, 2019

Merge pull request #645 from acl-org/issue-644

2987282

call normalize correctly (closes #644)

najtin pushed a commit to ir-anthology/ir-anthology that referenced this issue Jun 9, 2021

call normalize correctly (closes acl-org#644)

d2f9078

najtin pushed a commit to ir-anthology/ir-anthology that referenced this issue Jun 9, 2021

Merge pull request acl-org#645 from acl-org/issue-644

99bfb15

call normalize correctly (closes acl-org#644)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LaTeX processing is not being done on ingestion #644

LaTeX processing is not being done on ingestion #644

davidweichiang commented Nov 12, 2019

mjpost commented Nov 12, 2019

davidweichiang commented Nov 12, 2019

mjpost commented Nov 12, 2019 •

edited

Loading

davidweichiang commented Nov 12, 2019

mjpost commented Nov 12, 2019

davidweichiang commented Nov 12, 2019

mjpost commented Nov 12, 2019

mjpost commented Nov 12, 2019

davidweichiang commented Nov 12, 2019

mjpost commented Nov 12, 2019

davidweichiang commented Nov 12, 2019

davidweichiang commented Nov 12, 2019

LaTeX processing is not being done on ingestion #644

LaTeX processing is not being done on ingestion #644

Comments

davidweichiang commented Nov 12, 2019

mjpost commented Nov 12, 2019

davidweichiang commented Nov 12, 2019

mjpost commented Nov 12, 2019 • edited Loading

davidweichiang commented Nov 12, 2019

mjpost commented Nov 12, 2019

davidweichiang commented Nov 12, 2019

mjpost commented Nov 12, 2019

mjpost commented Nov 12, 2019

davidweichiang commented Nov 12, 2019

mjpost commented Nov 12, 2019

davidweichiang commented Nov 12, 2019

davidweichiang commented Nov 12, 2019

mjpost commented Nov 12, 2019 •

edited

Loading