Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some overcapitalization #590

Closed
wants to merge 1 commit into from
Closed

Conversation

aryamccarthy
Copy link
Member

H93 has a lot of text in all-caps. I fixed a bit of this by hand. It's probably a WIP overall, but this particular commit is a good start. I think it's better to do this bit-by-bit, opportunistically.

@mjpost
Copy link
Member

mjpost commented Oct 21, 2019

Note there isn't consensus on whether this is an error: #333

@davidweichiang
Copy link
Collaborator

I think the linked issue was primarily about author names. For titles, I think the argument is stronger for preserving what's written on the paper, because BibTeX (at least our styles) automatically lowercase the titles.

Of course, the <fixed-case> tags should be removed.

@aryamccarthy
Copy link
Member Author

so that sounds like a yes on this, no?

@davidweichiang
Copy link
Collaborator

I was trying to say that if the original paper's title was written in all caps, I kind of think it makes sense to keep it in all caps in the XML.

What would you think about lowercasing them during HTML generation?

@akoehn
Copy link
Member

akoehn commented Oct 29, 2019

If we lower-cased those, what do we do about the ones that are currently mixed case? In my opinion, we should not re-case(?) those and by extension also not re-case the uppercase ones because otherwise there are many possibilities for inconsistencies and I don't see big merits for re-casing uppercase titles.

E.g. "A Way to Capitalize Words" -> "A way to capitalize words" may be done by a bibtex style but IMO we should not do this. In addition, we have to decide what kind of re-casing should be done for uppercase titles. Title-case? lowercase?

@davidweichiang
Copy link
Collaborator

davidweichiang commented Dec 14, 2019

I think the two ideal solutions would be either (a) preserve case in titles and authors as closely as possible and algorithmically normalize case when generating HTML, or (b) normalize case in the XML. It doesn't seem to make much sense to use different policies for titles and authors just because BibTeX does.

Given that I just normalized a boatload of author names (#695), I guess I am okay with normalizing titles also. Was I the only one who didn't want to before?

I suggest that the best systematic way to do this is to try to insert <fixed-case> tags in the right places, and then actually doing the normalization will be easy.

We could try adapting the existing auto fixed-case script (currently it skips titles that are in all-caps), but many words are going to be tricky. The current script handles acronyms simply by marking any word written in all caps as <fixed-case>. That won't work here.

@@ -2,7 +2,7 @@
<collection id="H93">
<volume id="1">
<meta>
<booktitle><fixed-case>HUMAN</fixed-case> <fixed-case>LANGUAGE</fixed-case> <fixed-case>TECHNOLOGY</fixed-case>: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993</booktitle>
<booktitle>Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993</booktitle>
Copy link
Collaborator

@davidweichiang davidweichiang Dec 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H L T need <fixed-case>

@davidweichiang
Copy link
Collaborator

I'll close this since #700 is the same but bigger (although I didn't do booktitles yet).

@davidweichiang davidweichiang deleted the aryamccarthy-patch-1 branch December 18, 2019 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants