-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some overcapitalization #590
Conversation
Note there isn't consensus on whether this is an error: #333 |
I think the linked issue was primarily about author names. For titles, I think the argument is stronger for preserving what's written on the paper, because BibTeX (at least our styles) automatically lowercase the titles. Of course, the |
so that sounds like a yes on this, no? |
I was trying to say that if the original paper's title was written in all caps, I kind of think it makes sense to keep it in all caps in the XML. What would you think about lowercasing them during HTML generation? |
If we lower-cased those, what do we do about the ones that are currently mixed case? In my opinion, we should not re-case(?) those and by extension also not re-case the uppercase ones because otherwise there are many possibilities for inconsistencies and I don't see big merits for re-casing uppercase titles. E.g. "A Way to Capitalize Words" -> "A way to capitalize words" may be done by a bibtex style but IMO we should not do this. In addition, we have to decide what kind of re-casing should be done for uppercase titles. Title-case? lowercase? |
I think the two ideal solutions would be either (a) preserve case in titles and authors as closely as possible and algorithmically normalize case when generating HTML, or (b) normalize case in the XML. It doesn't seem to make much sense to use different policies for titles and authors just because BibTeX does. Given that I just normalized a boatload of author names (#695), I guess I am okay with normalizing titles also. Was I the only one who didn't want to before? I suggest that the best systematic way to do this is to try to insert We could try adapting the existing auto fixed-case script (currently it skips titles that are in all-caps), but many words are going to be tricky. The current script handles acronyms simply by marking any word written in all caps as |
@@ -2,7 +2,7 @@ | |||
<collection id="H93"> | |||
<volume id="1"> | |||
<meta> | |||
<booktitle><fixed-case>HUMAN</fixed-case> <fixed-case>LANGUAGE</fixed-case> <fixed-case>TECHNOLOGY</fixed-case>: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993</booktitle> | |||
<booktitle>Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993</booktitle> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
H L T need <fixed-case>
I'll close this since #700 is the same but bigger (although I didn't do booktitles yet). |
H93 has a lot of text in all-caps. I fixed a bit of this by hand. It's probably a WIP overall, but this particular commit is a good start. I think it's better to do this bit-by-bit, opportunistically.