-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reference links should be bigger #1043
Comments
What if having just a linked number is the intended output? If I explicitly want to have "Section 3.1" as the text link, I prefer to resort to I understand that the way I do requires to take action at the LaTeX source level; maybe putting this extra feature behind an option and letting the user choose whether he wants a heavier post-processing of this kind or not would be the best alternative (from my point of view). |
Almost everybody on arXiv seems to do Adding it as an option sounds like a great idea. This probably applies to some of the other stuff we need to do in Engrafo too. It means the intention remains when you use LaTeXML to convert your own documents, but for the wild west of arXiv, we can enable options to massage documents into better output. This might even work as a plugin in the meantime... I shall investigate... |
Sounds great! 😉 |
Sounds sorta doable, at least to the extent that the section-word isn't obfuscated with styling, and to the extent that you have a plausable set of type words (including plurals, abbreviations, etc). |
If the objective is simply to make
If the objective is to recognize the preceding unit name, that sounds much trickier. I suppose it's probably most feasible at the XML level, perhaps a clever XPath? You'd also need a reasonable dictionary of such words. I'd probably solicit a clever hacker to scan arXiv to generate a list of all words preceding a |
FWIW it’s also fine if you would rather consider this out of scope for LaTeXML. Seems like a good candidate for a plugin or an Engrafo post-processor. |
Also to be clear — yes, the original intention was the latter. Just rewriting \ref to \autoref will produce double unit names in most cases won’t it? |
@brucemiller fair enough, I quickly put together a llamapun stats harvester, and will report with some curious data when it completes over the 08.2018 dataset. Usual caveat that it takes ~3 days to walk the corpus with the current components in place. FWIW, it also feels as out-of-scope to latexml to me, as it goes beyond the original TeX markup in a direction that may or may not be in alignment with the original intention of the author. |
Here is the report from arXMLiv 08.2018, with an excerpt of the top 10 words:
https://gist.github.com/dginev/c83d239524e1380f7b0e5e92a24a5eb2 Comments from a first look:
Ok, will leave things here for a first round of discussion. The data seems to convey that we could grab a shortlist of standardized words from the most frequent entries and auto-upcase them, but that we can't do that reliably for arbitrary words. And this sounds like a stylistic choice for the final presentation, so may be best implemented as a step following latexml, and left to the discretion of the final resource editor. P.S. I lowerecased all words when counting the frequencies, to make the final report smaller. |
I got no feedback on my report, I assume everyone here is busy. That said, is everyone OK with me closing here and continuing this line of work as a post-latexml step, where needed? |
Sorry for delayed feedback; trying to get
something else finished.
This is a curious issue; it certainly could
be seen as being in scope of latexml. At least
to the extent that it only broadens the width
of the hyperlink, w/o changing the actual text.
And does it consistently/correctly enough that
it isn't a new irritant.
Your list of words is a good start; much more
than the top 10 are worth including. One concern
is plurals though. How do you deal with something
like "equations ref{eq:a}--\ref{eq:d}" ?
You might want a synthetic block that contains a--d.
Alternatively, links to b and c, as well.
You probably don't want the equivalent of
"\ref{equation a}--\ref{d}" and certainly
not using the plural "equations".
|
As I said, I'm not against this being part of LaTeXML (optional or default, depending on how consistently it can behave), nor am I against it being a post processing operation. But I'm not likely to find time to do much on it till after a release. So, maybe I should punt to you guys whether we should close or defer to next milestone? |
I'm back to my original thinking that it's out-of-scope in the sense of going a bit too far changing the authors intent; but at the same time, I'd like LaTeXML to be the kind of tool that can enable that sort of reworking. But it's really more plugin or post processing territory. Since the thread has gone inactive, I'll go ahead and close, but if more discussion on strategies is wanted; feel free to reopen. |
Section \ref{section-foo}
turns into output that looks like this:The link is a tiny number, which is fiddly to click. It would be much better if the link was the entire text "Section 3.1". This is makes more sense semantically, too.
Unfortunately this is not easy to do automatically, because the preceding text is defined by the author. In the pre-LaTeXML version of Engrafo, we made an attempt at turning these into a larger links by looking for preceding strings like "section", "figure", "fig.", etc.
The text was updated successfully, but these errors were encountered: