You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Most browsers will transform this, but I have had issues with following links in some Python libraries if these HTML entities aren't explicitly escaped beforehand. It's also a pretty odd way to represent simple characters like periods and underscores where the usual characters would suffice. Any reason why these characters shouldn't be used instead of encoding to HTML entities?
The text was updated successfully, but these errors were encountered:
It is the attributes of HTML and XML tags that must be strongly encoded, for security reasons. The code that does this is in com/cohort/util/XML.java in the method called encodeAsHTMLAttribute. The JavaDoc for that method explains:
* For security reasons, for text that will be used as an HTML or XML attribute,
* this replaces non-alphanumeric characters with HTML Entity &#xHHHH; format.
* See HTML Attribute Encoding at
* [https://owasp.org/www-pdf-archive/OWASP_Cheatsheets_Book.pdf](https://owasp.org/www-pdf-archive/OWASP_Cheatsheets_Book.pdf)
* pg 188, section 25.4
* "Encoding Type: HTML Attribute Encoding
* Encoding Mechanism:
* Except for alphanumeric characters, escape all characters with the HTML Entity &#xHH;
* format, including spaces. (HH = Hex Value)".
* On the need to escape HTML attributes: [http://wonko.com/post/html-escaping](http://wonko.com/post/html-escaping)
Both of the links there are interesting reading.
One might argue that in some circumstances this strict encoding is not necessary. Perhaps. Perhaps not. The problem is that it is very time consuming (even if we assume the programmer has 100% understanding of the situation) and error prone to try to make that determination. It is vastly simpler and (more important) vastly safer to just routinely encode all attributes in the safe and recommended way.
ERDDAP does some bizarre name munging to HTML entities in XML listings.
For example in https://gcoos4.tamu.edu/erddap/metadata/iso19115/xml/ there are numerous href values like this
2004JuvenileSportfishNOAA_DATA_Mean_v0_0_iso19115.xml
Most browsers will transform this, but I have had issues with following links in some Python libraries if these HTML entities aren't explicitly escaped beforehand. It's also a pretty odd way to represent simple characters like periods and underscores where the usual characters would suffice. Any reason why these characters shouldn't be used instead of encoding to HTML entities?
The text was updated successfully, but these errors were encountered: