-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translation tooltip images have extra encoding in paths #81
Comments
I admit that some of the text (e.g., urls) in HTML attributes are excessively encoded. However, I used to have a method (I think it was called minimallyEncodeAsHTMLAttribute()) which encoded a few more characters than what you suggested to minimally encode text that was stored as HTML attributes. However, that is insufficient for security reasons, notably the danger or cross-site scripting (XSS). See You can say, "yeah, but surely you don't need to encode that aggressively". But I think the problem was that ERDDAP was being flagged by security scans because of the possibility of XSS vulnerabilities. The mere presence of certain characters (characters you wouldn't think to be trouble) was flagged. So the situation is not a simple as one might expect and the safe thing was to aggressively encode the attributes (which has not caused any problems for ~8 years, until now). I suspect the happy medium is somewhere between what minimallyEncodeAsHTMLAttribute did and the current system, but it is exceedingly hard to know what that set of characters that need to be encoded (to avoid trouble with security scans) is. So, I will try to less aggressively encode those particular urls so the problem goes away for you. You say "it's hard to make good rewrite rules", but encodeAsHTMLAttributes just has one rule: if the character isn't a letter or a digit, then encode it as &#xhh; (where hh is the hexadecimal version of the character number). That is a legitimate encoding of the HTML attribute. So it seems like a solution for you is simply to decode those HTML attributes via the inverse of that rule. This is a subset of the standard way of encoding HTML attributes. You'll always have to deal with the possibility of some characters being encoded this way. I don't know what tools you have at your disposal in NGINX, but hopefully you can write the inverse of that rule if a built in function doesn't already exist. I admit I don't quite understand the problem you face. After all, when the browser reads the HTML, one of the first things it does is decode the attributes. It's not like a browser is ever going to redirect anyone to http:&x2f;&x2f;erddap1&x2e;8080&x2f;erddap&x2f;index&x2e;html. User's get redirected to the decoded URL. So the problem is that your system (NGINX?) isn't decoding the attribute, which is something it needs to do in any case. The current encoding may be excessive, but it isn't illegal or incorrect. |
You said "it's hard to make good rewrite rules", but you didn't say what tools you have at your disposal. Since you say it's hard, |
I found the updated link for the OWASP Cheatsheets Book: So my current method is following the OWASP advice exactly. It would be great if we could solve the problem you face via a change in your NGINX configuration. |
Hi!
I'm setting up ERDDAP behind an nginx proxy and using the sub module to rewrite the HTML content to have the proper URLs for our host (mostly because I'm planning a federated install at some point in the future and will only have one host name, so I'm doing something like http://publichost/1/erddap --> http://erddap1:8080/erddap, http://publichost/2/erddap --> http://erddap2:8080/erddap).
One issue I've noticed with this is that some links are getting very encoded to the point where it's hard to make good rewrite rules (notably where there's a tooltip and an image) - this seems to be the doing of XML.encodeAsHTMLAttribute() which is converting everything that isn't a digit or letter into an entity. The result is a URL like http://erddap1.org:8080/erddap/index.html is actually http:&x2f;&x2f;erddap1&x2e;8080&x2f;erddap&x2f;index&x2e;png which makes using sub more challenging.
It would be nice for these URLs to be consistent with the rest of the URLs on the page (which are not encoded in this fashion). I think it is safe to include :/. in the characters that encodeAsHTMLAttribute() would not encode at least, and generally only &, quotation marks, and those characters not supported in your character encoding need encoding (so codepoints > 255)
The text was updated successfully, but these errors were encountered: