Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base tag is added incorrectly if HTML is minified #1626

Closed
totemcaf opened this issue Oct 30, 2017 · 4 comments
Closed

Base tag is added incorrectly if HTML is minified #1626

totemcaf opened this issue Oct 30, 2017 · 4 comments

Comments

@totemcaf
Copy link

totemcaf commented Oct 30, 2017

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.):

"base", "add-base-url"


this is a BUG REPORT

NGINX Ingress controller version:

gcr.io/google_containers/nginx-ingress-controller:0.9.0-beta.15

Kubernetes version (use kubectl version):

Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.8", GitCommit:"bc6162cc70b4a39a7f39391564e0dd0be60b39e9", GitTreeState:"clean", BuildDate:"2017-10-05T06:35:40Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration:

    AWS

  • OS (e.g. from /etc/os-release):
    NAME="Ubuntu"
    VERSION="16.04.3 LTS (Xenial Xerus)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 16.04.3 LTS"
    VERSION_ID="16.04"
    HOME_URL="http://www.ubuntu.com/"
    SUPPORT_URL="http://help.ubuntu.com/"
    BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
    VERSION_CODENAME=xenial
    UBUNTU_CODENAME=xenial

  • Kernel (e.g. uname -a):
    Linux router-nginx-ingress-controller-1231038799-84q4v 4.4.78-k8s Basic structure  #1 SMP Fri Jul 28 01:28:39 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:

Helm chart: nginx-ingress-0.8.8

  • Others:

What happened:

When the serviced page has no the "head" token in one line by itself:

<!doctype html>
<html>
  <head><meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">

The "base" tag is added after the last tag in line:

<!doctype html>
<html>
  <head><meta charset="utf-8"><base href="https://sandbox.fravega.com/devopsless/">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">

If the page is minified, all content (tags) will be in one line, so the "base" tag will be added at the content end (outside the head tag and after all URLs that should use it). This make resources not requested properly and the page fails.

What you expected to happen:

The "base" tag is added just after the "head" tag ends:

<!doctype html>
<html>
  <head><base href="https://sandbox.fravega.com/devopsless/"><meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">

How to reproduce it (as minimally and precisely as possible):

  • Provide a pod that can serve an HTML page.

  • Configure the Ingress with annotation ingress.kubernetes.io/add-base-url: true

  • Put a page in the server with the <head> tag with other tags on same line (minified):

    <!doctype html><html><head></head><body>Some body</body></html>
    

Anything else we need to know:

The problems is the regular expression used in file https://github.com/kubernetes/ingress-nginx/blob/master/pkg/nginx/template/template.go .

'<head(.*)>'

It uses a greedy operator that tries to match as much as possible.

I'll research for a better regular expression to use.

@totemcaf totemcaf changed the title Base tag is added incorretly if HTNML Base tag is added incorrectly if HTML is minified Oct 30, 2017
@totemcaf
Copy link
Author

A proposed regular expression is:

<(?:H|h)(?:E|e)(?:A|a)(?:D|d)(?:[^">]|"[^"]*")*>

with the replacement:

$1<base href="%v://$http_host%v">

The expression finds a "<" followed by "head" in any case, and then skip any characters that are not a quote nor ">". In case a quote is found, it skips anything until next quote, and then repeats until the ending ">" is found and stops.

@totemcaf
Copy link
Author

Additional optimization:

  • Use option "o" to replace the first match only (can it be two o more "head" tags?)

@aledbf
Copy link
Member

aledbf commented Oct 30, 2017

@totemcaf please check #1433

@totemcaf
Copy link
Author

Moving comments to #1433 and closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants