Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The hostname is replaced globally in the HTML, even inappropiately #647

Closed
Flimm opened this issue May 28, 2015 · 15 comments · May be fixed by tobybellwood/govstrap#4
Closed

The hostname is replaced globally in the HTML, even inappropiately #647

Flimm opened this issue May 28, 2015 · 15 comments · May be fixed by tobybellwood/govstrap#4
Labels

Comments

@Flimm
Copy link

Flimm commented May 28, 2015

Take this command:

 browser-sync start --proxy experiment

(I have set up /etc/hosts to make http://experiment work.)

The index.html file being served contains this content:

    <!doctype html>
    <html><body>
    <p>This is an experiment</p>
    <p><img src="/images/experiment/example.jpg"></p>
    <!-- This experiment should prove interesting -->
    <p><a href="http://experiment/example.html">Link</a></p>
    </body></html>

However, the content that ends up being served on http://localhost:3002 looks like this:

    <!doctype html>
    <html><body><script type='text/javascript' id="__bs_script__">//<![CDATA[
        document.write("<script async src='/browser-sync/browser-sync-client.2.7.5.js'><\/script>".replace("HOST", location.hostname));
    //]]></script>
    <p>This is an localhost:3002</p>
    <p><img src="/images/localhost:3002/example.jpg"></p>
    <!-- This localhost:3002 should prove interesting -->
    <p><a href="//localhost:3002/example.html">Link</a></p>
    </body></html>

It seems like Browsersync is simply replacing all instances of experiment it finds with localhost:3002. This is noticeable straight away, as the image fails to load.

There are several errors:

  • Text nodes should be left as is (eg: <p>This is an experiment</p> should remain as is).
  • Attributes that refer to URLs should treat URLs intelligently, transforming /images/experiment/example.jpg to /images/localhost:3002/example.jpg is incorrect, but transforming http://experiment/example.html to //localhost:3002/example.html is correct.
  • Replacing text inside comments should not happen.
@shakyShane
Copy link
Contributor

Because we perform this 'rewriting' on each request, it must be fast. This is why we only use simple regex replacements on the HTML.

Because of the simplicity of it, we don't have any context of a text node versus any other.

It's usually not an issue for links/images if you use a tld in your hostname (which you should do anyway). So if you use experiment.dev none of your example would be an issue.

@shakyShane
Copy link
Contributor

basically, we don't want to get into the game of trying parse html on every request, so we try our best to make it work with string replacements only

so your example should be something like

 <!doctype html>
    <html><body>
    <p>This is an experiment</p>
    <p><img src="/images/experiment/example.jpg"></p>
    <!-- This experiment should prove interesting -->
    <p><a href="http://experiment.dev/example.html">Link</a></p>
    </body></html>

which would result in

<!doctype html>
    <html><body><script type='text/javascript' id="__bs_script__">//<![CDATA[
        document.write("<script async src='/browser-sync/browser-sync-client.2.7.5.js'><\/script>".replace("HOST", location.hostname));
    //]]></script>
    <p>This is an experiment</p>
    <p><img src="/images/experiment/example.jpg"></p>
    <!-- This experiment should prove interesting -->
    <p><a href="//localhost:3002/example.html">Link</a></p>
    </body></html>

@Flimm
Copy link
Author

Flimm commented May 28, 2015

I've seen other developers put a domain without a TLD in hosts so I don't think I'm the only one. It makes sense to me, .dev for example is a real TLD and there's a chance that experiment.dev is a real registered domain, whereas a domain without a TLD is unambiguously special.

In any case, I'm not asking that the HTML be parsed, you could use a regex. It's hacky, but less hacky than what exists now.

Instead of:

html = html.replace(original, target, "g");

Could we do this instead:

/* original = 'experiment'; target = 'http://localhost:3030'; */
var regexp = new RegExp("([\"'])https?://" + escapeRegExp(original) + "\\b", "g")
html = html.replace(regexp, function(m, group) { return group + target; });

In the spirit of doing our best to make it work without parsing the HTML, could this be re-opened please?

If for some reason, the regex cannot be fixed, can we at least warn developers when they use a domain without a TLD, letting them know that they're likely to run into bugs, and that they need to work around it?

@shakyShane
Copy link
Contributor

Feel free to contribute to the foxy module where you'll see that's there's more to this than first appears.

For example, of course we don't just do html = html.replace(original, target, "g");, it's actually https://github.com/shakyShane/foxy/blob/master/lib/utils.js#L61-L120

Also, your example would not work with things such as <a href="/experiment/test">Link</a> - you can look at the foxy test suite to get an idea about why we settled for the current solution.

@shakyShane
Copy link
Contributor

@shakyShane
Copy link
Contributor

@Flimm
Copy link
Author

Flimm commented May 28, 2015

Hmm, I had a look at that code you pointed out and it looks like the bug isn't in foxy, as those tests cover the cases I'm talking about. I'm having trouble finding where the problem is.

@Flimm
Copy link
Author

Flimm commented May 28, 2015

In fact, I just tried your solution of using experiment.dev instead of experiment suggested in this comment, and it doesn't work.

The source code:

<!doctype html>
<html><body>
<p>This is an experiment.dev</p>
<p><img src="/images/experiment.dev/example.jpg"></p>
<!-- This experiment.dev should prove interesting -->
<p><a href="http://experiment.dev/example.html">Link</a></p>
</body></html>

gets transformed into:

<!doctype html>
<html><body><script type='text/javascript' id="__bs_script__">//<![CDATA[
    document.write("<script async src='/browser-sync/browser-sync-client.2.7.5.js'><\/script>".replace("HOST", location.hostname));
//]]></script>
<p>This is an localhost:3002</p>
<p><img src="/images/localhost:3002/example.jpg"></p>
<!-- This localhost:3002 should prove interesting -->
<p><a href="//localhost:3002/example.html">Link</a></p>
</body></html>

which is clearly wrong. It should be:

<!doctype html>
<html><body><script type='text/javascript' id="__bs_script__">//<![CDATA[
    document.write("<script async src='/browser-sync/browser-sync-client.2.7.5.js'><\/script>".replace("HOST", location.hostname));
//]]></script>
<p>This is an experiment.dev</p>
<p><img src="/images/experiment.dev/example.jpg"></p>
<!-- This experiment.dev should prove interesting -->
<p><a href="//localhost:3002/example.html">Link</a></p>
</body></html>

So clearly, even with a TLD, the bug still exists. Could you re-open this issue?

@shakyShane shakyShane reopened this May 28, 2015
@shakyShane
Copy link
Contributor

ok so I have the test passing now with your example over in shakyShane/foxy@33d2a94

If somebody can swoop in and tidy up the regex i'd be very grateful :)

@Flimm
Copy link
Author

Flimm commented May 29, 2015

Thanks, @shakyShane , I'm new to Node and I'm not sure why I came to the conclusion that the bug wasn't in foxy. Your solution looks great to me.

@Flimm
Copy link
Author

Flimm commented May 30, 2015

Just for your interest, I recently discovered that there are registered domains that are comprised solely of a TLD! The example I discovered is http://uz, so I guess even http://experiment could become registered at some point.

@Flimm
Copy link
Author

Flimm commented Jun 12, 2015

Thanks @shakyShane , this works beautifully!

@shakyShane
Copy link
Contributor

@Flimm thanks to you for persisting :)

@shark0der
Copy link

Hello guys. Just bumped into the same issue when using "wp" for hostname on proxy target which is a WordPress installation, which of course has tons of classes that start with wp- and a lot of those got wrongly replaced.

Since it is nearly impossible to properly replace all URLs, is it possible to add an option to disable it completely? In my scenario, I've set the base URL of the WordPress installation to localhost so there no need to replace anything at all, so just some rewriteUrls: false would be very handy.

@giovannipds
Copy link

Hello people. I'm facing a similar problem. Let's see:

Proxy name: companyname

One page has an anchor to an internal tab, with the name of the company, like so (using Bootstrap, tabs through hashes):

<ul role="tablist">
    <li class="active"><a aria-controls="companyname" aria-expanded="true" data-toggle="tab" href="#companyname" role="tab">Company Name</a></li>
    <li><a aria-controls="anothertab" aria-expanded="false" data-toggle="tab" href="#anothertab" role="tab">Another Tab</a></li>
</ul>
<div class="tab-pane active" id="companyname" role="tabpanel">Company Name's content</div>
<div class="tab-pane" id="anothertab" role="tabpanel">Another Tab's content</div>

So, it is getting replaced by:

<ul role="tablist">
    <li class="active"><a aria-controls="localhost:3000" aria-expanded="true" data-toggle="tab" href="#companyname" role="tab">Company Name</a></li>
    <li><a aria-controls="anothertab" aria-expanded="false" data-toggle="tab" href="#anothertab" role="tab">Another Tab</a></li>
</ul>
<div class="tab-pane active" id="localhost:3000" role="tabpanel">Company Name's content</div>
<div class="tab-pane" id="anothertab" role="tabpanel">Another Tab's content</div>

What's curious is that aria-controls is replaced, id is replaced, but href="#companyname" is not.

Is there a way to prevent this without changing the proxy name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants