Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not correctly joining lines for HTML #1033

Closed
garretwilson opened this issue Sep 24, 2016 · 26 comments
Closed

not correctly joining lines for HTML #1033

garretwilson opened this issue Sep 24, 2016 · 26 comments

Comments

@garretwilson
Copy link

This is really hurting me. Here is an example HTML file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html lang="en-US" xmlns="http://www.w3.org/1999/xhtml">

<head>
    <title>Test Line Join</title>
</head>

<body>
    <p>The key words <q><span class="spec-must">must</span></q>, <q><span

          class="spec-must-not">must not</span></q>, <q><span class="spec-must">required</span></q>, <q><span

          class="spec-must">shall</span></q>, <q><span class="spec-must-not">shall not</span></q>, <q><span

          class="spec-should">should</span></q>, <q><span class="spec-should-not">should not</span></q>, <q><span

          class="spec-should">recommended</span></q>, <q><span class="spec-may">may</span></q>, and <q><span

          class="spec-may">optional</span></q> in this document are to be interpreted as described in <a href="#ref-rfc2119" class="ref">RFC 2119</a>.</p>
</body>

</html>

js-beautify should be getting rid of all that whitespace inside the <span …> tag. I want js-beautify to give me this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html lang="en-US" xmlns="http://www.w3.org/1999/xhtml">

<head>
    <title>Test Line Join</title>
</head>

<body>
    <p>The key words <q><span class="spec-must">must</span></q>, <q><span class="spec-must-not">must not</span></q>, <q><span class="spec-must">required</span></q>, <q><span class="spec-must">shall</span></q>, <q><span class="spec-must-not">shall not</span></q>, <q><span class="spec-should">should</span></q>, <q><span class="spec-should-not">should not</span></q>, <q><span class="spec-should">recommended</span></q>, <q><span class="spec-may">may</span></q>, and <q><span class="spec-may">optional</span></q> in this document are to be interpreted as described in <a href="#ref-rfc2119" class="ref">RFC 2119</a>.</p>
</body>

</html>

But it doesn't. This is a huge problem and is really dragging down the workflow. (The reason the lines are like that to begin with is a bug in BlueGriffon. So bugs in js-beautify are preventing it from fixing the bugs in BlueGriffon. Such is life.)

See #841.

Perhaps you could point me to a place in the code I might go to try to put in some workaround?

@bitwiseman bitwiseman added this to the v2.0.0 milestone Sep 28, 2016
@HookyQR
Copy link
Contributor

HookyQR commented Feb 10, 2017

q tags are not formatted by default. If you add an "unformated" option to your .jsbeautifyrc file, with the defaults minus the q you'll get each of the q groups on their own line at least, with the spans correctly un-whitespaced.

eg:

"unformatted": ["a", "abbr", "area", "audio", "b", "bdi", "bdo", "br", "button", "canvas", "cite", "code", "data",
		"datalist", "del", "dfn", "em", "embed", "i", "iframe", "img", "input", "ins", "kbd", "keygen", "label", "map",
		"mark", "math", "meter", "noscript", "object", "output", "progress", "ruby", "s", "samp", "select", "small", "span",
		"strong", "sub", "sup", "svg", "template", "textarea", "time", "u", "var", "video", "wbr", "text", "acronym",
		"address", "big", "dt", "ins", "strike", "tt"],

@garretwilson
Copy link
Author

garretwilson commented Feb 10, 2017

Yes, but you see @HookyQR , I want the <q> tags formatted --- I just want them inline, not block. (That is, I do not want them each on a separate line.) js-beautify conflates the two concepts of "formatted" and "inline/block", which are completely different things. See #841, as I mentioned above.

@bitwiseman I want to help and not just complain. As I mentioned above, could you point me to where I might look in the code to track this down? Or do we have to just rewrite it all from scratch?

@bitwiseman
Copy link
Member

@garretwilson - just rewrite it all from scratch. 👍 Sooner or later, the HTML and CSS beautifiers will need to be rewritten to at least have a parse stage like I did with the JS beautifier. It's just a major undertaking.

@garretwilson
Copy link
Author

garretwilson commented Apr 18, 2017

Yes, but I'm extremely, extremely overloaded with work --- and every time I add a tool to help me with my work, it seems to have a huge bug that adds more to my workload. As an example, the reason I need this capability in the first place is that BlueGriffon has a stupid bug that the owner ignores by making line breaks even when we request in the configuration never to break lines, and so I have to use js-beautify to fix that, and it has bugs that doesn't join lines... and now Atom introduced a new kludge where it is automatically soft-wrapping lines (because it crashes on what it considers long lines) --- one after another, more bugs in the tools to fix bugs.

So @bitwiseman , would you or anyone else be interested in fixing this bug or at least creating a workaround in the code, for payment? In general js-beautify for XHTML is hanging together, and if we could fix a couple of these big issues it would really help. How much would you charge to do something about this? Please contact me (you or anyone else) offline. Thanks.

@bitwiseman
Copy link
Member

bitwiseman commented Apr 26, 2017

@garretwilson
Thanks for the offer of compensation. I appreciate the thought. For me, much like @Glavin001, my main limitation is not money, but time.

What I need is people who can write code and tests that I don't have to spend cycles debugging or fixing to join this project and contribute fixes. Until that happens, issues will continue to be addressed as I find time (that is to say, very slowly). Sorry, but that's the reality.

@garretwilson
Copy link
Author

OK, so I'm trying out this Bountysource thing. While I want to eventually address js-beautify and atom-beautify problems, I'm going directly to the ultimate source of some of pain, and then work my way up if there is any success.

https://www.bountysource.com/issues/29803090-line-wrapping-setting-ignored

  1. Basically the whole thing starts because the BlueGriffon editor has the blatant bug line wrapping setting ignored therealglazou/bluegriffon#7 that arbitrarily adds line breaks in the middle of elements even though line wrapping are turned off. (To top it off, it adds them as LF even on the Windows platform, where it uses CRLF on the lines I really want: inappropriate addition of LF line wrapping on Windows machines therealglazou/bluegriffon#5. And you'll remember that js-beautify at one time would choose the wrong line ending: Erroneously changes CRLF to LF on Windows. #829. So there were bugs upon bugs upon bugs. Thank goodness Erroneously changes CRLF to LF on Windows. #829 is fixed. Thank you very much)

  2. Because of "unformatted" paradigm broken, "unformatted" and "inline" are not the same #841 , atom-beautify refuses to re-join the lines that should have never been broken in the first place.

So when one tool's bugs prevent working around another tool's bugs, it leaves me frustrated and a bit grouchy. Cross your fingers that this Bountysource thing works, and if so I can go up the chain start knocking off some of the nasty js-beautify ones as well, such as this one.

@garretwilson
Copy link
Author

garretwilson commented May 10, 2017

@bitwiseman , I've thought about this a lot, and I think I have a way forward with this issue. I would like to put up a bounty on Bountysource for this bug, but before I put up the money I want to get your approval for the solution so I can know it would be integrated.

The following solution is based on the understanding that the current logic must add newlines in two separate cases:

  • Whenever it thinks that a line should be wrapped (if wrapping is turned on). I'll refer to this as a "wrap-newline".
  • Whenever it is ready to start a new element that is not in unformatted. I'll refer to this as a "block-newline".

Here is what I propose:

  1. Create a new setting inline with the exact same value as unformatted. (These are the HTML5 inline elements you and I discussed and which you fixed in list of HTML inline elements incomplete; wraps inappropriately #840.)
  2. Find the line(s) in which the current formatting logic adds a block-newline. (All the code for wrap-newline will go untouched.) For every block-newline occurrence, add the following logic: "if the new element (to appear after the block-newline) is in inline, do not add the block-newline."

I have not looked at the code, but seriously, that seems pretty simple and straightforward to me, regardless of the complexity of the code. There has to be some line(s) at which you perform a block-newline break, and I don't see why one can't just do a check to see if the new element is in the newline list or not. (I could be completely wrong about the feasibility of this solution, but that would have to be some really jumbled code not to allow this approach.)

With the above approach, if no changes are made to a user's configuration, the formatting will function 100% exactly as it did before. However for those of us who realize there is a distinction between "inline vs block" and "formatted vs unformatted" (#841), we'll merely set our unformatted setting to "" (that is, we don't want any elements to be "unformatted"). Suddenly all of our <a>, <span>, <q>, etc. elements will be formatted correctly, yet without them being placed on separately lines as if they were block elements!

@bitwiseman I'd like to get your approval that you would integrate such a solution if I were to write it or pay someone else to do it.

Thank you! I think we all want to get this fixed, and I'm trying the best I can to find a way to help get it done.

@bitwiseman
Copy link
Member

@garretwilson -

Either way, I'd rather see the behavior change and be consistent for all users. I've never argued against the correctness of what you described in #841, only that I didn't have bandwidth to do it.

I'd like to make sure you understand, the bounty poster does not get any say in how a fix is designed or implemented. The bounty is paid when the issue is fixed. Period. I appreciate you proposing a design and asking for a review, whoever submits a fix is not required to pay any attention to that proposal.

Are you planning to bounty this issue or #841?

@garretwilson
Copy link
Author

Either way, I'd rather see the behavior change and be consistent for all users.

Sure, I'd like that even better! I just wasn't sure you wanted to make such a change that would affect some behavior without increasing the major version number.

We could therefore set unformatted to "", and move all the current unformatted elements to inline. That would make all the inline element format correctly, but still be inline; and by default no elements would be "unformatted".

If people considered this issue a bug, then everyone would be happy. The only problem would occur for people who both 1) thought "unformatted" really meant "unformatted", and 2) did not change the value of "unformatted". They would then be surprised when js-beautify started formatting their elements (while still leaving them inline). But I would imagine that if people are using unformatted, they have explicitly specified which elements they want by overriding unformatted, so they would never see any difference.

The only reason I suggested leaving unformatted the way it was is that I was afraid you wouldn't want existing behavior changed, and I was trying to make it more palatable for you.

The bounty is paid when the issue is fixed. Period.

If they want to rewrite the thing, that's fine with me. I was just suggesting an ultra-easy approach to getting it done and making some money. I just wanted to make sure you were fine with it.

Are you planning to bounty this issue or #841?

It makes no difference to me --- it may be that they are really the same thing: #841 explains the design problem, while this issue #1033 illustrates a negative behavior resulting from the design problem. Do you have a preference on which issue I put the bounty on? And more importantly, are you OK with what I proposed? (Because I imagine the bounty person will choose the easiest path, and I don't think there could be an approach much easier than what I laid out above.)

(If this discussion goes on much longer, I may just do it myself...)

@garretwilson
Copy link
Author

Do you have a preference on which issue I put the bounty on? And more importantly, are you OK with what I proposed?

Did you have an opinion here? I'll be putting up actual dough, so it would be nice to know ahead of time if you're OK with this direction...

@garretwilson
Copy link
Author

garretwilson commented May 24, 2017

Never mind; it's easier just to start the thing.

https://www.bountysource.com/issues/41908855-not-correctly-joining-lines-for-html

@garretwilson
Copy link
Author

I just realized that I had posted the incorrect bounty link a year ago. I updated it. I see someone says they are working on it. Please let me know any questions you have or clarifications you need. Thanks.

@madman-bob
Copy link
Contributor

Thanks. I'm replacing unformatted tags with inline tags, the intent of unformatted seeming to be contained in content_unformatted.

It's not a question, but may I thank you for the large number of tests. It makes me confident that I'm not breaking anything with my changes.

@garretwilson
Copy link
Author

garretwilson commented Jun 1, 2018

@madman-bob , you don't know what a huge help this will be to me. Anything you need on my end, just ask.

@garretwilson
Copy link
Author

And just in case you missed it, @madman-bob , #841 has a more in-depth discussion of the semantics of this bug. I recommend reading that ticket as well, if you haven't already. Thanks!!

@madman-bob
Copy link
Contributor

Ready for review: #1407

@garretwilson
Copy link
Author

@madman-bob , I'm anxious to try this. What's the easiest way I can try this out? I'm used to using this through the Atom editor atom-beautify plugin, which uses js-beautify. Is there a quick way you recommend for testing? Do I just clone some repository and run some script on my source code?

@bitwiseman
Copy link
Member

@garretwilson
Copy link
Author

Hi, @madman-bob and @bitwiseman . Just wanted to circle back and say I'm still extremely excited about this, and I'm trying to grab time to test it. From reading the js-beautify documentation, it looks the easiest way for me to test it on real-life documents is to install js-beautify via npm. (I'm not ready for the whole Python environment mess yet.) But after working on several projects all day, my mind is too exhausted to install npm, so I'll have a wait a couple of more days.

I've been using .jsbeautifyrc with atom-beautify, but I'm sure I can just pass in the individual config properties to the CLI.

Thanks again, guys. I'll test this as soon as I can.

@bitwiseman
Copy link
Member

@garretwilson
It is important to me that you have a chance to test this before I merge this.
I don't have an npm package for you, so you'll need to pull the files linked above and copy them into the folder where you have the beautifier installed. Sorry for the hacky process, but it is what I have right now.

@garretwilson
Copy link
Author

I apologize for not getting to this yet. I'm being overloaded with a couple of very stressful things right now. I'll try my best to get to this by this weekend.

So if I pull the new files, do you think I can simply replace the ones in the atom-beautify plugin? If so, that would make this really simple to test.

@garretwilson
Copy link
Author

As promised I just got a chance to test this over the weekend.

This is amazing!! This is some of the best news! It appears to be working. There are a few observations, but first let me review my testing procedure on Windows 10:

  1. I cloned https://github.com/madman-bob/js-beautify.git and switch to branch format-inline-tags, which is on commit def3cbd.
  2. I copied the entire contents of js/lib to ~/.atom/packages/atom-beautify/node_modules/js-beautify/js/lib, replacing the files there.
  3. I loaded an existing XHTML5 file into BlueGriffon 3.0.1 and resaved it so that it would un-beautify the contents.
  4. I reformatted using atom-beautify.

For reference my .jsbeautifyrc file is as follows:

{
  "indent_with_tabs": true,
  "preserve_newlines": true,
  "max_preserve_newlines": 2,
  "end_with_newline": true,
  "wrap_line_length": 0
}

At first I thought that the fix was not working, but then I read the notes in this ticket and in the pull request. I understand that under @madman-bob's fix, js-beautify unformatted still works as before, except that now it defaults to no tags. Unfortunately https://github.com/Glavin001/atom-beautify/ specifies its own defaults, duplicating js-beautify's old unformatted values. So I had to override that in my .jsbeautifyrc file.

{
  "indent_with_tabs": true,
  "preserve_newlines": true,
  "max_preserve_newlines": 2,
  "end_with_newline": true,
  "wrap_line_length": 0,
  "unformatted": ""
}

That's why I had recommended to @Glavin001 in Glavin001/atom-beautify#1008 (comment) not to provide its own defaults, and instead use those of atom-beautify:

In a perfect world, atom-beautify wouldn't need default values---if no values were supplied for an option, it would simply let js-beautify use its own defaults.

Once I forced unformatted to be empty, the formatting worked! I have some pretty complicated XHTML5 files, with many nested inline elements, and some of them (after being de-formatted by BlueGriffon) went back to exactly the correct format! I know it was exact, because I was working in a Git repository, and I looked at the diffs.

There was one difference I saw related to <figcaption>. BlueGriffon (at least the latest version) formats some code like this:

    <figure class="near side"><img src="images/flyway-schema_version-table.png"

        alt="Flyway schema_version table." /> <figcaption>Flyway schema_version table. (<a

          href="https://flywaydb.org/getstarted/how"><cite>How Flyway works</cite></a>)</figcaption> </figure>

Before this had been formatted like the following (with tabs for indention of course). (This is what was in my source file in my Git repository, so I assume this is how js-beautify left it before, although I might be wrong.)

	<figure class="near side"><img src="images/flyway-schema_version-table.png" alt="Flyway schema_version table." />
		<figcaption>Flyway schema_version table. (<a href="https://flywaydb.org/getstarted/how"><cite>How Flyway works</cite></a>)</figcaption>
	</figure>

But with the new changes being tested here, it produces this:

	<figure class="near side"><img src="images/flyway-schema_version-table.png" alt="Flyway schema_version table." /> <figcaption>Flyway schema_version table. (<a href="https://flywaydb.org/getstarted/how"><cite>How Flyway works</cite></a>)</figcaption>
	</figure>

I'm not sure why that is being produced. According to the list at https://www.w3.org/TR/html5/dom.html#phrasing-content, <figcaption> should not be "phrasing" content, so should form another block. I think that what should happen is that the <figcaption> element should cause a line break and an other level of indention. This is probably some formatting edge case in which both an inline and a block element are next to each other inside a block element. This is probably not a show-stopper, but you might glance at it.

Other than that, it looks good! I'll continue testing, but for the moment it's pretty much doing what I expected!! This is incredibly good news!!!

@Glavin001, you'll need to update atom-beautify unformatted to be blank as well, or this change will have no effect on the plugin. Again I think you shouldn't override the js-beautify defaults at all, for reasons exactly such as this. You don't want to be in the business of having to scramble and fix your overrides every time js-beautify fixes a bug in its defaults.

@garretwilson
Copy link
Author

garretwilson commented Jun 25, 2018

Oh, this is something cool I found in one file. BlueGriffon would produce something like this (with preformatted content removed for brevity):

    <aside class="note far half">The Super POM already contains the following definitions, which is why the standard resource directories are copied by default:
      <pre class="line-numbers"><code class="language-xml"></code></pre></aside>

Before now, js-beautify would live this untouched except for tabifying the non-<pre> indentions, as per my settings. (The </code> is correctly not indented, because it's inside a <pre>.) But now with the new changes, we get this:

	<aside class="note far half">The Super POM already contains the following definitions, which is why the standard resource directories are copied by default:
		<pre class="line-numbers"><code class="language-xml"></code></pre>
	</aside>

Note that the trailing </aside> is now indented! I hadn't noticed this before, as the biggest problem earlier was the newlines inside phrasing/inline elements. But as </aside> should be block formatted, it should indeed come on another line and be indented. So it appears there have been some possibly unrelated positive side effects from these changes as well. 👍

@garretwilson
Copy link
Author

I'm super-excited about this issue being fixed! I just came in for the night. Remind me how to mark the bounty as done so that @madman-bob can be awarded as soon as possible. I just read the FAQ and I think maybe I'll receive some email for me to confirm.

@bitwiseman bitwiseman modified the milestones: v2.0.0, v1.8.0 Jun 30, 2018
@bitwiseman
Copy link
Member

This if available in 1.8.0-rc2.

@garretwilson
Copy link
Author

@madman-bob, @bitwiseman, and others, I cannot begin to express my appreciation for finally getting this bug fixed and released. After a little prodding on the atom-beautify side, we finally got this into the Atom plugin as well.

It's hard to describe just how refreshing it is, when I reformat the ugly, buggy HTML that BlueGriffon puts out, to not have to manually search for those weird, wrapped lines that BlueGriffon should not have put in. Now js-beautify just fixes them, automatically (like it always should have). It's so amazing that at times I think I'm dreaming. I cannot count up all the time this is saving me, reducing my workflow, and also reducing my irritation level (or at least not adding to it 😉 ).

Thank you, thank you, thank you.

And after I finally gave in and tried Visual Studio Code (after a long list of stupid Atom problems, the last one being that it reset my entire configuration for no reason one day; see atom/atom#18229), I found out that there is a Beautify plugin for VS Code!

https://github.com/HookyQR/VSCodeBeautify

I was almost afraid to check, but yes… yes… the plugin includes the latest js-beautify 1.8.x!! This means that after installing VSCodeBeautify, I can open up my HTML and format it exactly like I did with atom-beautify (except using a different shortcut key), with the results being exactly the same—and it even uses the exact same .jsbeautifyrc file! This is the smoothest transition I could never have even dreamed possible. For once—for once—everything worked just like it should have.

I am very grateful. Thanks for making this happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants