-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Tokenizer to treat Markdown code as text instead of HTML #1
Conversation
lib/Tokenizer.js
Outdated
@@ -144,10 +144,12 @@ function Tokenizer(options, cbs){ | |||
this._ended = false; | |||
this._xmlMode = !!(options && options.xmlMode); | |||
this._decodeEntities = !!(options && options.decodeEntities); | |||
this._isMarkdownCode = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tabs vs spaces 😨
Let's be consistent with the rest of the file (tabs).
lib/Tokenizer.js
Outdated
@@ -635,6 +637,9 @@ Tokenizer.prototype.write = function(chunk){ | |||
Tokenizer.prototype._parse = function(){ | |||
while(this._index < this._buffer.length && this._running){ | |||
var c = this._buffer.charAt(this._index); | |||
// Detect Markdown code so that it is parsed as text instead of HTML | |||
if (c === '`') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No space before opening parentheses 😢
Let's be consistent with L152 of the file (braces even for single line of code).
Off-topic: Add a white space around operators :)
We don't have a JS coding standard but: |
c3efc5a
to
28804b6
Compare
Updated with the requested changes, somehow my WebStorm was set to indent with spaces and I didn't manage to catch the difference in the editor. Thanks for the tip about the white space! |
lib/Tokenizer.js
Outdated
} | ||
|
||
Tokenizer.prototype._stateText = function(c){ | ||
if(c === "<"){ | ||
// parse open tags if it is not Markdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Parse (capital P)
- tag (singular)
lib/Tokenizer.js
Outdated
@@ -635,6 +637,10 @@ Tokenizer.prototype.write = function(chunk){ | |||
Tokenizer.prototype._parse = function(){ | |||
while(this._index < this._buffer.length && this._running){ | |||
var c = this._buffer.charAt(this._index); | |||
// Detect Markdown code so that it is parsed as text instead of HTML | |||
if (c === '`') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No spaces before/after parentheses.
28804b6
to
43fe9b5
Compare
- Allows Markdown code to contain '<' , '<=' without having it affect other HTML elements
43fe9b5
to
3880c5c
Compare
We should treat
This is reasonable: create a .html file with the above and open it in your browser (tested in Chrome). |
Seems like the first case is handled fine. I've modified
|
Can you commit and push, so we can attempt to repro?
Try updating |
Are you generating the site from a index.md or a index.html? I get the bug when it's a html file, but not when it's an md file. |
Ah, I see that I suggested to "create a .html file" to see how the browser treats those strings. We don't have to solve that in this PR since:
So it's partial support for .html files: Given "a <= b", this PR gives "a <=b" instead of just "a". |
lib/Tokenizer.js
Outdated
@@ -160,6 +163,9 @@ Tokenizer.prototype._stateText = function(c){ | |||
this._baseState = TEXT; | |||
this._state = BEFORE_ENTITY; | |||
this._sectionStart = this._index; | |||
} else if(this._isInequality){ | |||
// Next character should be parsed normally | |||
this._isInequality = !this._isInequality; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be this._isInequality = false;
since it's not a toggle.
lib/Tokenizer.js
Outdated
} | ||
|
||
Tokenizer.prototype._stateText = function(c){ | ||
if(c === "<"){ | ||
// Parse open tag if it is not Markdown and not part of an inequality | ||
if(c === "<" && !this._isMarkdownCode && !this._isInequality){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is && !this._isInequality
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such that the Tokenizer doesn't think that the <
of a <=
is the start of an open HTML tag.
lib/Tokenizer.js
Outdated
} else if(c === '<'){ | ||
var nextChar = this._buffer.charAt(this._index + 1); | ||
if(nextChar === '='){ | ||
this._isInequality = !this._isInequality; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be this._isInequality = true;
?
lib/Tokenizer.js
Outdated
@@ -144,10 +144,13 @@ function Tokenizer(options, cbs){ | |||
this._ended = false; | |||
this._xmlMode = !!(options && options.xmlMode); | |||
this._decodeEntities = !!(options && options.decodeEntities); | |||
this._isMarkdownCode = false; | |||
this._isInequality = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reorder just these 2 in alphabetical order.
38ac8da
to
21ddebe
Compare
lib/Tokenizer.js
Outdated
@@ -160,6 +163,9 @@ Tokenizer.prototype._stateText = function(c){ | |||
this._baseState = TEXT; | |||
this._state = BEFORE_ENTITY; | |||
this._sectionStart = this._index; | |||
} else if(this._isInequality){ | |||
// Next character should be parsed normally | |||
this._isInequality = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be the first if
condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's the first if
condition, this._isInequality
would be set to false and then <
would then be treated as a valid open tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't enter the else if
block though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, yes that's right. Will resolve it ASAP.
lib/Tokenizer.js
Outdated
this._isMarkdownCode = !this._isMarkdownCode; | ||
} else if(c === '<'){ | ||
var nextChar = this._buffer.charAt(this._index + 1); | ||
if(nextChar === '='){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This needs a comment for consistency.
- Index should also be checked:
if(c === '<' && this._index + 1 < this._buffer.length){
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the index, will do so.
Should the comment be inside the else if
block or outside of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be inside if you add a section name.
lib/Tokenizer.js
Outdated
if(nextChar === '='){ | ||
this._isInequality = true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add a newline before and after this entire block.
- Maybe add a section name like the ones below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By section name, do you mean changing '='
into something like EQUALS
?
if(nextChar === EQUALS){
this._isInequality = true;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would special conditions
be an appropriate section name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for now.
21ddebe
to
ed1f971
Compare
lib/Tokenizer.js
Outdated
if(this._isInequality){ | ||
// Next character will be parsed normally | ||
this._isInequality = false; | ||
} else if(c === "<" && !this._isMarkdownCode && !this._isInequality){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
&& !this._isInequality
should be removed.
lib/Tokenizer.js
Outdated
* special conditions | ||
*/ | ||
if(c === '`'){ | ||
// Detect Markdown code to be parsed as text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detect Toggle
lib/Tokenizer.js
Outdated
this._isMarkdownCode = !this._isMarkdownCode; | ||
} else if(c === '<' && this._index + 1 < this._buffer.length){ | ||
var nextChar = this._buffer.charAt(this._index + 1); | ||
// Detect '<=' inequality to be parsed as text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detect Set
Also, move this comment into the if
block.
ed1f971
to
c09d6ee
Compare
Made the necessary changes. |
c09d6ee
to
f11e76a
Compare
lib/Tokenizer.js
Outdated
this._isInequality = true; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this still looks out-of-place.
Let's introduce a new state MARKDOWN
instead of tracking this._isMarkdownCode
and this._isInequality
.
Near top of file:
MARKDOWN = i++,
TEXT = i++, // No change
In this function:
if(this._state === MARKDOWN) {
this._stateMarkdown(c);
} else if (this.state === TEXT) {
this._stateText(c); // No change
Other functions:
Tokenizer.prototype._stateMarkdown = function(c){
if(c === '`'){
this._state = TEXT;
}
}
Tokenizer.prototype._stateText = function(c){
if(c === '`'){
this._state = MARKDOWN;
} else if(c === "<"){
let isInequality = (this._index + 1 < this._buffer.length) && this._buffer.charAt(this._index + 1) === '=';
if(!isInequality){
if(this._index > this._sectionStart){
this._cbs.ontext(this._getSection());
}
this._state = BEFORE_TAG_NAME;
this._sectionStart = this._index;
}
}
}
6943bc6
to
7aecd9b
Compare
7aecd9b
to
369b0ba
Compare
lib/Tokenizer.js
Outdated
@@ -6,7 +6,8 @@ var decodeCodePoint = require("entities/lib/decode_codepoint.js"), | |||
xmlMap = require("entities/maps/xml.json"), | |||
|
|||
i = 0, | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove whitespace.
lib/Tokenizer.js
Outdated
} else if(c === "<"){ | ||
var isInequality = (this._index + 1 < this._buffer.length) && (this._buffer.charAt(this._index + 1) === '='); | ||
if(!isInequality){ | ||
if (this._index > this._sectionStart) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
369b0ba
to
b348228
Compare
lib/Tokenizer.js
Outdated
@@ -6,7 +6,7 @@ var decodeCodePoint = require("entities/lib/decode_codepoint.js"), | |||
xmlMap = require("entities/maps/xml.json"), | |||
|
|||
i = 0, | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore newline (without whitespace).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You added whitespace again 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, fixed it now.
b348228
to
89cde72
Compare
89cde72
to
6e614fb
Compare
Have we test the code block (```) case? |
Do you mean whether code block cases render as before? Just tried this out, seems to be fine. Is there something else I should test? In the current version of the CS2103 website however, this fix will cause the rest of the page to not render as intended as there's an extra backtick; specifically in this page under the code snippet where it says If this backtick is removed, the page renders as per normal. |
Removed the extra backtick. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work :P
Let's patch Tokenizer to treat Markdown code as text instead of HTML. From MarkBind/htmlparser2#1: > This fix allows Markdown code to contain '<' , '<=' without having it > affect other HTML elements as it is now treated as a text element. > Furthermore, no spaces are required when typing these symbols within > the back ticks. > > As such, inequalities like the above can be rendered normally as > shown below. > > `x<y` > `<` > `<=` > `x<=y`
This fix allows Markdown code to contain '<' , '<=' without having it affect other HTML elements as it is now treated as a text element. Furthermore, no spaces are required when typing these symbols within the back ticks.
As such, inequalities like the above can be rendered normally as shown below.
Resolves MarkBind/markbind#101