Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Join adjacent inlineText tokens #1926

Merged
merged 9 commits into from
Feb 7, 2021

Conversation

calculuschild
Copy link
Contributor

Marked version: 1.2.9

Description

Contributor

  • Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
  • no tests required for this PR.
  • If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

@vercel
Copy link

vercel bot commented Feb 4, 2021

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/markedjs/markedjs/fk98enxmj
✅ Preview: https://markedjs-git-fork-calculuschild-joininnertexttokens.markedjs.vercel.app

@calculuschild
Copy link
Contributor Author

CI is not passing but all the spec tests and benchmarks pass just fine on my machine. Not sure what's going on. Any help?

@calculuschild
Copy link
Contributor Author

Ah I see there's a separate Lexer unit test suite. It expects adjacent but separate text tokens in a couple of cases. I assume that isn't a requirement anymore....

@UziTech
Copy link
Member

UziTech commented Feb 5, 2021

I think I fixed the tests in calculuschild#2. I also changed the block text token to do the merge in the lexer instead of the tokenizer and merge text tokens returned by other tokenizers.

Join adjacent innerText tokens
@calculuschild
Copy link
Contributor Author

calculuschild commented Feb 5, 2021

Thanks for looking into it!

I also changed the block text token to do the merge in the lexer instead of the tokenizer and merge text tokens returned by other tokenizers.

@UziTech I was going to ask about this as well so I'm glad you had the same idea. Seemed to make more sense in the Lexer. Do we also want to do the same for the block code tokenizer? It also does this same merge thing.

@UziTech
Copy link
Member

UziTech commented Feb 5, 2021

Ya it is probably a good time to change the tokenizer signatures if we need to since this and #1864 should be released as v2 soon.

@calculuschild calculuschild changed the title Join adjacent innerText tokens Join adjacent inlineText tokens Feb 5, 2021
UziTech
UziTech previously requested changes Feb 5, 2021
Copy link
Member

@UziTech UziTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are from my code

lib/marked.esm.js Outdated Show resolved Hide resolved
lib/marked.esm.js Outdated Show resolved Hide resolved
Co-authored-by: Tony Brix <[email protected]>
Co-authored-by: Tony Brix <[email protected]>
@UziTech
Copy link
Member

UziTech commented Feb 5, 2021

Can you update the tokenizer signatures for text and code in the docs

@calculuschild
Copy link
Contributor Author

Yay! All the Block Tokenizers have the same signature now! 🎉

@UziTech UziTech requested a review from davisjam February 5, 2021 04:14
lastToken = tokens[tokens.length - 1];
lastToken = tokens[tokens.length - 1];
// An indented code block cannot interrupt a paragraph.
if (lastToken && lastToken.type === 'paragraph') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should check for a paragraph before we call the code tokenizer? That might save some work that doesn't need to be done.

Copy link
Contributor Author

@calculuschild calculuschild Feb 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth trying. It's a tradeoff of always checking LastToken and sometimes calling codeTokenizer vs always calling codeTokenizer and sometimes checking LastToken. I'm not sure how often this situation comes up that one would be better than the other.

Edit: If LastToken is a paragraph though what do we do? Just continue? or call the "text" tokenizer out of sequence?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If LastToken is a paragraph though what do we do? Just continue? or call the "text" tokenizer out of sequence?

Good point. I suppose we would still need to call code tokenizer to see if we should skip the other tokens. Maybe it is still better to check code tokenizer first.

@UziTech UziTech changed the title Join adjacent inlineText tokens fix: Join adjacent inlineText tokens Feb 7, 2021
@UziTech UziTech merged commit f848e77 into markedjs:master Feb 7, 2021
github-actions bot pushed a commit that referenced this pull request Feb 7, 2021
# [2.0.0](v1.2.9...v2.0.0) (2021-02-07)

### Bug Fixes

* Join adjacent inlineText tokens ([#1926](#1926)) ([f848e77](f848e77))
* Total rework of Emphasis/Strong ([#1864](#1864)) ([7293251](7293251))

### BREAKING CHANGES

* `em` and `strong` tokenizers have been merged into one `emStrong` tokenizer
@github-actions
Copy link

github-actions bot commented Feb 7, 2021

🎉 This PR is included in version 2.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

This was referenced Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Paragraph text is split into separate tokens when using backslash
3 participants