Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non breaking space (U+00A0) and possibly others not treated as whitespace OR word characters #2744

Closed
gardnerjr opened this issue Mar 1, 2023 · 3 comments

Comments

@gardnerjr
Copy link

Marked version:
4.2.12
(noticed on 4.0.10, tried upgrading to see if it helps)

Describe the bug
Invisible non-breaking space characters, like in this block right here, (inline as code and content)

## h2
### h3
[link](destination)
### anotherh3

## h2
### h3
link
### anotherh3

are not always treated as whitespace, so things like headers, tables, etc don't get formatted and are left as markdown syntax.

To Reproduce
Steps to reproduce the behavior:

  1. Marked broken

    marked will create headers for the first few elements but then stop. adding extra newlines OR replacing the nonbreaking space characters seems to fix it?

  2. CommonMark broken worse

    commonmark doesn't create them at all, you have to replace the nonbreaking spaces

  3. DaringFireball's site seems to do "best", the headers are properly replaced AND the nonbreaking space stays intact?

Expected behavior
Best, I'd expect all the headers to be generated, keeping the nonbreaking space intact and without extra newlines/spacing (we're moving from one platform to another and have a ton of existing and customer markdown content that we don't control and can't convert).

@UziTech
Copy link
Member

UziTech commented Mar 1, 2023

I think you are looking for the pedantic: true option. demo

That will make marked work more like the original markdown spec instead of CommonMark.

@gardnerjr
Copy link
Author

@UziTech yes but, pedantic: true means that tables no longer work, since it isn't in the original spec.

so right now we're in a case where if there's non-breaking spaces, and tables, you can't have both.

@UziTech
Copy link
Member

UziTech commented Mar 2, 2023

Ya you have to pick one spec or the other. Your other option is to create your own tokenizer that breaks with the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants