Make it easier for plugins to add tokenizers to the parser #42

katylava · 2015-09-03T00:00:47Z

Looking here and here it seems like I need to have intimate knowledge of the how the parser works in order to define regular expressions to tokenize. The use case is detecting and linking URLs (auto-linking) and @mentions. Some of the URLs I'd like to turn into special node types – such as "twitter", which another plugin could render as HTML for an embedded tweet.

Ideally I could write a plugin that only has to specify a regular expression, a function which returns the node, and some rules about scope (for example, I wouldn't want to create a link for a URL that is already inside a link).

wooorm · 2015-09-05T13:45:36Z

Sorry for the late response. I’ve been busy launching, and handling the public interest, associated with alex. Regarding mentions, you should take a look at how mdast-github implements them.

However, that just adds links. I think that’s good. With mdast-html#1 I’m thinking of adding support for a new property on nodes which sets their content.
When that is implemented, there would be no need to modify the parser (other than the mentions), rather, a normal transformer could patch that property on link nodes with a hrefpointing to, let’s say, twitter.

Could you give some examples of what input you would like to transform to what output?

masylum · 2015-10-06T13:58:45Z

I'm having also some challenges using mdast for the very same reason. I do not use process since I'm just interested on the AST.

I do not understand how to extend the current markdown grammar (to support mentions for instance) or whether mdast is suited for that.

My use case is to do something like this:

return mdast
  .use(mentionGrammar)
  .use(hashtagGrammar)
  .parse(this.text, {position: false});

Let me know if I can help in any way.

wooorm · 2015-10-06T15:24:55Z

@masylum Thanks for your interest!

First of all, the code you posted will not run the plug-ins: process does three things, parse a string into a syntax tree, run plug-ins on the tree, and stringify the syntax tree back into a string. See mdast.run for more information.

Regarding to your question, and this issue in general, I recommend you investigate how mdast-github works: it has mention-parsing, and issue-parsing (which sort-of looks a bit like the hashtags you mentioned).

How the parser parses is currently, as you noticed, not documented. In the (nearby?) future I plan on rewriting the used system (related: GH-75, GH-82) to remove the need for regular expressions and only depend on tokeniser functions. In the future your use-case should be easier to accomplish, but as it’s going to change, I don’t currently plan on providing a very in-depth guide to modify the parser.

masylum · 2015-10-07T10:33:52Z

I understand. I guess I could just transform the AST looking for text nodes and try to generate newer nodes. Since I want to get a prototype sooner than later I'm playing with https://github.com/markdown-it/markdown-it at the moment. I will keep an eye on the project though.

wooorm · 2015-12-14T19:13:40Z

This will be closed by GH-109 and GH-96 in the upcoming 3.0.0 release.

Since 6d13eb5, the parser was completely rewritten utilising a new, performant, mechanism. This mechanism makes interfacing with the parser from a plug-in significantly simpler. Closes GH-42. Closes GH-109.

amirhouieh · 2017-10-08T18:47:52Z

Have been trying to extend the parser in order to wrap hashtags (let say twitter format). But getting this error

error  Error: Incorrectly eaten value:

where /^#(\w+)/.exec(value) would match for example#posters. But then eat("@posters") throwing this error. While /^@(\w+)/.exec(value) works perfectly. here is what I have:

...
tokenizeTags.locator = (value, fromIndex) => {
    return value.indexOf("#", fromIndex);
}

function tokenizeTags(eat, value, silent){
    const match = /^#(\w+)/.exec(value);
    if (match) {
      if (silent) {
        return true;
      }
     eat(match[0])({
          type: 'link',
          url:  `/${match[1]}`,
          children: [{type: 'text', value: match[0]}]
        });
   }

}

wooorm · 2017-10-08T19:10:14Z

You should return a node if you find something. Something like mentions in remark-github looks pretty similar!

amirhouieh · 2017-10-08T19:34:31Z

@wooorm Thanks. That was the issue I think. Although I am getting wired duplicated html string at the end!

digitalmoksha mentioned this issue Oct 6, 2015

Exponential performance decrease in Safari #82

Closed

wooorm mentioned this issue Dec 14, 2015

Add docs for new parser integration #109

Closed

wooorm added this to the 3.0.0 milestone Dec 14, 2015

wooorm self-assigned this Dec 14, 2015

wooorm closed this as completed in 7a5d16d Dec 24, 2015

wooorm added 🦋 type/enhancement This is great to have plugin labels Jan 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it easier for plugins to add tokenizers to the parser #42

Make it easier for plugins to add tokenizers to the parser #42

katylava commented Sep 3, 2015

wooorm commented Sep 5, 2015

masylum commented Oct 6, 2015

wooorm commented Oct 6, 2015

masylum commented Oct 7, 2015

wooorm commented Dec 14, 2015

amirhouieh commented Oct 8, 2017

wooorm commented Oct 8, 2017

amirhouieh commented Oct 8, 2017

Make it easier for plugins to add tokenizers to the parser #42

Make it easier for plugins to add tokenizers to the parser #42

Comments

katylava commented Sep 3, 2015

wooorm commented Sep 5, 2015

masylum commented Oct 6, 2015

wooorm commented Oct 6, 2015

masylum commented Oct 7, 2015

wooorm commented Dec 14, 2015

amirhouieh commented Oct 8, 2017

wooorm commented Oct 8, 2017

amirhouieh commented Oct 8, 2017