Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier for plugins to add tokenizers to the parser #42

Closed
katylava opened this issue Sep 3, 2015 · 8 comments
Closed

Make it easier for plugins to add tokenizers to the parser #42

katylava opened this issue Sep 3, 2015 · 8 comments
Assignees
Labels
plugin 🦋 type/enhancement This is great to have
Milestone

Comments

@katylava
Copy link

katylava commented Sep 3, 2015

Looking here and here it seems like I need to have intimate knowledge of the how the parser works in order to define regular expressions to tokenize. The use case is detecting and linking URLs (auto-linking) and @mentions. Some of the URLs I'd like to turn into special node types – such as "twitter", which another plugin could render as HTML for an embedded tweet.

Ideally I could write a plugin that only has to specify a regular expression, a function which returns the node, and some rules about scope (for example, I wouldn't want to create a link for a URL that is already inside a link).

@wooorm
Copy link
Member

wooorm commented Sep 5, 2015

Sorry for the late response. I’ve been busy launching, and handling the public interest, associated with alex. Regarding mentions, you should take a look at how mdast-github implements them.

However, that just adds links. I think that’s good. With mdast-html#1 I’m thinking of adding support for a new property on nodes which sets their content.
When that is implemented, there would be no need to modify the parser (other than the mentions), rather, a normal transformer could patch that property on link nodes with a hrefpointing to, let’s say, twitter.

Could you give some examples of what input you would like to transform to what output?

@masylum
Copy link

masylum commented Oct 6, 2015

I'm having also some challenges using mdast for the very same reason. I do not use process since I'm just interested on the AST.

I do not understand how to extend the current markdown grammar (to support mentions for instance) or whether mdast is suited for that.

My use case is to do something like this:

return mdast
  .use(mentionGrammar)
  .use(hashtagGrammar)
  .parse(this.text, {position: false});

Let me know if I can help in any way.

@wooorm
Copy link
Member

wooorm commented Oct 6, 2015

@masylum Thanks for your interest!

First of all, the code you posted will not run the plug-ins: process does three things, parse a string into a syntax tree, run plug-ins on the tree, and stringify the syntax tree back into a string. See mdast.run for more information.

Regarding to your question, and this issue in general, I recommend you investigate how mdast-github works: it has mention-parsing, and issue-parsing (which sort-of looks a bit like the hashtags you mentioned).

How the parser parses is currently, as you noticed, not documented. In the (nearby?) future I plan on rewriting the used system (related: GH-75, GH-82) to remove the need for regular expressions and only depend on tokeniser functions. In the future your use-case should be easier to accomplish, but as it’s going to change, I don’t currently plan on providing a very in-depth guide to modify the parser.

@masylum
Copy link

masylum commented Oct 7, 2015

I understand. I guess I could just transform the AST looking for text nodes and try to generate newer nodes. Since I want to get a prototype sooner than later I'm playing with https://github.com/markdown-it/markdown-it at the moment. I will keep an eye on the project though.

@wooorm
Copy link
Member

wooorm commented Dec 14, 2015

This will be closed by GH-109 and GH-96 in the upcoming 3.0.0 release.

@wooorm wooorm added this to the 3.0.0 milestone Dec 14, 2015
@wooorm wooorm self-assigned this Dec 14, 2015
wooorm added a commit that referenced this issue Dec 18, 2015
Since 6d13eb5, the parser was completely rewritten utilising a new,
performant, mechanism.  This mechanism makes interfacing with the
parser from a plug-in significantly simpler.

Closes GH-42.
Closes GH-109.
@wooorm wooorm closed this as completed in 7a5d16d Dec 24, 2015
@wooorm wooorm added 🦋 type/enhancement This is great to have plugin labels Jan 10, 2016
@amirhouieh
Copy link

Have been trying to extend the parser in order to wrap hashtags (let say twitter format). But getting this error

error  Error: Incorrectly eaten value:

where /^#(\w+)/.exec(value) would match for example#posters. But then eat("@posters") throwing this error. While /^@(\w+)/.exec(value) works perfectly. here is what I have:

...
tokenizeTags.locator = (value, fromIndex) => {
    return value.indexOf("#", fromIndex);
}

function tokenizeTags(eat, value, silent){
    const match = /^#(\w+)/.exec(value);
    if (match) {
      if (silent) {
        return true;
      }
     eat(match[0])({
          type: 'link',
          url:  `/${match[1]}`,
          children: [{type: 'text', value: match[0]}]
        });
   }

}

@wooorm
Copy link
Member

wooorm commented Oct 8, 2017

You should return a node if you find something. Something like mentions in remark-github looks pretty similar!

@amirhouieh
Copy link

@wooorm Thanks. That was the issue I think. Although I am getting wired duplicated html string at the end!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin 🦋 type/enhancement This is great to have
Development

No branches or pull requests

4 participants