-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Brain storming] New language dependency system #2880
Comments
As I said, this issue is for brainstorming, so here's an idea: We use Optional dependencies are harder. My idea is to do optional dependencies by lazily instantiating languages. Languages don't just add themselves to Prism, they add a function to create them instead. This allows Prism to 1) create language only as needed and 2) handle optional dependencies at runtime. This we know how to construct a language, so we can reload it easily (reloading is necessary for opt. deps). The API for this approach could look like this: type Grammar = Record<string, RegExp | { pattern: RegExp, inside?: Grammar }>;
interface GrammarProto {
id: string;
require?: GrammarProto[];
optional?: string[];
alias?: string[];
create(Prism: Prism): Grammar;
}
const Prism = {
languages: {
register(proto: GrammarProto): void { /* ... */ },
get(id: string): Grammar | undefined { /* ... */ },
}
}; Usage// file: prism-clike.js
export default {
id: 'clike',
create() {
return {
'comment': /.../,
'string': /.../,
...
}
}
}
// file: prism-javascript.js
import clike from "./prism-clike.js";
export default {
id: 'javascript',
require: [clike],
optional: ['regex'],
alias: ['js'],
create(Prism) {
return Prism.languages.extend('clike', {
'keyword': /.../,
'regex': {
pattern: /.../,
inside: Prism.languages.get('regex')
}
});
}
}
// file: some-user-file.js
import Prism from "prismjs/core.js"
import javascript from "prismjs/languages/prism-javascript.js"
Prism.languages.register(javascript)
Prism.highlight(code, Prism.languages.get('javascript'), 'javascript') Advantages:
Disadvantages:
|
Welcome to the party. ;-) This is exactly how we do it. This isn't a common thing in our ecosystem but I know at least one person doing this exact thing with their 3rd party grammar (to extend our built-in XML grammar)... you load a plugin, it fetches the original pre-compiled grammar definition (by rerunning it's definer function), modifies it (adding new sublanguage support), and then reregisters it (when it will get recompiled). You don't have a compile step I don't think but otherwise I think the process would be similar. You still have a potential load-order issue if multiple plugins were trying to make incompatible changes to the same grammar, but that's a bit more of an edge case. |
Now that you mention it, they really are the same. Does Highlight.js also deal with optional dependencies?
That's true. Right now, languages and plugins are handled by the same dependency system. The new system should probably also handle both. If both languages and plugins use the same system, the load-order issue should disappear, I think. That being said, any dependency system that works for languages can trivially be extended to plugins, so we only need to talk languages here. |
We'd have to define what that means exactly. Technically our |
Interesting. By using the id of a language in Prism should definitely copy this trick :) That being said, Prism's languages use optional dependencies not only for embedded languages, some also conditionally add more tokens if another language is present. (This is how I intend to get rid of While this trick doesn't eliminate optional dependencies, we could use it to cut down on the number of optional dependencies. It's also pretty easy to implement and will make some (recursive) language definitions simpler. |
Do you have an example or two of that? I feel like for the most part we avoid that with sub languages. The rules are there in the parent language but they are effectively NOOP without the appropriate grammar installed. |
Regex in JS and Markdown in GraphQL are examples of embedded languages. As for "conditionally add more tokens": nothing yet, I believe. Prism has A good example for that is CSS and CSS extras. While the selector insides could be done as an embedded language, the added // prism-css-extras.js
Prism.languages['css-extras'] = { 'color': /.../, ... };
Prism.languages['css-selector'] = { ... };
// prism-css.js
// CSS now has an optional dependency to css extras
Prism.languages.css = {
...,
'selector': { pattern: /.../, inside: 'css-selector' },
// add the css extra tokens
...(Prism.languages['css-extras'] || {})
}; |
Languages could be separate modules importing all their (required) dependencies. Prism could offer a promise syntax for when a specific language has been included that resolves to that language's grammar. Optional or modify dependencies could do things once/if that promise resolves. But stepping back for a moment, I think we should start from the opposite side: how do we expect users to include languages and plugins in v2? What's the least-friction syntax for them? |
From what I know, we have 4 camps RIGHT NOW.
<script src="path/to/prism.js">
<script src="https://some-cdn.io/prismjs/prism-core.js"></script>
<script src="https://some-cdn.io/prismjs/prism-autoloader.js"></script>
import Prism from "prismjs"
// ???
// well that's the problem. We don't support that yet.
const Prism = require("prismjs")
// if your on a server
const loadLanguages = require("prismjs/components/")
loadLanguages(["sql"])
// However, bundlers and loadLanguages don't do well together, so users are sometimes forced to do this:
require("prismjs/components/prism-sql")
// Does SQL have dependencies and now doesn't work because I forgot the require them ??? Who knows! In v2.0, the syntax for camps 1 and 2 should probably stay the same. So, what syntax do we want for ESM and CommonJS? @LeaVerou |
If this was a super common pattern I'd consider having a single unified file that's just a concat. |
We should definitely continue to support 1, 2, and support 3. How important is 4? Node supports ESM now, but not sure if they can be mixed with CJS. I think 3 should be something like: import Prism from "path/to/prism.js";
import "path/to/language1.js";
import "path/to/plugin1.js";
... and not this other pattern I saw recently: import Prism from "path/to/prism.js";
import language1 from "path/to/language1.js";
import plugin1 "path/to/plugin1.js";
language1(Prism);
plugin1(Prism);
... which is more friction and boilerplate for the end user. The upside of the latter is that it avoids ordering issues, but ideally plugins should be written in such a way that they work in any order. |
"Now" means since v13.2.0. Right now, we still support NodeJS v10.x, so we definitely have to support CommonJS. That being said, if we figure out a good way to handle ESM, we can trivially make an equivalent CommonJS API. So we probably don't have to worry about CommonJS at all. Please correct me if I'm wrong, but I think we really only need to talk about ESM imports from now on.
That's the goal of this issue... Right now, import order is important and has to be enforced by the user when importing specific languages (no matter the import style! Even The ESM API I used in the above example was only a mean to demonstrate my proposed dependency system. Whether we use side-effect-free imports or not doesn't matter in terms of the language dependency system. |
I believe ESM support was backported & unflagged in the v12.17 release, so if we require from that minor version up, we can go ESM-only. In terms of optional dependencies, if |
I wouldn't exactly call that a great user experience... Anyway, I would prefer it if we had this discussion at a later date. If we support ESM, we can trivially support CommonJS as well. Whether or not we actually want to support CommonJS can be decided once we support ESM.
That's the idea I proposed here. I called that function
I think it would be better if this problem was just not possible. I really want to avoid the current situation where you can do const Prism = require("prismjs");
require("prismjs/lang/prism-sql"); and nothing works because you forgot that SQL has dependencies (does SQL actually have dependencies? idk). Whether we throw an error during import or during highlighting doesn't matter. Our users shouldn't have to worry about language dependencies at all. In my idea here, each language has to bring its own dependencies with it when registering. This is a rather explicit solution but it does mean that a registered language is guaranteed to just work. |
Agreed.
This might work, but we should also support providing the grammar as a literal, for the simple case. In general, the complicated cases should not get in the way of simple usage. |
By "bring its own dependencies," is just... importing them... viable? import { register } from 'prismjs';
import 'path/to/dep';
register({ ... language definition ... }); I dunno if we're overthinking it by needing to provide the literal or something in the definition. |
@mAAdhaTTah I assumed that's what he meant. If not, yes, definitely importing them should be the way to go! No need to re-implement a dependency graph when the JS runtime implements a perfectly good one already. |
Node v12 (latest versions) should do ESM without the warning, but I can't give you a specific version number. That's my understanding from having this same discussion on Highlight.js. So v10 is the only major version you drop support for if if you go ESM only. To me "hard dependencies" are only an issue on the browser side (with raw JS files, option 1/2). If you're using imports (or require) with a build system or Node then a dependency can't really be missing at runtime. You'd be informed at build time and fix your broken build. With Highlight.js we've removed all hard dependencies at runtime. Every CDN build of a language is an isolated module. If Typescript requires some components of Javascript, then those bits are compiled in. In the past we would do a This is all done with ESM in the original source and Rollup is just used to compile every language (which might have numerous imports) into a single monolithic JS file that includes everything necessary. |
Yes, I provided a code example showing that. Hard (require) dependencies between languages are done using ESM and optional dependencies are done at runtime by Prism itself.
I think we might want to yoink this idea. This trivially solves the problem of we would implement Autoloader. AMD is another option but I think that "monolithic JS file"s are the better approach. |
…hook (#2719) The hook that highlights code blocks in markdown code was unable to handle code blocks that were highlighted already. The hook can now handle any existing markup in markdown code blocks.
To answer your question: It will be an overhaul. import { register } from "../global-register"
import a from "./prism-a"
import b from "./prism-b"
export default register({
id: "my-lang",
require: [a, b],
optional: "c",
grammar({ getLanguage }) {
return { 'comment': /.../, ... };
}
}) So why this:
Of course, for simple grammars without dependencies, we could allow: import { register } from "../global-register"
export default register({
id: "my-lang",
grammar: { 'comment': /.../, ... }
}) |
However, there is one this approach doesn't address yet: effects. Languages with hooks and all plugins have side effects. This is a problem because the registry has to be able to reload components. So I'm thinking about an import { register } from "../global-register"
export default register({
id: "my-plugin",
// the object for Prism.plugins[id]
plugin: { configProp1: 3 },
effect(Prism) {
// hooks.add returns a function (type: `() => void`) that removes the added hook
return Prism.hooks.add('before-tokenize', env => {});
}
}) (Note: Some plugins don't just have hooks as their effects, so we need a general solution for effects.) |
Some thoughts, in no particular order and not very edited as I'm scrambling to meet a deadline: I'm not sure we'd still want to have a global With ESM projects, there are two main use cases, with often a tension between them: The way many projects handle this is to have separate The additional challenge with Prism is that it doesn't make sense to have a single bundle with all languages and plugins, the sheer number of them makes this insane. So, while catering to advanced users is fairly straightforward, it's not clear how to cater to the simple case, the novices. Remember that the no-fuss Prism installation was one of the things that made it so successful, so we need to be careful to maintain this in v2. Having to import each language separately and registering stuff is incredibly fussy. It's useful for advanced users, but it cannot be the only way. It still needs to be possible to use Prism by simply including a Once each language is a separate module, it may make sense for Prism itself to have a mode where it loads any languages it encounters, asynchronously, via |
I don't understand why the proposed format has a |
We have to be able to recreate (formerly reload) languages because of optional dependencies, so we must know all dependencies of a given component. The rules are as follows:
(This is what we do right now. v2 follows the same rules, but it lazily creates languages, so recreating a language is free if the language wasn't used yet.) The
That's a good idea. We could do that for every component. Right now, I chose side effect (
We have to change AutoLoader anyway, so we might as well change it to work like this.
We need side effects to support the |
As long as two modules import |
True. With Autoloader using the side-effect-free versions, this could work. However, this does make the assumption that every component imports the same Core module. This won't necessarily be the case for bundles (the monolithic component files that include all their dependencies), so this might cause problems. Well, since side effect versions will be generated anyway, we can change how we do side effects later in the case that importing Core directly doesn't work out. |
Yeah but bundles don't need to import Core at all, the core and the languages/plugins are all in the same file. |
No, with bundles, I mean monolithic components here (is there a name for this stuff?). I.e. JS depends on C-like, so the bundle dist of JS would include a copy of C-like, while the ESM dist would import C-like. The goal of these bundle files in to eliminate any and all imports, so you can use them with a regular Or are we just going to say that browsers have to support
That can be pretty easily done like this, so I think this use case is covered. <script type=module>
import Prism from "bundle.js"
import a from "lang-a.js"
import b from "plugin-b.js"
Prism.register(a)
Prism.register(b)
</script> |
Brain dump: ... I kinda want to lean forward with v2. Is that bad? Could we get away with not using a build process at all? What would that look like? I know we want to make this easy to use for beginners, so if we had a More radical idea: What if we took this a step further and did away with the single global Prism instance and instead provide a factory function to create that instance? Since we have |
I really like the idea of Although, the transition away from it feels a bit too steep right now. On scale from "Prism does everything for you" and "you have to do it all yourself", it feels like we only provide the extremes right now. But the intermediate steps will hopefully be easy to add once we have the extremes.
I also like that. I intended to pack the main Prism functionality into a class anyway. The global Prism instance would have been exactly that: an instance.
Not necessarily. People that need strong browser support can use v1. The question is: How far do we want to go/what is the exact feature set we are going to require? We require ES6, yes, but different parts of ES6 have vastly different support. I mean, parts of ES7 have better support than some of ES6.
I'm not quite sure what you mean with that. Like |
I would be fine with only supporting very modern browsers with v2. Remember, cutting edge releases now, are the majority release in a year. It costs much more to refactor, whereas wider browser support is only a matter of time (and v2 will take some time to finish anyway). So I think only supporting the last 1-2 versions of current browsers when we start work on v2 is fine. I wouldn't even consider browsers that don't support My understanding was that people use bundles to reduce HTTP requests and DNS lookups, not primarily for browser support.
I believe he means |
Yeah, I was thinking minification would be the responsibility of the user. Do people use minifiers without bundlers? Is it too much on an assumption that you'd use those together? |
That sounds good. I would even suggest jumping to at least ES2017 to get async functions for any planned promise APIs. We might even want to go ES2018, because that added Unicode property escapes (e.g. Going even further, the To be clear: I'm suggesting this because these are useful features that Prism would benefit from, not because of a "then let's go all the way" mentality. |
Autoloader is supposed to work with CDNs, so we have to include minified files somewhere. I also don't want to lose the ability to do magic with our builds. If we completely got rid of builds, then we would make it impossible to later use a transpiler or upgrade to TS (at least let me hope). We also lose our Prism-specific minifier logic (the optimized source inlining), although that might not be too bad, because I got that included in Terser (albeit a less optimized version). |
I think that's fine. Wow, had never seen of the |
Wrt Safari and lookbehind, I see the bug is Assigned, so all hope is not lost. |
Uhm, what is? I've given 3 options...
They didn't work on it for 5 years now, but the ticket exists, yes. I'm about on the same level of hope as this person :) |
I don't see 3 options, I see a suggestion that we depend on 3 modern to cutting edge features, and I'm fine with depending on all of them 😁
True. Can we polyfill or transpile it? |
Oh, that's great!
No. Unfortunately, it's fundamentally not possible to mimic the behavior of lookbehinds (positive or negative) with other regex features, so transpiling is off the table. The only way to polyfill it, would be to have another regex engine that supports it and fallback to that engine if the browser doesn't support lookbehinds. One way would be to import Rust's fancy regex via wasm (we would have to make the wasm module ourselves). One could potentially also make a wasm module out of the regex engines of v8 or spidermonkey, but that might be difficult (e.g. v8's regex engine is tightly integrated with v8's memory model for JS objects, IIRC). However, all of those wasm modules would be at least 1MB in size (e.g. the wasm of re2 and rust's regex crate are both around 1MB, and they are simpler regex engines than what we would need). |
[In Highlight.js] We use some ES2018 ( We started using With v12 we're planning to go ESM only (for Node) but for the browser we'll still ship boring old CJS/global in addition to pure ESM modules. We still always resolve smaller dependencies at build-time... ie, our JS and TS modules are stand-alone - despite that in the source TS largely depends on JS. This is also true for our main library which in our case might be ~ 15 separate tiny JS files, but it compiles down to a one file monolith. Forcing the browser to import 15 different tiny modules makes no sense to me vs it being a build time concern. Just my 0.02. |
Motivation
Our current language dependency system has a few problems:
It is incompatible with ESM.
The current system relies on load order to manage both require dependencies and optional dependencies. Optional dependencies simply cannot be expressed using ESM's static
import
syntax.It has to be enforced by the user.
Nothing stops our users from loading languages in the wrong order. We regularly get issues where people have this problem.
It does not support third-party language definitions.
Right now, all dependencies have to be declared in
components.json
. This makes it very hard for authors of third-party languages to integrate their languages. They are essentially forced to write languages with zero dependencies.The first problem is huge because ESM is a goal for v2.0.
Goals
Primary goals:
ESM-compatible.
We have to be able to either write or create ESM source files. Source files have to be able to import each other to fulfill their require dependencies. (E.g. the JS language has to be able to import C-like.)
Optional dependencies.
The dependency system has to support optional dependencies. They are the reason this issue exists and they are necessary for Prism.
Secondary goals:
Non-goals
Backwards-compatibility.
Modify dependencies.
Modify dependencies are not strictly necessary. Right now, Prism has 13 modify dependencies. 12 of those can be easily converted into optional dependencies. Only 1 modify dependency will be difficult to convert into an optional dependency but I'm sure that we can find a solution for that.
@LeaVerou @Golmote @mAAdhaTTah
The text was updated successfully, but these errors were encountered: