Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Brain storming] New language dependency system #2880

Open
RunDevelopment opened this issue Apr 30, 2021 · 40 comments
Open

[Brain storming] New language dependency system #2880

RunDevelopment opened this issue Apr 30, 2021 · 40 comments

Comments

@RunDevelopment
Copy link
Member

Motivation

Our current language dependency system has a few problems:

  • It is incompatible with ESM.

    The current system relies on load order to manage both require dependencies and optional dependencies. Optional dependencies simply cannot be expressed using ESM's static import syntax.

  • It has to be enforced by the user.

    Nothing stops our users from loading languages in the wrong order. We regularly get issues where people have this problem.

  • It does not support third-party language definitions.

    Right now, all dependencies have to be declared in components.json. This makes it very hard for authors of third-party languages to integrate their languages. They are essentially forced to write languages with zero dependencies.

The first problem is huge because ESM is a goal for v2.0.

Goals

Primary goals:

  • ESM-compatible.

    We have to be able to either write or create ESM source files. Source files have to be able to import each other to fulfill their require dependencies. (E.g. the JS language has to be able to import C-like.)

  • Optional dependencies.

    The dependency system has to support optional dependencies. They are the reason this issue exists and they are necessary for Prism.

Secondary goals:

  • Third-party authors should be able to "hook" into our dependency system.
  • Making the Node hack unnecessary.

Non-goals

  • Backwards-compatibility.

  • Modify dependencies.

    Modify dependencies are not strictly necessary. Right now, Prism has 13 modify dependencies. 12 of those can be easily converted into optional dependencies. Only 1 modify dependency will be difficult to convert into an optional dependency but I'm sure that we can find a solution for that.


@LeaVerou @Golmote @mAAdhaTTah

@RunDevelopment
Copy link
Member Author

As I said, this issue is for brainstorming, so here's an idea:

We use import statements for our require dependencies. I think everybody will agree with that.

Optional dependencies are harder. My idea is to do optional dependencies by lazily instantiating languages. Languages don't just add themselves to Prism, they add a function to create them instead. This allows Prism to 1) create language only as needed and 2) handle optional dependencies at runtime. This we know how to construct a language, so we can reload it easily (reloading is necessary for opt. deps).

The API for this approach could look like this:

type Grammar = Record<string, RegExp | { pattern: RegExp, inside?: Grammar }>;

interface GrammarProto {
	id: string;
	require?: GrammarProto[];
	optional?: string[];
	alias?: string[];
	create(Prism: Prism): Grammar;
}

const Prism = {
	languages: {
		register(proto: GrammarProto): void { /* ... */ },
		get(id: string): Grammar | undefined { /* ... */ },
	}
};
Usage
// file: prism-clike.js
export default {
	id: 'clike',
	create() {
		return {
			'comment': /.../,
			'string': /.../,
			...
		}
	}
}

// file: prism-javascript.js
import clike from "./prism-clike.js";

export default {
	id: 'javascript',
	require: [clike],
	optional: ['regex'],
	alias: ['js'],
	create(Prism) {
		return Prism.languages.extend('clike', {
			'keyword': /.../,
			'regex': {
				pattern: /.../,
				inside: Prism.languages.get('regex')
			}
		});
	}
}

// file: some-user-file.js
import Prism from "prismjs/core.js"
import javascript from "prismjs/languages/prism-javascript.js"
Prism.languages.register(javascript)

Prism.highlight(code, Prism.languages.get('javascript'), 'javascript')

Advantages:

  • Easy to use for us and third-party authors
  • Enforced by default. Users can't get around it.

Disadvantages:

  • Optional dependencies are now handled by Prism Core. This whole thing can be implemented in a few lines of code but will add to Core's bundle size nonetheless.

@joshgoebel
Copy link

Languages don't just add themselves to Prism, they add a function to create them instead.

Welcome to the party. ;-) This is exactly how we do it. This isn't a common thing in our ecosystem but I know at least one person doing this exact thing with their 3rd party grammar (to extend our built-in XML grammar)... you load a plugin, it fetches the original pre-compiled grammar definition (by rerunning it's definer function), modifies it (adding new sublanguage support), and then reregisters it (when it will get recompiled).

You don't have a compile step I don't think but otherwise I think the process would be similar.

You still have a potential load-order issue if multiple plugins were trying to make incompatible changes to the same grammar, but that's a bit more of an edge case.

@RunDevelopment
Copy link
Member Author

Welcome to the party. ;-) This is exactly how we do it.

Now that you mention it, they really are the same. Does Highlight.js also deal with optional dependencies?

You still have a potential load-order issue if multiple plugins were trying to make incompatible changes to the same grammar

That's true. Right now, languages and plugins are handled by the same dependency system. The new system should probably also handle both. If both languages and plugins use the same system, the load-order issue should disappear, I think.

That being said, any dependency system that works for languages can trivially be extended to plugins, so we only need to talk languages here.

@joshgoebel
Copy link

Does Highlight.js also deal with optional dependencies?

We'd have to define what that means exactly. Technically our sublanguage support is all optional. If you don't have a language registered it will silently not work for sublanguage blocks. Register the language and suddenly it starts working. And all plugins are all optional and registered at runtime.

@RunDevelopment
Copy link
Member Author

Interesting. By using the id of a language in sublanguage, Highlight.js gets around having to worry about optional dependencies.

Prism should definitely copy this trick :)

That being said, Prism's languages use optional dependencies not only for embedded languages, some also conditionally add more tokens if another language is present. (This is how I intend to get rid of modify dependencies.)

While this trick doesn't eliminate optional dependencies, we could use it to cut down on the number of optional dependencies. It's also pretty easy to implement and will make some (recursive) language definitions simpler.

@joshgoebel
Copy link

That being said, Prism's languages use optional dependencies not only for embedded languages, some also conditionally add more tokens if another language is present

Do you have an example or two of that? I feel like for the most part we avoid that with sub languages. The rules are there in the parent language but they are effectively NOOP without the appropriate grammar installed.

@RunDevelopment
Copy link
Member Author

Do you have an example or two of that?

Regex in JS and Markdown in GraphQL are examples of embedded languages.

As for "conditionally add more tokens": nothing yet, I believe. Prism has modify dependencies right now but I intend to replace them with optional dependencies (to make the deps system simpler).

A good example for that is CSS and CSS extras. While the selector insides could be done as an embedded language, the added color, hexcode, etc tokens cannot. With optional deps, you could implement CSS+CSS extras kinda like this:

// prism-css-extras.js
Prism.languages['css-extras'] = { 'color': /.../, ... };
Prism.languages['css-selector'] = { ... };

// prism-css.js
// CSS now has an optional dependency to css extras
Prism.languages.css = {
  ...,
  'selector': { pattern: /.../, inside: 'css-selector' },
  
  // add the css extra tokens
  ...(Prism.languages['css-extras'] || {})
};

@LeaVerou
Copy link
Member

LeaVerou commented May 1, 2021

Languages could be separate modules importing all their (required) dependencies. Prism could offer a promise syntax for when a specific language has been included that resolves to that language's grammar. Optional or modify dependencies could do things once/if that promise resolves.

But stepping back for a moment, I think we should start from the opposite side: how do we expect users to include languages and plugins in v2? What's the least-friction syntax for them?

@RunDevelopment
Copy link
Member Author

RunDevelopment commented May 1, 2021

how do we expect users to include languages and plugins in v2? What's the least-friction syntax for them?

From what I know, we have 4 camps RIGHT NOW.

  1. A Prism bundle from our download page
<script src="path/to/prism.js">
  1. A Prism core + Autoloader combo
<script src="https://some-cdn.io/prismjs/prism-core.js"></script>
<script src="https://some-cdn.io/prismjs/prism-autoloader.js"></script>
  1. ESM imports
import Prism from "prismjs"
// ??? 
// well that's the problem. We don't support that yet.
  1. CommonJS imports
const Prism = require("prismjs")

// if your on a server
const loadLanguages = require("prismjs/components/")
loadLanguages(["sql"])

// However, bundlers and loadLanguages don't do well together, so users are sometimes forced to do this:
require("prismjs/components/prism-sql") 
// Does SQL have dependencies and now doesn't work because I forgot the require them ??? Who knows!

In v2.0, the syntax for camps 1 and 2 should probably stay the same.

So, what syntax do we want for ESM and CommonJS? @LeaVerou

@joshgoebel
Copy link

<script src="https://some-cdn.io/prismjs/prism-core.js"></script>
<script src="https://some-cdn.io/prismjs/prism-autoloader.js"></script>

If this was a super common pattern I'd consider having a single unified file that's just a concat.

@LeaVerou
Copy link
Member

LeaVerou commented May 3, 2021

@RunDevelopment

We should definitely continue to support 1, 2, and support 3. How important is 4? Node supports ESM now, but not sure if they can be mixed with CJS.
1 can be generated by using Rollup in the client, which is possible, though painfully underdocumented.

I think 3 should be something like:

import Prism from "path/to/prism.js";
import "path/to/language1.js";
import "path/to/plugin1.js";
...

and not this other pattern I saw recently:

import Prism from "path/to/prism.js";
import language1 from "path/to/language1.js";
import plugin1 "path/to/plugin1.js";
language1(Prism);
plugin1(Prism);
...

which is more friction and boilerplate for the end user. The upside of the latter is that it avoids ordering issues, but ideally plugins should be written in such a way that they work in any order.

@RunDevelopment
Copy link
Member Author

Node supports ESM now

"Now" means since v13.2.0. Right now, we still support NodeJS v10.x, so we definitely have to support CommonJS.

That being said, if we figure out a good way to handle ESM, we can trivially make an equivalent CommonJS API. So we probably don't have to worry about CommonJS at all.

Please correct me if I'm wrong, but I think we really only need to talk about ESM imports from now on.

[...] ideally plugins should be written in such a way that they work in any order.

That's the goal of this issue...

Right now, import order is important and has to be enforced by the user when importing specific languages (no matter the import style! Even <script src="../prism-lang.js"> imports are affected).

The ESM API I used in the above example was only a mean to demonstrate my proposed dependency system. Whether we use side-effect-free imports or not doesn't matter in terms of the language dependency system.

@mAAdhaTTah
Copy link
Member

"Now" means since v13.2.0. Right now, we still support NodeJS v10.x, so we definitely have to support CommonJS.

I believe ESM support was backported & unflagged in the v12.17 release, so if we require from that minor version up, we can go ESM-only.


In terms of optional dependencies, if register took in a generate function, rather than the final language, Prism could generate the languages lazily at first highlight so they don't even need to be registered in a particular order. Similarly, it would allow Prism to regenerate languages that have optional dependencies if those optional deps are registered after the language has already been generated. Having the language resolution internally would also allow Prism to be smarter and throwing/returning errors if you attempt to highlight something without its dependencies registered.

@RunDevelopment
Copy link
Member Author

"Unlike Node.js 14, using ESM will still emit a runtime experimental warning, either when a module is used a the application's entrypoint or the first time dynamic import() is called." - Node v12.17.0 release notes

I wouldn't exactly call that a great user experience...

Anyway, I would prefer it if we had this discussion at a later date. If we support ESM, we can trivially support CommonJS as well. Whether or not we actually want to support CommonJS can be decided once we support ESM.


if register took in a generate function

That's the idea I proposed here. I called that function create tho ;)

Having the language resolution internally would also allow Prism to be smarter and throwing/returning errors if you attempt to highlight something without its dependencies registered.

I think it would be better if this problem was just not possible. I really want to avoid the current situation where you can do

const Prism = require("prismjs");
require("prismjs/lang/prism-sql");

and nothing works because you forgot that SQL has dependencies (does SQL actually have dependencies? idk). Whether we throw an error during import or during highlighting doesn't matter. Our users shouldn't have to worry about language dependencies at all.

In my idea here, each language has to bring its own dependencies with it when registering. This is a rather explicit solution but it does mean that a registered language is guaranteed to just work.

@LeaVerou
Copy link
Member

LeaVerou commented May 3, 2021

Whether we throw an error during import or during highlighting doesn't matter. Our users shouldn't have to worry about language dependencies at all.

Agreed.

In my idea here, each language has to bring its own dependencies with it when registering. This is a rather explicit solution but it does mean that a registered language is guaranteed to just work.

This might work, but we should also support providing the grammar as a literal, for the simple case. In general, the complicated cases should not get in the way of simple usage.

@mAAdhaTTah
Copy link
Member

mAAdhaTTah commented May 3, 2021

In my idea here, each language has to bring its own dependencies with it when registering.

By "bring its own dependencies," is just... importing them... viable?

import { register } from 'prismjs';
import 'path/to/dep';

register({ ... language definition ... });

I dunno if we're overthinking it by needing to provide the literal or something in the definition.

@LeaVerou
Copy link
Member

LeaVerou commented May 3, 2021

@mAAdhaTTah I assumed that's what he meant. If not, yes, definitely importing them should be the way to go! No need to re-implement a dependency graph when the JS runtime implements a perfectly good one already.

@joshgoebel
Copy link

joshgoebel commented May 3, 2021

Node v12 (latest versions) should do ESM without the warning, but I can't give you a specific version number. That's my understanding from having this same discussion on Highlight.js. So v10 is the only major version you drop support for if if you go ESM only.

To me "hard dependencies" are only an issue on the browser side (with raw JS files, option 1/2). If you're using imports (or require) with a build system or Node then a dependency can't really be missing at runtime. You'd be informed at build time and fix your broken build.

With Highlight.js we've removed all hard dependencies at runtime. Every CDN build of a language is an isolated module. If Typescript requires some components of Javascript, then those bits are compiled in. In the past we would do a requireLanguage("javascript") call which could break at runtime depending on whether or not the user had installed javascript. This is now no longer possible.

This is all done with ESM in the original source and Rollup is just used to compile every language (which might have numerous imports) into a single monolithic JS file that includes everything necessary.

@RunDevelopment
Copy link
Member Author

By "bring its own dependencies," is just... importing them...

Yes, I provided a code example showing that.

Hard (require) dependencies between languages are done using ESM and optional dependencies are done at runtime by Prism itself.

With Highlight.js we've removed all hard dependencies at runtime.

I think we might want to yoink this idea. This trivially solves the problem of we would implement Autoloader.

AMD is another option but I think that "monolithic JS file"s are the better approach.

RunDevelopment referenced this issue Jul 2, 2021
…hook (#2719)

The hook that highlights code blocks in markdown code was unable to handle code blocks that were highlighted already. The hook can now handle any existing markup in markdown code blocks.
@RunDevelopment
Copy link
Member Author

@LeaVerou @mAAdhaTTah

To answer your question: It will be an overhaul.
Right now, I imagine an API like this:

import { register } from "../global-register"
import a from "./prism-a"
import b from "./prism-b"

export default register({
  id: "my-lang",
  require: [a, b],
  optional: "c",
  grammar({ getLanguage }) {
    return { 'comment': /.../, ... };
  }
})

So why this:

  1. register adds the given language proto to the the language registry of the global Prism instance. This means that users can import "path/to/lang" or <script src="path/to/lang.js" /> and it'll work.
    register itself is quite a simple function that is defined as such (just more robust):
    export function register(proto) {
      globalThis.Prism.register(proto);
      return proto;
    }
  2. Language protos need to include their require deps and declare their optional deps. If they don't, then the runtime addition of optional dependencies (e.g. via Autoloader) won't work.
  3. create is a function because the lazy runtime creating of languages means that you can only use dependencies after the registry instanciated them. It also has to be a pure function, because the registry might have to run it again when dependencies change.
  4. The export because other languages have to be able to get the proto of this language if they depend on it.

Of course, for simple grammars without dependencies, we could allow:

import { register } from "../global-register" 

export default register({
  id: "my-lang", 
  grammar: { 'comment': /.../, ... }
})

@RunDevelopment
Copy link
Member Author

However, there is one this approach doesn't address yet: effects. Languages with hooks and all plugins have side effects. This is a problem because the registry has to be able to reload components.

So I'm thinking about an effect function that adds the hooks and other side effects of a component. Since we need to be able to undo the effects, the effect function returns a function that undoes the effect. Similar to React's useEffect's cleanup function.

import { register } from "../global-register" 

export default register({
  id: "my-plugin", 
  // the object for Prism.plugins[id]
  plugin: { configProp1: 3 },
  effect(Prism) {
    // hooks.add returns a function (type: `() => void`) that removes the added hook
    return Prism.hooks.add('before-tokenize', env => {});
  }
})

(Note: Some plugins don't just have hooks as their effects, so we need a general solution for effects.)

@LeaVerou
Copy link
Member

Some thoughts, in no particular order and not very edited as I'm scrambling to meet a deadline:

I'm not sure we'd still want to have a global Prism object once we move to ESM, it kinda defeats the purpose.

With ESM projects, there are two main use cases, with often a tension between them:
- Novice users want convenience. They don't want to handle importing various modules and registering things. They just want something that works, ideally with a single import.
- Advanced users often want tree-shakeability. This means no side effects from modules that are not actually used but just imported. This is where registration methods can be useful.

The way many projects handle this is to have separate index.js and index-fn.js files, the former with side effects and the latter without. That's what we did in color.js too.

The additional challenge with Prism is that it doesn't make sense to have a single bundle with all languages and plugins, the sheer number of them makes this insane. So, while catering to advanced users is fairly straightforward, it's not clear how to cater to the simple case, the novices. Remember that the no-fuss Prism installation was one of the things that made it so successful, so we need to be careful to maintain this in v2. Having to import each language separately and registering stuff is incredibly fussy. It's useful for advanced users, but it cannot be the only way. It still needs to be possible to use Prism by simply including a <script type=module>. Though just having the existing generator, adapted for the ESM architecture would help with that I assume, so maybe my concern is misplaced.

Once each language is a separate module, it may make sense for Prism itself to have a mode where it loads any languages it encounters, asynchronously, via import(language_module_url), which would be relative to import.meta.url. Then people could even just import it from prismjs.com directly, and stuff just works.

@LeaVerou
Copy link
Member

I don't understand why the proposed format has a require property. One of the advantages of ESM is that they manage their own dependencies. We should hook into that, rather than continuing to have our own dependency system.

@RunDevelopment
Copy link
Member Author

I don't understand why the proposed format has a require property

We have to be able to recreate (formerly reload) languages because of optional dependencies, so we must know all dependencies of a given component. The rules are as follows:

  1. If a component gets added, then all languages that optionally depend on this component must be recreated.
  2. If a component gets recreated, then all components that depend on this component (optionally or required) must also get recreated.

(This is what we do right now. v2 follows the same rules, but it lazily creates languages, so recreating a language is free if the language wasn't used yet.)

The require property is also necessary for correctness when making different Prism instances (e.g. for our testing). If the required property wasn't there, then you could make a new Prism instance and manually add a language/plugin proto without its dependencies.

The way many projects handle this is to have separate index.js and index-fn.js files, the former with side effects and the latter without. That's what we did in color.js too.

That's a good idea. We could do that for every component. Right now, I chose side effect (register) + export. But making them side effect free and then generating a version with side effects (so you can <script src=""/> import them) is better.

Once each language is a separate module, it may make sense for Prism itself to have a mode where it loads any languages it encounters, asynchronously, via import(language_module_url), which would be relative to import.meta.url

We have to change AutoLoader anyway, so we might as well change it to work like this.

I'm not sure we'd still want to have a global Prism object once we move to ESM, it kinda defeats the purpose.

We need side effects to support the <script/> import case, so that's how I did side effects. How else would you do it?

@LeaVerou
Copy link
Member

I'm not sure we'd still want to have a global Prism object once we move to ESM, it kinda defeats the purpose.

We need side effects to support the <script/> import case, so that's how I did side effects. How else would you do it?

As long as two modules import Prism from the same URL, they can both add stuff to it and it's the same object. :)

@RunDevelopment
Copy link
Member Author

True. With Autoloader using the side-effect-free versions, this could work.

However, this does make the assumption that every component imports the same Core module. This won't necessarily be the case for bundles (the monolithic component files that include all their dependencies), so this might cause problems.

Well, since side effect versions will be generated anyway, we can change how we do side effects later in the case that importing Core directly doesn't work out.

@LeaVerou
Copy link
Member

Yeah but bundles don't need to import Core at all, the core and the languages/plugins are all in the same file.
Unless you're talking about the use case where somebody imports a bundle, then tries to add more languages and plugins to it?

@RunDevelopment
Copy link
Member Author

No, with bundles, I mean monolithic components here (is there a name for this stuff?). I.e. JS depends on C-like, so the bundle dist of JS would include a copy of C-like, while the ESM dist would import C-like. The goal of these bundle files in to eliminate any and all imports, so you can use them with a regular <script src=""/>, which has better browser support than type=module.

Or are we just going to say that browsers have to support type=module? Same for import(), which has even worse browser support.

Unless you're talking about the use case where somebody imports a bundle, then tries to add more languages and plugins to it?

That can be pretty easily done like this, so I think this use case is covered.

<script type=module>
import Prism from "bundle.js"
import a from "lang-a.js"
import b from "plugin-b.js"
Prism.register(a)
Prism.register(b)
</script>

@mAAdhaTTah
Copy link
Member

Or are we just going to say that browsers have to support type=module? Same for import(), which has even worse browser support.

Brain dump:

... I kinda want to lean forward with v2. Is that bad? Could we get away with not using a build process at all? What would that look like? I know we want to make this easy to use for beginners, so if we had a prism/autostart (or any better name), they could import that, which would use the autoloader & load events to highlight, and everything else could be a pure module. Does that lean too far away from ease-of-use? Beginners would then a weaker performance (all languages are loaded from the server lazily, so highlighting is delayed until the language is loaded), but wouldn't even need to build a version from the website. They could download the bundle of files & put them on a server, or import or script them in directly from a CDN. Are we doing beginner users a disservice from denying so a wide swath of their own audience (any of their readers using a browser that doesn't support import()will have problems)?

More radical idea: What if we took this a step further and did away with the single global Prism instance and instead provide a factory function to create that instance? Since we have /autostart, it won't make it any more difficult for beginners and would give advanced users more flexibility (but is that useful flexibility? Maybe not...).

@RunDevelopment
Copy link
Member Author

I really like the idea of /autostart. I'm also in favor of separating the load evens and Core. It feels a little silly that so much browser-specific code gets loaded and promptly ignored by Node.

Although, the transition away from it feels a bit too steep right now. On scale from "Prism does everything for you" and "you have to do it all yourself", it feels like we only provide the extremes right now. But the intermediate steps will hopefully be easy to add once we have the extremes.

What if we took this a step further and did away with the single global Prism instance and instead provide a factory function to create that instance?

I also like that. I intended to pack the main Prism functionality into a class anyway. The global Prism instance would have been exactly that: an instance.

I kinda want to lean forward with v2. Is that bad?

Not necessarily. People that need strong browser support can use v1.

The question is: How far do we want to go/what is the exact feature set we are going to require? We require ES6, yes, but different parts of ES6 have vastly different support. I mean, parts of ES7 have better support than some of ES6.

Could we get away with not using a build process at all?

I'm not quite sure what you mean with that. Like npm run build, minification included?

@LeaVerou
Copy link
Member

I would be fine with only supporting very modern browsers with v2. Remember, cutting edge releases now, are the majority release in a year. It costs much more to refactor, whereas wider browser support is only a matter of time (and v2 will take some time to finish anyway). So I think only supporting the last 1-2 versions of current browsers when we start work on v2 is fine. I wouldn't even consider browsers that don't support type=module or import(), those are ancient history.

My understanding was that people use bundles to reduce HTTP requests and DNS lookups, not primarily for browser support.

I'm not quite sure what you mean with that. Like npm run build, minification included?

I believe he means import Prism from "https://prismjs.com/src/prism-core.js" (or local files), without having to make any bundles.
But yeah, I suppose there's always minification.

@mAAdhaTTah
Copy link
Member

Yeah, I was thinking minification would be the responsibility of the user. Do people use minifiers without bundlers? Is it too much on an assumption that you'd use those together?

@RunDevelopment
Copy link
Member Author

RunDevelopment commented Aug 21, 2022

So I think only supporting the last 1-2 versions of current browsers when we start work on v2 is fine.

That sounds good. I would even suggest jumping to at least ES2017 to get async functions for any planned promise APIs.

We might even want to go ES2018, because that added Unicode property escapes (e.g. \p{L}) which is huge for us, because Unicode classes are commonly used in language specs, but we could never get them right. Lookbehinds would also be huge, but Safari is raining on that parade, unfortunately.

Going even further, the v flag proposal would also be useful to us (we currently use lookarounds to do set operations, which comes at a performance cost). So would we be comfortable going with the current ES version or even ESNext and then transpiling down?

To be clear: I'm suggesting this because these are useful features that Prism would benefit from, not because of a "then let's go all the way" mentality.

@RunDevelopment
Copy link
Member Author

Yeah, I was thinking minification would be the responsibility of the user.

Autoloader is supposed to work with CDNs, so we have to include minified files somewhere.

I also don't want to lose the ability to do magic with our builds. If we completely got rid of builds, then we would make it impossible to later use a transpiler or upgrade to TS (at least let me hope). We also lose our Prism-specific minifier logic (the optimized source inlining), although that might not be too bad, because I got that included in Terser (albeit a less optimized version).

@LeaVerou
Copy link
Member

So I think only supporting the last 1-2 versions of current browsers when we start work on v2 is fine.

That sounds good. I would even suggest jumping to at least ES2017 to get async functions for any planned promise APIs.

We might even want to go ES2018, because that added Unicode property escapes (e.g. \p{L}) which is huge for us, because Unicode classes are commonly used in language specs, but we could never get them right. Lookbehinds would also be huge, but Safari is raining on that parade, unfortunately.

Going even further, the v flag proposal would also be useful to us (we currently us lookarounds to do set operations, which comes at a performance cost). So would we be comfortable going with the current ES version or even ESNext and then transpiling down?

To be clear: I'm suggesting this because these are useful features that Prism would benefit from, not because of a "then let's go all the way" mentality.

I think that's fine. Wow, had never seen of the v flag, that's indeed super useful for Prism!

@LeaVerou
Copy link
Member

Wrt Safari and lookbehind, I see the bug is Assigned, so all hope is not lost.

@RunDevelopment
Copy link
Member Author

I think that's fine.

Uhm, what is? I've given 3 options...

I see the bug is Assigned, so all hope is not lost.

They didn't work on it for 5 years now, but the ticket exists, yes. I'm about on the same level of hope as this person :)

@LeaVerou
Copy link
Member

I think that's fine.

Uhm, what is? I've given 3 options...

I don't see 3 options, I see a suggestion that we depend on 3 modern to cutting edge features, and I'm fine with depending on all of them 😁

I see the bug is Assigned, so all hope is not lost.

They didn't work on it for 5 years now, but the ticket exists, yes. I'm about on the same level of hope as this person :)

True. Can we polyfill or transpile it?

@RunDevelopment
Copy link
Member Author

I'm fine with depending on all of them 😁

Oh, that's great!

Can we polyfill or transpile it?

No. Unfortunately, it's fundamentally not possible to mimic the behavior of lookbehinds (positive or negative) with other regex features, so transpiling is off the table.

The only way to polyfill it, would be to have another regex engine that supports it and fallback to that engine if the browser doesn't support lookbehinds. One way would be to import Rust's fancy regex via wasm (we would have to make the wasm module ourselves). One could potentially also make a wasm module out of the regex engines of v8 or spidermonkey, but that might be difficult (e.g. v8's regex engine is tightly integrated with v8's memory model for JS objects, IIRC). However, all of those wasm modules would be at least 1MB in size (e.g. the wasm of re2 and rust's regex crate are both around 1MB, and they are simpler regex engines than what we would need).

@joshgoebel
Copy link

We might even want to go ES2018, because that added Unicode property escapes (e.g. \p{L}) which is huge for us,

[In Highlight.js] We use some ES2018 ([... but not {...), always keeping an eye on which browsers support which features... if something has been released for quite a while I'm generally not afraid of using it - generally we support "recent versions of green-field browsers". The problem with some of the new stuff though is it's a hard syntax error if you run into an old browser - which can sometimes break someone's ENTIRE JS stack (when using a bundler)...

We started using \p a while back (perhaps slightly prematurely) and broke some very old Extended Service releases of Firefox, but after shipping it I didn't see the value in rewinding it based on how tiny the affected user pool was - and those releases were going EOL soon anyways.

With v12 we're planning to go ESM only (for Node) but for the browser we'll still ship boring old CJS/global in addition to pure ESM modules. We still always resolve smaller dependencies at build-time... ie, our JS and TS modules are stand-alone - despite that in the source TS largely depends on JS. This is also true for our main library which in our case might be ~ 15 separate tiny JS files, but it compiles down to a one file monolith. Forcing the browser to import 15 different tiny modules makes no sense to me vs it being a build time concern.

Just my 0.02.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants