-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out core functionality into separate module #1664
Comments
I agree, it's something I keep thinking since a long time I should do. Any advices/hints on how best to undertake this are welcome. |
Maybe the first step should be to move core functionality into a new The rules for the
For example, the This alone will take multiple iterations. The next step would be to make it possible to load |
I think there might be a better way to do this using the Node.js modules API. In any case, the first step should be to refactor the code and gradually move the |
I've been looking at modules, and I think I will go that route, my understanding is that this would make the code compatible with Node.js. |
It turns out the solution has nothing to do with the modules API at all but rather it's the VM API that we need to use. For the sake of this example, let's say console.log('loading foo.js');
if (typeof µBlockCore === 'undefined') {
µBlockCore = {};
}
µBlockCore.foo = function () {
console.log('foo!');
}; console.log('loading bar.js');
if (typeof µBlockCore === 'undefined') {
µBlockCore = {};
}
µBlockCore.bar = function () {
console.log('bar!');
}; In the browser extension, these files would be loaded like so: <script src="core/foo.js"></script>
<script src="core/bar.js"></script>
<script>
µBlockCore.foo();
µBlockCore.bar();
</script> Now the question is how to also make these files available to a Node.js program as a "package," without having to rewrite the code in these files. Here's a Node.js program that uses the package: let µBlockCore = require('ublock');
µBlockCore.foo();
µBlockCore.bar(); The package has been installed via GitHub using the command For it to be a valid package, it must have a Now the question is how to load Here's the magic: let fs = require('fs');
let path = require('path');
let vm = require('vm');
let environment = {
// Let µBlockCore be the current exports object so everything on it is
// exported directly out of this module.
µBlockCore: exports,
// Pass the global console object to each script so it can use it.
console
};
function loadScript(name) {
if (!vm.isContext(environment)) {
vm.createContext(environment);
}
vm.runInContext(fs.readFileSync(path.resolve(__dirname, name)),
environment);
}
loadScript('core/foo.js');
loadScript('core/bar.js'); We create a new VM "context" and run both Since the scripts need access to @gorhill if you've already decided to go down this route then I don't mind submitting a patch to set this up. |
I am already well advanced with using export/import and I like where it's going -- I am at the point where there is no hard dependencies to |
Alright, sounds good! There are two kinds of modules in Node.js and they don't seem to get along very well, but you can cross that bridge once you get to it. One thing you might want to keep in mind when you write the |
So I've reached a point where I could create an HTML page which is able to load uBO's static network filtering engine (what is meant to be benchmarked) with module-based approach, Example of usage<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>uBO Static Network Filtering Engine</title>
</head>
<body>
<script src="./ublock/src/lib/punycode.js"></script>
<script src="./ublock/src/lib/publicsuffixlist/publicsuffixlist.js"></script>
<script type="module">
import {
FilteringContext,
} from './ublock/src/js/core/filtering-context.js';
import {
CompiledListReader,
CompiledListWriter,
} from './ublock/src/js/core/compiledlistio.js';
import {
compile,
} from './ublock/src/js/core/static-filtering-compiler.js';
import {
staticNetFilteringEngine,
} from './ublock/src/js/core/static-net-filtering.js';
staticNetFilteringEngine.reset(); // Remove all filters
const applyList = async url => {
const list = await fetch(url)
const raw = await list.text();
const writer = new CompiledListWriter();
writer.properties.set('name', url);
const compiled = compile(raw, writer);
const reader = new CompiledListReader(compiled);
staticNetFilteringEngine.fromCompiled(reader);
};
(async ( ) => {
// Populate filtering engine with filter lists
await Promise.all([
applyList('https://easylist.to/easylist/easylist.txt'),
applyList('https://easylist.to/easylist/easyprivacy.txt'),
]);
// Commit changes
staticNetFilteringEngine.freeze({ optimize: true });
// Reuse filtering context: it's what uBO does
const fctxt = new FilteringContext();
// Tests
fctxt.setDocOriginFromURL('https://www.bloomberg.com/');
fctxt.setURL('https://www.bloomberg.com/tophat/assets/v2.6.1/that.css');
fctxt.setType('stylesheet');
if ( staticNetFilteringEngine.matchRequest(fctxt) !== 0 ) {
console.log(staticNetFilteringEngine.toLogData());
}
fctxt.setDocOriginFromURL('https://www.bloomberg.com/');
fctxt.setURL('https://securepubads.g.doubleclick.net/tag/js/gpt.js');
fctxt.setType('script');
if ( staticNetFilteringEngine.matchRequest(fctxt) !== 0 ) {
console.log(staticNetFilteringEngine.toLogData());
}
fctxt.setDocOriginFromURL('https://www.bloomberg.com/');
fctxt.setURL('https://sourcepointcmp.bloomberg.com/ccpa.js');
fctxt.setType('script');
if ( staticNetFilteringEngine.matchRequest(fctxt) !== 0 ) {
console.log(staticNetFilteringEngine.toLogData());
}
})();
</script>
</body>
</html> Now there is still a long way to go regarding all this work to have cleaner dependencies, naming, etc., but for the purpose of unblocking your work with Cliqz benchmark, and keeping in mind I have zero experience with nodejs or npm, what is left to address on my side for you to be able to run this in your benchmark package? Never mind, found a guide. |
This sounds about right! You might run into the problem that Node.js doesn't recognize |
Related issue: - uBlockOrigin/uBlock-issues#1664 The changes are enough to fulfill the related issue. A new platform has been added in order to allow for building a NodeJS package. From the root of the project: ./tools/make-nodejs This will create new uBlock0.nodejs directory in the ./dist/build directory, which is a valid NodeJS package. From the root of the package, you can try: node test This will instantiate a static network filtering engine, populated by easylist and easyprivacy, which can be used to match network requests by filling the appropriate filtering context object. The test.js file contains code which is typical example of usage of the package. Limitations: the NodeJS package can't execute the WASM versions of the code since the WASM module requires the use of fetch(), which is not available in NodeJS. This is a first pass at modularizing the codebase, and while at it a number of opportunistic small rewrites have also been made. This commit requires the minimum supported version for Chromium and Firefox be raised to 61 and 60 respectively.
I don't know if it's enough to unblock you, you can now build a nodejs package by using I didn't go with a |
Thanks, this should work. We'll need to benchmark the serialization functionality, and for this we could create a mock storage object in the benchmarking code: class MockStorage {
#map;
constructor() {
this.#map = new Map();
}
async put(key, value) {
this.#map.set(key, value);
}
async get(key) {
return this.#map.get(key);
}
}
let storage = new MockStorage(); And then we could do Please correct me if I'm wrong. |
A program using the package shouldn't have to read node -pe "JSON.stringify(fs.readFileSync('$DES/data/effective_tld_names.dat', 'utf8'))" > $DES/data/effective_tld_names.json After this a program can load the file like But you could also go a step further and not expose it at all: If there's no argument to the import { createRequire } from 'module';
const require = createRequire(import.meta.url);
function pslInit(raw = require('./data/effective_tld_names.json')) {
// ...
} |
Related feedback: - uBlockOrigin/uBlock-issues#1664 (comment)
Thanks for the feedback, I've made the suggested changes in gorhill/uBlock@e1222d1. |
I have a temporary branch here: https://github.com/mjethani/adblocker It's called update-ublockorigin-nodejs. Here's the sequence of commands to run the benchmarks:
@gorhill once this work is done, do you intend to merge it back into master? |
One more issue I ran into is that the Node.js package uses |
I will rework the code to avoid console usage for what is just normal functioning, stumbling into an invalid filter is an expected condition. Yes, I do plan to merge all this into master, and there will also be a uBlock0_[version].nodejs.zip package for each release so that it can simply be downloaded. |
If you make it a |
Related issue: - uBlockOrigin/uBlock-issues#1664 Modularization is a necessary step toward possibly publishing a more complete nodejs package to allow using uBO's filtering capabilities outside of the uBO extension. Additionally, as per feedback, remove undue usage of console output as per feedback: - uBlockOrigin/uBlock-issues#1664 (comment)
Merged.
|
Note that in the latest iteration, |
Related issue: - uBlockOrigin/uBlock-issues#1664 The various filtering engine benchmarking functions are best isolated in their own file since they have specific dependencies that should not be suffered by the filtering engines. Additionally, moved decomposeHostname() into uri-utils.js as it's a hostname-related function required by many filtering engine cores -- this allows to further reduce or outright remove dependency on `µb`.
Thanks. I've been experimenting with git submodules. You could do something like this:
From that point on someone who wants to build uBlock Origin could clone the main repo with |
With submodules: https://github.com/mjethani/uBlock/tree/submodules
|
Let's say I have work to do in uAssets repo, can it be done from within the submodule, i.e. git commit/pull/push etc. from within it? |
Thanks, I pulled your changes, I agree it's much cleaner and this simpilfies the make scripts. |
You should be able to do it. In fact, you could just delete the directory and make a symlink to |
Since you mentioned development, I should clarify that you'll have to update the version of uAssets in the main repo from time to time. Workflow:
|
I think I will just keep working in the separate uAssets repo to keep it simple. But now if locally I want to create a build with the current state of the uAssets repo, as per doc this is what needs to be done?
Which I did, but now it seems I have local changes in
Not sure what's next. Never mind, I just git add-ed/commit-ed, and push-ed as with any other update. |
Yes, the idea is that the main repo always refers to a specific version of uAssets. If there's a new version, you need to upgrade to it. I'm not sure if there's a way around this extra step. |
It's all fine, having locally up to date submodule uAssets is rarely needed, except when to test a new install of uBO, and my understanding is that the gihub action-driven build always use the latest state of uAssets as this was the case before. |
If you mean |
@gorhill This is not strictly related to the issue here but more of an optimization idea. As for why this still matters in 2021, personally I'd love to run uBlock Origin on a low-end Android device and every bit of improvement can make a difference. |
In uBO it's parsed once and then the state is serialized using toSelfie/fromSelfie, and from then on I could make |
@gorhill I have added a makefile: https://github.com/mjethani/uBlock/tree/makefile With this there's no need for the benchmarking code to know about the internal paths or anything else about the Node.js package. It can simply run This will also help with development in general: the |
Awesome, thank you very much -- can you submit a pull request? I assume you should be able to submit PRs since your last commit should have made you a contributor, i.e. allowed to submit PRs. |
Another small step toward the goal of reducing dependency on `µb`. Related issue: - uBlockOrigin/uBlock-issues#1664 text-iterators module has been renamed text-utils to better reflect its content.
@mjethani I added a I was curious to find out if it made a difference, but at a glance from the console output (executing |
@gorhill does uBlock Origin use Wasm on Chromium-based browsers? If the answer is no, maybe it's best to benchmark only the plain JS version on Node.js. I'd like to add SpiderMonkey too and in that case we might want to enable Wasm. |
It's not used in Chromium -- as this would require |
I know you said you wouldn't use it. |
@gorhill it's possible to convert a WebAssembly module into asm.js using the wasm2js tool in Emscripten. It makes me wonder if it might be possible to load it this way and how it might perform on V8. |
Suggestion: In |
Another way is to use a scope. e.g. |
I see ABP's core on npm, is there a reason they didn't go |
Related discussion: - uBlockOrigin/uBlock-issues#1664 (comment)
I have no idea. But I do know that npm recommends using a scope. By the way, even if you never end up publishing any packages on npm, it would still be a good idea to grab your username(s) if you haven't done so already. |
Related issue: - uBlockOrigin/uBlock-issues#1664 This change allows to add the redirect engine into the nodejs package. The purpose of the redirect engine is to resolve a redirect token into a path to a local resource, to be used by the caller as wished.
In case anyone stumbles by, the end result here is that the core of uBlock Origin is now available as a Node.js package in the npm registry: https://www.npmjs.com/package/@gorhill/ubo-core This has greatly simplified the integration with the Cliqz benchmarks and as a bonus has made the same code used in the uBlock Origin browser extension now available to any Node.js program. Thanks @gorhill for the effort. |
Related issue: - uBlockOrigin/uBlock-issues#1664
Prerequisites
I tried to reproduce the issue when...
Description
I have just updated the Cliqz benchmarks with the latest versions of Adblock Plus (ghostery/adblocker@a2da894) and Brave (ghostery/adblocker@97dc7c1)
Unfortunately it's not trivial to do the same for uBlock Origin, because of the relative tight coupling in the code. In fact, it keeps getting harder. For example,
uritools.js
now refers to thevAPI
object whereas previously it did not.It would be ideal if the core filter parsing, matching, and serialization functionality were split out into a separate Node.js module.
A specific URL where the issue occurs
N/A
Steps to Reproduce
Try to update the benchmarks at https://github.com/cliqz-oss/adblocker/tree/master/packages/adblocker-benchmarks with the latest version of uBlock Origin.
Expected behavior
It should be as easy as updating a version number or a commit hash in the
package.json
file in that repo.Actual behavior
It's too much work that involves copying and modifying a bunch of files from the uBlock Origin repo.
uBlock Origin version
1.36.2
Browser name and version
N/A
Operating System and version
N/A
The text was updated successfully, but these errors were encountered: