Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull data files from ICU GitHub instead of npm #53

Merged
merged 4 commits into from
Oct 4, 2021
Merged

Conversation

srl295
Copy link
Member

@srl295 srl295 commented Jul 17, 2020

feat: Pull from GitHub instead of npm

  • for ICU v67+, pull the icu data bin zip file
  • For earlier ones, use ICU's GitHub release's tarball instead of npm (ICU v50+)
  • set env FULL_ICU_PREFER_NPM to prefer npm instead
  • add .eslint

Fixes: #36
Fixes: #61
Fixes: #38

@CLAassistant
Copy link

CLAassistant commented Jul 17, 2020

CLA assistant check
All committers have signed the CLA.

package.json Outdated Show resolved Hide resolved
@srl295 srl295 marked this pull request as draft September 24, 2021 21:23
@srl295 srl295 self-assigned this Sep 24, 2021
@srl295 srl295 force-pushed the pull-l-from-gh branch 3 times, most recently from 0034ab8 to 3a0a45a Compare September 24, 2021 22:35
@srl295 srl295 marked this pull request as ready for review September 24, 2021 22:35
@srl295
Copy link
Member Author

srl295 commented Sep 24, 2021

okey dokey I'm not sure where all the semicolons went, and it's too noisy, but it seems to work.

Set env var FULL_ICU_PREFER_NPM=1 to stick with the current behavior

@srl295 srl295 changed the title Pull l from gh Pull data files from ICU GitHub instead of npm Sep 24, 2021
@srl295
Copy link
Member Author

srl295 commented Sep 24, 2021

force push to fix the logic for little endian folks

@srl295 srl295 linked an issue Sep 24, 2021 that may be closed by this pull request
@srl295 srl295 linked an issue Sep 24, 2021 that may be closed by this pull request
@srl295
Copy link
Member Author

srl295 commented Sep 27, 2021

I would say I'm trying to work myself out of a job, but that kind of took care of itself last year 8-D

@srl295
Copy link
Member Author

srl295 commented Sep 28, 2021

in yarnpkg/berry#873 (comment) @arcanis said:

… running yarn add while an install is already in progress is completely invalid as it would likely lead to dependency tree corruptions

I'm going to take that as a vote against using npm|yarn add

@srl295
Copy link
Member Author

srl295 commented Sep 28, 2021

any other comments from @nodejs/npm ? like to make sure i'm fixing a real problem, and that the fix is better than the problem, and that i've well documented why a change is being made.

@srl295
Copy link
Member Author

srl295 commented Sep 28, 2021

post install considered harmful by yarn: https://yarnpkg.com/advanced/lifecycle-scripts#a-note-about-postinstall

@MylesBorins
Copy link

@srl295 I guess I'm not entirely understanding the process of what is happening with these changes, and how it is not dynamic.

If this is an incremental improvement and makes things more reliable it probably makes sense to move forward with it and perhaps explore ways to improve it moving forward. Happy to find time for a quick call if you want to talk through it, feel free to dm me to find time.

@arcanis
Copy link

arcanis commented Sep 28, 2021

I don't have much context on this particular package or change, but as a general tip I got feedback in the past that serving assets directly from GitHub was problematic for users behind certain firewalls (particularly China), cf yarnpkg/berry#982.

@wraithgar
Copy link

any other comments from @nodejs/npm ? like to make sure i'm fixing a real problem, and that the fix is better than the problem, and that i've well documented why a change is being made.

I'm not sure what the problem here is tbh. Is it the postinstall approach or is it "where should we source this content?"

If the problem being solve here is "installing platform-specific assets" then this is the same problem that chromedriver also solves using a manual download https://github.com/giggio/node-chromedriver/blob/main/install.js The only fundamental difference is that they hook into the install event. It does not appear to make a difference in either implementation which event would want to be used, postinstall is simply guaranteed to run after the install event and neither project has both types of scripts.

If the problem being solved here is "where do we source the assets from" then we do lose the baked in idempotency we get from the npm registry and also the checksum validation that an npm or yarn install would give. We also, as @arcanis pointed out, lose some availability from behind certain firewalls. Checksums are already baked into the npm registry:

$ npm view icu4c-data@67l dist
{
  integrity: 'sha512-OIRiop+k1IVf4TBLEOj910duoO9NKwtJLwp++qWT6KT5gRziHNt+5gwhcGuTqRy++RTK2gLoAIbk8KYCNxW++g==',
  shasum: '4c5264a9ec6e2d126b8d47b6faa618025ccbaaaf',
  tarball: 'https://registry.npmjs.org/icu4c-data/-/icu4c-data-0.67.2.tgz',

and I would recommend using those since that's already what will happen when you use the old npm method. This way there is a "guarantee" that there is no difference between the results based on if you use npm install or manually download. You'll source it from the same place.

@srl295
Copy link
Member Author

srl295 commented Sep 28, 2021

any other comments from @nodejs/npm ? like to make sure i'm fixing a real problem, and that the fix is better than the problem, and that i've well documented why a change is being made.

I'm not sure what the problem here is tbh. Is it the postinstall approach or is it "where should we source this content?"

Both are problematic. #38 summarizes it: "you can't npm/yarn install (a random asset) from within a postinstall script". Both npm and yarn folks have confirmed this.

npm seemed like one possible place to store the data due to the caching and availability mentioned.

If the problem being solve here is "installing platform-specific assets" then this is the same problem that chromedriver also solves using a manual download https://github.com/giggio/node-chromedriver/blob/main/install.js The only fundamental difference is that they hook into the install event. It does not appear to make a difference in either implementation which event would want to be used, postinstall is simply guaranteed to run after the install event and neither project has both types of scripts.

It's "ICU version and endianness specific". So not quite platform specific. The ICU version changes once or twice a year.

The basic logic is:

  • Do we already have full data installed? if so, STOP !process.config.variables.icu_small
  • What ICU version do we have? 67, 68, 69, 70, … process.versions.icu.split('.')[0] or process.config.variables.icu_ver_major
  • What endianness? l or b? (e for EBCDIC would be supported, but doesn't occur in practice) process.config.variables.icu_endianness

Putting it all together, then it attempts to run npm install icu4c-data@67l

We don't know a priori what ICU version will be used. When node was installed, the user can do configure --with-icu-source=something and it could be any version.

If the problem being solved here is "where do we source the assets from" then we do lose the baked in idempotency we get from the npm registry and also the checksum validation that an npm or yarn install would give. We also, as @arcanis pointed out, lose some availability from behind certain firewalls. Checksums are already baked into the npm registry:

$ npm view icu4c-data@67l dist
{
  integrity: 'sha512-OIRiop+k1IVf4TBLEOj910duoO9NKwtJLwp++qWT6KT5gRziHNt+5gwhcGuTqRy++RTK2gLoAIbk8KYCNxW++g==',
  shasum: '4c5264a9ec6e2d126b8d47b6faa618025ccbaaaf',
  tarball: 'https://registry.npmjs.org/icu4c-data/-/icu4c-data-0.67.2.tgz',

and I would recommend using those since that's already what will happen when you use the old npm method. This way there is a "guarantee" that there is no difference between the results based on if you use npm install or manually download. You'll source it from the same place.

There's no way to know what version is being used as mentioned. As I mentioned above, the files at https://github.com/unicode-org/icu/releases/tag/release-69-1 are GPG signed, so we could have some way to validate those.

@srl295
Copy link
Member Author

srl295 commented Sep 29, 2021

cool… unicode-org/icu@4a8b160 just barely made it into icu4c 70 rc … this means that workaround won't be needed

@srl295 srl295 force-pushed the pull-l-from-gh branch 2 times, most recently from 0c46860 to d499e98 Compare September 30, 2021 16:59
srl295 and others added 2 commits September 30, 2021 12:02
Fixes: #36

- fetch from ICU’s GitHub release instead of npm (ICU v50+)
- set env FULL_ICU_PREFER_NPM=1 to prefer npm instead
- add .eslint (as well as standard)
- for ICU 67 and following, fetch from icu4c-___-data-_.zip
- otherwise fetch from icu4c-src.zip for little endian
- otherwise, use npm as before
- work around ICU issue with data file name
  https://unicode-org.atlassian.net/browse/ICU-21764
- document the FULL_ICU_PREFER_NPM
- test with FULL_ICU_PREFER_NPM=1
This avoid permission issues for users that cannot create files in "/",  like "/tmpXXXX". It will generate "/tmp/XXX" instead.

Ref: #62 d696940
Co-authored-By: Rachid Tarsimi <[email protected]>
@srl295
Copy link
Member Author

srl295 commented Sep 30, 2021

@wraithgar @MylesBorins think this is ready to merge… perhaps open new issues to track further work?

@srl295
Copy link
Member Author

srl295 commented Sep 30, 2021

@srl295
Copy link
Member Author

srl295 commented Oct 1, 2021

Hi all,

I've done an early publish to npm so folks can test this out:

npm i [email protected] ( https://www.npmjs.com/package/full-icu/v/1.4.0-0 )

Copy link
Member

@nschonni nschonni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this is minor, so feel free to ignore

runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v2 also allows native caching https://github.com/actions/setup-node#caching-packages-dependencies if you set it. caching probably isn't needed right now, since there are no lock files

Suggested change
- uses: actions/setup-node@v1
- uses: actions/setup-node@v2

fail-fast: false
matrix:
container:
- node:12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth expanding out the matrix to cover supported versions or at least a version with NPM 6 vs NPM 7

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nschonni makese sense (although probably good for a separate PR), do you have a recommendation?

@@ -0,0 +1,14 @@
language: node_js
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably drop this, since you've got this running on Actions now

const myFetch = require('./myFetch')
const yauzl = require('yauzl')

// var isglobal = process.env.npm_config_global === 'true';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented out code here, and below

"description": "install 'full-icu' data for your current node",
"scripts": {
"lint": "standard",
"postinstall": "node postinstall.js"
"lint": "standard && eslint *.js test/*.js",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you can skip the call to standard since you're using the ESlint plug-in now

unzipOne.js Show resolved Hide resolved
@@ -0,0 +1,69 @@
// Copyright (C) 2015-2016 IBM Corporation and Others. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is right, or just got copied across from some of the other scripts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's copied from install-spawn

@srl295 srl295 mentioned this pull request Oct 1, 2021
@srl295 srl295 requested a review from nschonni October 1, 2021 19:38
Copy link
Member Author

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nschonni cleaned up dead code/comments, would like to defer test improvements to #69

@@ -0,0 +1,69 @@
// Copyright (C) 2015-2016 IBM Corporation and Others. All Rights Reserved.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's copied from install-spawn

@srl295 srl295 merged commit de367a8 into master Oct 4, 2021
@srl295 srl295 deleted the pull-l-from-gh branch October 4, 2021 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment