Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIN M1] WIN (ZIP/MSI) Build/Assemble Process #2306

Closed
1 task
peterzhuamazon opened this issue Jul 7, 2022 · 78 comments
Closed
1 task

[WIN M1] WIN (ZIP/MSI) Build/Assemble Process #2306

peterzhuamazon opened this issue Jul 7, 2022 · 78 comments
Assignees

Comments

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jul 7, 2022

Tasks(ZIP) Tasks(MSI) Estimate Status(ZIP) Status(MSI) Notes
Re-use the existing build process to generate the OpenSearch/Dashboards min + all of the plugins artifacts for WIN package to use Same 0 Completed Completed
The artifacts should be built with LINUX Windows platform specified, as we will cross-compile WIN binary on LINUX then package with WINDOWS JDK. Same 1 Completed Completed This is still in debate as we can techinically build WIN on Windows machine, but there are a lot of things to setup just so Jenkins can run Python on Windows Agent. We are able to run shell scripts natively on Windows agent
We already have "zip" supported for "—distribution“ parameter, but needs to check whether it is already combined with --platform windows. We do not have "exe" support for "—distribution“ yet. However, this is different from "RPM" as we do not need min artifact to be a exe. The min artifact can be zip and the final product in assemble can be exe. 2 Completed Completed As for "exe" we need to discuss whether a standard exe is enough, or do we want to invest into Windows official installer "msi".
We already have "--distribution" param available in assemble workflow, just need to verify existing functions of "ZIP". We already have "--distribution" param available in assemble workflow, but no support for "EXE" redirection. Need to add a child class supporting the new distribution. 2 Completed Completed
The generation code should pull the artifacts from the build workflow to a temporary location Same 1 Completed Completed
The code will compile the components and also call existing install function to install plugins on min artifacts Same 1 Completed Completed ETA: 2022/09/16
After installation, the code will execute a tool or utility to wrap all the content into corresponding distribution format Same 1 Completed Completed 20220819 Note: Plugin compilation currently have some issues with the build scripts, the compilation itself seems ok at least on things like common-utils.

ETA: 2022/09/16
The code will move the final distribution artifact from the temp location to dist folder Same 1 Completed Completed ETA: 2022/09/07

Note: MSI section in this milestone is obsolete as MSI is just a wrapper of the content in ZIP. So as long as ZIP is completed here MSI is considered complete as well.

  • do not remove
  • PRs:

20220715:

20220721:

20220722:

20220819:

20220824:

20220902:

20220907:

20220914:

20220915:

20220916:

20220927:

20220928:

20221004:

20221006:

20221007:

20221010:

20221011:

20221012:

20221013:

20221018:

20221019:

20221024:

20221025:

20221027:

20221028:

20221101:

20221103:

20221104:

20221107:

20221108:

20221114:

20221116:

@dblock
Copy link
Member

dblock commented Jul 8, 2022

I think building an MSI is a totally separate ask and a whole new project (and repo).

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented Jul 11, 2022

Since Windows cannot run our python code due to psutil = "~=5.8" does not support Cygwin/MSYS, but support Windows directly running in Shell it seems.
We are going to build on LINUX and assemble on Windows.
Since it is JAVA code and NodeJS anyway.

@peterzhuamazon
Copy link
Member Author

OK I think I find this one part:
./src/system/process.py: parent = psutil.Process(self.process.pid)

@dblock
Copy link
Member

dblock commented Jul 11, 2022

Since Windows cannot run our python code due to psutil = "~=5.8" does not support Cygwin/MSYS, but support Windows directly running in Shell it seems. We are going to build on LINUX and assemble on Windows. Since it is JAVA code and NodeJS anyway.

I think this is a mistake, and we're giving up prematurely. We definitely have native bits, even though for a first version may not try to build them (k-nn, etc.). A bigger problem is that we need tests to run on the target platform, or we can't be assured the code works.

What is the issue with psutil? Link? Make a repro/failing test/ and let's fix it?

@peterzhuamazon
Copy link
Member Author

Since Windows cannot run our python code due to psutil = "~=5.8" does not support Cygwin/MSYS, but support Windows directly running in Shell it seems. We are going to build on LINUX and assemble on Windows. Since it is JAVA code and NodeJS anyway.

I think this is a mistake, and we're giving up prematurely. We definitely have native bits, even though for a first version may not try to build them (k-nn, etc.). A bigger problem is that we need tests to run on the target platform, or we can't be assured the code works.

What is the issue with psutil? Link? Make a repro/failing test/ and let's fix it?

I am in the process of adding Windows runner to Jenkins.
I believe psutil actually support Windows native but not cygwin.
If I am able to bring up the runner this would not be an issue anymore I think.

@peterzhuamazon
Copy link
Member Author

Connecting to (<>) with WinRM as Administrator

Waiting for WinRM to come up. Sleeping 10s.

Waiting for WinRM to come up. Sleeping 10s.

Waiting for WinRM to come up. Sleeping 10s.

Waiting for WinRM to come up. Sleeping 10s.

Waiting for WinRM to come up. Sleeping 10s.

WinRM service responded. Waiting for WinRM service to stabilize on EC2 (Amazon_ec2_cloud) - Jenkins-Agent-Windows2016-X64-M52xlarge-Single-Host (i-<>)
WinRM should now be ok on EC2 (Amazon_ec2_cloud) - Jenkins-Agent-Windows2016-X64-M52xlarge-Single-Host (i-<>)
Connected with WinRM.
Creating tmp directory if it does not exist

remoting.jar sent remotely. Bootstrapping it
Launching via WinRM:java  -jar <> -workDir <>

<===[JENKINS REMOTING CAPACITY]===>Remoting version: 4.7
This is a Windows agent

image

@dblock
Copy link
Member

dblock commented Jul 14, 2022

Nice @peterzhuamazon! Try writing up a 3.0/2.x build for Windows?

@peterzhuamazon
Copy link
Member Author

Nice @peterzhuamazon! Try writing up a 3.0/2.x build for Windows?

Working on it now and still trying to get bash natively running in powershell, which should happen in newer versions.

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented Jul 15, 2022

New error detected on building opensearch on Windows host:

Build Logs

C:\Users\Administrator\opensearch-build>bash ./build.sh manifests/2.1.0/opensearch-2.1.0.yml --distribution zip
Installing dependencies in . ...
Creating a virtualenv for this project...
Pipfile: C:\Users\Administrator\opensearch-build\Pipfile
Using C:/Users/Administrator/scoop/apps/python37/3.7.9/python.exe (3.7.9) to create virtualenv...
[ ===] Creating virtual environment...created virtual environment CPython3.7.9.final.0-64 in 4832ms
  creator CPython3Windows(dest=C:\Users\Administrator\.virtualenvs\opensearch-build-ppYgApDN, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=C:\Users\Administrator\AppData\Local\pypa\virtualenv)
    added seed packages: pip==22.1.2, setuptools==62.6.0, wheel==0.37.1
  activators BashActivator,BatchActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator

Successfully created virtual environment!
Virtualenv location: C:\Users\Administrator\.virtualenvs\opensearch-build-ppYgApDN
Installing dependencies from Pipfile.lock (d422cd)...
  ================================ 55/55 - 00:00:40
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
Running ./src/run_build.py manifests/2.1.0/opensearch-2.1.0.yml --distribution zip ...
2022-07-15 02:39:58 INFO     Building in C:\Users\ADMINI~1\AppData\Local\Temp\2\tmpmtqisg7j
2022-07-15 02:39:58 INFO     Building OpenSearch (x64) into C:\Users\Administrator\opensearch-build\zip\builds\opensearch
2022-07-15 02:39:58 INFO     Building OpenSearch
2022-07-15 02:39:58 INFO     Executing "git init" in C:\Users\ADMINI~1\AppData\Local\Temp\2\tmpmtqisg7j\OpenSearch
2022-07-15 02:39:58 INFO     Executing "git remote add origin https://github.com/opensearch-project/OpenSearch.git" in C:\Users\ADMINI~1\AppData\Local\Temp\2\tmpmtqisg7j\OpenSearch
2022-07-15 02:39:59 INFO     Executing "git fetch --depth 1 origin tags/2.1.0" in C:\Users\ADMINI~1\AppData\Local\Temp\2\tmpmtqisg7j\OpenSearch
2022-07-15 02:40:05 INFO     Executing "git checkout FETCH_HEAD" in C:\Users\ADMINI~1\AppData\Local\Temp\2\tmpmtqisg7j\OpenSearch
2022-07-15 02:40:11 INFO     Executing "git rev-parse HEAD" in C:\Users\ADMINI~1\AppData\Local\Temp\2\tmpmtqisg7j\OpenSearch
2022-07-15 02:40:11 INFO     Checked out https://github.com/opensearch-project/OpenSearch.git@tags/2.1.0 into C:\Users\ADMINI~1\AppData\Local\Temp\2\tmpmtqisg7j\OpenSearch at 388c80ad94529b1d9aad0a735c4740dce2932a32
2022-07-15 02:40:11 INFO     Executing "bash C:\Users\Administrator\opensearch-build\scripts\components\OpenSearch\build.sh -v 2.1.0 -p windows -a x64 -d zip -s false -o builds" in C:\Users\ADMINI~1\AppData\Local\Temp\2\tmpmtqisg7j\OpenSearch
+ getopts :h:v:q:s:o:p:a:d: arg
+ case $arg in
+ VERSION=2.1.0
+ getopts :h:v:q:s:o:p:a:d: arg
+ case $arg in
+ PLATFORM=windows
+ getopts :h:v:q:s:o:p:a:d: arg
+ case $arg in
+ ARCHITECTURE=x64
+ getopts :h:v:q:s:o:p:a:d: arg
+ case $arg in
+ DISTRIBUTION=zip
+ getopts :h:v:q:s:o:p:a:d: arg
+ case $arg in
+ SNAPSHOT=false
+ getopts :h:v:q:s:o:p:a:d: arg
+ case $arg in
+ OUTPUT=builds
+ getopts :h:v:q:s:o:p:a:d: arg
+ '[' -z 2.1.0 ']'
+ '[' -z builds ']'
+ mkdir -p builds/maven/org/opensearch
+ ./gradlew publishToMavenLocal -Dbuild.snapshot=false -Dbuild.version_qualifier=
Downloading https://services.gradle.org/distributions/gradle-7.4.2-all.zip
...............10%...............20%...............30%...............40%...............50%................60%...............70%...............80%...............90%...............100%

Welcome to Gradle 7.4.2!

Here are the highlights of this release:
 - Aggregated test and JaCoCo reports
 - Marking additional test source directories as tests in IntelliJ
 - Support for Adoptium JDKs in Java toolchains

For more details see https://docs.gradle.org/7.4.2/release-notes.html

Starting a Gradle Daemon (subsequent builds will be faster)

FAILURE: Build failed with an exception.

* What went wrong:
Unable to start the daemon process.
This problem might be caused by incorrect configuration of the daemon.
For example, an unrecognized jvm option is used.
Please refer to the User Manual chapter on the daemon at https://docs.gradle.org/7.4.2/userguide/gradle_daemon.html
Process command line: C:\Users\Administrator\scoop\apps\temurin8-jdk\8.0.332-9\bin\java.exe -XX:+HeapDumpOnOutOfMemoryError -Xss2m --add-exports jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED -Xmx3g -Dfile.encoding=windows-1252 -Duser.country=US -Duser.language=en -Duser.variant -cp C:\Users\Administrator\.gradle\wrapper\dists\gradle-7.4.2-all\9uukhhbclvbegdvsww0j0cr3p\gradle-7.4.2\lib\gradle-launcher-7.4.2.jar org.gradle.launcher.daemon.bootstrap.GradleDaemon 7.4.2
Please read the following process output to find out more:
-----------------------
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Unrecognized option: --add-exports


* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.
> Run with --scan to get full insights.

* Get more help at https://help.gradle.org
2022-07-15 02:40:28 ERROR    Error building OpenSearch, retry with: ./build.sh manifests/2.1.0/opensearch-2.1.0.yml --component OpenSearch
Traceback (most recent call last):
  File "./src/run_build.py", line 81, in <module>
    sys.exit(main())
  File "./src/run_build.py", line 68, in main
    builder.build(build_recorder)
  File "C:\Users\Administrator\opensearch-build\src\build_workflow\builder_from_source.py", line 55, in build
    self.git_repo.execute(build_command)
  File "C:\Users\Administrator\opensearch-build\src\git\git_repository.py", line 83, in execute
    subprocess.check_call(command, cwd=cwd, shell=True)
  File "C:\Users\Administrator\scoop\apps\python37\3.7.9\lib\subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'bash C:\Users\Administrator\opensearch-build\scripts\components\OpenSearch\build.sh -v 2.1.0 -p windows -a x64 -d zip -s false -o builds' returned non-zero exit status 1.

C:\Users\Administrator\opensearch-build>

This is caused by the fact that JAVA_HOME is set to jdk8.

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented Jul 15, 2022

Hi @dblock,

Right now I manually swapped the JDK to 17 and able to build "2.1.0 core + common-utils + job-scheduler + ml-commons".

This is running shell directly on a Windows 2019 Server without the WSL setup.

Windows is IOPS hungry thus EBS Optimized is required and c5 instances enable that by default.
Even then the build speed is slower than LINUX host.

There are still some issues with multiple different plugins.
This is just a start and a lot of dependencies needs to be installed on Windows Host.

Thanks.

@peterzhuamazon
Copy link
Member Author

Success with no fixed password logging into Windows Agent through EC2 plugin.

Waiting for password to be available. Sleeping 10s.
Connecting to (<>) with WinRM as <>
Waiting for WinRM to come up. Sleeping 10s.
Connected with WinRM.
Creating tmp directory if it does not exist
remoting.jar sent remotely. Bootstrapping it
Launching via WinRM:java  -jar <> -workDir <>
<===[JENKINS REMOTING CAPACITY]===>Remoting version: 4.7
This is a Windows agent

@peterzhuamazon
Copy link
Member Author

Traceback (most recent call last):
  File "./src/run_build.py", line 81, in <module>
    sys.exit(main())
  File "./src/run_build.py", line 74, in main
    build_recorder.write_manifest()
  File "C:\Users\Administrator\jenkins\workspace\zhujiaxi\bundle-build-zhujiaxi\src\system\temporary_directory.py", line 52, in __exit__
    shutil.rmtree(self.name, ignore_errors=False, onerror=g__handleRemoveReadonly)
  File "C:\Users\Administrator\scoop\apps\python37\3.7.9\lib\shutil.py", line 516, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "C:\Users\Administrator\scoop\apps\python37\3.7.9\lib\shutil.py", line 395, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  File "C:\Users\Administrator\scoop\apps\python37\3.7.9\lib\shutil.py", line 395, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  File "C:\Users\Administrator\scoop\apps\python37\3.7.9\lib\shutil.py", line 395, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  [Previous line repeated 11 more times]
  File "C:\Users\Administrator\scoop\apps\python37\3.7.9\lib\shutil.py", line 400, in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
  File "C:\Users\Administrator\scoop\apps\python37\3.7.9\lib\shutil.py", line 398, in _rmtree_unsafe
    os.unlink(fullname)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\tmpp6rmc7d9\\OpenSearch\\benchmarks\\build\\classes\\java\\main\\org\\opensearch\\benchmark\\search\\aggregations\\bucket\\terms\\jmh_generated\\LongKeyedBucketOrdsBenchmark_singleBucketIntoSingleImmutableBimorphicInvocation_jmhTest.class'

This is due to that the Windows API file path size limitation.
Basically the thing where you cannot delete files if the path is too long.
https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry

@peterzhuamazon
Copy link
Member Author

The initial solution is to convert windows path to unicode:

            tmpdirname = self.name

            if current_platform() == 'windows':
                # Convert path to unicode string to tell Windows that path is longer than 260 chars
                # So that shutil.rmtree can remove the Windows path properly
                # https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
                tmpdirname = ur'\\?\ '.strip() + tmpdirname

            logging.info(f"Removing {tmpdirname}")

The better solution is to add this reg change to windows packer.


[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem]
"LongPathsEnabled"=dword:00000001

@peterzhuamazon
Copy link
Member Author

After fixing the long path by enable long path on registry, this shows:

Traceback (most recent call last):
  File "./src/run_build.py", line 81, in <module>
    sys.exit(main())
  File "./src/run_build.py", line 74, in main
    build_recorder.write_manifest()
  File "C:\Users\Administrator\jenkins\workspace\zhujiaxi\bundle-build-zhujiaxi\src\system\temporary_directory.py", line 52, in __exit__
    shutil.rmtree(self.name, ignore_errors=False, onerror=g__handleRemoveReadonly)
  File "C:\Users\Administrator\scoop\apps\python37\3.7.9\lib\shutil.py", line 516, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "C:\Users\Administrator\scoop\apps\python37\3.7.9\lib\shutil.py", line 404, in _rmtree_unsafe
    onerror(os.rmdir, path, sys.exc_info())
  File "C:\Users\Administrator\scoop\apps\python37\3.7.9\lib\shutil.py", line 402, in _rmtree_unsafe
    os.rmdir(path)
OSError: [WinError 145] The directory is not empty: 'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\tmpp5qv1j4z'

@peterzhuamazon
Copy link
Member Author

Another issue is on windows specifically after a few runs git will show malloc error:

ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from https://github.com/peterzhuamazon/opensearch-build
	at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:1003)
	at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1244)
	at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1308)
	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:129)
	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:97)
	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:84)
	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: hudson.plugins.git.GitException: Command "git fetch --tags --force --progress -- https://github.com/peterzhuamazon/opensearch-build +refs/heads/*:refs/remotes/origin/*" returned status code 128:
stdout: 
stderr: error: Out of memory, malloc failed (tried to allocate 1132 bytes)
fatal: packed object 87d61083f41051b91815047abe51e6ff4319189f (stored in .git/objects/pack/pack-c02b411344f3128e4d33d11bce5b431cbbe6cbc8.pack) is corrupt

Only resolve if we set the mem limits early on:


git config --system core.packedGitLimit 128m
git config --system core.packedGitWindowSize 128m


git config --system pack.deltaCacheSize 128m
git config --system pack.packSizeLimit 128m
git config --system pack.windowMemory 128m

@dblock
Copy link
Member

dblock commented Aug 5, 2022

That's some thorough debugging @peterzhuamazon!

@peterzhuamazon
Copy link
Member Author

New issue:

Selected Git installation does not exist. Using Default
The recommended git tool is: NONE
No credentials specified
Cloning the remote Git repository
Cloning repository https://github.com/peterzhuamazon/opensearch-build
 > git init C:\Users\Administrator\jenkins\workspace\zhujiaxi\bundle-build-zhujiaxi # timeout=10
ERROR: Error cloning remote repo 'origin'
hudson.plugins.git.GitException: Could not init C:\Users\Administrator\jenkins\workspace\zhujiaxi\bundle-build-zhujiaxi
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$5.execute(CliGitAPIImpl.java:1042)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$2.execute(CliGitAPIImpl.java:797)
	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:158)
	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:151)
	at hudson.remoting.UserRequest.perform(UserRequest.java:211)
	at hudson.remoting.UserRequest.perform(UserRequest.java:54)
	at hudson.remoting.Request$2.run(Request.java:376)
	at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
	Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to EC2 (Amazon_ec2_cloud) - jenkinsAgentNode-Jenkins-Agent-Windows2019-X64-C54xlarge-Single-Host (i-031d573af80effb62)
		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1784)
		at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
		at hudson.remoting.Channel.call(Channel.java:1000)
		at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:143)
		at sun.reflect.GeneratedMethodAccessor543.invoke(Unknown Source)
		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.lang.reflect.Method.invoke(Method.java:498)
		at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:129)
		at com.sun.proxy.$Proxy85.execute(Unknown Source)
		at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1226)
		at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1308)
		at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:129)
		at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:97)
		at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:84)
		at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
		at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
		... 4 more
Caused by: hudson.plugins.git.GitException: Error performing git command: git init C:\Users\Administrator\jenkins\workspace\zhujiaxi\bundle-build-zhujiaxi
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2679)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2601)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2597)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1968)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$5.execute(CliGitAPIImpl.java:1040)
	... 11 more
Caused by: java.io.IOException: Cannot run program "git" (in directory "C:\Users\Administrator\jenkins\workspace\zhujiaxi\bundle-build-zhujiaxi"): CreateProcess error=2, The system cannot find the file specified
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at hudson.Proc$LocalProc.<init>(Proc.java:254)
	at hudson.Proc$LocalProc.<init>(Proc.java:223)
	at hudson.Launcher$LocalLauncher.launch(Launcher.java:997)
	at hudson.Launcher$ProcStarter.start(Launcher.java:509)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2664)
	... 15 more
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
	at java.lang.ProcessImpl.create(Native Method)
	at java.lang.ProcessImpl.<init>(ProcessImpl.java:453)
	at java.lang.ProcessImpl.start(ProcessImpl.java:139)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	... 20 more

This is due to jenkins trying to run git without invoking windows env vars sometimes.
This seems like an issue with Jenkins Git Plugin https://issues.jenkins.io/browse/JENKINS-68580.
However, in my observation, a newly created windows agent would goes to success, after a lengthy 10min of searching for the 1st run, then succeed for about 10 more runs, and will eventually end up in this error.

In order to make sure it goes to succeed every time I will define the path of git.exe in agent, forcing Jenkins to respect the actual git location during the checkout.

This also sometimes cause broken .git folder as git commands will needs to be run multiple times for a checkout, and sometimes the 1st command succeed and the rest of the commands failed out of nowhere.

Very inconsistent, especially on windows agent.

Solution for now:

nodeProperties:
        - toolLocation:
            locations:
            - home: "C:\\Users\\Administrator\\scoop\\shims\\git.exe"
              key: "hudson.plugins.git.GitTool$DescriptorImpl@Default"

@peterzhuamazon
Copy link
Member Author

Another issue that caused the malloc on git of windows:

Scoop actually install 2 versions of git, one in apps/git/<>/bin/git.exe, then other on shim/git.exe.

The latter one is the one get pick up more often:

\usr\bin\file.exe .\bin\git.exe
.\bin\git.exe: PE32+ executable (console) x86-64, for MS Windows

\usr\bin\file.exe ..\..\..\shims\git.exe
..\..\..\shims\git.exe: PE32 executable (console) Intel 80386, for MS Windows

The shims one is the 32bit version while the bin one is the 64bit version.
It is well known that the 32bit version would cause errors on 64bit system more often than now, especially malloc related.

Will replace the 32bit one with 64bit one, not sure why scoop is doing this behavior tho.

@peterzhuamazon
Copy link
Member Author

This still seems like a limitation not sure why scoop create shims in 32bit only.
It might just be a shortcut per scoop not really the actual executable.
Not an expert here so will still use 32bit with the mem limitation for now.

@peterzhuamazon
Copy link
Member Author

Confirmed now, shims folder entries are just shortcuts to the actual x64 binary, so no issues.

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented Aug 17, 2022

Apprently os.chmod does not change permissions recursively, at least windows.
And windows is using 2 specific stat for the permission, due to permission being assigned differently.

Initial bug:

excvalue [WinError 5] Access is denied: 'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\tmpbsogh9xo\\common-utils\\.git\\objects\\pack\\pack-772f4bbbd4a13c5fd30f2ca85e1ebc7745b17976.idx'
excvalue [WinError 5] Access is denied: 'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\tmpbsogh9xo\\common-utils\\.git\\objects\\pack\\pack-772f4bbbd4a13c5fd30f2ca85e1ebc7745b17976.pack'
excvalue [WinError 145] The directory is not empty: 'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\tmpbsogh9xo'

After applying this block of code recursively, we reduce errors:

def recursive_clean(path):
    for dirpath, dirnames, filenames in os.walk(path):
        os.chmod(dirpath, stat.S_IWRITE)
        for filename in filenames:
            os.chmod(os.path.join(dirpath, filename), stat.S_IWRITE)
excvalue [WinError 145] The directory is not empty: 'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\tmprbf0vtpj'

Upon checking on the server the folder is actually empty already, and a simple remove will just run fine.
Not sure why os.rmdir still complaining about it.
It is possible that some process still hanging the folder.

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented Aug 18, 2022

Surprising enough the solutions here only runs directly on windows terminal/powershell but not on Jenkins, despite Jenkins is using those two with nohup.

After directly debugging in windows terminal, the issue finally shows and it is not related to write but read access.
Windows apparently change the user of the folder from Administrator to something else, not able to figure out as the acl is blocking me from checking the user.

The solution is straight forward, adding both stat.S_IWRITE and stat.S_IREAD would solve the issue with os.chmod. Apparently is my misunderstanding of shutil.rmtree, this call would actually apply onerror function on every specific instance, instead of the whole folder in one take. Therefore, running os.chmod once is fine and efficient to switch permissions on the specific file/dir instance.

os.chmod(path, stat.S_IRWXU | stat.S_IRWXG | stat.S_IRWXO | stat.S_IWRITE | stat.S_IREAD)  # 0777 nix* / rw windows

@peterzhuamazon
Copy link
Member Author

After many tweaks KNN finally able to build on Windows.
Now we need to disable the integTest during build on Jenkins so that they can run the actual test during integ-test workflow.

@peterzhuamazon
Copy link
Member Author

The reportsDashboards issue show up as it is not able to be built on Windows after their PR:
opensearch-project/reporting#191 (comment)

@peterzhuamazon
Copy link
Member Author

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented Oct 25, 2022

Seems like this PR caused all the issues related to windows running opensearch.
Since Jenkins currently only takes Administrator as user and most of people develop on EC2 is having administrator user by default.
This could be a potential issue tho:

@peterzhuamazon
Copy link
Member Author

@peterzhuamazon
Copy link
Member Author

KNN build success on 2.4.0 need to add zip in windows agent.

@peterzhuamazon
Copy link
Member Author

New change to support KNN rezip on Windows:

@peterzhuamazon
Copy link
Member Author

New PR enable knn in 2.4.0 and OSD changes:

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented Nov 15, 2022

We have completed this milestone now.

@peterzhuamazon
Copy link
Member Author

We will discuss the msi creation on the other issues later.
We will only focus on the zip this time.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants