Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The latest "Cuda 8.0"-plugin not working with "Dero HE (astrobwt/v2) "-algo. #161

Open
avselang opened this issue Apr 14, 2022 · 15 comments

Comments

@avselang
Copy link

The latest "Cuda 8.0"-plugin not working with "Dero HE (astrobwt/v2) "-algo.
nv disabled (no suitable configuration found)
Tested with xmrig-6.17.0-msvc-win64
my GPU is NVIDIA GeForce 710M with OpenCL 1.1 and CUDA 2.1 .

@Spudz76
Copy link
Contributor

Spudz76 commented Apr 15, 2022

That should be a Kepler capability 3.5 (CUDA_ARCH=35) not Fermi capability 2.1

You can use newer CUDA because 8.0 might not work anyway (unproven, but several other newer algos definitely don't work due to the CUDA code was written without backward compatibility).

;  Valid CUDA Toolkit Map:
;   8.x for Fermi/Kepler       /Maxwell/Pascal,
;   9.0 for       Kepler       /Maxwell/Pascal/Volta(70),
;   9.1 for       Kepler       /Maxwell/Pascal/Volta(72),
;  10.x for       Kepler       /Maxwell/Pascal/Volta    /Turing,
;  11.x for       Kepler(35/37)/Maxwell/Pascal/Volta    /Turing/Ampere(80)
;  11.1 for       Kepler(35/37)/Maxwell/Pascal/Volta    /Turing/Ampere(86)
;  11.4 for       Kepler(35/37)/Maxwell/Pascal/Volta    /Turing/Ampere(87)

newer than 11.4 are the same as far as support so newest (11.6) should also work if you are building your own (releases only build up to 11.4 but that will work okay)

But you should match whatever the driver you have offers, use nvidia-smi command to show what's bundled (upper right corner) in case it is not recent (sometimes laptop drivers are weird or require older manufacturer-modified driver they never update).

@avselang
Copy link
Author

I update the laptop driver to Version: 391.35(This is the latest driver support found on the official website),and use cuda9_1. An error occurred that thread #0 failed with error <AstroBWT_Dero_HE::hash>:112 "no kernel image is available for execution on the device".

@Spudz76
Copy link
Contributor

Spudz76 commented Apr 16, 2022

Prebuild may be filtering 35 out as it is fairly uncommon, to save build time and size.

Yep, verified the default build for 9.1 has CUDA_ARCH=30;50;60;70 and unlike other GPUs the base family 30 will not run on a 35 or 37 (Kepler 2.0), so you will have to build your own with -DCUDA_ARCH=35.

@avselang
Copy link
Author

How is this done? Do you have a detailed tutorial?

@Spudz76
Copy link
Contributor

Spudz76 commented Apr 16, 2022

I built it for you, grab special release from here

@avselang
Copy link
Author

Thank you very much, but the error still appeared:“thread #0 failed with error <AstroBWT_Dero_HE::hash>:112 "no kernel image is available for execution on the device"”.I have noticed some graphics card parameters and hope to provide you with reference:
CUDA 9.1/9.1/6.17.0

  • NVML 8.376.54/391.35 press e for health report
  • CUDA GPU #0 01:00.0 GeForce 710M 1550/900 MHz smx:2 arch:21 mem:1679/2048 MB
    [2022-04-17 07:56:01.613] nvidia use profile astrobwt/v2 (1 thread) scratchpad 128 KB
    | # | GPU | BUS ID | INTENSITY | THREADS | BLOCKS | BF | BS | MEMORY | NAME
    | 0 | 0 | 01:00.0 | 512 | 32 | 16 | 6 | 25 | 64 | GeForce 710M

Thank you again for your help

@Spudz76
Copy link
Contributor

Spudz76 commented Apr 17, 2022

That is very strange everything online says 710M is Kepler2.0. Clearly a Fermi (arch:21).

Perhaps online info is confused because there are GeForce 710M versus Geforce GT 710M and they think they are all the same but only the GT is the Kepler.

So when you run with the 8_0 plugin it never shows that summary? The mainstream release should have 20 in it which works on 21's.

@avselang
Copy link
Author

when i run with the 8_0 plugin it shows that:
nvidia use profile astrobwt/v2 (1 thread) scratchpad 128 KB
| # | GPU | BUS ID | INTENSITY | THREADS | BLOCKS | BF | BS | MEMORY | NAME
| 0 | 0 | 01:00.0 | 512 | 32 | 16 | 6 | 25 | 64 | GeForce 710M
[2022-04-17 19:06:18.324] nvidia thread #0 failed with error Unsupported algorithm
[2022-04-17 19:06:18.375] nvidia thread #0 self-test failed
[2022-04-17 19:06:18.376] nvidia disabled (failed to start threads)

@Spudz76
Copy link
Contributor

Spudz76 commented Apr 17, 2022

So that's with 'xmrig-cuda-6.17.0-cuda8_0-win64.zip'?

@avselang
Copy link
Author

yes

@Spudz76
Copy link
Contributor

Spudz76 commented Apr 17, 2022

Okay I will see why the algorithm is not apparently being built and/or test it out on some of my Fermi (on Linux, though).

@Spudz76
Copy link
Contributor

Spudz76 commented Apr 17, 2022

Okay, same as previous AstroBWT (non-v2) algorithm, it uses the __shfl() call which is only supported on capability 3.0 or higher.

Investigating some sort of workaround, maybe it can work with a polyfill.

@avselang
Copy link
Author

if i change the algo to cn-heavy/xhv,it may be all right
use profile cn-heavy (1 thread) scratchpad 4096 KB
| # | GPU | BUS ID | INTENSITY | THREADS | BLOCKS | BF | BS | MEMORY | NAME
| 0 | 0 | 01:00.0 | 160 | 40 | 4 | 8 | 25 | 640 | GeForce 710M
[2022-04-18 13:13:03.432] nvidia READY threads 1/1 (322 ms)
[2022-04-18 13:13:07.650] nvidia accepted (1/0) diff 1000 (109 ms)
if i run with with the 9_1 plugin it shows that:
use profile cn-heavy (1 thread) scratchpad 4096 KB
| # | GPU | BUS ID | INTENSITY | THREADS | BLOCKS | BF | BS | MEMORY | NAME
| 0 | 0 | 01:00.0 | 160 | 40 | 4 | 8 | 25 | 640 | GeForce 710M
[2022-04-18 13:14:18.846] nvidia READY threads 1/1 (302 ms)
[2022-04-18 13:14:18.848] nvidia thread #0 failed with error <cryptonight_extra_cpu_prepare>:407 "no kernel image is available for execution on the device"
The same is true when using xmrig-cuda-v6.17.0-arch35-cuda9_1-win64
hahaha,Let's all find out.

@Spudz76
Copy link
Contributor

Spudz76 commented Apr 18, 2022

Yes the 9_1 does not have anything but the erroneously assumed arch 35 so it will never work, now that we've confirmed it's the Fermi arch 21 type of 710M (and not the Kepler2 arch 35 "GT 710M"). You can just toss that one and not test it further.

Algorithms other than those which use the __shfl call should work fine with the 8_0 mainstream plugin, which are everything except RandomX, AstroBWT, and KawPow. However with 2GB VRAM there isn't room for RandomX or KawPow anyway so the only one that could work if patched is AstroBWT. I'm still investigating that.

@avselang
Copy link
Author

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants