Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Cannot get available memory information for CUDA gpu device 0. #621

Closed
adaasch opened this issue Apr 2, 2019 · 17 comments
Closed

Bug: Cannot get available memory information for CUDA gpu device 0. #621

adaasch opened this issue Apr 2, 2019 · 17 comments

Comments

@adaasch
Copy link

adaasch commented Apr 2, 2019

Problem

On executing DepthMap step I get:

[1/5](4/86) DepthMap
 - commandLine: aliceVision_depthMapEstimation  --input "/raid/masif/photogrammetrie/MeshroomCache/StructureFromMotion/6cdc52f1978a87ceb39f3652706e8593706edbbd/sfm.abc" --imagesFolder "/raid/masif/photogrammetrie/MeshroomCache/PrepareDenseScene/90c04a04f8201923eae09602f44b36f88db1cf35" --downscale 2 --minViewAngle 2.0 --maxViewAngle 70.0 --sgmMaxTCams 10 --sgmWSH 4 --sgmGammaC 5.5 --sgmGammaP 8.0 --refineMaxTCams 6 --refineNSamplesHalf 150 --refineNDepthsToRefine 31 --refineNiters 100 --refineWSH 3 --refineSigma 15 --refineGammaC 15.5 --refineGammaP 8.0 --refineUseTcOrRcPixSize False --exportIntermediateResults False --nbGPUs 0 --verboseLevel info --output "/raid/masif/photogrammetrie/MeshroomCache/DepthMap/2d5a12f60cf50311c3008439bb4ba845fd3e7add" --rangeStart 9 --rangeSize 3
 - logFile: /raid/masif/photogrammetrie/MeshroomCache/DepthMap/2d5a12f60cf50311c3008439bb4ba845fd3e7add/3.log
 - elapsed time: 0:00:01.119207
ERROR:root:Error on node computation: Error on node "DepthMap_1(3)":
Log:
Program called with the following parameters:
 * downscale = 2
 * exportIntermediateResults = 0
 * imagesFolder = "/raid/masif/photogrammetrie/MeshroomCache/PrepareDenseScene/90c04a04f8201923eae09602f44b36f88db1cf35"
 * input = "/raid/masif/photogrammetrie/MeshroomCache/StructureFromMotion/6cdc52f1978a87ceb39f3652706e8593706edbbd/sfm.abc"
 * maxViewAngle = 70
 * minViewAngle = 2
 * nbGPUs = 0
 * output = "/raid/masif/photogrammetrie/MeshroomCache/DepthMap/2d5a12f60cf50311c3008439bb4ba845fd3e7add"
 * rangeSize = 3
 * rangeStart = 9
 * refineGammaC = 15.5
 * refineGammaP = 8
 * refineMaxTCams = 6
 * refineNDepthsToRefine = 31
 * refineNSamplesHalf = 150
 * refineNiters = 100
 * refineSigma = 15
 * refineUseTcOrRcPixSize = 0
 * refineWSH = 3
 * sgmGammaC = 5.5
 * sgmGammaP = 8
 * sgmMaxTCams = 10
 * sgmWSH = 4
 * verboseLevel = "info"

[08:26:12.260276][warning] Cannot get available memory information for CUDA gpu device 0.
[08:26:11.824449][warning] CUDA-Enabled GPU.
Device information:
        - id:                      0
        - name:                    Tesla V100-SXM2-16GB
        - compute capability:      7.0
        - total device memory:     16130 MB
        - device memory available: 0 MB
        - per-block shared memory: 49152
        - warp size:               32
        - max threads per block:   1024
        - max threads per SM(X):   2048
        - max block sizes:         {1024,1024,64}
        - max grid sizes:          {2147483647,65535,65535}
        - max 2D array texture:    {131072,65536}
        - max 3D array texture:    {16384,16384,16384}
        - max 2D linear texture:   {131072,65000,2097120}
        - max 2D layered texture:  {32768,32768,2048}
        - number of SM(x)s:        80
        - registers per SM(x):     65536
        - registers per block:     65536
        - concurrent kernels:      yes
        - mapping host memory:     yes
        - unified addressing:      yes
        - texture alignment:       512 byte
        - pitch alignment:         32 byte

[08:26:12.261645][info] Supported CUDA-Enabled GPU detected.
[08:26:12.344528][info] Found 2 image dimension(s):
[08:26:12.344548][info]         - [3670x5496]
[08:26:12.344557][info]         - [5496x3670]
[08:26:12.633642][info] Overall maximum dimension: [5496x5496]
[08:26:12.633730][info] Create depth maps.


CUDAError: unknown error
  file:       /home/ros/meshroom/AliceVision/src/aliceVision/depthMap/cuda/planeSweeping/plane_sweeping_cuda.cu
  function:   ps_listCUDADevices
  line:       205

[08:26:12.635015][info] # GPU devices: 1, # CPU threads: 80
[08:26:12.635038][info] Plane sweeping parameters:
        - scale: 2
        - step: 5
[08:26:12.635099][info] PlaneSweepingCuda:
        - _nImgsInGPUAtTime: 2
        - scales: 2
        - subPixel: Yes
        - varianceWSH: 4
terminate called after throwing an instance of 'std::runtime_error'
  what():  Device alloc 2D array failed: unknown error
number of CUDA devices: 1
   0: Tesla V100-SXM2-16GB
CUDA device no 0 for 0
Device 0 memory - used: 0.000000, free: 0.000000, total: 0.000000
Aborted (core dumped)

WARNING: downgrade status on node "DepthMap_1(4)" from Status.SUBMITTED to Status.NONE
WARNING: downgrade status on node "DepthMap_1(5)" from Status.SUBMITTED to Status.NONE

[snip]

WARNING: downgrade status on node "DepthMapFilter_1(25)" from Status.SUBMITTED to Status.NONE
WARNING: downgrade status on node "Meshing_1" from Status.SUBMITTED to Status.NONE
WARNING: downgrade status on node "MeshFiltering_1" from Status.SUBMITTED to Status.NONE
WARNING: downgrade status on node "Texturing_1" from Status.SUBMITTED to Status.NONE
Traceback (most recent call last):
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/cx_Freeze/initscripts/__startup__.py", line 14, in run
  File "/opt/Meshroom/setupInitScriptUnix.py", line 39, in run
  File "bin/meshroom_compute", line 64, in <module>
  File "/opt/Meshroom/meshroom/core/graph.py", line 1131, in executeGraph
  File "/opt/Meshroom/meshroom/core/node.py", line 271, in process
  File "/opt/Meshroom/meshroom/core/desc.py", line 452, in processChunk
RuntimeError: Error on node "DepthMap_1(3)":
Log:
Program called with the following parameters:
 * downscale = 2
 * exportIntermediateResults = 0
 * imagesFolder = "/raid/masif/photogrammetrie/MeshroomCache/PrepareDenseScene/90c04a04f8201923eae09602f44b36f88db1cf35"
 * input = "/raid/masif/photogrammetrie/MeshroomCache/StructureFromMotion/6cdc52f1978a87ceb39f3652706e8593706edbbd/sfm.abc"
 * maxViewAngle = 70
 * minViewAngle = 2
 * nbGPUs = 0
 * output = "/raid/masif/photogrammetrie/MeshroomCache/DepthMap/2d5a12f60cf50311c3008439bb4ba845fd3e7add"
 * rangeSize = 3
 * rangeStart = 9
 * refineGammaC = 15.5
 * refineGammaP = 8
 * refineMaxTCams = 6
 * refineNDepthsToRefine = 31
 * refineNSamplesHalf = 150
 * refineNiters = 100
 * refineSigma = 15
 * refineUseTcOrRcPixSize = 0
 * refineWSH = 3
 * sgmGammaC = 5.5
 * sgmGammaP = 8
 * sgmMaxTCams = 10
 * sgmWSH = 4
 * verboseLevel = "info"

[08:26:12.260276][warning] Cannot get available memory information for CUDA gpu device 0.
[08:26:11.824449][warning] CUDA-Enabled GPU.
Device information:
        - id:                      0
        - name:                    Tesla V100-SXM2-16GB
        - compute capability:      7.0
        - total device memory:     16130 MB
        - device memory available: 0 MB
        - per-block shared memory: 49152
        - warp size:               32
        - max threads per block:   1024
        - max threads per SM(X):   2048
        - max block sizes:         {1024,1024,64}
        - max grid sizes:          {2147483647,65535,65535}
        - max 2D array texture:    {131072,65536}
        - max 3D array texture:    {16384,16384,16384}
        - max 2D linear texture:   {131072,65000,2097120}
        - max 2D layered texture:  {32768,32768,2048}
        - number of SM(x)s:        80
        - registers per SM(x):     65536
        - registers per block:     65536
        - concurrent kernels:      yes
        - mapping host memory:     yes
        - unified addressing:      yes
        - texture alignment:       512 byte
        - pitch alignment:         32 byte

[08:26:12.261645][info] Supported CUDA-Enabled GPU detected.
[08:26:12.344528][info] Found 2 image dimension(s):
[08:26:12.344548][info]         - [3670x5496]
[08:26:12.344557][info]         - [5496x3670]
[08:26:12.633642][info] Overall maximum dimension: [5496x5496]
[08:26:12.633730][info] Create depth maps.


CUDAError: unknown error
  file:       /home/ros/meshroom/AliceVision/src/aliceVision/depthMap/cuda/planeSweeping/plane_sweeping_cuda.cu
  function:   ps_listCUDADevices
  line:       205

[08:26:12.635015][info] # GPU devices: 1, # CPU threads: 80
[08:26:12.635038][info] Plane sweeping parameters:
        - scale: 2
        - step: 5
[08:26:12.635099][info] PlaneSweepingCuda:
        - _nImgsInGPUAtTime: 2
        - scales: 2
        - subPixel: Yes
        - varianceWSH: 4
terminate called after throwing an instance of 'std::runtime_error'
  what():  Device alloc 2D array failed: unknown error
number of CUDA devices: 1
   0: Tesla V100-SXM2-16GB
CUDA device no 0 for 0
Device 0 memory - used: 0.000000, free: 0.000000, total: 0.000000
Aborted (core dumped)

This doesn't happen with the release Meshroom-2019.1.0.
But the release doesn't work either because of meshroom issue #409

Steps to Reproduce

  1. Compile from source with Cuda TK 7 or 9
  2. Run pipeline

Versions

  • AliceVision branch/version: git develop
  • OS: Ubuntu 16.04.4 LTS / DGX-1
  • C++ compiler: GCC
@adaasch adaasch changed the title Bug Bug: Cannot get available memory information for CUDA gpu device 0. Apr 2, 2019
@natowi
Copy link
Member

natowi commented Apr 3, 2019

@adaasch I think you meant to reference this meshroom issue MR#409

@fabiencastan
Copy link
Member

It's not the same issue. Here the problem is a failure of the call to cudaMemGetInfo. See Cannot get available memory information in the log.

The corresponding code is here:

if(cudaMemGetInfo(&avail, &total) != cudaSuccess)

But I have no idea why cudaMemGetInfo fails.

@natowi natowi added the CUDA label Jun 30, 2019
@skinkie
Copy link

skinkie commented Aug 2, 2019

I can reproduce it in DepthMap; In the feature extraction step the GPU is disabled because the device has CUDA 2.1, where 3.0 is required. My AliceVision is compiled with CUDA 9.1.

Program called with the following parameters:
 * downscale = 2
 * exportIntermediateResults = 0
 * imagesFolder = "/home/mio/Meshroom/MeshroomCache/PrepareDenseScene/80d2fd52143eb44bcc9f853253d6039e385aba6c"
 * input = "/home/mio/Meshroom/MeshroomCache/StructureFromMotion/7e37934063bf907691aa9e37da6fc3004cb09eee/sfm.abc"
 * maxViewAngle = 70
 * minViewAngle = 2
 * nbGPUs = 0
 * output = "/home/mio/Meshroom/MeshroomCache/DepthMap/eed412414cf0a5b8d0e35274e1d035e26b8340eb"
 * rangeSize = 3
 * rangeStart = 0
 * refineGammaC = 15.5
 * refineGammaP = 8
 * refineMaxTCams = 6
 * refineNDepthsToRefine = 31
 * refineNSamplesHalf = 150
 * refineNiters = 100
 * refineSigma = 15
 * refineUseTcOrRcPixSize = 0
 * refineWSH = 3
 * sgmGammaC = 5.5
 * sgmGammaP = 8
 * sgmMaxTCams = 10
 * sgmWSH = 4
 * verboseLevel = "info"

[12:23:18.803335][warning] Cannot get available memory information for CUDA gpu device 0.
[12:23:18.519278][warning] CUDA-Enabled GPU.
Device information:
	- id:                      0
	- name:                    GeForce GT 540M
	- compute capability:      2.1
	- total device memory:     964 MB 
	- device memory available: 0 MB 
	- per-block shared memory: 49152
	- warp size:               32
	- max threads per block:   1024
	- max threads per SM(X):   1536
	- max block sizes:         {1024,1024,64}
	- max grid sizes:          {65535,65535,65535}
	- max 2D array texture:    {65536,65535}
	- max 3D array texture:    {2048,2048,2048}
	- max 2D linear texture:   {65000,65000,1048544}
	- max 2D layered texture:  {16384,16384,2048}
	- number of SM(x)s:        2
	- registers per SM(x):     32768
	- registers per block:     32768
	- concurrent kernels:      yes
	- mapping host memory:     yes
	- unified addressing:      yes
	- texture alignment:       512 byte
	- pitch alignment:         32 byte

[12:23:18.804564][info] Supported CUDA-Enabled GPU detected.
[12:23:18.977578][info] Found 1 image dimension(s): 
[12:23:18.977714][info] 	- [4032x3024]
[12:23:19.223508][info] Overall maximum dimension: [4032x3024]
[12:23:19.223810][info] Create depth maps.


CUDAError: device kernel image is invalid
  file:       /home/mio/Sources/AliceVision/src/aliceVision/depthMap/cuda/planeSweeping/plane_sweeping_cuda.cu
  function:   ps_listCUDADevices
  line:       205

[12:23:19.225171][info] # GPU devices: 1, # CPU threads: 8
[12:23:19.225426][info] Plane sweeping parameters:
	- scale: 2
	- step: 3
[12:23:19.225621][info] PlaneSweepingCuda:
	- _nImgsInGPUAtTime: 2
	- scales: 2
	- subPixel: Yes
	- varianceWSH: 4
terminate called after throwing an instance of 'std::runtime_error'
  what():  Device alloc 2D array failed: device kernel image is invalid
Aborted (core dumped)

This is the output for feature extraction as you can see, the memory is found there.

[12:03:47.666846][error] CUDA-Enabled GPU detected, but the compute capabilities is not enough.
 - Device 0: 2.1, global memory: 964MB
 - Requirements: 3.0, global memory: 0MB

@fhaust
Copy link

fhaust commented Aug 18, 2019

Just a me too here. Is there anything that I could do to work around this?

Some context:

This does not happen if I just run a simple --input xyz --output abc from the commandline. But it happens if I create a pipeline with Akaze features and the high preset and run that from the command line.

@miegl
Copy link

miegl commented Aug 20, 2019

I am getting the same exact error, Meshroom 2019.1.0 worked perfectly fine.

@skinkie
Copy link

skinkie commented Aug 20, 2019

@miegl does it fails for you at the binary release of Meshroom 2019.2.0?

@fabiencastan
Copy link
Member

@fhaust: Seems to be a different problem if it can work in some cases. Could you open a different issue with a precise description of the error with log files and information about your hardware&os?

@miegl: Which platform do you use?

@miegl
Copy link

miegl commented Aug 20, 2019

@skinkie @fabiencastan binary release, linux version.
Getting [warning] Cannot get available memory information for CUDA gpu device 0.. Cuda works fine. Will try compiling alicevision myself.

@natowi natowi added the linux label Aug 20, 2019
@natowi
Copy link
Member

natowi commented Aug 20, 2019

Reference alicevision/Meshroom#594

@skinkie
Copy link

skinkie commented Aug 20, 2019

@skinkie @fabiencastan binary release, linux version.
Getting [warning] Cannot get available memory information for CUDA gpu device 0.. Cuda works fine. Will try compiling alicevision myself.

This is not going to work, or at least, that is what I did with CUDA 9.1. So my assumption is a lower version of the CUDA API is required - or - the actual code should be fixed in AliceVision. Investigation was still on my radar.

@fabiencastan
Copy link
Member

Yes, I managed to reproduce it on one computer. I'm working on it.
This only affects the linux binaries.

@simogasp simogasp added the bug label Aug 21, 2019
@natowi
Copy link
Member

natowi commented Aug 24, 2019

You can try the updated binaries alicevision/Meshroom#594 (comment)

@fhaust
Copy link

fhaust commented Sep 2, 2019

@fabiencastan Just came back from vacations.

I have not encountered this problem again. I figured that it might have something todo with the Bumblebee/Optimus setup I have running on the laptop. But running CUDA does not require that as I then found out.

@JPLeoRX
Copy link

JPLeoRX commented Feb 24, 2020

Encountered the same issue today, when running from "alicevision/meshroom:2019.2.0-centos7-cuda8.0" docker container.

[08:14:03.935462][warning] Cannot get available memory information for CUDA gpu device 0. [08:14:03.615148][warning] CUDA-Enabled GPU. Device information: - id: 0 - name: GeForce GTX 1080 - compute capability: 6.1 - total device memory: 8119 MB - device memory available: 0 MB - per-block shared memory: 49152 - warp size: 32 - max threads per block: 1024 - max threads per SM(X): 2048 - max block sizes: {1024,1024,64} - max grid sizes: {2147483647,65535,65535} - max 2D array texture: {131072,65536} - max 3D array texture: {16384,16384,16384} - max 2D linear texture: {131072,65000,2097120} - max 2D layered texture: {32768,32768,2048} - number of SM(x)s: 20 - registers per SM(x): 65536 - registers per block: 65536 - concurrent kernels: yes - mapping host memory: yes - unified addressing: yes - texture alignment: 512 byte - pitch alignment: 32 byte

@tritolol
Copy link

tritolol commented Mar 2, 2020

@JPLeoRX I have exactly the same issue. Maybe this is related to CUDA v8 as @fabiencastan mentioned here alicevision/Meshroom#594 (comment).

@nkennek
Copy link

nkennek commented Jun 1, 2020

I encountered the same issue,
using v2.2.0 in docker built upon the repository's Dockerfile, but some tweaks made
(change CUDA version to 10.0 and add some environmental variables).

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants