Bug: Cannot get available memory information for CUDA gpu device 0. #621

adaasch · 2019-04-02T09:01:00Z

Problem

On executing DepthMap step I get:

[1/5](4/86) DepthMap
 - commandLine: aliceVision_depthMapEstimation  --input "/raid/masif/photogrammetrie/MeshroomCache/StructureFromMotion/6cdc52f1978a87ceb39f3652706e8593706edbbd/sfm.abc" --imagesFolder "/raid/masif/photogrammetrie/MeshroomCache/PrepareDenseScene/90c04a04f8201923eae09602f44b36f88db1cf35" --downscale 2 --minViewAngle 2.0 --maxViewAngle 70.0 --sgmMaxTCams 10 --sgmWSH 4 --sgmGammaC 5.5 --sgmGammaP 8.0 --refineMaxTCams 6 --refineNSamplesHalf 150 --refineNDepthsToRefine 31 --refineNiters 100 --refineWSH 3 --refineSigma 15 --refineGammaC 15.5 --refineGammaP 8.0 --refineUseTcOrRcPixSize False --exportIntermediateResults False --nbGPUs 0 --verboseLevel info --output "/raid/masif/photogrammetrie/MeshroomCache/DepthMap/2d5a12f60cf50311c3008439bb4ba845fd3e7add" --rangeStart 9 --rangeSize 3
 - logFile: /raid/masif/photogrammetrie/MeshroomCache/DepthMap/2d5a12f60cf50311c3008439bb4ba845fd3e7add/3.log
 - elapsed time: 0:00:01.119207
ERROR:root:Error on node computation: Error on node "DepthMap_1(3)":
Log:
Program called with the following parameters:
 * downscale = 2
 * exportIntermediateResults = 0
 * imagesFolder = "/raid/masif/photogrammetrie/MeshroomCache/PrepareDenseScene/90c04a04f8201923eae09602f44b36f88db1cf35"
 * input = "/raid/masif/photogrammetrie/MeshroomCache/StructureFromMotion/6cdc52f1978a87ceb39f3652706e8593706edbbd/sfm.abc"
 * maxViewAngle = 70
 * minViewAngle = 2
 * nbGPUs = 0
 * output = "/raid/masif/photogrammetrie/MeshroomCache/DepthMap/2d5a12f60cf50311c3008439bb4ba845fd3e7add"
 * rangeSize = 3
 * rangeStart = 9
 * refineGammaC = 15.5
 * refineGammaP = 8
 * refineMaxTCams = 6
 * refineNDepthsToRefine = 31
 * refineNSamplesHalf = 150
 * refineNiters = 100
 * refineSigma = 15
 * refineUseTcOrRcPixSize = 0
 * refineWSH = 3
 * sgmGammaC = 5.5
 * sgmGammaP = 8
 * sgmMaxTCams = 10
 * sgmWSH = 4
 * verboseLevel = "info"

[08:26:12.260276][warning] Cannot get available memory information for CUDA gpu device 0.
[08:26:11.824449][warning] CUDA-Enabled GPU.
Device information:
        - id:                      0
        - name:                    Tesla V100-SXM2-16GB
        - compute capability:      7.0
        - total device memory:     16130 MB
        - device memory available: 0 MB
        - per-block shared memory: 49152
        - warp size:               32
        - max threads per block:   1024
        - max threads per SM(X):   2048
        - max block sizes:         {1024,1024,64}
        - max grid sizes:          {2147483647,65535,65535}
        - max 2D array texture:    {131072,65536}
        - max 3D array texture:    {16384,16384,16384}
        - max 2D linear texture:   {131072,65000,2097120}
        - max 2D layered texture:  {32768,32768,2048}
        - number of SM(x)s:        80
        - registers per SM(x):     65536
        - registers per block:     65536
        - concurrent kernels:      yes
        - mapping host memory:     yes
        - unified addressing:      yes
        - texture alignment:       512 byte
        - pitch alignment:         32 byte

[08:26:12.261645][info] Supported CUDA-Enabled GPU detected.
[08:26:12.344528][info] Found 2 image dimension(s):
[08:26:12.344548][info]         - [3670x5496]
[08:26:12.344557][info]         - [5496x3670]
[08:26:12.633642][info] Overall maximum dimension: [5496x5496]
[08:26:12.633730][info] Create depth maps.


CUDAError: unknown error
  file:       /home/ros/meshroom/AliceVision/src/aliceVision/depthMap/cuda/planeSweeping/plane_sweeping_cuda.cu
  function:   ps_listCUDADevices
  line:       205

[08:26:12.635015][info] # GPU devices: 1, # CPU threads: 80
[08:26:12.635038][info] Plane sweeping parameters:
        - scale: 2
        - step: 5
[08:26:12.635099][info] PlaneSweepingCuda:
        - _nImgsInGPUAtTime: 2
        - scales: 2
        - subPixel: Yes
        - varianceWSH: 4
terminate called after throwing an instance of 'std::runtime_error'
  what():  Device alloc 2D array failed: unknown error
number of CUDA devices: 1
   0: Tesla V100-SXM2-16GB
CUDA device no 0 for 0
Device 0 memory - used: 0.000000, free: 0.000000, total: 0.000000
Aborted (core dumped)

WARNING: downgrade status on node "DepthMap_1(4)" from Status.SUBMITTED to Status.NONE
WARNING: downgrade status on node "DepthMap_1(5)" from Status.SUBMITTED to Status.NONE

[snip]

WARNING: downgrade status on node "DepthMapFilter_1(25)" from Status.SUBMITTED to Status.NONE
WARNING: downgrade status on node "Meshing_1" from Status.SUBMITTED to Status.NONE
WARNING: downgrade status on node "MeshFiltering_1" from Status.SUBMITTED to Status.NONE
WARNING: downgrade status on node "Texturing_1" from Status.SUBMITTED to Status.NONE
Traceback (most recent call last):
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/cx_Freeze/initscripts/__startup__.py", line 14, in run
  File "/opt/Meshroom/setupInitScriptUnix.py", line 39, in run
  File "bin/meshroom_compute", line 64, in <module>
  File "/opt/Meshroom/meshroom/core/graph.py", line 1131, in executeGraph
  File "/opt/Meshroom/meshroom/core/node.py", line 271, in process
  File "/opt/Meshroom/meshroom/core/desc.py", line 452, in processChunk
RuntimeError: Error on node "DepthMap_1(3)":
Log:
Program called with the following parameters:
 * downscale = 2
 * exportIntermediateResults = 0
 * imagesFolder = "/raid/masif/photogrammetrie/MeshroomCache/PrepareDenseScene/90c04a04f8201923eae09602f44b36f88db1cf35"
 * input = "/raid/masif/photogrammetrie/MeshroomCache/StructureFromMotion/6cdc52f1978a87ceb39f3652706e8593706edbbd/sfm.abc"
 * maxViewAngle = 70
 * minViewAngle = 2
 * nbGPUs = 0
 * output = "/raid/masif/photogrammetrie/MeshroomCache/DepthMap/2d5a12f60cf50311c3008439bb4ba845fd3e7add"
 * rangeSize = 3
 * rangeStart = 9
 * refineGammaC = 15.5
 * refineGammaP = 8
 * refineMaxTCams = 6
 * refineNDepthsToRefine = 31
 * refineNSamplesHalf = 150
 * refineNiters = 100
 * refineSigma = 15
 * refineUseTcOrRcPixSize = 0
 * refineWSH = 3
 * sgmGammaC = 5.5
 * sgmGammaP = 8
 * sgmMaxTCams = 10
 * sgmWSH = 4
 * verboseLevel = "info"

[08:26:12.260276][warning] Cannot get available memory information for CUDA gpu device 0.
[08:26:11.824449][warning] CUDA-Enabled GPU.
Device information:
        - id:                      0
        - name:                    Tesla V100-SXM2-16GB
        - compute capability:      7.0
        - total device memory:     16130 MB
        - device memory available: 0 MB
        - per-block shared memory: 49152
        - warp size:               32
        - max threads per block:   1024
        - max threads per SM(X):   2048
        - max block sizes:         {1024,1024,64}
        - max grid sizes:          {2147483647,65535,65535}
        - max 2D array texture:    {131072,65536}
        - max 3D array texture:    {16384,16384,16384}
        - max 2D linear texture:   {131072,65000,2097120}
        - max 2D layered texture:  {32768,32768,2048}
        - number of SM(x)s:        80
        - registers per SM(x):     65536
        - registers per block:     65536
        - concurrent kernels:      yes
        - mapping host memory:     yes
        - unified addressing:      yes
        - texture alignment:       512 byte
        - pitch alignment:         32 byte

[08:26:12.261645][info] Supported CUDA-Enabled GPU detected.
[08:26:12.344528][info] Found 2 image dimension(s):
[08:26:12.344548][info]         - [3670x5496]
[08:26:12.344557][info]         - [5496x3670]
[08:26:12.633642][info] Overall maximum dimension: [5496x5496]
[08:26:12.633730][info] Create depth maps.


CUDAError: unknown error
  file:       /home/ros/meshroom/AliceVision/src/aliceVision/depthMap/cuda/planeSweeping/plane_sweeping_cuda.cu
  function:   ps_listCUDADevices
  line:       205

[08:26:12.635015][info] # GPU devices: 1, # CPU threads: 80
[08:26:12.635038][info] Plane sweeping parameters:
        - scale: 2
        - step: 5
[08:26:12.635099][info] PlaneSweepingCuda:
        - _nImgsInGPUAtTime: 2
        - scales: 2
        - subPixel: Yes
        - varianceWSH: 4
terminate called after throwing an instance of 'std::runtime_error'
  what():  Device alloc 2D array failed: unknown error
number of CUDA devices: 1
   0: Tesla V100-SXM2-16GB
CUDA device no 0 for 0
Device 0 memory - used: 0.000000, free: 0.000000, total: 0.000000
Aborted (core dumped)

This doesn't happen with the release Meshroom-2019.1.0.
But the release doesn't work either because of meshroom issue #409

Steps to Reproduce

Compile from source with Cuda TK 7 or 9
Run pipeline

Versions

AliceVision branch/version: git develop
OS: Ubuntu 16.04.4 LTS / DGX-1
C++ compiler: GCC

The text was updated successfully, but these errors were encountered:

natowi · 2019-04-03T12:32:20Z

@adaasch I think you meant to reference this meshroom issue MR#409

fabiencastan · 2019-04-05T12:14:50Z

It's not the same issue. Here the problem is a failure of the call to cudaMemGetInfo. See Cannot get available memory information in the log.

The corresponding code is here:

AliceVision/src/aliceVision/gpu/gpu.cpp

Line 99 in 6734014

if(cudaMemGetInfo(&avail, &total) != cudaSuccess)

But I have no idea why cudaMemGetInfo fails.

skinkie · 2019-08-02T11:28:58Z

I can reproduce it in DepthMap; In the feature extraction step the GPU is disabled because the device has CUDA 2.1, where 3.0 is required. My AliceVision is compiled with CUDA 9.1.

Program called with the following parameters:
 * downscale = 2
 * exportIntermediateResults = 0
 * imagesFolder = "/home/mio/Meshroom/MeshroomCache/PrepareDenseScene/80d2fd52143eb44bcc9f853253d6039e385aba6c"
 * input = "/home/mio/Meshroom/MeshroomCache/StructureFromMotion/7e37934063bf907691aa9e37da6fc3004cb09eee/sfm.abc"
 * maxViewAngle = 70
 * minViewAngle = 2
 * nbGPUs = 0
 * output = "/home/mio/Meshroom/MeshroomCache/DepthMap/eed412414cf0a5b8d0e35274e1d035e26b8340eb"
 * rangeSize = 3
 * rangeStart = 0
 * refineGammaC = 15.5
 * refineGammaP = 8
 * refineMaxTCams = 6
 * refineNDepthsToRefine = 31
 * refineNSamplesHalf = 150
 * refineNiters = 100
 * refineSigma = 15
 * refineUseTcOrRcPixSize = 0
 * refineWSH = 3
 * sgmGammaC = 5.5
 * sgmGammaP = 8
 * sgmMaxTCams = 10
 * sgmWSH = 4
 * verboseLevel = "info"

[12:23:18.803335][warning] Cannot get available memory information for CUDA gpu device 0.
[12:23:18.519278][warning] CUDA-Enabled GPU.
Device information:
	- id:                      0
	- name:                    GeForce GT 540M
	- compute capability:      2.1
	- total device memory:     964 MB 
	- device memory available: 0 MB 
	- per-block shared memory: 49152
	- warp size:               32
	- max threads per block:   1024
	- max threads per SM(X):   1536
	- max block sizes:         {1024,1024,64}
	- max grid sizes:          {65535,65535,65535}
	- max 2D array texture:    {65536,65535}
	- max 3D array texture:    {2048,2048,2048}
	- max 2D linear texture:   {65000,65000,1048544}
	- max 2D layered texture:  {16384,16384,2048}
	- number of SM(x)s:        2
	- registers per SM(x):     32768
	- registers per block:     32768
	- concurrent kernels:      yes
	- mapping host memory:     yes
	- unified addressing:      yes
	- texture alignment:       512 byte
	- pitch alignment:         32 byte

[12:23:18.804564][info] Supported CUDA-Enabled GPU detected.
[12:23:18.977578][info] Found 1 image dimension(s): 
[12:23:18.977714][info] 	- [4032x3024]
[12:23:19.223508][info] Overall maximum dimension: [4032x3024]
[12:23:19.223810][info] Create depth maps.


CUDAError: device kernel image is invalid
  file:       /home/mio/Sources/AliceVision/src/aliceVision/depthMap/cuda/planeSweeping/plane_sweeping_cuda.cu
  function:   ps_listCUDADevices
  line:       205

[12:23:19.225171][info] # GPU devices: 1, # CPU threads: 8
[12:23:19.225426][info] Plane sweeping parameters:
	- scale: 2
	- step: 3
[12:23:19.225621][info] PlaneSweepingCuda:
	- _nImgsInGPUAtTime: 2
	- scales: 2
	- subPixel: Yes
	- varianceWSH: 4
terminate called after throwing an instance of 'std::runtime_error'
  what():  Device alloc 2D array failed: device kernel image is invalid
Aborted (core dumped)

This is the output for feature extraction as you can see, the memory is found there.

[12:03:47.666846][error] CUDA-Enabled GPU detected, but the compute capabilities is not enough.
 - Device 0: 2.1, global memory: 964MB
 - Requirements: 3.0, global memory: 0MB

fhaust · 2019-08-18T19:32:53Z

Just a me too here. Is there anything that I could do to work around this?

Some context:

This does not happen if I just run a simple --input xyz --output abc from the commandline. But it happens if I create a pipeline with Akaze features and the high preset and run that from the command line.

miegl · 2019-08-20T18:38:51Z

I am getting the same exact error, Meshroom 2019.1.0 worked perfectly fine.

skinkie · 2019-08-20T18:40:48Z

@miegl does it fails for you at the binary release of Meshroom 2019.2.0?

fabiencastan · 2019-08-20T18:52:15Z

@fhaust: Seems to be a different problem if it can work in some cases. Could you open a different issue with a precise description of the error with log files and information about your hardware&os?

@miegl: Which platform do you use?

miegl · 2019-08-20T19:49:02Z

@skinkie @fabiencastan binary release, linux version.
Getting [warning] Cannot get available memory information for CUDA gpu device 0.. Cuda works fine. Will try compiling alicevision myself.

natowi · 2019-08-20T19:52:38Z

Reference alicevision/Meshroom#594

skinkie · 2019-08-20T19:55:15Z

@skinkie @fabiencastan binary release, linux version.
Getting [warning] Cannot get available memory information for CUDA gpu device 0.. Cuda works fine. Will try compiling alicevision myself.

This is not going to work, or at least, that is what I did with CUDA 9.1. So my assumption is a lower version of the CUDA API is required - or - the actual code should be fixed in AliceVision. Investigation was still on my radar.

fabiencastan · 2019-08-21T08:01:32Z

Yes, I managed to reproduce it on one computer. I'm working on it.
This only affects the linux binaries.

natowi · 2019-08-24T10:25:55Z

You can try the updated binaries alicevision/Meshroom#594 (comment)

fhaust · 2019-09-02T10:39:30Z

@fabiencastan Just came back from vacations.

I have not encountered this problem again. I figured that it might have something todo with the Bumblebee/Optimus setup I have running on the laptop. But running CUDA does not require that as I then found out.

JPLeoRX · 2020-02-24T08:20:03Z

Encountered the same issue today, when running from "alicevision/meshroom:2019.2.0-centos7-cuda8.0" docker container.

[08:14:03.935462][warning] Cannot get available memory information for CUDA gpu device 0. [08:14:03.615148][warning] CUDA-Enabled GPU. Device information: - id: 0 - name: GeForce GTX 1080 - compute capability: 6.1 - total device memory: 8119 MB - device memory available: 0 MB - per-block shared memory: 49152 - warp size: 32 - max threads per block: 1024 - max threads per SM(X): 2048 - max block sizes: {1024,1024,64} - max grid sizes: {2147483647,65535,65535} - max 2D array texture: {131072,65536} - max 3D array texture: {16384,16384,16384} - max 2D linear texture: {131072,65000,2097120} - max 2D layered texture: {32768,32768,2048} - number of SM(x)s: 20 - registers per SM(x): 65536 - registers per block: 65536 - concurrent kernels: yes - mapping host memory: yes - unified addressing: yes - texture alignment: 512 byte - pitch alignment: 32 byte

tritolol · 2020-03-02T14:22:06Z

@JPLeoRX I have exactly the same issue. Maybe this is related to CUDA v8 as @fabiencastan mentioned here alicevision/Meshroom#594 (comment).

nkennek · 2020-06-01T09:15:08Z

I encountered the same issue,
using v2.2.0 in docker built upon the repository's Dockerfile, but some tweaks made
(change CUDA version to 10.0 and add some environmental variables).

github-actions · 2021-05-28T00:51:25Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

adaasch changed the title ~~Bug~~ Bug: Cannot get available memory information for CUDA gpu device 0. Apr 2, 2019

natowi added the CUDA label Jun 30, 2019

natowi added the linux label Aug 20, 2019

simogasp added the bug label Aug 21, 2019

github-actions bot added the stale label May 28, 2021

github-actions bot closed this as completed Jun 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Cannot get available memory information for CUDA gpu device 0. #621

Bug: Cannot get available memory information for CUDA gpu device 0. #621

adaasch commented Apr 2, 2019

natowi commented Apr 3, 2019

fabiencastan commented Apr 5, 2019

skinkie commented Aug 2, 2019

fhaust commented Aug 18, 2019 •

edited

Loading

miegl commented Aug 20, 2019

skinkie commented Aug 20, 2019

fabiencastan commented Aug 20, 2019

miegl commented Aug 20, 2019

natowi commented Aug 20, 2019

skinkie commented Aug 20, 2019

fabiencastan commented Aug 21, 2019

natowi commented Aug 24, 2019

fhaust commented Sep 2, 2019

JPLeoRX commented Feb 24, 2020

tritolol commented Mar 2, 2020

nkennek commented Jun 1, 2020

github-actions bot commented May 28, 2021

Bug: Cannot get available memory information for CUDA gpu device 0. #621

Bug: Cannot get available memory information for CUDA gpu device 0. #621

Comments

adaasch commented Apr 2, 2019

Problem

Steps to Reproduce

Versions

natowi commented Apr 3, 2019

fabiencastan commented Apr 5, 2019

skinkie commented Aug 2, 2019

fhaust commented Aug 18, 2019 • edited Loading

miegl commented Aug 20, 2019

skinkie commented Aug 20, 2019

fabiencastan commented Aug 20, 2019

miegl commented Aug 20, 2019

natowi commented Aug 20, 2019

skinkie commented Aug 20, 2019

fabiencastan commented Aug 21, 2019

natowi commented Aug 24, 2019

fhaust commented Sep 2, 2019

JPLeoRX commented Feb 24, 2020

tritolol commented Mar 2, 2020

nkennek commented Jun 1, 2020

github-actions bot commented May 28, 2021

fhaust commented Aug 18, 2019 •

edited

Loading