Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MacOS] Segmentation Fault w/ Torch #4897

Closed
Innixma opened this issue Dec 20, 2021 · 5 comments
Closed

[MacOS] Segmentation Fault w/ Torch #4897

Innixma opened this issue Dec 20, 2021 · 5 comments
Labels

Comments

@Innixma
Copy link

Innixma commented Dec 20, 2021

Description

MacOS segmentation faults if torch is imported before LightGBM and LightGBM trains:

import torch
import lightgbm
lightgbm.train(...)  # <--- Segmentation Fault

Reproducible example

First create python venv, then:

pip install lightgbm
pip install torch

import sys
python_version = f'{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}'
print(f'\nPython Version: {python_version}')
import platform
print(f'Operating System: {platform.system()}\n')

from pip._internal.operations import freeze
x = freeze.freeze()
for p in x:
    print(p)

import numpy as np
data = np.random.rand(50_000, 10)
label = np.random.randint(2, size=50_000)
test_data = np.random.rand(50_000, 10)

# This import order works:
# import lightgbm as lgb
# import torch

# This import order fails on MacOS when calling lgb.train: Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
import torch
import lightgbm as lgb

train_data = lgb.Dataset(data, label=label)
print('\nStarting Train')
model = lgb.train({}, train_data)
print('Training Finished')

Example output:

Python Version: 3.8.10
Operating System: Darwin

joblib==1.1.0
lightgbm==3.3.1
numpy==1.21.5
pip==21.1.2
scikit-learn==1.0.1
scipy==1.7.3
setuptools==57.0.0
threadpoolctl==3.0.0
torch==1.10.1
typing-extensions==4.0.1
wheel==0.36.2

Starting Train

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Environment info

LightGBM version: 3.3.1, but I've tested on various different versions of LightGBM and Torch, none worked.

OS Details:

macOS Big Sur
Version 11.6

MacBook Pro (16-inch, 2019)
Processor: 2.6 GHz 6-Core Intel Core i7
Memory: 16 GB 2667 MHz DDR4

libomp version: 12.0.1 (13.0.0 also fails)

Additional Comments

This bug is problematic in AutoGluon because depending on the order models train, if a torch model trains before LightGBM, LightGBM will cause a segmentation fault. On my old Mac I didn't get this issue likely because my OS was older. This might be specific to Big Sur, but it's unclear. Linux works fine, haven't tried Windows.

@jameslamb
Copy link
Collaborator

Thanks for the excellent write-up and example code!

I'll try running this example on my Mac tonight to see if I can reproduce the issue.

@jmoralez
Copy link
Collaborator

@Innixma can you provide your libomp version? This may be #4229.

@StrikerRUS
Copy link
Collaborator

@Innixma Hey, thanks for writing this issue!

This is the same issue as dmlc/xgboost#7518.
Indeed, the root cause is upstream bug in OpenMP on macOS #4229.

Please try any of the following workarounds:

@Innixma
Copy link
Author

Innixma commented Dec 21, 2021

It is indeed fixed when I downgrade libomp, I was using 12.0.1 and segfault goes away when I use 11.1.0 via:

# brew install wget
wget https://raw.githubusercontent.com/Homebrew/homebrew-core/fb8323f2b170bd4ae97e1bac9bf3e2983af3fdb0/Formula/libomp.rb
brew uninstall libomp
brew install libomp.rb
rm libomp.rb

Thanks for the help! Feel free to close this issue if it is a duplicate.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants