You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from register_dataset import* #register custom dataset
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor, DefaultTrainer
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
import os
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
with open(os.path.join(cfg.OUTPUT_DIR, "config.yaml"), "w") as f:
f.write(cfg.dump())
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
when i am running this code with batch size 28 i am getting cuda error
but i am able to run this file in windows which has same configuration as linux what is issue?how to overcome this could you please provide some code to perform well with increased batch size in linux environment
The text was updated successfully, but these errors were encountered:
You've chosen to report an unexpected problem or bug. Unless you already know the root cause of it, please include details about it by filling the issue template.
The following information is missing: "Instructions To Reproduce the Issue and Full Logs";
eklahari
changed the title
I can't train the model with batchsize:28 in linux environment but i can get the training results in windows with same configuration ?
I can't train the model with batch size : 28 in linux environment but I can get the training results in windows with same configuration ?
Jun 19, 2024
eklahari
changed the title
I can't train the model with batch size : 28 in linux environment but I can get the training results in windows with same configuration ?
I can't train the model with batch size : 28 in linux environment but I can get the training results in windows with batch size 28 !
Jun 19, 2024
Hi,
This is usually because of the different ways CUDA memory is managed in different environments.
There isn't any specific method to resolve this, but in a Linux environment where you are unable to train a model of batch size of 28, you could try and:
Reduce the Batch Size
Go for a Smaller Model
Using something like torch.cuda.memory_allocated() and torch.cuda.memory_cached() to check up on GPU Memory allocation
These aren't solutions but other possibilities in which you can still train your model in a Linux environment...
Hope that explains the issues,
If there are any more questions please let me know
from register_dataset import* #register custom dataset
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor, DefaultTrainer
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
import os
CUDA_LAUNCH_BLOCKING=1.
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.MASK_ON = False
cfg.DATASETS.TRAIN = ("football_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.SOLVER.IMS_PER_BATCH = 28
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 1000
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 5 # Number of classes in the dataset
cfg.OUTPUT_DIR = "/output1"
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
with open(os.path.join(cfg.OUTPUT_DIR, "config.yaml"), "w") as f:
f.write(cfg.dump())
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
when i am running this code with batch size 28 i am getting cuda error
but i am able to run this file in windows which has same configuration as linux what is issue?how to overcome this could you please provide some code to perform well with increased batch size in linux environment
The text was updated successfully, but these errors were encountered: