-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About configurations #8
Comments
I think you may meet the same question as I met before.You can have a look at my issue.The author gives some useful advice. |
@kapness Thank you for the kind reply! |
if you complete data aug correctly in transform.py ,the F-score can reach 72% without other changes.I do not change the original yaml file.
…---Original---
From: "Sangdoo Yun"<[email protected]>
Date: Tue, Aug 6, 2019 08:45 AM
To: "STVIR/PMTD"<[email protected]>;
Cc: "kapness"<[email protected]>;"Mention"<[email protected]>;
Subject: Re: [STVIR/PMTD] About configurations (#8)
@kapness Thank you for the kind reply!
I followed your issue but the results were still worse than my expectation.
It would be very helpful if you share your config file (.yaml) :)
Thank you again.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
my batchsize is 16 and LR starts with 0.01
…---Original---
From: "Sangdoo Yun"<[email protected]>
Date: Tue, Aug 6, 2019 08:45 AM
To: "STVIR/PMTD"<[email protected]>;
Cc: "kapness"<[email protected]>;"Mention"<[email protected]>;
Subject: Re: [STVIR/PMTD] About configurations (#8)
@kapness Thank you for the kind reply!
I followed your issue but the results were still worse than my expectation.
It would be very helpful if you share your config file (.yaml) :)
Thank you again.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@kapness |
@kapness Thanks a lot! |
@hellbell And the _C.MODEL.RPN.ASPECT_RATIOS in defaults.py should be modified as the paper said. I forgot this tip before. |
@kapness @JingChaoLiu
My questions are:
Many thanks! |
if you use original crop function implemented by maskrcnn ,maybe you are wrong.I think it doesn't crop mask gt properly.You can see its source code in modeling/structure.
…---Original---
From: "Sangdoo Yun"<[email protected]>
Date: Thu, Aug 15, 2019 15:04 PM
To: "STVIR/PMTD"<[email protected]>;
Cc: "kapness"<[email protected]>;"Mention"<[email protected]>;
Subject: Re: [STVIR/PMTD] About configurations (#8)
@kapness @JingChaoLiu
Thank you for your kind replies.
I trained vanilla Mask-RCNN on ICDAR2017-MLT and got F-score only 62% which is still far under the baseline.
My settings:
based on e2e_mask_rcnn_R_50_FPN_1x.yaml
changed MODEL.RPN.ASPECT_RATIOS: (0.17, 0.44, 1.13, 2.90, 7.46)
changed MODEL.RPN.FPN_POST_NMS_PER_BATCH = False
4 gpus with these learning rates
SOLVER: BASE_LR: 0.01 WEIGHT_DECAY: 0.0001 STEPS: (50000, 80000) MAX_ITER: 100000 IMS_PER_BATCH: 16
My questions are:
At the test time, the confidence score threshold for selecting valid bounding box is set to 0.5. Is it okay?
I guess my data augmentation of trasnform.py might be wrong. Would you share your transform.py file or give me some tips? I posted my code snippets.
class RandomSampleCrop(object): def __init__(self, crop_size=640, min_size=640, max_size=2560): self.crop_size = crop_size self.min_size = min_size self.max_size = max_size def get_size(self): # w, h = image_size w_resize = random.randint(self.min_size, self.max_size) h_resize = random.randint(self.min_size, self.max_size) return (h_resize, w_resize) def __call__(self, image, target): while (True): resized_size = self.get_size() image_r = F.resize(image, resized_size) target_r = target.resize(image_r.size) width, height = image_r.size crop_left = random.randint(0,width-self.crop_size) crop_top = random.randint(0,height-self.crop_size) target_r_c = target_r.crop([crop_left, crop_top, crop_left+self.crop_size, crop_top+self.crop_size]) target_r_c = target_r_c.clip_to_image() if len(target_r_c) > 0: too_small = False for t in target_r_c.bbox: w, h = t[2] - t[0], t[3] - t[1] if w < 1 or h < 1: too_small = True if too_small: continue break target_r_c = target_r_c image_r_c = image_r.crop([crop_left, crop_top, crop_left + self.crop_size, crop_top + self.crop_size])
Many thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@kapness |
@kapness thanks again for your reply.
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
BACKBONE:
CONV_BODY: "R-50-FPN"
RESNETS:
BACKBONE_OUT_CHANNELS: 256
RPN:
USE_FPN: True
ANCHOR_STRIDE: (4, 8, 16, 32, 64)
ANCHOR_SIZES: (16, 32, 64, 128, 256)
ASPECT_RATIOS: (0.17, 0.44, 1.13, 2.90, 7.46)
STRADDLE_THRESH: 10 # Remove RPN anchors that go outside the image by RPN_STRADDLE_THRESH pixels,
# I changed this value from 0 to 10 in the early stage accidentally and forgot to change back. But I think this change makes no difference.
PRE_NMS_TOP_N_TRAIN: 2000
PRE_NMS_TOP_N_TEST: 1000
POST_NMS_TOP_N_TEST: 1000
FPN_POST_NMS_TOP_N_TEST: 1000
FPN_POST_NMS_PER_BATCH: False
ROI_HEADS:
USE_FPN: True
ROI_BOX_HEAD:
NUM_CLASSES: 2
POOLER_RESOLUTION: 7
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
POOLER_SAMPLING_RATIO: 2
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
PREDICTOR: "FPNPredictor"
ROI_MASK_HEAD:
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor"
PREDICTOR: "MaskRCNNC4Predictor"
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 2
RESOLUTION: 28
SHARE_BOX_FEATURE_EXTRACTOR: False
MASK_ON: True
DATASETS:
TRAIN: ("icadar_2017_mlt_train", "icdar_2017_mlt_val")
TEST: ("icdar_2017_mlt_test",)
DATALOADER:
SIZE_DIVISIBILITY: 32
SOLVER:
WARMUP_METHOD: 'linear' # PMTD use 'exponential' which is not implemented in maskrcnn-benchmark
WARMUP_ITERS: 4500 # warmup_iter = (image_num=9000 * warmup_epoch=8 / batch_size=16)
IMS_PER_BATCH: 16
BASE_LR: 0.02 # PMTD use batch_size * 0.00125 with syncBN
WEIGHT_DECAY: 0.0001
STEPS: (49500, 76500) # warmup_iter + (iter * 0.5, iter * 0.8)
MAX_ITER: 94500 # iter = (image_num=9000 * warmup_epoch=160 / batch_size=16) = 90000, max_iter = (warmup_iter + iter)
|
now I have one question about the OHEM.In paper,you compute 512 proposals for OHEM in roi_heads,is it right?(or should I modify it in RPN branch?)
But my batchsize is smaller than yours,for example,batchsize is 16 and each GPU computes 8 images.Does it make a difference to OHEM?
In maskrcnn ,roi_head branch gets 512 proposals/image.
thanks for your kind reply again.I think this is my last question about the baseline...
…---Original---
From: "JingChaoLiu"<[email protected]>
Date: Thu, Aug 15, 2019 17:17 PM
To: "STVIR/PMTD"<[email protected]>;
Cc: "kapness"<[email protected]>;"Mention"<[email protected]>;
Subject: Re: [STVIR/PMTD] About configurations (#8)
@kapness thanks again for your reply.
@hellbell
Following the previous answers and the paper, here is one configuration which I just wrote. Sorry for no time to validate it and no guarantee to the F-measure.
MODEL: META_ARCHITECTURE: "GeneralizedRCNN" WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50" BACKBONE: CONV_BODY: "R-50-FPN" RESNETS: BACKBONE_OUT_CHANNELS: 256 RPN: USE_FPN: True ANCHOR_STRIDE: (4, 8, 16, 32, 64) ANCHOR_SIZES: (16, 32, 64, 128, 256) ASPECT_RATIOS: (0.17, 0.44, 1.13, 2.90, 7.46) STRADDLE_THRESH: 10 # Remove RPN anchors that go outside the image by RPN_STRADDLE_THRESH pixels, # I changed this value from 0 to 10 in the early stage accidentally and forgot to change back. But I think this change makes no difference. PRE_NMS_TOP_N_TRAIN: 2000 PRE_NMS_TOP_N_TEST: 1000 POST_NMS_TOP_N_TEST: 1000 FPN_POST_NMS_TOP_N_TEST: 1000 FPN_POST_NMS_PER_BATCH: False ROI_HEADS: USE_FPN: True ROI_BOX_HEAD: NUM_CLASSES: 2 POOLER_RESOLUTION: 7 POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) POOLER_SAMPLING_RATIO: 2 FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor" PREDICTOR: "FPNPredictor" ROI_MASK_HEAD: POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125) FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor" PREDICTOR: "MaskRCNNC4Predictor" POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 2 RESOLUTION: 28 SHARE_BOX_FEATURE_EXTRACTOR: False MASK_ON: True DATASETS: TRAIN: ("icadar_2017_mlt_train", "icdar_2017_mlt_val") TEST: ("icdar_2017_mlt_test",) DATALOADER: SIZE_DIVISIBILITY: 32 SOLVER: WARMUP_METHOD: 'linear' # PMTD use 'exponential' which is not implemented in maskrcnn-benchmark WARMUP_ITERS: 4500 # warmup_iter = (image_num=9000 * warmup_epoch=8 / batch_size=16) IMS_PER_BATCH: 16 BASE_LR: 0.02 # PMTD use batch_size * 0.00125 with syncBN WEIGHT_DECAY: 0.0001 STEPS: (49500, 76500) # warmup_iter + (iter * 0.5, iter * 0.8) MAX_ITER: 94500 # iter = (image_num=9000 * warmup_epoch=160 / batch_size=16) = 90000, max_iter = (warmup_iter + iter)
Have you done a grid search for the parameters (cls_threshold, nms_threshold) of final NMS? See #4 for more details. This can make a bigger difference than some neglectable training details.
See #5 to see the problematic crop operation. There are two problems. One, the point number of the cropped mask may float from 3 to 8, no longer a constant number 4. Two, the difference between the cropped origin bounding box and the correct cropped bounding box.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
OHEM is done in the bbox branch, instead of RPN. Compared with the data flow of inference mentioned in #4, the data flow of training is as follows. Some details about loss are also added.
batch_size = 16 is enough. |
thx very very much for saving me!
…---Original---
From: "JingChaoLiu"<[email protected]>
Date: Thu, Aug 15, 2019 21:24 PM
To: "STVIR/PMTD"<[email protected]>;
Cc: "kapness"<[email protected]>;"Mention"<[email protected]>;
Subject: Re: [STVIR/PMTD] About configurations (#8)
OHEM is done in the bbox branch, instead of RPN. Compared with the data flow of inference mentioned in #4, the data flow of training is as follows. Some details about loss are also added.
image -> backbone
-> RPN
> pred_cls, pred_reg = RPN.forward(All proposals)
> random sample sample_num = RPN.BATCH_SIZE_PER_IMAGE=256 * image_num proposals to calculate loss. (sample_num is far less than len(All proposals))
> postprocess for All proposals to output MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN * image_num proposals, given RPN.FPN_POST_NMS_PER_BATCH = False
RPN -> bbox branch
> pred_cls, pred_reg = bbox.forward(the proposals outputted by RPN)
> random sample ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals to calculate loss
> (add OHEM here) sort all ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals by cls_loss + reg_loss, then keep the loss of top 512 proposals, and set the loss of the other proposals to 0.
RPN -> mask branch
> pred_cls, pred_reg = mask.forward(the positive proposals outputted by RPN)
> calculate mask loss for all predicted mask
backward the loss to update parameters
my batchsize is smaller than yours,for example,batchsize is 16 and each GPU computes 8 images.Does it make a difference to OHEM?
batch_size = 16 is enough.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I'm so sorry to disturb you again..
> (add OHEM here) sort all ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals by cls_loss + reg_loss, then keep the loss of top 512 proposals, and set the loss of the other proposals to 0.
here,in original code,it only computes the reg_loss of positive proposals.should I first set the reg loss of negative proposals,then add cls_loss to reg_loss and sort?
…---Original---
From: "JingChaoLiu"<[email protected]>
Date: Thu, Aug 15, 2019 21:24 PM
To: "STVIR/PMTD"<[email protected]>;
Cc: "kapness"<[email protected]>;"Mention"<[email protected]>;
Subject: Re: [STVIR/PMTD] About configurations (#8)
OHEM is done in the bbox branch, instead of RPN. Compared with the data flow of inference mentioned in #4, the data flow of training is as follows. Some details about loss are also added.
image -> backbone
-> RPN
> pred_cls, pred_reg = RPN.forward(All proposals)
> random sample sample_num = RPN.BATCH_SIZE_PER_IMAGE=256 * image_num proposals to calculate loss. (sample_num is far less than len(All proposals))
> postprocess for All proposals to output MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN * image_num proposals, given RPN.FPN_POST_NMS_PER_BATCH = False
RPN -> bbox branch
> pred_cls, pred_reg = bbox.forward(the proposals outputted by RPN)
> random sample ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals to calculate loss
> (add OHEM here) sort all ROI_HEADS.BATCH_SIZE_PER_IMAGE=512 * image_num proposals by cls_loss + reg_loss, then keep the loss of top 512 proposals, and set the loss of the other proposals to 0.
RPN -> mask branch
> pred_cls, pred_reg = mask.forward(the positive proposals outputted by RPN)
> calculate mask loss for all predicted mask
backward the loss to update parameters
my batchsize is smaller than yours,for example,batchsize is 16 and each GPU computes 8 images.Does it make a difference to OHEM?
batch_size = 16 is enough.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
yes, for the negative proposals, just set the reg loss to 0 before sorting |
First, thank you for your kind paper and github page.
Your work is super useful for studying text detection using mask-rcnn baseline.
I am reproducing the results of PMTD but my results are little bit worse. (Mask RCNN baseline 60% F-measure on MLT dataset)
So I'm figuring out what is wrong with my configuration.
It will be very helpful if the config file (.yaml) is provided, or let me know RPN.ANCHOR_STRIDE setting (currently, I'm using (4, 8, 16, 32, 64))
Thanks!
The text was updated successfully, but these errors were encountered: