Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

mAP predicted by student_model is sometimes higher than teacher #42

Open
firekeepers opened this issue Oct 20, 2022 · 7 comments
Open

Comments

@firekeepers
Copy link

I train the model with my dataset,but the ap50 is instable, I may get vary different result by same parameters.
Moreover,the teacher' AP50 sometimes lower than student'AP50.
Is this phenomenon normal in DAOD?

图片

@sysuzgg
Copy link

sysuzgg commented Oct 23, 2022

I train the model with my dataset,but the ap50 is instable, I may get vary different result by same parameters. Moreover,the teacher' AP50 sometimes lower than student'AP50. Is this phenomenon normal in DAOD?

图片 这个问题可能是没固定随机种子,我也是训练自己的数据集,我想请教一下模型的训练过程,第一步:Trainer: baseline, 迭代10k次;第二步:Trainer: ateacher, 加载第一步训练的模型参数,接着迭代50k次?是这样的步骤吗?

@firekeepers
Copy link
Author

I train the model with my dataset,but the ap50 is instable, I may get vary different result by same parameters. Moreover,the teacher' AP50 sometimes lower than student'AP50. Is this phenomenon normal in DAOD?
图片 这个问题可能是没固定随机种子,我也是训练自己的数据集,我想请教一下模型的训练过程,第一步:Trainer: baseline, 迭代10k次;第二步:Trainer: ateacher, 加载第一步训练的模型参数,接着迭代50k次?是这样的步骤吗?

我试图固定了torch,numpy等随机种子,在config.py中也尝试对随机种子进行了固定,同时数据读取的随机种子保持为0,但是每次的结果相差甚远

@firekeepers
Copy link
Author

I train the model with my dataset,but the ap50 is instable, I may get vary different result by same parameters. Moreover,the teacher' AP50 sometimes lower than student'AP50. Is this phenomenon normal in DAOD?
图片 这个问题可能是没固定随机种子,我也是训练自己的数据集,我想请教一下模型的训练过程,第一步:Trainer: baseline, 迭代10k次;第二步:Trainer: ateacher, 加载第一步训练的模型参数,接着迭代50k次?是这样的步骤吗?

对于网络的训练步骤,在trainer.py中描述的较为详细,首先<burn_in_iter时对模型进行训练,burn up阶段将网络复制到教师模型,然后对学生模型进行训练

我根据网络拟合情况对训练时间进行了多次调整,由于我的数据集域分布偏差较大,因此较小的burn-in阶段可能会带来更高的精度

@sysuzgg
Copy link

sysuzgg commented Oct 24, 2022

@firekeepers 请问你训练自己的数据集修改了什么参数,我用cityscapes的yaml配置训练自己的数据集,metrics.json中的结果比faster-RCNN source-only的结果差距20%左右,这个结果肯定不对,想请教一下哪些参数可以修改?

@sysuzgg
Copy link

sysuzgg commented Oct 24, 2022

@firekeepers 还有一个问题,我用VGG16训练可行,但是换为R101的时候,训练中总是会出现total_loss, loss_cls, loss_box_reg都为Nan,learning rate 从0.02,0.002,0.0002,0.00002都试过,训练时还是total_loss, loss_cls, loss_box_reg都为Nan

@firekeepers
Copy link
Author

@firekeepers 请问你训练自己的数据集修改了什么参数,我用cityscapes的yaml配置训练自己的数据集,metrics.json中的结果比faster-RCNN source-only的结果差距20%左右,这个结果肯定不对,想请教一下哪些参数可以修改?

主要调整burn-in阶段的训练时常吧,我也不确定根据你的数据集如何调整

@sysuzgg
Copy link

sysuzgg commented Oct 25, 2022

@firekeepers 请问你训练自己的数据集修改了什么参数,我用cityscapes的yaml配置训练自己的数据集,metrics.json中的结果比faster-RCNN source-only的结果差距20%左右,这个结果肯定不对,想请教一下哪些参数可以修改?

主要调整burn-in阶段的训练时常吧,我也不确定根据你的数据集如何调整

好的,谢谢

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants