Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducing sAP evaluation #24

Closed
yanconglin opened this issue Feb 22, 2020 · 10 comments
Closed

reproducing sAP evaluation #24

yanconglin opened this issue Feb 22, 2020 · 10 comments
Labels
bug Something isn't working question Further information is requested

Comments

@yanconglin
Copy link

yanconglin commented Feb 22, 2020

Hi, Yichao, really nice work! Thanks for sharing the code.
I would like to know how you convert the AFM reuslts (other methods as well) into the npz format which is used in the mAPJ/SAP evaluation? Can you share this part? Currently I have trouble reproducing the reported results of AFM in your paper. The numbes I got are 4.7 6.5 7.6 (sAP on ShanghaiTech), which are ~20 (sAP10) lower than the ones in the paper (Table 2). I have checked that the afm lines and gt lines before the 'lcnn.metric.msTPFP' function are matched by visualization (check the figure below), so I was wondering why the afm result I got is so bad compare with the one in paper? Looking forward to your help!
Figure_1

Thanks in advance!

@yanconglin
Copy link
Author

yanconglin commented Feb 23, 2020

Update: guess I found the reason after a day.... After checking the afm code, I found the last column of the output mat (x1,y1,x2,y2,width) is width, instead of the ratio which is used to rank lines in the paper. After replacing last column with width/length, I got new results: mAPJ 23.3 (the same as yours), sAP10: 23.9 (slightly worse than 24.4 in paper). Any ideas on why this happens?

@zhou13
Copy link
Owner

zhou13 commented Feb 24, 2020

If the error is relatively small (23.9 vs 24.4), I think it might be due to some subpixel error, e.g., pixels starting from 0 vs from 1 vs from 0.5. @HaozhiQi Any idea on this?

@zhou13 zhou13 added the question Further information is requested label Feb 24, 2020
@yanconglin
Copy link
Author

yanconglin commented Feb 24, 2020

Another update: Currently the number I got on York is better than yours: (sAP10: 9.1 vs 3.5, mAPJ: 12.3 vs6.9). THis is abit confusing. Not sure where the problem is. Is it possible to realase all numbers (SAP 5 10 15 on both datasets for all methods)? Thanks a lot!

@zhou13
Copy link
Owner

zhou13 commented Feb 24, 2020

It seems that your number on York might be reasonable. I will check and get back to you later.

@zhou13 zhou13 added bug Something isn't working question Further information is requested and removed question Further information is requested bug Something isn't working labels Mar 9, 2020
@zhou13
Copy link
Owner

zhou13 commented Mar 9, 2020

Sorry for the late reply. The code for evaluating York I am currently using is

def evaluate_afm(im_list, gt_list):
and
def line_score(path, threshold=5):
I check my code and plot and it seems to be right for me. The gt data I am using is at https://drive.google.com/file/d/1ijOXv0Xw1IaNDtp1uBJt5Xb3mMj99Iw2/view?usp=sharing. Let me know if you have any finding or anything more from me.

@yanconglin
Copy link
Author

Cool. I will. Thanks for the reply!

@zhou13
Copy link
Owner

zhou13 commented May 16, 2020

Feel free to reopen this issue if you have any update.

@zhou13 zhou13 closed this as completed May 16, 2020
@yanconglin
Copy link
Author

HI, Yizhao,

Sorry to bother you again.
Recently, I spent sometiem figuring out why the Wireframe approach has such a bad performance on sAP (3%-5%), and found a problem in sAP evaluation. By printing the output of the Wireframe, you will find that there are many duplicate lines with exactly the same endpoints, except that the order of endpoints are different. This leads to a lower sAP values. Once those duplicate lines are removed, I obtained a higher sAP values (6.8 | 9.0 | 10.4 on ShanghaiTech), which seem to be more reasonable. Similar observations on the York dataset. Not sure if you have the same observation? PLease let me know your findings. Thank you!

@zhou13
Copy link
Owner

zhou13 commented Jun 11, 2020

Hi @yanconglin,

Thank you so much for your finding! I was also wondering the same problem for some time. I will investigate and update our results.

Best,
Yichao Zhou.

@zhou13 zhou13 reopened this Jun 11, 2020
@zhou13 zhou13 added the bug Something isn't working label Jun 11, 2020
@zhou13
Copy link
Owner

zhou13 commented May 4, 2021

@yanconglin We have fixed the problem and update the results of the arXiv paper. Thanks for your investigation!

@zhou13 zhou13 closed this as completed May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants