Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Koniq10k dataloader do resize to (224, 224) and then apply transform with random crop? Why? #34

Open
KarenAssaraf opened this issue May 7, 2023 · 6 comments

Comments

@KarenAssaraf
Copy link

Hey!
I see in this line:

transform=transforms.Compose([RandCrop(patch_size=config.crop_size),
, that when training on koniq10k, each image is first resized to (224, 224).
Then you apply transform function that contains a random crop to size (224, 224).
Unless I miss something does the original image has to be resized?

Thanks!

@Stephen0808
Copy link
Collaborator

We select vision transformer as our feature extractor, which means the input images should be resized to the fixed image size(224X224).

@KarenAssaraf
Copy link
Author

KarenAssaraf commented May 8, 2023

Hi @Stephen0808 !
Thanks for your fast answer. But then what is the effect of random crop in the transform function of the dataloader?

Also, it means all Koniq10k images, which are initially, full resolution are resized to (224, 224). We loose the information of the full resolution image quality. (Usually IQA tramnsformers try to avoid resizing and leverage transformers architecture to accept different input sizes)
Wouldn't it be better performance if we random crop first koniq10k dataset images, and then send to vit?
lets say we make sure at least 20 crops are in the dataset for each image, and each crop would have the label of the full resolution image for the training.
It would mean training on size(koniq10)*num_of_crops images.

What do you think?

@Stephen0808
Copy link
Collaborator

As mentioned in your question, we crop several images(224X224) for inference and average the scores to get the final score.

@KarenAssaraf
Copy link
Author

KarenAssaraf commented May 8, 2023

I mean:

  • in inference, the crop would be of high quality (there's no resize of the original inference image)
  • in training, we do affect the quality since it's the resized image that is forwarded to the model, and not a crop from the original full resolution training mage.

the question is: why not same process for training/inference?

@Stephen0808
Copy link
Collaborator

Both in the inference and training phases, we used cropped images.

@KarenAssaraf
Copy link
Author

KarenAssaraf commented May 8, 2023

Ok maybe I misunderstood something.
From my understanding:
in training, there is :

  1. resizing (so it's not a crop and it affects image quality) from initial resolution to (224, 224)
  2. and then crop from (224, 224) to (224, 224). Which does nothing since the input is already (224,224)? correct?

So I was wondering why instead of step1, there are not several crops then send this crops to vit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants