-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All architectures with rgb_mean = (0.5, 0.5, 0.5)
are incompatible
#73
Comments
I documented it now. Please send an example code of how you think it would be better to add such tensor for detection and I can add it to the modified networks. Would something like this work for you?
Just to make it clear: this change to the rgb mean values was needed to improve stability. Several datasets were giving unstable training, and after debugging for several hours I found that the original values was the cause of it. The reason is that the default values were copied directly from SwinIR code, and they used ImageNet1k rgb mean values. This means that datasets that do not have values close to these (none, not even the classic dataset used in SISR research, DIV2k) will cause extreme values, making training unstable. |
Thanks musl!
You have already made numerous improvements that objectively make these architectures better. I'm not suggesting for you to stop doing that, we simply need to find a way to make it possible for spandrel and other to detect these changes.
Some on EE suggested an interesting solution that would ensure that we'll never have this problem again: put all changes behind new hyperparameters and store those extra hyperparameters in the model. So we make This would also allow you to add a "compatibility mode" to neosr. While neosr use the best hyperparameters by default (so e.g. I suggest this solution, because it will make it easier for us both. When you add a new modification, you just need to add a new hyperparameter and add it to a dict. No more tensors for each new parameters (e.g. SPAN What do you think of doing something like this? As for code: I would implement a decorator to make this feature easy to use. Like this: @ARCH_REGISTRY.register()
@modifications
class atd(nn.Module):
defaults_for_modifications = {"rgb_mean": (0.4488, 0.4371, 0.4040)}
def __init__(self, embed_dim=210, ..., rgb_mean=(0.5, 0.5, 0.5)):
# normal init The Implementing |
safetensors.
Why? |
ok. there is a _ _ metadata _ _ field in the file format but I can understand that's easier to implement a single solution for both pth and saftetensors
Will the current official implementation still work with a model that has this new tensor? |
So long as you turn strict mode off, it would just discard the extra data. The actual changes themselves would break it, but that's gonna happen regardless since, well, the arch was changed |
As Joey said, if the official arch code load the model with I also want to point out that only parameters that differ from the values in the official arch will be stored. So e.g. an ATD model with |
@RunDevelopment sorry for taking so long to answer.
I made a commit here, please take a look. What would you suggest? I didn't modify the original decorator, feel free to suggest anything. Honestly, as long as the solution is not too convoluted, you can implement this as you see fit. ps: ignore the other changes, check the end of the commit |
Any update on this @RunDevelopment ? |
Hey musl.
Some people on Discord recently asked whether 417432c was a breaking change. I always thought that it didn't matter, but I was wrong. For some architectures, this changes the results only slightly, others drastically change.
So please (1) document all of these changes and (2) make them detectable for spandrel.
As for how to make them detectable: I would suggest adding a
neosr_version
tensor to each model that just contains a single int32. This int is the version number of neosr changes. The idea is that each of those changes you made is essentially a new version of the architecture. This number just tracks the version and allows others to detect it.Of course, this addition is unnecessary for architectures you created or did not modify.
The text was updated successfully, but these errors were encountered: