Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All architectures with rgb_mean = (0.5, 0.5, 0.5) are incompatible #73

Closed
RunDevelopment opened this issue May 30, 2024 · 9 comments
Closed
Labels
documentation Improvements or additions to documentation wontfix This will not be worked on

Comments

@RunDevelopment
Copy link
Contributor

RunDevelopment commented May 30, 2024

Hey musl.

Some people on Discord recently asked whether 417432c was a breaking change. I always thought that it didn't matter, but I was wrong. For some architectures, this changes the results only slightly, others drastically change.

So please (1) document all of these changes and (2) make them detectable for spandrel.


As for how to make them detectable: I would suggest adding a neosr_version tensor to each model that just contains a single int32. This int is the version number of neosr changes. The idea is that each of those changes you made is essentially a new version of the architecture. This number just tracks the version and allows others to detect it.

Of course, this addition is unnecessary for architectures you created or did not modify.

@neosr-project
Copy link
Owner

I documented it now. Please send an example code of how you think it would be better to add such tensor for detection and I can add it to the modified networks. Would something like this work for you?

self.register_buffer("neosr", torch.zeros(1))

Just to make it clear: this change to the rgb mean values was needed to improve stability. Several datasets were giving unstable training, and after debugging for several hours I found that the original values was the cause of it. The reason is that the default values were copied directly from SwinIR code, and they used ImageNet1k rgb mean values. This means that datasets that do not have values close to these (none, not even the classic dataset used in SISR research, DIV2k) will cause extreme values, making training unstable.
So this was not a 'random change' at all, it was made for a reason. The alternative to this solution would be to remove normalization altogether, which would completely break all models. Simply changing it to neutral values was the best way I found while keeping compatibility.

@neosr-project neosr-project added the documentation Improvements or additions to documentation label May 30, 2024
@RunDevelopment
Copy link
Contributor Author

I documented it now.

Thanks musl!

Just to make it clear: this change to the rgb mean values was needed to improve stability.

You have already made numerous improvements that objectively make these architectures better. I'm not suggesting for you to stop doing that, we simply need to find a way to make it possible for spandrel and other to detect these changes.

Please send an example code of how you think it would be better to add such tensor for detection and I can add it to the modified networks.

Some on EE suggested an interesting solution that would ensure that we'll never have this problem again: put all changes behind new hyperparameters and store those extra hyperparameters in the model.

So we make rgb_mean a hyperparameter and store it inside a dict that contains all hyperparameters that relate to modifications you made. We then JSON serialize that dict and put inside an uint8 tensor that we store in the model (let's called this tensor neosr_params for now). Spandrel and others can then look for this tensor and apply the same changes as neosr. Of course, if neosr doesn't make any modification, we don't need to store neosr_params at all.

This would also allow you to add a "compatibility mode" to neosr. While neosr use the best hyperparameters by default (so e.g. rgb_mean=(0.5, ...)), compatibility mode would revert back to the hyperparameters the original arch uses.

I suggest this solution, because it will make it easier for us both. When you add a new modification, you just need to add a new hyperparameter and add it to a dict. No more tensors for each new parameters (e.g. SPAN no_norm). We can then detect the change and implement the modification. But most importantly, we can detect that a change has been made even if we don't support this change. This will allow spandrel to reject loading models that it don't support.

What do you think of doing something like this?


As for code: I would implement a decorator to make this feature easy to use. Like this:

@ARCH_REGISTRY.register()
@modifications
class atd(nn.Module):
    defaults_for_modifications = {"rgb_mean": (0.4488, 0.4371, 0.4040)}

    def __init__(self, embed_dim=210, ..., rgb_mean=(0.5, 0.5, 0.5)):
        # normal init

The modifications decorator would then change the constructor to store modifications in neosr_params if they differ from the default values defined in defaults_for_modifications. So it wouldn't store anything if you create an ATD model with rgb_mean=(0.4488, 0.4371, 0.4040).

Implementing @modifications also shouldn't be difficult. It's similar to @store_hyperparameters I made for spandrel, so we can use it as a reference.

@adegerard
Copy link

adegerard commented May 30, 2024

So we make rgb_mean a hyperparameter and store it inside a dict that contains all hyperparameters that relate to modifications you made. We then JSON serialize that dict and put inside an uint8 tensor that we store in the model (let's called this tensor neosr_params for now). Spandrel and others can then look for this tensor and apply the same changes as neosr. Of course, if neosr doesn't make any modification, we don't need to store neosr_params at all.

why using a tensor to store metadata you can directly save in the state_dict?
image
Mixing metadata and tensors is not a good idea.

for example:

            metadata = {
                'datetime': datetime.strptime(
                    time.strftime("%Y-%m-%dT%H:%M:%S%z", time.localtime()),
                    '%Y-%m-%dT%H:%M:%S%z').isoformat(),
                'an_int': 3,
                'a_tuple': (2,3,4),
                'a_dict': dict(a=1,b=2,c=3)
            }
            state_dict[f'metadata'] = json.dumps(metadata)

@RunDevelopment
Copy link
Contributor Author

why using a tensor to store metadata you can directly save in the state_dict?

safetensors.

Mixing metadata and tensors is not a good idea.

Why?

@adegerard
Copy link

adegerard commented May 30, 2024

safetensors

ok. there is a _ _ metadata _ _ field in the file format but I can understand that's easier to implement a single solution for both pth and saftetensors

Why?

Will the current official implementation still work with a model that has this new tensor?

@joeyballentine
Copy link
Contributor

joeyballentine commented May 30, 2024

Will the current official implementation still work with a model that has this new tensor?

So long as you turn strict mode off, it would just discard the extra data. The actual changes themselves would break it, but that's gonna happen regardless since, well, the arch was changed

@RunDevelopment
Copy link
Contributor Author

Will the current official implementation still work with a model that has this new tensor?

As Joey said, if the official arch code load the model with strict=True, then the additional tensor will prevent the model from loading. This is the intended effect, because a model with this tensor is incompatible.

I also want to point out that only parameters that differ from the values in the official arch will be stored. So e.g. an ATD model with rgb_mean=(0.4488, 0.4371, 0.4040) will not have the extra tensor, and so official arch code is guaranteed to load it in strict mode.

@neosr-project
Copy link
Owner

neosr-project commented Jun 5, 2024

@RunDevelopment sorry for taking so long to answer.

As for code: I would implement a decorator to make this feature easy to use. Like this:

I made a commit here, please take a look. What would you suggest? I didn't modify the original decorator, feel free to suggest anything. Honestly, as long as the solution is not too convoluted, you can implement this as you see fit.

ps: ignore the other changes, check the end of the commit

@neosr-project
Copy link
Owner

Any update on this @RunDevelopment ?

@neosr-project neosr-project added the wontfix This will not be worked on label Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants