All architectures with `rgb_mean = (0.5, 0.5, 0.5)` are incompatible #73

RunDevelopment · 2024-05-30T13:19:23Z

Hey musl.

Some people on Discord recently asked whether 417432c was a breaking change. I always thought that it didn't matter, but I was wrong. For some architectures, this changes the results only slightly, others drastically change.

So please (1) document all of these changes and (2) make them detectable for spandrel.

As for how to make them detectable: I would suggest adding a neosr_version tensor to each model that just contains a single int32. This int is the version number of neosr changes. The idea is that each of those changes you made is essentially a new version of the architecture. This number just tracks the version and allows others to detect it.

Of course, this addition is unnecessary for architectures you created or did not modify.

The text was updated successfully, but these errors were encountered:

neosr-project · 2024-05-30T16:31:43Z

I documented it now. Please send an example code of how you think it would be better to add such tensor for detection and I can add it to the modified networks. Would something like this work for you?

self.register_buffer("neosr", torch.zeros(1))

Just to make it clear: this change to the rgb mean values was needed to improve stability. Several datasets were giving unstable training, and after debugging for several hours I found that the original values was the cause of it. The reason is that the default values were copied directly from SwinIR code, and they used ImageNet1k rgb mean values. This means that datasets that do not have values close to these (none, not even the classic dataset used in SISR research, DIV2k) will cause extreme values, making training unstable.
So this was not a 'random change' at all, it was made for a reason. The alternative to this solution would be to remove normalization altogether, which would completely break all models. Simply changing it to neutral values was the best way I found while keeping compatibility.

RunDevelopment · 2024-05-30T20:46:52Z

I documented it now.

Thanks musl!

Just to make it clear: this change to the rgb mean values was needed to improve stability.

You have already made numerous improvements that objectively make these architectures better. I'm not suggesting for you to stop doing that, we simply need to find a way to make it possible for spandrel and other to detect these changes.

Please send an example code of how you think it would be better to add such tensor for detection and I can add it to the modified networks.

Some on EE suggested an interesting solution that would ensure that we'll never have this problem again: put all changes behind new hyperparameters and store those extra hyperparameters in the model.

So we make rgb_mean a hyperparameter and store it inside a dict that contains all hyperparameters that relate to modifications you made. We then JSON serialize that dict and put inside an uint8 tensor that we store in the model (let's called this tensor neosr_params for now). Spandrel and others can then look for this tensor and apply the same changes as neosr. Of course, if neosr doesn't make any modification, we don't need to store neosr_params at all.

This would also allow you to add a "compatibility mode" to neosr. While neosr use the best hyperparameters by default (so e.g. rgb_mean=(0.5, ...)), compatibility mode would revert back to the hyperparameters the original arch uses.

I suggest this solution, because it will make it easier for us both. When you add a new modification, you just need to add a new hyperparameter and add it to a dict. No more tensors for each new parameters (e.g. SPAN no_norm). We can then detect the change and implement the modification. But most importantly, we can detect that a change has been made even if we don't support this change. This will allow spandrel to reject loading models that it don't support.

What do you think of doing something like this?

As for code: I would implement a decorator to make this feature easy to use. Like this:

@ARCH_REGISTRY.register()
@modifications
class atd(nn.Module):
    defaults_for_modifications = {"rgb_mean": (0.4488, 0.4371, 0.4040)}

    def __init__(self, embed_dim=210, ..., rgb_mean=(0.5, 0.5, 0.5)):
        # normal init

The modifications decorator would then change the constructor to store modifications in neosr_params if they differ from the default values defined in defaults_for_modifications. So it wouldn't store anything if you create an ATD model with rgb_mean=(0.4488, 0.4371, 0.4040).

Implementing @modifications also shouldn't be difficult. It's similar to @store_hyperparameters I made for spandrel, so we can use it as a reference.

adegerard · 2024-05-30T21:17:18Z

So we make rgb_mean a hyperparameter and store it inside a dict that contains all hyperparameters that relate to modifications you made. We then JSON serialize that dict and put inside an uint8 tensor that we store in the model (let's called this tensor neosr_params for now). Spandrel and others can then look for this tensor and apply the same changes as neosr. Of course, if neosr doesn't make any modification, we don't need to store neosr_params at all.

why using a tensor to store metadata you can directly save in the state_dict?

Mixing metadata and tensors is not a good idea.

for example:

            metadata = {
                'datetime': datetime.strptime(
                    time.strftime("%Y-%m-%dT%H:%M:%S%z", time.localtime()),
                    '%Y-%m-%dT%H:%M:%S%z').isoformat(),
                'an_int': 3,
                'a_tuple': (2,3,4),
                'a_dict': dict(a=1,b=2,c=3)
            }
            state_dict[f'metadata'] = json.dumps(metadata)

RunDevelopment · 2024-05-30T21:20:29Z

why using a tensor to store metadata you can directly save in the state_dict?

safetensors.

Mixing metadata and tensors is not a good idea.

Why?

adegerard · 2024-05-30T21:39:23Z

safetensors

ok. there is a _ _ metadata _ _ field in the file format but I can understand that's easier to implement a single solution for both pth and saftetensors

Why?

Will the current official implementation still work with a model that has this new tensor?

joeyballentine · 2024-05-30T21:53:10Z

Will the current official implementation still work with a model that has this new tensor?

So long as you turn strict mode off, it would just discard the extra data. The actual changes themselves would break it, but that's gonna happen regardless since, well, the arch was changed

RunDevelopment · 2024-05-31T11:05:00Z

Will the current official implementation still work with a model that has this new tensor?

As Joey said, if the official arch code load the model with strict=True, then the additional tensor will prevent the model from loading. This is the intended effect, because a model with this tensor is incompatible.

I also want to point out that only parameters that differ from the values in the official arch will be stored. So e.g. an ATD model with rgb_mean=(0.4488, 0.4371, 0.4040) will not have the extra tensor, and so official arch code is guaranteed to load it in strict mode.

neosr-project · 2024-06-05T17:39:25Z

@RunDevelopment sorry for taking so long to answer.

As for code: I would implement a decorator to make this feature easy to use. Like this:

I made a commit here, please take a look. What would you suggest? I didn't modify the original decorator, feel free to suggest anything. Honestly, as long as the solution is not too convoluted, you can implement this as you see fit.

ps: ignore the other changes, check the end of the commit

neosr-project · 2024-06-21T08:23:23Z

Any update on this @RunDevelopment ?

neosr-project added the documentation Improvements or additions to documentation label May 30, 2024

neosr-project added the wontfix This will not be worked on label Jul 13, 2024

neosr-project closed this as completed Jul 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All architectures with `rgb_mean = (0.5, 0.5, 0.5)` are incompatible #73

All architectures with `rgb_mean = (0.5, 0.5, 0.5)` are incompatible #73

RunDevelopment commented May 30, 2024 •

edited

Loading

neosr-project commented May 30, 2024

RunDevelopment commented May 30, 2024

adegerard commented May 30, 2024 •

edited

Loading

RunDevelopment commented May 30, 2024

adegerard commented May 30, 2024 •

edited

Loading

joeyballentine commented May 30, 2024 •

edited

Loading

RunDevelopment commented May 31, 2024

neosr-project commented Jun 5, 2024 •

edited

Loading

neosr-project commented Jun 21, 2024

All architectures with rgb_mean = (0.5, 0.5, 0.5) are incompatible #73

All architectures with rgb_mean = (0.5, 0.5, 0.5) are incompatible #73

Comments

RunDevelopment commented May 30, 2024 • edited Loading

neosr-project commented May 30, 2024

RunDevelopment commented May 30, 2024

adegerard commented May 30, 2024 • edited Loading

RunDevelopment commented May 30, 2024

adegerard commented May 30, 2024 • edited Loading

joeyballentine commented May 30, 2024 • edited Loading

RunDevelopment commented May 31, 2024

neosr-project commented Jun 5, 2024 • edited Loading

neosr-project commented Jun 21, 2024

All architectures with `rgb_mean = (0.5, 0.5, 0.5)` are incompatible #73

All architectures with `rgb_mean = (0.5, 0.5, 0.5)` are incompatible #73

RunDevelopment commented May 30, 2024 •

edited

Loading

adegerard commented May 30, 2024 •

edited

Loading

adegerard commented May 30, 2024 •

edited

Loading

joeyballentine commented May 30, 2024 •

edited

Loading

neosr-project commented Jun 5, 2024 •

edited

Loading