Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does storing module layers in arrays break the learning process? #532

Closed
Metritutus opened this issue Feb 23, 2022 · 10 comments · Fixed by #534
Closed

Why does storing module layers in arrays break the learning process? #532

Metritutus opened this issue Feb 23, 2022 · 10 comments · Fixed by #534

Comments

@Metritutus
Copy link

This is all very new to me, so I apologise for using any incorrect terminology, and also if this is something fundamental that I've missed which is documented somewhere.

Using the MNIST example as a reference, I was attempting to create my own version of that, but found that it didn't seem to learn, and that accuracy was absolutely terrible (ie less than 11% with an average loss greater than 2).

I discovered, by experimentation, that it seemed to be failing because I had stored my layers in arrays for the module properties. For some reason this breaks it, and I am uncertain why.

This is how I had it before:

private class Model1 : Module
{
    private Conv2d[] ConvolutionLayers { get; } =   new[]
                                                    {
                                                        Conv2d(1, 32, 3),
                                                        Conv2d(32, 64, 3)
                                                    };

    private Linear[] FullyConnectedLayers { get; } =    new[]
                                                        {
                                                            Linear(9216, 128),
                                                            Linear(128, 10)
                                                        };

    private MaxPool2d Pool { get; } = MaxPool2d(new long[] { 2, 2 });

    private ReLU[] ReLUs { get; } = new[]
                                    {
                                        ReLU(),
                                        ReLU(),
                                        ReLU()
                                    };

    private Dropout[] Dropouts { get; } =   new[]
                                            {
                                                Dropout(0.25),
                                                Dropout(0.5),
                                            };

    private Flatten Flatten { get; } = Flatten();

    private LogSoftmax LogSoftmax { get; } = LogSoftmax(1);

    private Module Layers { get; }

    public Model1(string name, torch.Device device)
        : base(name)
    {
        RegisterComponents();

        if (device?.type == DeviceType.CUDA)
        {
            to(device);
        }

        Layers = Sequential(ConvolutionLayers[0],
                            ReLUs[0],
                            ConvolutionLayers[1],
                            ReLUs[1],
                            Pool,
                            Dropouts[0],
                            Flatten,
                            FullyConnectedLayers[0],
                            ReLUs[2],
                            Dropouts[1],
                            FullyConnectedLayers[1],
                            LogSoftmax);

        //Layers = Sequential(("ConvolutionLayer0", ConvolutionLayers[0]),
        //                    ("ReLU0", ReLUs[0]),
        //                    ("ConvolutionLayer1", ConvolutionLayers[1]),
        //                    ("ReLU1", ReLUs[1]),
        //                    ("Pool", Pool),
        //                    ("Dropout0", Dropouts[0]),
        //                    ("Flatten", Flatten),
        //                    ("FullyConnectedLayer0", FullyConnectedLayers[0]),
        //                    ("ReLU2", ReLUs[2]),
        //                    ("Dropout1", Dropouts[1]),
        //                    ("FullyConnectedLayer1", FullyConnectedLayers[1]),
        //                    ("LogSoftmax", LogSoftmax));
    }

    public override torch.Tensor forward(torch.Tensor input)
    {
        return Layers.forward(input);
    }
}

And this is what works:

private class Model2 : Module
{
    private Conv2d ConvolutionLayer0 { get; } = Conv2d(1, 32, 3);
    private Conv2d ConvolutionLayer1 { get; } = Conv2d(32, 64, 3);

    private Linear FullyConnectedLayer0 { get; } = Linear(9216, 128);
    private Linear FullyConnectedLayer1 { get; } = Linear(128, 10);

    private MaxPool2d Pool { get; } = MaxPool2d(new long[] { 2, 2 });

    private ReLU ReLU0 { get; } = ReLU();
    private ReLU ReLU1 { get; } = ReLU();
    private ReLU ReLU2 { get; } = ReLU();

    private Dropout Dropout0 { get; } = Dropout(0.25);
    private Dropout Dropout1 { get; } = Dropout(0.5);

    private Flatten Flatten { get; } = Flatten();

    private LogSoftmax LogSoftmax { get; } = LogSoftmax(1);

    private Module Layers { get; }

    public Model2(string name, torch.Device device)
        : base(name)
    {
        RegisterComponents();

        if (device?.type == DeviceType.CUDA)
        {
            to(device);
        }

        Layers = Sequential(ConvolutionLayer0,
                            ReLU0,
                            ConvolutionLayer1,
                            ReLU1,
                            Pool,
                            Dropout0,
                            Flatten,
                            FullyConnectedLayer0,
                            ReLU2,
                            Dropout1,
                            FullyConnectedLayer1,
                            LogSoftmax);
    }

    public override torch.Tensor forward(torch.Tensor input)
    {
        return Layers.forward(input);
    }
}

This second model class results in an accuracy of greater than 98%, with an average loss of roughly 0.04.

I'm assuming there must be something fundamental that I'm missing, but I'm not sure what. When I looked at the documentation on modules, it mentions "Note that the field names in the module class correspond to the names that were passed in to the Sequential constructor in the earlier section.".

I'm not sure what earlier section it refers to (if it's another Help article, the Memory one only mentions Sequential, but does not demonstrate it). And even with that said, even if I specify the names manually (in the commented-out section of Model1), it still doesn't seem to work correctly.

@Metritutus Metritutus changed the title Why does storing model layers in arrays break the learning process? Why does storing module layers in arrays break the learning process? Feb 23, 2022
@NiklasGustafsson
Copy link
Contributor

@Metritutus -- it's because "RegisterComponents" doesn't pick up on arrays, or lists, or any of the built-in collections as somewhere to look for modules. In the first case, the parameters are simply not being trained, so you get pretty much random behavior.

Use ModuleList or ModuleDict, instead of standard .NET collections. That's precisely what they are meant to do.

@NiklasGustafsson
Copy link
Contributor

NiklasGustafsson commented Feb 23, 2022

@Metritutus -- can you try your original code, but call 'RegisterComponents()' and the move to CUDA after the creation of 'Layers'? Sequential doesn't look like it's getting registered, either, so when you call 'parameters()' on the module, it's likely an empty collection.

@NiklasGustafsson
Copy link
Contributor

Also, and this has nothing to do with the issue you're seeing -- if you're going to stick the layers in a 'Sequential', there's really no reason to declare them as fields unless you want them for debugging purposes, but if that's the reason, Sequential often gets in the way, since you can't single-step the layers and see what is passed between them.

@NiklasGustafsson
Copy link
Contributor

@Metritutus -- your repro led me to another error -- double-reporting of parameters when the Sequential and its component modules are both registered.

@Metritutus
Copy link
Author

@Metritutus -- it's because "RegisterComponents" doesn't pick up on arrays, or lists, or any of the built-in collections as somewhere to look for modules. In the first case, the parameters are simply not being trained, so you get pretty much random behavior.

Use ModuleList or ModuleDict, instead of standard .NET collections. That's precisely what they are meant to do.

Ah, this explains it!

Also, and this has nothing to do with the issue you're seeing -- if you're going to stick the layers in a 'Sequential', there's really no reason to declare them as fields unless you want them for debugging purposes, but if that's the reason, Sequential often gets in the way, since you can't single-step the layers and see what is passed between them.

Ah, this was because I was merely trying to organise the layer types. I hadn't even reached the point of thinking about debugging through the individual layers as I don't have a sufficient understanding of how this all works yet. The use of Sequential() meant that I didn't have to faff around with manually passing in the outputs of the different forward() calls.

@Metritutus -- can you try your original code, but call 'RegisterComponents()' and the move to CUDA after the creation of 'Layers'? Sequential doesn't look like it's getting registered, either, so when you call 'parameters()' on the module, it's likely an empty collection.

This did indeed have a positive impact on the issue (without updating the arrays to be ModuleList or ModuleDict).

@Metritutus -- your repro led me to another error -- double-reporting of parameters when the Sequential and its component modules are both registered.

Is this implying that if I use Sequential() that I should not call RegisterComponents()?

@NiklasGustafsson
Copy link
Contributor

NiklasGustafsson commented Feb 23, 2022

Is this implying that if I use Sequential() that I should not call RegisterComponents()?

No, it's implying that if you choose (and, as I said, there's no real reason to do so, but it's perfectly legal) to include both the layers and the Sequential as fields in the module, then TorchSharp should not be listing parameters twice, as it does now. There's a bug there, which I have a fix for.

Sequential does allow you not to "faff around" with forwarding data. The only reason not to use it, ever, is that you want to debug the model and see what is passed around. Or, you may have a model that is more complex and does not just pass data in a linear fashion from first to last. Anyway, if you do use Sequential, there's no reason to also list the components as fields for the purpose of implementing forward(). There may be other reasons, such as manipulating their weights, or forming optimizer parameter groups from subsets (in a coming release) of the modules rather than all of them.

@NiklasGustafsson
Copy link
Contributor

This did indeed have a positive impact on the issue (without updating the arrays to be ModuleList or ModuleDict).

That's because the Sequential module gets registered.

@Metritutus
Copy link
Author

Metritutus commented Feb 23, 2022

@Metritutus please change @Metritutus within the quotes to @NiklasGustafsson => not to confuse the next persons reading that. :-)

Surely this would be disingenuous? It is clearly quoting a reply from another (ie you) who directly referenced me! Altering a quoted message so that it does not match the original message is never wise.

The only reason not to use it, ever, is that you want to debug the model and see what is passed around.

I confess I couldn't find a good example of it being used with a model class. I was attempting to follow the MNIST example, which did not use it. I first found mention of it in the Memory article, but it held no example. The 'faff' I referred to is how each output from forward() calls is manually passed through to the next layer. use of Sequentual() seemed to be a convenient and safe way to avoid this.

This did indeed have a positive impact on the issue (without updating the arrays to be ModuleList or ModuleDict).

That's because the Sequential module gets registered.

Indeed, you inferred this earlier!

Anyway, my issue has been resolved as a result of the suggestions above, so I'll close this now. Cheers!

@NiklasGustafsson
Copy link
Contributor

NiklasGustafsson commented Feb 23, 2022

Surely this would be disingenuous.

@Metritutus -- I agree, and it wasn't I who suggested it.

@Metritutus
Copy link
Author

Surely this would be disingenuous.

@Metritutus -- I agree, and it wasn't I who suggested it.

Ack, my sincere apologies!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants