Why does storing module layers in arrays break the learning process? #532

Metritutus · 2022-02-23T16:40:13Z

This is all very new to me, so I apologise for using any incorrect terminology, and also if this is something fundamental that I've missed which is documented somewhere.

Using the MNIST example as a reference, I was attempting to create my own version of that, but found that it didn't seem to learn, and that accuracy was absolutely terrible (ie less than 11% with an average loss greater than 2).

I discovered, by experimentation, that it seemed to be failing because I had stored my layers in arrays for the module properties. For some reason this breaks it, and I am uncertain why.

This is how I had it before:

private class Model1 : Module
{
    private Conv2d[] ConvolutionLayers { get; } =   new[]
                                                    {
                                                        Conv2d(1, 32, 3),
                                                        Conv2d(32, 64, 3)
                                                    };

    private Linear[] FullyConnectedLayers { get; } =    new[]
                                                        {
                                                            Linear(9216, 128),
                                                            Linear(128, 10)
                                                        };

    private MaxPool2d Pool { get; } = MaxPool2d(new long[] { 2, 2 });

    private ReLU[] ReLUs { get; } = new[]
                                    {
                                        ReLU(),
                                        ReLU(),
                                        ReLU()
                                    };

    private Dropout[] Dropouts { get; } =   new[]
                                            {
                                                Dropout(0.25),
                                                Dropout(0.5),
                                            };

    private Flatten Flatten { get; } = Flatten();

    private LogSoftmax LogSoftmax { get; } = LogSoftmax(1);

    private Module Layers { get; }

    public Model1(string name, torch.Device device)
        : base(name)
    {
        RegisterComponents();

        if (device?.type == DeviceType.CUDA)
        {
            to(device);
        }

        Layers = Sequential(ConvolutionLayers[0],
                            ReLUs[0],
                            ConvolutionLayers[1],
                            ReLUs[1],
                            Pool,
                            Dropouts[0],
                            Flatten,
                            FullyConnectedLayers[0],
                            ReLUs[2],
                            Dropouts[1],
                            FullyConnectedLayers[1],
                            LogSoftmax);

        //Layers = Sequential(("ConvolutionLayer0", ConvolutionLayers[0]),
        //                    ("ReLU0", ReLUs[0]),
        //                    ("ConvolutionLayer1", ConvolutionLayers[1]),
        //                    ("ReLU1", ReLUs[1]),
        //                    ("Pool", Pool),
        //                    ("Dropout0", Dropouts[0]),
        //                    ("Flatten", Flatten),
        //                    ("FullyConnectedLayer0", FullyConnectedLayers[0]),
        //                    ("ReLU2", ReLUs[2]),
        //                    ("Dropout1", Dropouts[1]),
        //                    ("FullyConnectedLayer1", FullyConnectedLayers[1]),
        //                    ("LogSoftmax", LogSoftmax));
    }

    public override torch.Tensor forward(torch.Tensor input)
    {
        return Layers.forward(input);
    }
}

And this is what works:

private class Model2 : Module
{
    private Conv2d ConvolutionLayer0 { get; } = Conv2d(1, 32, 3);
    private Conv2d ConvolutionLayer1 { get; } = Conv2d(32, 64, 3);

    private Linear FullyConnectedLayer0 { get; } = Linear(9216, 128);
    private Linear FullyConnectedLayer1 { get; } = Linear(128, 10);

    private MaxPool2d Pool { get; } = MaxPool2d(new long[] { 2, 2 });

    private ReLU ReLU0 { get; } = ReLU();
    private ReLU ReLU1 { get; } = ReLU();
    private ReLU ReLU2 { get; } = ReLU();

    private Dropout Dropout0 { get; } = Dropout(0.25);
    private Dropout Dropout1 { get; } = Dropout(0.5);

    private Flatten Flatten { get; } = Flatten();

    private LogSoftmax LogSoftmax { get; } = LogSoftmax(1);

    private Module Layers { get; }

    public Model2(string name, torch.Device device)
        : base(name)
    {
        RegisterComponents();

        if (device?.type == DeviceType.CUDA)
        {
            to(device);
        }

        Layers = Sequential(ConvolutionLayer0,
                            ReLU0,
                            ConvolutionLayer1,
                            ReLU1,
                            Pool,
                            Dropout0,
                            Flatten,
                            FullyConnectedLayer0,
                            ReLU2,
                            Dropout1,
                            FullyConnectedLayer1,
                            LogSoftmax);
    }

    public override torch.Tensor forward(torch.Tensor input)
    {
        return Layers.forward(input);
    }
}

This second model class results in an accuracy of greater than 98%, with an average loss of roughly 0.04.

I'm assuming there must be something fundamental that I'm missing, but I'm not sure what. When I looked at the documentation on modules, it mentions "Note that the field names in the module class correspond to the names that were passed in to the Sequential constructor in the earlier section.".

I'm not sure what earlier section it refers to (if it's another Help article, the Memory one only mentions Sequential, but does not demonstrate it). And even with that said, even if I specify the names manually (in the commented-out section of Model1), it still doesn't seem to work correctly.

The text was updated successfully, but these errors were encountered:

NiklasGustafsson · 2022-02-23T17:05:22Z

@Metritutus -- it's because "RegisterComponents" doesn't pick up on arrays, or lists, or any of the built-in collections as somewhere to look for modules. In the first case, the parameters are simply not being trained, so you get pretty much random behavior.

Use ModuleList or ModuleDict, instead of standard .NET collections. That's precisely what they are meant to do.

NiklasGustafsson · 2022-02-23T17:20:32Z

@Metritutus -- can you try your original code, but call 'RegisterComponents()' and the move to CUDA after the creation of 'Layers'? Sequential doesn't look like it's getting registered, either, so when you call 'parameters()' on the module, it's likely an empty collection.

NiklasGustafsson · 2022-02-23T17:27:12Z

Also, and this has nothing to do with the issue you're seeing -- if you're going to stick the layers in a 'Sequential', there's really no reason to declare them as fields unless you want them for debugging purposes, but if that's the reason, Sequential often gets in the way, since you can't single-step the layers and see what is passed between them.

NiklasGustafsson · 2022-02-23T18:24:40Z

@Metritutus -- your repro led me to another error -- double-reporting of parameters when the Sequential and its component modules are both registered.

Metritutus · 2022-02-23T19:30:52Z

@Metritutus -- it's because "RegisterComponents" doesn't pick up on arrays, or lists, or any of the built-in collections as somewhere to look for modules. In the first case, the parameters are simply not being trained, so you get pretty much random behavior.

Use ModuleList or ModuleDict, instead of standard .NET collections. That's precisely what they are meant to do.

Ah, this explains it!

Also, and this has nothing to do with the issue you're seeing -- if you're going to stick the layers in a 'Sequential', there's really no reason to declare them as fields unless you want them for debugging purposes, but if that's the reason, Sequential often gets in the way, since you can't single-step the layers and see what is passed between them.

Ah, this was because I was merely trying to organise the layer types. I hadn't even reached the point of thinking about debugging through the individual layers as I don't have a sufficient understanding of how this all works yet. The use of Sequential() meant that I didn't have to faff around with manually passing in the outputs of the different forward() calls.

@Metritutus -- can you try your original code, but call 'RegisterComponents()' and the move to CUDA after the creation of 'Layers'? Sequential doesn't look like it's getting registered, either, so when you call 'parameters()' on the module, it's likely an empty collection.

This did indeed have a positive impact on the issue (without updating the arrays to be ModuleList or ModuleDict).

@Metritutus -- your repro led me to another error -- double-reporting of parameters when the Sequential and its component modules are both registered.

Is this implying that if I use Sequential() that I should not call RegisterComponents()?

NiklasGustafsson · 2022-02-23T19:38:13Z

Is this implying that if I use Sequential() that I should not call RegisterComponents()?

No, it's implying that if you choose (and, as I said, there's no real reason to do so, but it's perfectly legal) to include both the layers and the Sequential as fields in the module, then TorchSharp should not be listing parameters twice, as it does now. There's a bug there, which I have a fix for.

Sequential does allow you not to "faff around" with forwarding data. The only reason not to use it, ever, is that you want to debug the model and see what is passed around. Or, you may have a model that is more complex and does not just pass data in a linear fashion from first to last. Anyway, if you do use Sequential, there's no reason to also list the components as fields for the purpose of implementing forward(). There may be other reasons, such as manipulating their weights, or forming optimizer parameter groups from subsets (in a coming release) of the modules rather than all of them.

NiklasGustafsson · 2022-02-23T19:55:54Z

This did indeed have a positive impact on the issue (without updating the arrays to be ModuleList or ModuleDict).

That's because the Sequential module gets registered.

Metritutus · 2022-02-23T20:34:19Z

@Metritutus please change @Metritutus within the quotes to @NiklasGustafsson => not to confuse the next persons reading that. :-)

Surely this would be disingenuous? It is clearly quoting a reply from another ~~(ie you)~~ who directly referenced me! Altering a quoted message so that it does not match the original message is never wise.

The only reason not to use it, ever, is that you want to debug the model and see what is passed around.

I confess I couldn't find a good example of it being used with a model class. I was attempting to follow the MNIST example, which did not use it. I first found mention of it in the Memory article, but it held no example. The 'faff' I referred to is how each output from forward() calls is manually passed through to the next layer. use of Sequentual() seemed to be a convenient and safe way to avoid this.

This did indeed have a positive impact on the issue (without updating the arrays to be ModuleList or ModuleDict).

That's because the Sequential module gets registered.

Indeed, you inferred this earlier!

Anyway, my issue has been resolved as a result of the suggestions above, so I'll close this now. Cheers!

NiklasGustafsson · 2022-02-23T20:57:41Z

Surely this would be disingenuous.

@Metritutus -- I agree, and it wasn't I who suggested it.

Metritutus · 2022-02-23T22:22:54Z

Surely this would be disingenuous.

@Metritutus -- I agree, and it wasn't I who suggested it.

Ack, my sincere apologies!

Metritutus changed the title ~~Why does storing model layers in arrays break the learning process?~~ Why does storing module layers in arrays break the learning process? Feb 23, 2022

NiklasGustafsson linked a pull request Feb 23, 2022 that will close this issue

Fixed double-counting of parameters when a module is referenced in more than one way. #534

Merged

Metritutus closed this as completed Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does storing module layers in arrays break the learning process? #532

Why does storing module layers in arrays break the learning process? #532

Metritutus commented Feb 23, 2022

NiklasGustafsson commented Feb 23, 2022

NiklasGustafsson commented Feb 23, 2022 •

edited

Loading

NiklasGustafsson commented Feb 23, 2022

NiklasGustafsson commented Feb 23, 2022

Metritutus commented Feb 23, 2022

NiklasGustafsson commented Feb 23, 2022 •

edited

Loading

NiklasGustafsson commented Feb 23, 2022

Metritutus commented Feb 23, 2022 •

edited

Loading

NiklasGustafsson commented Feb 23, 2022 •

edited

Loading

Metritutus commented Feb 23, 2022

Why does storing module layers in arrays break the learning process? #532

Why does storing module layers in arrays break the learning process? #532

Comments

Metritutus commented Feb 23, 2022

NiklasGustafsson commented Feb 23, 2022

NiklasGustafsson commented Feb 23, 2022 • edited Loading

NiklasGustafsson commented Feb 23, 2022

NiklasGustafsson commented Feb 23, 2022

Metritutus commented Feb 23, 2022

NiklasGustafsson commented Feb 23, 2022 • edited Loading

NiklasGustafsson commented Feb 23, 2022

Metritutus commented Feb 23, 2022 • edited Loading

NiklasGustafsson commented Feb 23, 2022 • edited Loading

Metritutus commented Feb 23, 2022

NiklasGustafsson commented Feb 23, 2022 •

edited

Loading

NiklasGustafsson commented Feb 23, 2022 •

edited

Loading

Metritutus commented Feb 23, 2022 •

edited

Loading

NiklasGustafsson commented Feb 23, 2022 •

edited

Loading