Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training metrics #100

Closed
rcmalli opened this issue Aug 11, 2019 · 7 comments
Closed

Training metrics #100

rcmalli opened this issue Aug 11, 2019 · 7 comments
Labels
feature Is an improvement or enhancement help wanted Open to be worked on

Comments

@rcmalli
Copy link

rcmalli commented Aug 11, 2019

Should we have training accuracy calculation automated?

Currently I am handling like this

class Model(ptl.LightningModule):

    def __init__(self,):
        super(AdvTrainModel, self).__init__()
        self.training_correct_counter = 0

    def training_step(self, batch, batch_nb):
        #...
        if batch_nb == 0:
            self.training_correct_counter = (torch.max(y_hat, 1)[1].view(y.size()) == y).sum()
        else:
            self.training_correct_counter += (torch.max(y_hat, 1)[1].view(y.size()) == y).sum()
        return {'loss': self.my_loss(y_adv_hat, y)}

    def validation_end(self, outputs):
        # ...
        train_avg_acc = 100 * self.training_correct_counter / len(self.tng_dataloader.dataset)
        return {'Training/_accuracy':train_avg_acc}
@rcmalli rcmalli added feature Is an improvement or enhancement help wanted Open to be worked on labels Aug 11, 2019
@williamFalcon
Copy link
Contributor

? just calculate accuracy in training_step. you can do whatever in there, it’s not just for the loss

@minhptx
Copy link

minhptx commented Oct 25, 2019

I think the problem here is that if metrics are caculated in training_step, it is only calculated for one batch. I need to tweak the code as @rcmalli did to aggregate for the whole epoch.

Can we have a function called training_end where we can calculate metrics for the whole epoch ? (Something similar to validation_end but for training)

@expectopatronum
Copy link
Contributor

@minhptx Did you implement this? I also want to collect my training metrics after each epoch but as far as I understood the new method training_end just collects the output for the whole batch and not all batches in an epoch.

@Jonathan-LeRoux
Copy link

I'm also interested in such a feature. It took me a little while to understand that training_end and validation_end did not have the same behavior, which is a bit misleading. It may be clearer to have training_end be whatever happens at the end of an epoch, and maybe rename the current training_end to training_step_end.

@captainvera
Copy link

captainvera commented Mar 3, 2020

@Jonathan-LeRoux I'm in the same boat.. It is super misleading that validation_end and training_end have different behaviour. It took me a while to understand what was going on.

Continuing this discussion @williamFalcon, I think this thread's name is misleading. There's absolutely no reason for lightning to automatically calculate accuracy. On the other hand, it would be super useful if lightning could keep the list of outputs of training_step just like it does for validaton_step with validation_end.

Correct me if I'm wrong, but the only way to calculate these metrics is for me to save a state of (y_hat, target) throughout the entire epoch and calculate metrics at certain points. My point is, if I am not supposed to keep state to track validation metrics why would we break that philosophy with the training metrics?

edit:
There are metrics we can calculate per-batch such as accuracy and just save a running average, for that we could use external loggers. On the other hand, metrics like F1, need to be calculated using the entirety of the dataset so pumping out values to the loggers at each training step seems useless for this purpose (off, we could keep avgs of precision, etc etc but you get the point).

@Borda
Copy link
Member

Borda commented Mar 4, 2020

@captainvera have you check recent changes in #776 #889 #950
anyway a PR with suggestions is welcome 🤖

@failable
Copy link

@captainvera May I ask how you compute metrics like F1 in current version? I tried to do it in validation_epoch_end but it seemed that to access the data loader by val_dataloader I would need to handle things like moving tensors to correct devices manually...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

8 participants