Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

05. Model for Stain4 #8

Open
EchteRobert opened this issue Apr 20, 2022 · 4 comments
Open

05. Model for Stain4 #8

EchteRobert opened this issue Apr 20, 2022 · 4 comments
Assignees

Comments

@EchteRobert
Copy link
Collaborator

EchteRobert commented Apr 20, 2022

To test the generalization of the model trained on Stain3 (and tested on Stain2), I will now evaluate it on Stain4. Based on the results, further advancements will be made by training on plates of Stain4 (and then evaluating on Stain2 and Stain3 in turn).

Stain 4 consists of 30 plates which were divided into 5 batches, each with different staining conditions.

  • Baseline staining conditions used in Stain3 (Stain2)
  • Baseline staining conditions used in Stain3 with 2-fold dilution of all dyes (Stain2_2)
  • Baseline staining conditions used in Stain3 with 2-fold dilution of Phalloidin (Stain2_Phalloidin_2)
  • Baseline staining conditions used in Stain3 with 2-fold dilution of Phalloidin and ConA (Stain2_Phalloidin_ConA_2)
  • Bray et al. staining conditions (Bray)

Apart from that, standard exposure vs. high exposure and Binning 1 vs. Binning 2 comparisons were also made.

To analyze the relations between the different plates in Stain4, I calculated the correlation between the PC1 loadings of the mean aggregated profiles of every plate. I only included the plates that were similar enough to form a large cluster.

Click here for clusters! PlateClustermap

Click here for cells per well per plate! Only the plates I have downloaded so far are available, but will still give a good indication of this dataset
NRCELLS_Stain4

@EchteRobert
Copy link
Collaborator Author

EchteRobert commented Apr 20, 2022

Benchmark Stain4

plate Training mAP BM Validation mAP BM PR BM
BR00116630 0.31 0.31 53.30
BR00116625 0.31 0.29 58.90
BR00116631 0.30 0.28 57.80
BR00116627 0.30 0.29 56.70
BR00116630highexp 0.29 0.30 58.90
BR00116629highexp 0.29 0.29 52.20
BR00116627highexp 0.31 0.27 56.70
BR00116628highexp 0.32 0.31 57.80
BR00116625highexp 0.32 0.28 61.10
BR00116631highexp 0.28 0.30 53.30
BR00116628 0.32 0.29 58.90
BR00116629 0.30 0.29 52.20

@EchteRobert
Copy link
Collaborator Author

EchteRobert commented Apr 21, 2022

First model trained on Stain4

Using the same setup as found in Stain3 (#6 (comment)) I trained on plates BR00116625highexp, BR00116628highexp_FS, and BR00116629highexp. I only have 2 validation plates now, but will have more next week.

Main takeaways

  • I am getting similar results to the best model I was able to train on Stain3: the model is discerning training compounds just fine, but is lacking on validation compounds.

One possible explanation that I think may also apply to the Stain3 model is that the training plates look too much alike. The Stain2 model was trained on slightly more dissimilar plates and generalized well to everything from Stain2. However, we have also seen that the model does not generalize to plates that are too different either. Although this might be a possible solution, I think managing this trade-off (i.e. trying different compositions of training plates) is not something I should be looking into because ideally the training plates should play a smaller role in generalization.

Next up

Possible solutions include:

  • Using a rank based loss function which would penalize the model more for scoring poorly on mAP
  • Using hard positive/negative mining during contrastive training (although this should already implicitly happen with SupConLoss?) --> this is indeed already happening as the cosine similarity normalizes the representations
  • Increasing batch size even further (?) --> did not work see 04. Model for Stain3  #6 (comment)
TableTime!
plate Training mAP model Training mAP BM Validation mAP model Validation mAP BM PR model PR BM
Training plates
BR00116625highexp 0.67 0.32 0.33 0.28 98.9 61.1
BR00116628highexp 0.7 0.32 0.29 0.31 97.8 57.8
BR00116629highexp 0.65 0.29 0.35 0.29 98.9 52.2
Validation plates
BR00116630highexp 0.46 0.29 0.29 0.3 94.4 58.9
BR00116631highexp 0.39 0.28 0.23 0.3 86.7 53.3

@EchteRobert
Copy link
Collaborator Author

EchteRobert commented Apr 26, 2022

Second model trained on Stain4

With slightly updated parameters according to #6 (comment) I now train on the same plates as before, but evaluate on all plates in the cluster.

Main takeaways

  • The model is able to generalize to most plates, although there are still 3 for which it does not beat the baseline validation mAP.
  • Although the 3 outliers are closely linked, they do not especially stand out when comparing them to the training plates using the PC1 loadings correlation plot. One possible improvement could now be to find a method that more accurately describes which plates are close to the training plates and which ones aren't.
plate Training mAP model Training mAP BM Validation mAP model Validation mAP BM PR model PR BM
Training plates
BR00116625highexp 0.75 0.32 0.36 0.28 98.9 61.1
BR00116628highexp 0.76 0.32 0.34 0.31 96.7 57.8
BR00116629highexp 0.75 0.29 0.32 0.29 98.9 52.2
Validation plates
BR00116625 0.55 0.31 0.31 0.29 98.9 58.9
BR00116630highexp 0.47 0.29 0.27 0.3 91.1 58.9
BR00116631highexp 0.42 0.28 0.22 0.3 88.9 53.3
BR00116631 0.42 0.3 0.21 0.28 94.4 57.8
BR00116627highexp 0.5 0.31 0.36 0.27 92.2 56.7
BR00116627 0.48 0.3 0.32 0.29 92.2 56.7
BR00116629 0.55 0.3 0.3 0.29 97.8 52.2
BR00116628 0.56 0.32 0.29 0.29 97.8 58.9

@EchteRobert
Copy link
Collaborator Author

EchteRobert commented Apr 27, 2022

Using a rank based loss function

As described in Deep Metric Learning to Rank , I use the FastAP loss function which optimizes the rank-based Average Precision measure, using an approximation derived from distance quantization.
Hypothesis: by directly optimizing the mean Average Precision, instead of the Percent Replicating with the Supervised Contrastive Loss function, the model should generalize better to the ranking task (mAP).

Main takeaways

  • By optimizing for mean average precision we lose some performance on the Percent Replicating score, as expected. The distribution of the Percent Replicating histograms looks different as well.
  • I observe no significant increase in performance by training the model with the rank loss function.

Results

Table Results
plate Training mAP model Training mAP BM Validation mAP model Validation mAP BM PR model PR BM
Training plates
BR00116625highexp 0.66 0.32 0.36 0.28 95.6 61.1
BR00116628highexp 0.67 0.32 0.33 0.31 92.2 57.8
BR00116629highexp 0.68 0.29 0.3 0.29 86.7 52.2
Validation plates
BR00116631highexp 0.39 0.28 0.26 0.3 71.1 53.3
BR00116625 0.51 0.31 0.3 0.29 86.7 58.9
BR00116630highexp 0.44 0.29 0.29 0.3 76.7 58.9
BR00116631 0.41 0.3 0.25 0.28 77.8 57.8
BR00116627highexp 0.48 0.31 0.35 0.27 81.1 56.7
BR00116627 0.46 0.3 0.3 0.29 73.3 56.7
BR00116629 0.51 0.3 0.31 0.29 87.8 52.2
BR00116628 0.48 0.32 0.24 0.29 83.3 58.9
Percent Replicating graphs

Percent Replicating with rank based loss function
Stain4_BR00116625_PR

Percent Replicating with supervised contrastive loss function
Stain4_BR00116625_PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant