05. Model for Stain4 #8

EchteRobert · 2022-04-20T13:38:19Z

To test the generalization of the model trained on Stain3 (and tested on Stain2), I will now evaluate it on Stain4. Based on the results, further advancements will be made by training on plates of Stain4 (and then evaluating on Stain2 and Stain3 in turn).

Stain 4 consists of 30 plates which were divided into 5 batches, each with different staining conditions.

Baseline staining conditions used in Stain3 (Stain2)
Baseline staining conditions used in Stain3 with 2-fold dilution of all dyes (Stain2_2)
Baseline staining conditions used in Stain3 with 2-fold dilution of Phalloidin (Stain2_Phalloidin_2)
Baseline staining conditions used in Stain3 with 2-fold dilution of Phalloidin and ConA (Stain2_Phalloidin_ConA_2)
Bray et al. staining conditions (Bray)

Apart from that, standard exposure vs. high exposure and Binning 1 vs. Binning 2 comparisons were also made.

To analyze the relations between the different plates in Stain4, I calculated the correlation between the PC1 loadings of the mean aggregated profiles of every plate. I only included the plates that were similar enough to form a large cluster.

Click here for clusters!

Click here for cells per well per plate!

Only the plates I have downloaded so far are available, but will still give a good indication of this dataset

EchteRobert · 2022-04-20T15:09:28Z

Benchmark Stain4

plate	Training mAP BM	Validation mAP BM	PR BM
BR00116630	0.31	0.31	53.30
BR00116625	0.31	0.29	58.90
BR00116631	0.30	0.28	57.80
BR00116627	0.30	0.29	56.70
BR00116630highexp	0.29	0.30	58.90
BR00116629highexp	0.29	0.29	52.20
BR00116627highexp	0.31	0.27	56.70
BR00116628highexp	0.32	0.31	57.80
BR00116625highexp	0.32	0.28	61.10
BR00116631highexp	0.28	0.30	53.30
BR00116628	0.32	0.29	58.90
BR00116629	0.30	0.29	52.20

EchteRobert · 2022-04-21T20:07:00Z

First model trained on Stain4

Using the same setup as found in Stain3 (#6 (comment)) I trained on plates BR00116625highexp, BR00116628highexp_FS, and BR00116629highexp. I only have 2 validation plates now, but will have more next week.

Main takeaways

I am getting similar results to the best model I was able to train on Stain3: the model is discerning training compounds just fine, but is lacking on validation compounds.

One possible explanation that I think may also apply to the Stain3 model is that the training plates look too much alike. The Stain2 model was trained on slightly more dissimilar plates and generalized well to everything from Stain2. However, we have also seen that the model does not generalize to plates that are too different either. Although this might be a possible solution, I think managing this trade-off (i.e. trying different compositions of training plates) is not something I should be looking into because ideally the training plates should play a smaller role in generalization.

Next up

Possible solutions include:

Using a rank based loss function which would penalize the model more for scoring poorly on mAP
Using hard positive/negative mining during contrastive training (although this should already implicitly happen with SupConLoss?) --> this is indeed already happening as the cosine similarity normalizes the representations
Increasing batch size even further (?) --> did not work see 04. Model for Stain3 #6 (comment)

TableTime!

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training plates
BR00116625highexp	0.67	0.32	0.33	0.28	98.9	61.1
BR00116628highexp	0.7	0.32	0.29	0.31	97.8	57.8
BR00116629highexp	0.65	0.29	0.35	0.29	98.9	52.2
Validation plates
BR00116630highexp	0.46	0.29	0.29	0.3	94.4	58.9
BR00116631highexp	0.39	0.28	0.23	0.3	86.7	53.3

EchteRobert · 2022-04-26T20:11:49Z

Second model trained on Stain4

With slightly updated parameters according to #6 (comment) I now train on the same plates as before, but evaluate on all plates in the cluster.

Main takeaways

The model is able to generalize to most plates, although there are still 3 for which it does not beat the baseline validation mAP.
Although the 3 outliers are closely linked, they do not especially stand out when comparing them to the training plates using the PC1 loadings correlation plot. One possible improvement could now be to find a method that more accurately describes which plates are close to the training plates and which ones aren't.

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training plates
BR00116625highexp	0.75	0.32	0.36	0.28	98.9	61.1
BR00116628highexp	0.76	0.32	0.34	0.31	96.7	57.8
BR00116629highexp	0.75	0.29	0.32	0.29	98.9	52.2
Validation plates
BR00116625	0.55	0.31	0.31	0.29	98.9	58.9
BR00116630highexp	0.47	0.29	0.27	0.3	91.1	58.9
BR00116631highexp	0.42	0.28	0.22	0.3	88.9	53.3
BR00116631	0.42	0.3	0.21	0.28	94.4	57.8
BR00116627highexp	0.5	0.31	0.36	0.27	92.2	56.7
BR00116627	0.48	0.3	0.32	0.29	92.2	56.7
BR00116629	0.55	0.3	0.3	0.29	97.8	52.2
BR00116628	0.56	0.32	0.29	0.29	97.8	58.9

EchteRobert · 2022-04-27T20:02:54Z

Using a rank based loss function

As described in Deep Metric Learning to Rank , I use the FastAP loss function which optimizes the rank-based Average Precision measure, using an approximation derived from distance quantization.
Hypothesis: by directly optimizing the mean Average Precision, instead of the Percent Replicating with the Supervised Contrastive Loss function, the model should generalize better to the ranking task (mAP).

Main takeaways

By optimizing for mean average precision we lose some performance on the Percent Replicating score, as expected. The distribution of the Percent Replicating histograms looks different as well.
I observe no significant increase in performance by training the model with the rank loss function.

Results

Table Results

plate	Training mAP model	Training mAP BM	Validation mAP model	Validation mAP BM	PR model	PR BM
Training plates
BR00116625highexp	0.66	0.32	0.36	0.28	95.6	61.1
BR00116628highexp	0.67	0.32	0.33	0.31	92.2	57.8
BR00116629highexp	0.68	0.29	0.3	0.29	86.7	52.2
Validation plates
BR00116631highexp	0.39	0.28	0.26	0.3	71.1	53.3
BR00116625	0.51	0.31	0.3	0.29	86.7	58.9
BR00116630highexp	0.44	0.29	0.29	0.3	76.7	58.9
BR00116631	0.41	0.3	0.25	0.28	77.8	57.8
BR00116627highexp	0.48	0.31	0.35	0.27	81.1	56.7
BR00116627	0.46	0.3	0.3	0.29	73.3	56.7
BR00116629	0.51	0.3	0.31	0.29	87.8	52.2
BR00116628	0.48	0.32	0.24	0.29	83.3	58.9

Percent Replicating graphs

Percent Replicating with rank based loss function

Percent Replicating with supervised contrastive loss function

EchteRobert added the Development label Apr 20, 2022

EchteRobert self-assigned this Apr 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05. Model for Stain4 #8

05. Model for Stain4 #8

EchteRobert commented Apr 20, 2022 •

edited

Loading

EchteRobert commented Apr 20, 2022 •

edited

Loading

EchteRobert commented Apr 21, 2022 •

edited

Loading

EchteRobert commented Apr 26, 2022 •

edited

Loading

EchteRobert commented Apr 27, 2022 •

edited

Loading

05. Model for Stain4 #8

05. Model for Stain4 #8

Comments

EchteRobert commented Apr 20, 2022 • edited Loading

EchteRobert commented Apr 20, 2022 • edited Loading

Benchmark Stain4

EchteRobert commented Apr 21, 2022 • edited Loading

First model trained on Stain4

Main takeaways

Next up

EchteRobert commented Apr 26, 2022 • edited Loading

Second model trained on Stain4

Main takeaways

EchteRobert commented Apr 27, 2022 • edited Loading

Using a rank based loss function

Main takeaways

Results

EchteRobert commented Apr 20, 2022 •

edited

Loading

EchteRobert commented Apr 20, 2022 •

edited

Loading

EchteRobert commented Apr 21, 2022 •

edited

Loading

EchteRobert commented Apr 26, 2022 •

edited

Loading

EchteRobert commented Apr 27, 2022 •

edited

Loading