General data analysis #7

EchteRobert · 2022-03-24T21:12:21Z

Creating plate clusters for training and validation splits

In order to split all the Stain2, Stain3, Stain4, and Stain5 (condition C) plates into clusters that are most similar to each other, I created a hierarchical cluster map based on the PC1 loadings of the mean aggregate profiles of these plates. First including all the outliers and then iteratively removing these until I find x clusters which are similar enough so that they can be used for training and validation.

Main takeaways

The final 7 clusters were chosen based on the largest clusters that can be seen in the final clustermap iteration.
Cluster number 5, 6, and 7 are the highest quality clusters, i.e. have the highest correlation of the PC1 loadings of the plates.
We already now that the model beats the baseline on cluster 1 for most plates, except on BR00113818 and BR00112199. However, note that this is actually one of the most diverse diverse clusters of the 7 I have created here.

Clustermap all plates

plate cluster 1: 3 plates
plate cluster 2: 67 plates
plate cluster 3: 9 plates
plate cluster 4: 1 plate

Clustermap remove iteration 3

plate cluster 1: 8 plates
plate cluster 2: 8 plates
plate cluster 3: 41 plates
plate cluster 4: 2 plates
plate cluster 5: 9 plates
plate cluster 6: 2 plates
plate cluster 7: 3 plates

Clustermap final iteration

Cluster numbering goes from top to bottom (where the bottom right cluster is number 7)

Clusters final iteration

	plate	cluster
0	BR00113818	1
1	BR00112198	1
2	BR00112204	1
3	BR00112199	1
4	BR00112201	1
5	BR00112197repeat	1
6	BR00112202	1
7	BR00112197binned	1
8	BR00112197standard	1

	plate	cluster
25	BR00116621highexp	2
29	BR00116624bin1	2
30	BR00116624highexp	2
31	BR00116621bin1	2
35	BR00116620bin1	2
40	BR00116620highexp	2

	plate	cluster
23	BR00116632highexp	3
24	BR00116622	3
27	015124-V	3
33	BR00116622highexp	3
34	BR00116633highexp	3
36	BR00116634highexp	3
39	015124-Vhighexp	3

	plate	cluster
9	BR00115129	4
10	BR00115128	4
11	BR00115133highexp	4
12	BR00115133	4
13	BR00115127	4
14	BR00115131	4
15	BR00115125	4
16	BR00115134	4
17	BR00115125highexp	4
18	BR00115128highexp	4

	plate	cluster
19	BR00116630	5
20	BR00116625	5
21	BR00116631	5
22	BR00116627	5
26	BR00116630highexp	5
28	BR00116629highexp	5
32	BR00116627highexp	5
37	BR00116628highexp	5
38	BR00116625highexp	5
41	BR00116631highexp	5
42	BR00116628	5
43	BR00116629	5

	plate	cluster
48	BR00120277	6
50	BR00120276	6
51	BR00120274	6
52	BR00120275	6
53	BR00120271	6
54	BR00120270	6
56	BR00120272	6
57	BR00120273	6

	plate	cluster
44	BR00120272confocal	7
45	BR00120277confocal	7
46	BR00120274confocal	7
47	BR00120271confocal	7
49	BR00120276confocal	7
55	BR00120273confocal	7
58	BR00120270confocal	7
59	BR00120275confocal	7

EchteRobert · 2022-06-07T14:26:36Z

Updates clustermap with condition A

Note that the two largest clusters are split by dividing Stain2, Stain3, and Stain4 as one cluster and Stain5 as the second

EchteRobert added Development Learning labels Mar 24, 2022

EchteRobert self-assigned this Mar 24, 2022

EchteRobert removed the Development label Apr 20, 2022

EchteRobert mentioned this issue Jun 9, 2022

07. Model for Stain5 #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General data analysis #7

General data analysis #7

EchteRobert commented Mar 24, 2022 •

edited

Loading

EchteRobert commented Jun 7, 2022 •

edited

Loading

General data analysis #7

General data analysis #7

Comments

EchteRobert commented Mar 24, 2022 • edited Loading

Creating plate clusters for training and validation splits

Main takeaways

EchteRobert commented Jun 7, 2022 • edited Loading

Updates clustermap with condition A

EchteRobert commented Mar 24, 2022 •

edited

Loading

EchteRobert commented Jun 7, 2022 •

edited

Loading