Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General data analysis #7

Open
EchteRobert opened this issue Mar 24, 2022 · 1 comment
Open

General data analysis #7

EchteRobert opened this issue Mar 24, 2022 · 1 comment
Assignees
Labels

Comments

@EchteRobert
Copy link
Collaborator

EchteRobert commented Mar 24, 2022

Creating plate clusters for training and validation splits

In order to split all the Stain2, Stain3, Stain4, and Stain5 (condition C) plates into clusters that are most similar to each other, I created a hierarchical cluster map based on the PC1 loadings of the mean aggregate profiles of these plates. First including all the outliers and then iteratively removing these until I find x clusters which are similar enough so that they can be used for training and validation.

Main takeaways

  • The final 7 clusters were chosen based on the largest clusters that can be seen in the final clustermap iteration.
  • Cluster number 5, 6, and 7 are the highest quality clusters, i.e. have the highest correlation of the PC1 loadings of the plates.
  • We already now that the model beats the baseline on cluster 1 for most plates, except on BR00113818 and BR00112199. However, note that this is actually one of the most diverse diverse clusters of the 7 I have created here.
Clustermap all plates

ClusterMapAllStainPlates

plate cluster 1: 3 plates
plate cluster 2: 67 plates
plate cluster 3: 9 plates
plate cluster 4: 1 plate

Clustermap remove iteration 3

ClusterMapRemove3

plate cluster 1: 8 plates
plate cluster 2: 8 plates
plate cluster 3: 41 plates
plate cluster 4: 2 plates
plate cluster 5: 9 plates
plate cluster 6: 2 plates
plate cluster 7: 3 plates

Clustermap final iteration

Cluster numbering goes from top to bottom (where the bottom right cluster is number 7)
ClusterMapFinal_dataset

Clusters final iteration
plate cluster
0 BR00113818 1
1 BR00112198 1
2 BR00112204 1
3 BR00112199 1
4 BR00112201 1
5 BR00112197repeat 1
6 BR00112202 1
7 BR00112197binned 1
8 BR00112197standard 1
plate cluster
25 BR00116621highexp 2
29 BR00116624bin1 2
30 BR00116624highexp 2
31 BR00116621bin1 2
35 BR00116620bin1 2
40 BR00116620highexp 2
plate cluster
23 BR00116632highexp 3
24 BR00116622 3
27 015124-V 3
33 BR00116622highexp 3
34 BR00116633highexp 3
36 BR00116634highexp 3
39 015124-Vhighexp 3
plate cluster
9 BR00115129 4
10 BR00115128 4
11 BR00115133highexp 4
12 BR00115133 4
13 BR00115127 4
14 BR00115131 4
15 BR00115125 4
16 BR00115134 4
17 BR00115125highexp 4
18 BR00115128highexp 4
plate cluster
19 BR00116630 5
20 BR00116625 5
21 BR00116631 5
22 BR00116627 5
26 BR00116630highexp 5
28 BR00116629highexp 5
32 BR00116627highexp 5
37 BR00116628highexp 5
38 BR00116625highexp 5
41 BR00116631highexp 5
42 BR00116628 5
43 BR00116629 5
plate cluster
48 BR00120277 6
50 BR00120276 6
51 BR00120274 6
52 BR00120275 6
53 BR00120271 6
54 BR00120270 6
56 BR00120272 6
57 BR00120273 6
plate cluster
44 BR00120272confocal 7
45 BR00120277confocal 7
46 BR00120274confocal 7
47 BR00120271confocal 7
49 BR00120276confocal 7
55 BR00120273confocal 7
58 BR00120270confocal 7
59 BR00120275confocal 7
@EchteRobert
Copy link
Collaborator Author

EchteRobert commented Jun 7, 2022

Updates clustermap with condition A

Note that the two largest clusters are split by dividing Stain2, Stain3, and Stain4 as one cluster and Stain5 as the second

ClusterMapTrainingPlateSelection_wcondA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant