Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

choosing the optimal number of "Ks" #40

Closed
cathalgking opened this issue Aug 29, 2023 · 6 comments
Closed

choosing the optimal number of "Ks" #40

cathalgking opened this issue Aug 29, 2023 · 6 comments

Comments

@cathalgking
Copy link

What is the best way to choose the optimal number of K's or cell-types for a dataset?
Is it just by observing the plot from the below code? How does one know what to set the upper limit to? i.e. in the below example, there could be more than 9 cell types present.
ldas <- fitLDA(t(as.matrix(cd)), Ks = seq(2, 9, by = 1))
Also, my R session often crashes when running the above code.

@bmill3r
Copy link
Collaborator

bmill3r commented Aug 31, 2023

Hi @cathalgking,

Thanks again for using STdeconvolve and for your questions! To provide some context, I'll point you towards a previous GitHub response:

#35 (comment)

In the example a max K of 9 was chosen for speed purposes, however in practice, a higher K could be used if you suspect more than 9 cell types in the data.

In terms of your R session crashing, what kind of errors are you seeing, if any? In terms of compute resources, are you possibly running out of memory? This has happened to me sometimes for very large datasets when fitting multiple models. I believe there is a way to change the max memory limit of R.

Let me know if you still have follow up questions and hope this helps,
Brendan

@cathalgking
Copy link
Author

I solved this thanks @bmill3r

@cathalgking
Copy link
Author

@bmill3r I notice that the opt parameter in the optimalModel() function can take the option "min". Does this mean that it takes the lowest perplexity value where alpha <1 (not in a grey region) of the fitLDA plot? Is this the easiest way to choose K?

My 4 samples seem to vary a lot in terms of what K to choose.
For instance, sample A seems to have an optimal K at 5 or 6?
image

While sample B seems to have an optimal K at around 16?
image

Other than this plot, is there any other way to ascertain the best K per sample?

@bmill3r
Copy link
Collaborator

bmill3r commented Sep 13, 2023

Hi @cathalgking,

I believe that min just selects the model with lowest perplexity, but does not take into account alpha. It in theory would be the simplest, but because it does not account for alpha or the number of rare cell types, it might not be the best option. There are other options, such as

"kneed" = K vs perplexity inflection point.
"min" = K corresponding to minimum perplexity
"proportion" = K vs number of cell-type with mean proportion < 5% inflection point

but all of these currently do not take into account alpha, and whether they are truly identify the optimal K can be dataset dependent. So really, I would recommend using the plots to help guide selection of K.

Hope this helps,
Brendan

@cathalgking
Copy link
Author

Ok thanks @bmill3r . So would you say (just from looking at the plots) that the best K would be ~6 for the first plot and for the bottom plot ~ 16?

@bmill3r
Copy link
Collaborator

bmill3r commented Sep 14, 2023

Hi @cathalgking,

Yes, looking at those plots I would say those are reasonable choices of K.

Brendan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants