choosing the optimal number of "Ks" #40

cathalgking · 2023-08-29T08:32:08Z

What is the best way to choose the optimal number of K's or cell-types for a dataset?
Is it just by observing the plot from the below code? How does one know what to set the upper limit to? i.e. in the below example, there could be more than 9 cell types present.
ldas <- fitLDA(t(as.matrix(cd)), Ks = seq(2, 9, by = 1))
Also, my R session often crashes when running the above code.

The text was updated successfully, but these errors were encountered:

bmill3r · 2023-08-31T23:45:42Z

Hi @cathalgking,

Thanks again for using STdeconvolve and for your questions! To provide some context, I'll point you towards a previous GitHub response:

#35 (comment)

In the example a max K of 9 was chosen for speed purposes, however in practice, a higher K could be used if you suspect more than 9 cell types in the data.

In terms of your R session crashing, what kind of errors are you seeing, if any? In terms of compute resources, are you possibly running out of memory? This has happened to me sometimes for very large datasets when fitting multiple models. I believe there is a way to change the max memory limit of R.

Let me know if you still have follow up questions and hope this helps,
Brendan

cathalgking · 2023-09-08T01:30:09Z

I solved this thanks @bmill3r

cathalgking · 2023-09-08T04:29:33Z

@bmill3r I notice that the opt parameter in the optimalModel() function can take the option "min". Does this mean that it takes the lowest perplexity value where alpha <1 (not in a grey region) of the fitLDA plot? Is this the easiest way to choose K?

My 4 samples seem to vary a lot in terms of what K to choose.
For instance, sample A seems to have an optimal K at 5 or 6?

While sample B seems to have an optimal K at around 16?

Other than this plot, is there any other way to ascertain the best K per sample?

bmill3r · 2023-09-13T00:03:04Z

Hi @cathalgking,

I believe that min just selects the model with lowest perplexity, but does not take into account alpha. It in theory would be the simplest, but because it does not account for alpha or the number of rare cell types, it might not be the best option. There are other options, such as

"kneed" = K vs perplexity inflection point.
"min" = K corresponding to minimum perplexity
"proportion" = K vs number of cell-type with mean proportion < 5% inflection point

but all of these currently do not take into account alpha, and whether they are truly identify the optimal K can be dataset dependent. So really, I would recommend using the plots to help guide selection of K.

Hope this helps,
Brendan

cathalgking · 2023-09-13T00:20:00Z

Ok thanks @bmill3r . So would you say (just from looking at the plots) that the best K would be ~6 for the first plot and for the bottom plot ~ 16?

bmill3r · 2023-09-14T01:36:41Z

Hi @cathalgking,

Yes, looking at those plots I would say those are reasonable choices of K.

Brendan

cathalgking closed this as completed Sep 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

choosing the optimal number of "Ks" #40

choosing the optimal number of "Ks" #40

cathalgking commented Aug 29, 2023

bmill3r commented Aug 31, 2023

cathalgking commented Sep 8, 2023

cathalgking commented Sep 8, 2023

bmill3r commented Sep 13, 2023

cathalgking commented Sep 13, 2023

bmill3r commented Sep 14, 2023

choosing the optimal number of "Ks" #40

choosing the optimal number of "Ks" #40

Comments

cathalgking commented Aug 29, 2023

bmill3r commented Aug 31, 2023

cathalgking commented Sep 8, 2023

cathalgking commented Sep 8, 2023

bmill3r commented Sep 13, 2023

cathalgking commented Sep 13, 2023

bmill3r commented Sep 14, 2023