Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOM should protect against inputs with no column names #41

Closed
LTLA opened this issue Jan 6, 2021 · 6 comments
Closed

SOM should protect against inputs with no column names #41

LTLA opened this issue Jan 6, 2021 · 6 comments

Comments

@LTLA
Copy link

LTLA commented Jan 6, 2021

In the following example:

library(FlowSOM)
set.seed(1000)
x <- matrix(runif(10000), ncol=10)
out <- SOM(x, xdim=5, ydim=5)
out$mapping[,1]
##   [1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 # and on it goes

All observations are assigned the same code - odd. Diving into the source reveals the offending line:

FlowSOM/R/2_buildSOM.R

Lines 201 to 209 in 6f79b8b

nnCodes <- .C("C_mapDataToCodes",
as.double(newdata[,colnames(codes)]),
as.double(codes),
as.integer(nrow(codes)),
as.integer(nrow(newdata)),
as.integer(ncol(codes)),
nnCodes = integer(nrow(newdata)),
nnDists = double(nrow(newdata)),
distf = as.integer(distf))

Here, colnames(codes) is NULL, which causes data to be subsetted to an empty numeric vector. The C code subsequently performs out-of-bounds accesses, leading to very difficult memory-related bugs downstream of using SOM().

An expedient solution would be to simply enforce identical colnames in any non-NULL value of codes. If codes=NULL, then it's derived from the data and there is no need for a separate check that the columns are matching.

@SamGG
Copy link
Contributor

SamGG commented Jan 6, 2021

Already reported #38, but not yet implemented IIRC.
I hope Sofie will have time to take care of this point.

@LTLA
Copy link
Author

LTLA commented Jan 8, 2021

Well, you know what they say... when it rains, it pours.

FlowSOM inherits a memory-related problem from its dependencies, namely cytolib via flowCore; see RGLab/cytolib#45. This means that I am not able to call FlowSOM with any other package that uses global C variables, which pretty much makes it impossible to use in my bluster package (for coordinating different clustering methods).

So, I went for the nuclear option of copying all the relevant code from FlowSOM into bluster to avoid these dependency problems (and also fix this issue at the same time). I had to reimplement it in C++, though, which led to #42.

@SamGG
Copy link
Contributor

SamGG commented Jan 8, 2021

I thought that when it rains, we get wet...
I think that there are not a lot of differences between the core code of FlowSOM and the SOM implemented in the kohonen package. Did you have a look?

@LTLA
Copy link
Author

LTLA commented Jan 8, 2021

Yes, and I couldn't see any change-related termination code in kohonen.

@SofieVG
Copy link
Owner

SofieVG commented Jan 8, 2021 via email

@SofieVG
Copy link
Owner

SofieVG commented Jan 22, 2021

The issue regarding the colnames should be solved with the latest commit.

@SofieVG SofieVG closed this as completed Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants