Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distance with as.dist.obj=TRUE does not work correctly when the input data has 2 rows #29

Closed
Nowosad opened this issue Jul 29, 2021 · 3 comments

Comments

@Nowosad
Copy link
Contributor

Nowosad commented Jul 29, 2021

In the case of an input object with 2 rows, the R dist()function returns an output of length 1 (a distance between the first and the second row). However, the distance() function with as.dist.obj=TRUE returns an empty object in these case:

library(philentropy)

m1 = matrix(c(1, 2), ncol = 1)
m2 = matrix(c(1, 2, 3), ncol = 1)

# base R
d1 = dist(m1)
d1
#>   1
#> 2 1
d2 = dist(m2)
d2
#>   1 2
#> 2 1  
#> 3 2 1

# philentropy
dp1 = distance(m1, as.dist.obj = TRUE)
#> Metric: 'euclidean'; comparing: 2 vectors.
dp1
#> dist(0)
str(dp1)
#>  'dist' num(0) 
#>  - attr(*, "Labels")= chr "euclidean"
#>  - attr(*, "Size")= int 1
#>  - attr(*, "call")= language as.dist.default(m = dist, diag = diag, upper = upper)
#>  - attr(*, "Diag")= logi FALSE
#>  - attr(*, "Upper")= logi FALSE
#>  - attr(*, "method")= chr "euclidean"
dp2 = distance(m2, as.dist.obj = TRUE)
#> Metric: 'euclidean'; comparing: 3 vectors.
dp2
#>    v1 v2
#> v2  1   
#> v3  2  1

Created on 2021-07-29 by the reprex package (v2.0.0)

@Nowosad
Copy link
Contributor Author

Nowosad commented Jul 29, 2021

Possible solution is to slightly modify the distance() code at

if (as.dist.obj) {
:

if (as.dist.obj) {
  if (ncols == 2) {
    dist <- stats::as.dist(matrix(c(0, dp1, dp1, 0), nrow = 2), diag = diag, upper = upper)
  } else {
    dist <- stats::as.dist(dist, diag = diag, upper = upper)
  }
  attr(dist, "method") <- method
  return(dist)
}

HajkD added a commit that referenced this issue Jul 29, 2021
…tats::dist()` when working with 2 dimensional input matrices (2 vector inputs) (see #29) (Many thanks to

Jakub Nowosad (@Nowosad))
@HajkD
Copy link
Member

HajkD commented Jul 29, 2021

Hi Jakub,

This is a fantastic suggestion! Thank you very much for catching this!

I now added your suggestions and please feel free to test whether it is sufficient.

library(philentropy)

m1 = matrix(c(1, 2), ncol = 1)
m2 = matrix(c(1, 2, 3), ncol = 1)

dist(m1)
#> 1
#> 2 1
distance(m1, as.dist.obj = TRUE)
#> Metric: 'euclidean'; comparing: 2 vectors.
#> 1
#> 2 1
dist(m2)
#> 1 2
#> 2 1  
#> 3 2 1
distance(m2, as.dist.obj = TRUE)
#> Metric: 'euclidean'; comparing: 3 vectors.
#>    v1 v2
#> v2  1   
#> v3  2  1

P.S. Sorry for the delay on your your pull request, I am a bit delayed, but I am working on it!

@Nowosad
Copy link
Contributor Author

Nowosad commented Jul 29, 2021

Thanks!

PS No problem - I can wait.

@Nowosad Nowosad closed this as completed Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants