Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch prepDetections to gsub. #39

Merged
merged 4 commits into from
Dec 18, 2021
Merged

Switch prepDetections to gsub. #39

merged 4 commits into from
Dec 18, 2021

Conversation

mhpob
Copy link

@mhpob mhpob commented Jul 1, 2021

Allows speed increase outlined in #38.

Note: will give the error

Warning message: In eval(jsub, SDenv, parent.frame()) : NAs introduced by coercion

when a match doesn't occurr. This happened for me on a few lines where fractional seconds weren't reported for some reason.

@mhpob mhpob changed the title Switch to prepDetections to gsub. Switch prepDetections to gsub. Jul 1, 2021
@mhpob
Copy link
Author

mhpob commented Jul 7, 2021

The issue in #37 stems from the R default of printing 0 decimal seconds. This is internally converted to a character before strsplit/gsub, and so and millisecond information is dropped while attempting to create the "frac" column. Temporarily changing the R options inside of the function seems to fix this and so should close that issue.

@baktoft
Copy link
Owner

baktoft commented Jul 8, 2021

Hi @mhpob,
Thanks for this - much appreciated 👍

I have looked into it and it seems that the issue in #37 can be solved by changing this line in prepDetections()
detections[, frac:= (as.numeric(sapply(raw_dat$'Date and Time (UTC)', function(x) strsplit(x, "\\.")[[1]][2]))) / 1000]
to
detections[, frac:= as.numeric(ts) - floor(as.numeric(ts))]

This should also cater for the cases when fractional seconds are absent and be adequately fast.
Would you mind take it for a spin on your data to confirm?

I like the gsub() from #38 :-)

Thanks,
\hb

@mhpob
Copy link
Author

mhpob commented Jul 8, 2021

Great catch @baktoft -- that halved the time again and solved the warning on my data.

> fn <- system.file("extdata", "VUE_Export_ssu1.csv", package="yaps")
> vue <- data.table::fread(fn, fill=TRUE)
> microbenchmark::microbenchmark(prepDetections(vue, 'vemco_vue'))
Unit: milliseconds
                             expr    min     lq      mean  median      uq     max neval
 prepDetections(vue, "vemco_vue") 26.929 27.397 28.292839 27.5914 28.2504 34.4613   100

Columns ts and epo are now carrying the millisecond information with them. This can be seen by changing the global R options:

> options(digits = 15, digits.secs = 3)
> detections
                           ts   tag            epo              frac serial
   1: 2019-09-09 16:04:11.193 59335 1568045051.193 0.193000078201294 128355
   2: 2019-09-09 16:04:12.573 59336 1568045052.574 0.573999881744385 128371
   3: 2019-09-09 16:04:43.953 59335 1568045083.953 0.953000068664551 128959
   4: 2019-09-09 16:05:14.888 59335 1568045114.888 0.888000011444092 128344
   5: 2019-09-09 16:05:26.450 59335 1568045126.451 0.450999975204468 128370
  ---                                                                      
15369: 2019-09-10 12:59:12.707 59336 1568120352.708 0.707999944686890 128369
15370: 2019-09-10 12:59:12.789 59336 1568120352.790 0.789999961853027 128973
15371: 2019-09-10 12:59:32.420 59336 1568120372.420 0.420000076293945 128371
15372: 2019-09-10 13:02:55.806 59335 1568120575.807 0.806999921798706 135178
15373: 2019-09-10 13:02:56.724 59335 1568120576.725 0.724999904632568 128369

Is that something you would want to keep, or should I change things to strip that information? I had stripped it in 464e3f2 -- it does take a little longer there since that method has to convert to list time, then back to calendar time.

@baktoft
Copy link
Owner

baktoft commented Jul 9, 2021

Hmmm - it should be ok to truncate the fractional seconds from those columns. Something like this should be ok'ish in terms of cpu-time.
detections[, ts := as.POSIXct(floor(as.numeric(ts)), origin="1970-01-01", tz="UTC")]
detections[, epo := floor(epo)]
Thanks,
\hb

@mhpob
Copy link
Author

mhpob commented Jul 9, 2021

Original result and time taken:

> fn <- system.file("extdata", "VUE_Export_ssu1.csv", package="yaps")
> vue <- data.table::fread(fn, fill=TRUE, tz = '')
> prepDetections_original <- function(raw_dat, type){
+   detections <- data.table::data.table()
+   if (type == "vemco_vue"){
+     detections[, ts:=as.POSIXct(raw_dat$'Date and Time (UTC)', tz="UTC")]
+     detections[, tag:=as.numeric(sapply(raw_dat$Transmitter, function(x) strsplit(x, "-")[[1]][3]))]
+     detections[, epo:=as.numeric(ts)]
+     detections[, frac:= (as.numeric(sapply(raw_dat$'Date and Time (UTC)', function(x) strsplit(x, "\\.")[[1]][2]))) / 1000]
+     detections[, serial:=as.numeric(sapply(raw_dat$Receiver, function(x) strsplit(x, "-")[[1]][2]))]
+   }
+   detections[]
+   return(detections)
+ }
> options(digits = 15, digits.secs = 3)
> prepDetections_original(vue, 'vemco_vue')
                            ts   tag            epo  frac serial
    1: 2019-09-09 16:04:11.193 59335 1568045051.193 0.193 128355
    2: 2019-09-09 16:04:12.573 59336 1568045052.574 0.574 128371
    3: 2019-09-09 16:04:43.953 59335 1568045083.953 0.953 128959
    4: 2019-09-09 16:05:14.888 59335 1568045114.888 0.888 128344
    5: 2019-09-09 16:05:26.450 59335 1568045126.451 0.451 128370
   ---                                                          
15369: 2019-09-10 12:59:12.707 59336 1568120352.708 0.708 128369
15370: 2019-09-10 12:59:12.789 59336 1568120352.790 0.790 128973
15371: 2019-09-10 12:59:32.420 59336 1568120372.420 0.420 128371
15372: 2019-09-10 13:02:55.806 59335 1568120575.807 0.807 135178
15373: 2019-09-10 13:02:56.724 59335 1568120576.725 0.725 128369
> microbenchmark::microbenchmark(prepDetections_original(vue, 'vemco_vue'))
Unit: milliseconds
                                      expr      min        lq       mean   median       uq      max neval
 prepDetections_original(vue, "vemco_vue") 287.4089 308.50835 319.939158 318.6149 329.1724 368.4272   100

Current version using tz = '', default in data.table::fread < v1.14.0:

> vue <- data.table::fread(fn, fill=TRUE, tz = '')
> options(digits = 15, digits.secs = 3)
> prepDetections(vue, 'vemco_vue')
                        ts   tag        epo  frac serial
    1: 2019-09-09 16:04:11 59335 1568045051 0.193 128355
    2: 2019-09-09 16:04:12 59336 1568045052 0.574 128371
    3: 2019-09-09 16:04:43 59335 1568045083 0.953 128959
    4: 2019-09-09 16:05:14 59335 1568045114 0.888 128344
    5: 2019-09-09 16:05:26 59335 1568045126 0.451 128370
   ---                                                  
15369: 2019-09-10 12:59:12 59336 1568120352 0.708 128369
15370: 2019-09-10 12:59:12 59336 1568120352 0.790 128973
15371: 2019-09-10 12:59:32 59336 1568120372 0.420 128371
15372: 2019-09-10 13:02:55 59335 1568120575 0.807 135178
15373: 2019-09-10 13:02:56 59335 1568120576 0.725 128369
> microbenchmark::microbenchmark(prepDetections(vue, 'vemco_vue'))
Unit: milliseconds
                             expr     min      lq      mean   median       uq      max neval
 prepDetections(vue, "vemco_vue") 43.2651 43.8835 46.724863 44.50235 47.03805 165.2624   100

Current version when tz is given as UTC in data.table::fread (default in data.table version >=1.14.0):

> fn <- system.file("extdata", "VUE_Export_ssu1.csv", package="yaps")
> vue <- data.table::fread(fn, fill=TRUE, tz = 'UTC')
> options(digits = 15, digits.secs = 3)
> prepDetections(vue, 'vemco_vue')
                        ts   tag        epo  frac serial
    1: 2019-09-09 16:04:11 59335 1568045051 0.193 128355
    2: 2019-09-09 16:04:12 59336 1568045052 0.574 128371
    3: 2019-09-09 16:04:43 59335 1568045083 0.953 128959
    4: 2019-09-09 16:05:14 59335 1568045114 0.888 128344
    5: 2019-09-09 16:05:26 59335 1568045126 0.451 128370
   ---                                                  
15369: 2019-09-10 12:59:12 59336 1568120352 0.708 128369
15370: 2019-09-10 12:59:12 59336 1568120352 0.790 128973
15371: 2019-09-10 12:59:32 59336 1568120372 0.420 128371
15372: 2019-09-10 13:02:55 59335 1568120575 0.807 135178
15373: 2019-09-10 13:02:56 59335 1568120576 0.725 128369
> microbenchmark::microbenchmark(prepDetections(vue, 'vemco_vue'))
Unit: milliseconds
                             expr    min      lq      mean  median      uq     max neval
 prepDetections(vue, "vemco_vue") 28.567 29.0313 30.529318 29.6492 31.4714 38.9802   100

@mhpob
Copy link
Author

mhpob commented Jul 9, 2021

I'm now noticing that there are some floating point rounding issues (see rows 2 and 5 in the original result, e.g.). This was also the behavior in the original version of prepDetections, only now the code explicitly covers them up as the input time stamp is not carried through (rounding of ts).

In your experience, does a thousandth of a second matter... especially since the clocks can drift on a much larger scale than that?

@baktoft
Copy link
Owner

baktoft commented Dec 18, 2021

Sorry - got side-tracked, but are now doing a bit of end-of-year-cleaning. 1/1000 of seconds can definitely matter in estimating positions (1/1000 ~ 1.5 meters), but the temporal resolution of systems yielding data to use with this function is only 1/1000 s, so the rounding will not be an issues here.

@baktoft baktoft merged commit e768797 into baktoft:v1.2.5.9000 Dec 18, 2021
baktoft added a commit that referenced this pull request Dec 18, 2021
* Switch to gsub.

* Change options() to keep millisecond information. Copy raw data and change by reference.

* Change frac calculation and drop resulting excess.

* Explicity provide date format. Round off ts, epo, and frac.

Co-authored-by: Mike O'Brien <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants