Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WISH: Built-in MD5 checksum calculator, e.g. tools::md5(x) #21

Closed
HenrikBengtsson opened this issue Apr 19, 2016 · 5 comments
Closed

WISH: Built-in MD5 checksum calculator, e.g. tools::md5(x) #21

HenrikBengtsson opened this issue Apr 19, 2016 · 5 comments

Comments

@HenrikBengtsson
Copy link
Owner

Background

R has a built-in MD5 checksum calculator tools::md5sum(), but it only operates on files. It takes a vector of pathnames (not connections) as input and returns a character string of the same length containing MD5 checksums, e.g.

> files <- dir(R.home(), pattern = "^C", full.names = TRUE)
> files
[1] "C:/PROGRA~1/R/R-3.2.5/CHANGES" "C:/PROGRA~1/R/R-3.2.5/COPYING"
> tools::md5sum(files)
     C:/PROGRA~1/R/R-3.2.5/CHANGES      C:/PROGRA~1/R/R-3.2.5/COPYING
"d45eec95ce49830fdd3277950397dbde" "0cce1e42ef3fb133940946534fcf8896"

Wish / Suggestion

Calculating MD5 checksums is such a common task that it would warrant a core R functions for calculating the checksum for an R object x, e.g. tools::md5(x).

There is an internal src/library/tools/src/md5.c file that implements the MD5 checksum. It even has an internal md5_buffer() function that seems to do exactly this.

See also

  • digest package: Provides well-tested function digest::digest(x, algo="md5") for calculating the MD5 checksum for R object x.
@luciorq
Copy link

luciorq commented Dec 11, 2022

I would love to see an update on that front, is there any way to ping R core members responsible for utils?

@tdeenes
Copy link

tdeenes commented Dec 11, 2022

yup, I tried it ~2.5 years ago on r-devel without success... See the convo here

@yihui
Copy link

yihui commented Sep 18, 2024

FYI I asked an R core member in 2019 offline and he said it was not straightforward. Last week I had a precious chance to meet four R core members, and the good news is the larger sample size worked: wch/r-source@c91b845 Hopefully we will be able to use tools::md5sum(bytes = raw()) in the next version of R, which means we can serialize() an object to a NULL connection, get the raw bytes, pass them to md5sum(), and obtain the MD5 checksum.

@yihui
Copy link

yihui commented Sep 25, 2024

Again FYI, here comes sha256sum(): wch/r-source@a7c34a7

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Oct 26, 2024
# CHANGES IN xfun VERSION 0.48

- Added utilities for HTML tags: `html_tag()`, `html_escape()`,
  `html_escape()`, and `html_view()`. Removed the soft dependency on
  the **htmltools** package accordingly.

- `base_pkgs()` is faster now: it calls
  `tools::standard_package_names()` if the function exists (R >=
  4.4.0), otherwise it just returns a constant vector of base package
  names (thanks, @arnaudgallou, #91).

- Added a function `mime_type()` to obtain the MIME types of files via
  `mime::guess_type()` if **mime** is installed, otherwise it will
  call `tools:::mime_type()`, and fall back to using a system command
  (e.g., `file --mime-type`) to obtain the types.

- Added a function `file_rename()` to deal with `file.rename()`
  failures by calling `file.copy()` (thanks, @Giqles @katrinabrock,
  rstudio/bookdown#804).

- `new_app()` will use `utils::browseURL()` to open the app if
  `options('viewer')` is not configured (thanks, @AlbertLei,
  yihui/litedown#29).

- Added a method `record_print.record_asis()` to return the object as is.

# CHANGES IN xfun VERSION 0.47

- Added functions `lazy_save()` and `lazy_load()` to save objects to
  files and lazy-load them.

- Fixed a bug in `record(dev = svglite::svglite)` that misplaced plots
  when low-level plot functions are used (thanks, @liao961120,
  yihui/litedown#17).

- Specified the lowest R version required (v3.2.0) for this package.

# CHANGES IN xfun VERSION 0.46

- `md_table()` should add a vertical ellipsis to row names when rows
  are truncated by the `limit` argument.

- `session_info()` recognizes Positron now (thanks, @chuxinyuan, #89).

# CHANGES IN xfun VERSION 0.45

- For `record()` with `verbose = 1` or `2`, invisible `NULL` is no
  longer printed.

- `Rscript_call()` will show the actual error message (if an error
  occurred) during calling the function in a new R session.

# CHANGES IN xfun VERSION 0.44

- Added a function `cache_exec()` to cache the execution of an
  expression either in memory or on disk. It is much more general and
  flexible than `cache_rds()`. For example, it supports custom
  reading/writing methods for cache files, and can load locally
  created variables in the expression while loading cache.

- Added an argument `cache` to `record()` to make it possible to enable caching.

- Added arguments `message` and `warning` to `record()` to decide
  whether messages and warnings should be recorded.

- Changed the default value of the argument `error` of `record()` from
  `FALSE` to `NA`. Now `FALSE` means to suppress error messages, and
  `NA` means to throw errors normally. This is for consistency with
  the `message` and `warning` arguments.

- Added an S3 generic function `record_print()`, which is similar to
  `knitr::knit_print()` but for the purpose of printing visible values
  in `record()`.

- The `record()` function gained new arguments `print` and
  `print.args` to support custom printing functions and arguments.

- Added a function `md_table()`, which is a minimal Markdown table
  generator.

- Exported the internal function `md5()` to calculate the MD5
  checksums of R objects. The function is essentially a workaround for
  `tools::md5sum()` (see HenrikBengtsson/Wishlist-for-R#21).

- For `fenced_block()`, a space is added between the backticks and the
  language name, e.g., ```` ```r ```` has become ```` ``` r ````
  now. This will affect snapshot tests based on Markdown ([an
  example](yihui/knitr-examples@931e0a2)).

- Added a shorthand `fenced_div()` for `fenced_block(char = ':')`.

- `write_utf8()` returns the `con` argument (typically a file path)
  now. Previously, it returns `NULL`.

- Added an experimental function `new_app()` to create a local web application.

- The returned value of `yaml_body()` contains a new element `lines`
  in the list indicating the line numbers of YAML metadata if exists.

- Removed the `skip` argument from `split_source()`.

- For `split_source(line_number = TRUE)`, the attribute name for line
  numbers in the returned value was changed from `line_start` (a
  single starting line number) to `lines` (both the starting and
  ending numbers).

- Fixed an edge case in `prose_index()`, in which inline code was
  incorrectly recognized as a code block fence.

# CHANGES IN xfun VERSION 0.43

- Added a function `upload_imgur()`, which was adapted from
  `knitr::imgur_upload()`. The latter will call the former in the
  future. `xfun::upload_imgur()` allows users to choose whether to use
  the system command `curl` or the R package **curl** to upload the
  image. It also has a new argument `include_xml` to specify whether
  the XML response needs to be included in the returned value.

- Added a function `fenced_block()` to create a fenced block in
  Markdown (thanks, @cderv, yihui/knitr#2331). The block can be either
  a code block or a fenced Div.

- Fixed a bug in `xfun::record()` when the argument `verbose = 1` or `2`.
@HenrikBengtsson
Copy link
Owner Author

These are excellent news. Now we just have to wait for R 4.5.0 to be released. I think we can close this wish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants