Skip to content

R port of 'universalchardet', that is the encoding detector library of Mozilla.

Notifications You must be signed in to change notification settings

g4challenge/Ruchardet

 
 

Repository files navigation

Ruchardet

An R port of http://code.google.com/p/uchardet/ library

Install

library(devtools)

install_github("Ruchardet", "haven-jeon")

Example

library(Ruchardet, quietly = TRUE)

nm <- "안녕하세요! 고감자입니다"
benc <- detectEncoding(nm)
benc
## [1] "UTF-8"
nme <- iconv(nm, benc, "CP949")
detectEncoding(c(nm, nme))
## [1] "UTF-8"  "EUC-KR"
# detection of unknown file encoding
unknown <- file.path(system.file("tests", package = "Ruchardet"), "shift_jis.txt")
read.table(unknown, fileEncoding = detectFileEncoding(unknown))
##                                                                                                                   V1
## 1 日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語
# URL encoding detection
detectFileEncoding("http://www.ppomppu.co.kr/")
## [1] "EUC-KR"
detectFileEncoding("http://freesearch.pe.kr")
## [1] "UTF-8"

About

R port of 'universalchardet', that is the encoding detector library of Mozilla.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 70.7%
  • C 27.3%
  • CMake 0.9%
  • R 0.6%
  • M4 0.3%
  • Shell 0.1%
  • Other 0.1%