update doc

DanChaltiel · Nov 30, 2023 · ce5ecdc · ce5ecdc
1 parent dffd718
commit ce5ecdc
Show file tree

Hide file tree

Showing 6 changed files with 31 additions and 30 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -6,7 +6,7 @@ EDCimport is a package designed to easily import data from EDC software TrialMas
 
 # EDCimport 0.4.0 <sub><sup>2023/xx/xx</sup></sub>
 
-#### New features
+### New features
 
 - New function `check_subjid()` to check if a vector is not missing some patients (#8). 
 ```r
@@ -33,15 +33,15 @@ tibble(subjid=c(1:10, 1)) %>% assert_no_duplicate() %>% nrow()
 - You can now use the syntax `read_trialmaster(split_mixed=c("col1", "col2"))` to split only the datasets you need to (#10).
 
 
-#### Bug fixes & Improvements
+### Bug fixes & Improvements
 
 - Reading with `read_trialmaster()` from cache will output an error if parameters (`split_mixed`, `clean_names_fun`) are different (#4).
 
 - `split_mixed_datasets()` is now fully case-insensitive.  
 
 - Non-UTF8 characters in labels are now identified and corrected during reading (#5).
 
-#### Minor breaking changes
+### Minor breaking changes
 
 - `read_trialmaster(use_cache="write")` is now the default. Reading from cache is not stable yet, so you should opt-in rather than opt-out.
 
@@ -52,7 +52,7 @@ tibble(subjid=c(1:10, 1)) %>% assert_no_duplicate() %>% nrow()
 
 # EDCimport 0.3.0 <sub><sup>2023/05/19</sup></sub>
 
-#### New features
+### New features
 
 - New function `edc_swimmerplot()` to show a swimmer plot of all dates in the database and easily find outliers.
 
@@ -66,7 +66,7 @@ tibble(subjid=c(1:10, 1)) %>% assert_no_duplicate() %>% nrow()
 
 - New helper `unify()`, which turns a vector of duplicate values into a vector of length 1.
 
-#### Bug fixes
+### Bug fixes
 
 - Reading errors are now handled by `read_trialmaster()` instead of failing. If one XPT file is corrupted, the resulting object will contain the error message instead of the dataset.
 

diff --git a/R/helpers.R b/R/helpers.R
@@ -4,7 +4,7 @@
 # User helpers --------------------------------------------------------------------------------
 
 
-#' Find a keyword
+#' Find a keyword in the whole database
 #' 
 #' Find a keyword in all names and labels of a list of datasets. 
 #'
@@ -73,7 +73,7 @@ find_keyword = function(keyword, data=getOption("edc_lookup"), ignore_case=TRUE)
 
 
 
-#' Check completion of subject ID column
+#' Check the completion of the subject ID column
 #' 
 #' Compare a subject ID vector to the study's reference subject ID (usually something like `enrolres$subjid`).
 #'

diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@
 [![R-CMD-check](https://github.com/DanChaltiel/EDCimport/actions/workflows/check-standard.yaml/badge.svg)](https://github.com/DanChaltiel/EDCimport/actions/workflows/check-standard.yaml)
 <!-- badges: end -->
 
-EDCimport is a package designed to easily import data from EDC software TrialMaster.
+EDCimport is a package designed to easily import data from EDC software [TrialMaster](https://www.anjusoftware.com/trial-master/).
 
 ## Installation
 
@@ -20,15 +20,15 @@ devtools::install_github("DanChaltiel/EDCimport")
 
 You will also need [`7-zip`](https://www.7-zip.org/download.html) installed, and preferably added to the [`PATH`](https://www.java.com/en/download/help/path.html).
 
-### Windows-only
 
-This package was developed to work on Windows and is unlikely to work on any other OS. Feel free to submit a PR if you manage to get it to work on another OS.
+> [!WARNING]
+> This package was developed to work on Windows and is unlikely to work on any other OS. 
+> You are very welcome to submit a PR if you manage to get it to work on Mac or Linux.
 
-## TrialMaster
 
 ### Load the data
 
-First, you need to request an export of type `SAS Xport`, with the checkbox "Include Codelists" ticked. This export should generate a `.zip` archive.
+Inside TrialMaster, you should request an export of type `SAS Xport`, with the checkbox "Include Codelists" ticked. This export should generate a `.zip` archive.
 
 Then, simply use `read_trialmaster()` with the archive password (if any) to retrieve the data from the archive:
 
@@ -37,7 +37,7 @@ library(EDCimport)
 tm = read_trialmaster("path/to/my/archive.zip", pw="foobar")
 ```
 
-The resulting object `tm` is a list containing all the datasets, plus the date of extraction (`datetime_extraction`) and a dataset summary (`.lookup`).
+The resulting object `tm` is a list containing all the datasets, plus metadatas.
 
 You can now use `load_list()` to import the list in the global environment and use your tables:
 
@@ -46,23 +46,26 @@ load_list(tm) #this also removes `tm` to save memory
 mean(dataset1$column5)
 ```
 
-There are other options available, e.g. colnames cleaning & table splitting), see `?read_trialmaster` for more details.
+There are many other options available (e.g. colnames cleaning & table splitting), see `?read_trialmaster` for more details.
 
-## Utils
+### Database management tools
 
-`EDCimport` include a set of useful tools that help with using the imported database.
+`EDCimport` include a set of useful tools that help with using the imported database. See [References](https://danchaltiel.github.io/EDCimport/reference/index.html) for a complete list.
 
-### Search the whole database
+#### Database summary
 
-`.lookup` is a dataframe containing for each dataset all its column names and labels.
+Reading a database using `read_trialmaster()` generates the `.lookup` dataframe, which contains for each dataset the number of rows, columns, patients, and the CRF name.
 
-Its main use is to work with `find_keyword()`. For instance, say you do not remember in which dataset and column is located the "date of ECG". `find_keyword()` will search every column name and label and will give you the answer:
+`.lookup` is used by many other tools inside EDCimport, be careful not to modify or delete it.
 
-``` r
-find_keyword("date")
-```
+#### Search the whole database
+
+Using `find_keyword()`, you can run a global search of the database. 
+
+For instance, say you do not remember in which dataset and column is located the "date of ECG". `find_keyword()` will search every column name and label and will give you the answer:
 
 ``` r
+find_keyword("date")
 #> # A tibble: 10 x 3
 #>    dataset names   labels                      
 #>    <chr>   <chr>   <chr>                       
@@ -78,8 +81,6 @@ find_keyword("date")
 #> 10 vs      VISITDT Visit Date
 ```
 
-Note that `find_keyword()` uses the `edc_lookup` option as its second argument, automatically set by `read_trialmaster()`.
-
 ### Swimmer Plot
 
 The `edc_swimmerplot()` function will create a swimmer plot of all date variables in the whole database.

diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -17,7 +17,7 @@ navbar:
 
 
 reference:
-- title: "Main function"
+- title: "Reading databases"
 - contents:
   - read_trialmaster
   - read_tm_all_xpt
@@ -31,15 +31,15 @@ reference:
   - edc_swimmerplot
 - title: "Helpers"
 - contents:
+  - find_keyword
   - assert_no_duplicate
   - check_subjid
   - unify
-  - extend_lookup
-  - find_keyword
-  - get_lookup
   - get_datasets
   - get_key_cols
   - split_mixed_datasets
+  - extend_lookup
+  - get_lookup
 - title: "List Utils"
 - contents:
   - load_list

diff --git a/man/check_subjid.Rd b/man/check_subjid.Rd
diff --git a/man/find_keyword.Rd b/man/find_keyword.Rd