Creating Modules • papercheck

library(papercheck)
#> 
#> 
#> *******************************************
#> ✅ Welcome to PaperCheck
#> For support and examples visit:
#> https://scienceverse.github.io/papercheck/
#> 
#> ⚠️ This is alpha software; please check any
#> results. False positives and negatives will
#> occur at unknown rates.
#> *******************************************

Modules are user-created patterns for checking a paper or set of papers. Module specifications are written in the same format as functions in R packages, using roxygen2 for documentation.

#' Module Name
#'
#' @description
#' A short description of the module
#'
#' @author Author Name (\email{name@email.com})
#'
#' @references
#' # Optional reference to include in reports
#'
#' @import dplyr
#'
#' @param paper a paper object or paperlist object
#' @param ... further arguments (not used)
#'
#' @returns a list with table, summary, traffic light, and report text
#'
#' @examples
#' module_run(psychsci, "module_name")
module_name <- function(paper, ...) {
  # detailed table of results ----
  pattern <- "significant"
  table <- search_text(paper, pattern)

  # summary output for paperlists ----
  # must have id column as the id of each paper, one row per paper
  # further columns to be added to a master summary table
  summary_table <- dplyr::count(table, id, name = "n_significant")

  # determine the traffic light ----
  # possible values: na, info, red, yellow, green, fail
  tl <- if (nrow(table)) "info" else "na"

  # report text for each possible traffic light ----
  report <- c(
    na = "Not applicable",
    info = "This table is provided for your information",
    red = "This is a potential problem",
    yellow = "There may be a problem",
    green = "No problems found",
    fail = "The check failed, sorry"
  )

  # return a list ----
  list(
    table = table,
    summary = summary_table,
    na_replace = 0,
    traffic_light = tl,
    report = report[[tl]]
  )
}

Roxygen Documentation

The module file starts with standard function documentation using roxygen2. Roxygen documentation always starts with #'.

Title

On the first line, give your module a short title, which will be used as a section header in reports.

#' Module Name

Description

You can skip a line and write a 1-sentence description, which will be shown in module_list(), or optionally start this with @description.

#' @description
#' A short description of the module

Details

You can write more detailed help under the tag @details, which will be shown when calling module_help(). This is optional.

#' @details
#' Here is more information about the module to help you use or understand it.
#' 
#' You can skip more lines to break up paragraphs.
#' 
#' * make a list
#' * check it twice

If you have experience writing R functions with roxygen, you can also omit the @description and @details tags and rely on paragraph spacing to distinguish description from details.

Author

Include the module authors so they can get credit! Adda new @author tag for each author, and optionally add ther emailaddress.

#' @author Lisa DeBruine (\email{debruine@gmail.com})
#' @author Daniel Lakens (\email{lakens@gmail.com})

References

Optionally include references that you would want available to users. If you are building a module that uses citable resources, please list them here.

#' @references
#' The Retraction Watch Database [Internet].
#' New York: The Center for Scientific Integrity. 2018.
#' ISSN: 2692-4579. [Cited 2025-05-20].
#' Available from: http://retractiondatabase.org/.

Import

If you are using packages other than papercheck, add each with an @import statement.

#' @import dplyr
#' @import tidyr

Technically, you can then use functions from these packages in your function code without the package name prefix, but it is still best practice to use the package name prefix for all functions, like dplyr::case_when().

Parameters

Each argument should be defined for a function. All papercheck modules require the first argument to be paper. The last argument can optionally be .... This allows the module_run() function to pass any arguments, and your code can use them by name (e.g., extra_args <- list(...)).

#' @param paper a paper object or paperlist object
#' @param ... further arguments (not used)

Returns

It is good practice to explain what your function returns. This is usually the default list with table, summary, traffic light, and report text, but you can edit this. It’s just a human-readable string.

#' @returns a list with table, summary, traffic light, and report text

Examples

You can add an example of how to use this module with the module_run() function. Give a paper or list of papers in the example so you can demonstrate the purpose of this module and it doesn’t take too much tme to run the example.

#' @examples
#' module_run(psychsci, "module_name")

Function Code

The module function is written like any R package function, with the requirement that the first argument be paper. Set module_name to your module name, which must be a valid R variable name. Your module script should also have the same name, with a .R suffix (e.g., module_name.R).

module_name <- function(paper, ...) {
  # detailed table of results ----
  # summary output for paperlists ----
  # determine the traffic light ----
  # report text for each possible traffic light ----
  # return a list ----
}

You can define helper functions below your main module functions, but the first functions defined in the script is what will be run on the paper object.

A module can technically do anything you want with the paper input, but you will need to follow the template below for your module to work automatically with reports and the metascience workflow.

If you are using your modules to build a report, you need to specify what type of output corresponds to good practice or practice that may need improvement. We do this through “traffic_light” and “report”.

Table

Most modules will need to structure their output in a table that can be shown in a report. The search_text() function below creates a table with a row for each sentence that contains to word “significant”.

  # detailed table of results ----
  pattern <- "significant"
  table <- search_text(paper, pattern)

You will need to make sure that your module works with both single paper object sand lists of paper objects. The papercheck functions search_text() and llm() are already vectorised for paper lists.

Summary

For the metascience workflow, it is useful to create a table with a row for each paper in a list, and some columns that summarise the results. You can use nested tables if you want some of your cells to contain multiple values.

  # summary output for paperlists ----
  # must have id column as the id of each paper, one row per paper
  # further columns to be added to a master summary table
  summary_table <- dplyr::count(table, id, name = "n_significant")

Your summary table might omit some papers from the whole list because no relevant text was found. You don’t have to add them into your table, as the module_run() function will do that automatically for you. However, you may want the values of your summary variables to be something other than NA for these missing papers. You can set the value of na_replace in the return list (below) to this default value. For example, if you are returning a summary of the count of sentences with the word “significant”, you can replace NAs with 0.

If you are returning more than one summary column and have different replacement values, use a named list.

na_replace <- list(
  n_significant = 0,
  paper_type = "unknown"
)

Traffic Light

The traffic lights are used in single-paper reports to give a quick visual overview of the module results. There are 5 kinds of traffic lights:

🟢 no problems detected;
🟡 something to check;
🔴 possible problems detected;
🔵 informational only;
⚪️ not applicable;
⚫️ check failed

You will need to write some code to determine which traffic lights apply to your case. If you don’t include a traffic light, but do include a table in the returned list, the following rule will be applied for the traffic light.

  # determine the traffic light ----
  # possible values: na, info, red, yellow, green, fail
  tl <- if (nrow(table)) "info" else "na"

Report Text

Reports need to explain concepts or give resources for further learning. This is often specific to the outcome of a check, so you can use the pattern below to customise the report text for each traffic light.

  # report text for each possible traffic light ----
  report <- c(
    na = "Not applicable",
    info = "This table is provided for your information",
    red = "This is a potential problem",
    yellow = "There may be a problem",
    green = "No problems found",
    fail = "The check failed, sorry"
  )

Return

Structure the returned values in a list, with the names table, summary, na_replace, traffic_light and report.

  # return a list ----
  list(
    table = table,
    summary = summary_table,
    na_replace = 0,
    traffic_light = tl,
    report = report[[tl]]
  )