Modules are user-created patterns for checking a paper or set of papers. Module specifications are written in JSON format.
{
"title": "List All P-Values",
"description": "List all p-values in the text, returning the matched text (e.g., 'p = 0.04') and document location in a table."
}
Search Strategies
Modules can use three search strategies, each of which has its own JSON specification. All three types take either a paper object or the resulting search table as input, making it easy to chain modules together.
Text Search
For checks that can be done with a simple text search, you can create
a module that provides the arguments to the search_text()
function. The “text” key takes a dictionary of the arguments to the
search_text()
function.
Foe example, the following JSON specification defines the arguments needed to search for all instances of p-values in the text and return just the matched text.
Code
For checks that require a bit more logic, you can include R code directly or (more likely) reference an external .R file. The “code” key takes a dictionary of the “code” or a “path” to a file containing code. You can optionally list the “packages” required so users can be prompted to install any unavailable packages.
{
"title": "Check Status of OSF Links",
"description": "List all OSF links and whether they are open, closed, or do not exist.",
"code": {
"packages": ["papercheck", "httr", "dplyr"],
"path": "osf-check.R"
}
}
Code must end with a list that contains values to return. The item
table
should be the table of returned values, and the
optional item traffic_light
contains the classification of
the result (see below).
Large Language Models
The “llm” key takes a dictionary of the arguments to the
llm()
function.
{
"title": "Summarise Sections",
"description": "Generate a 1-sentence summary for each section",
"llm": {
"query": "Summarise this section briefly, in one sentence. Do not include any preamble explaining what you are going to do, just give the sentence."
}
}
Chaining Modules
You can run a chain of modules by including them in order. The first module will take the paper(s) as input, and the rest will take the returned table as input. Here, we want to group the text by section before sending it to the llm.
{
"title": "Summarise Sections",
"description": "Generate a 1-sentence summary for each section",
"text": {
"return": "section"
},
"llm": {
"query": "Summarise this section briefly, in one sentence. Do not include any preamble explaining what you are going to do, just give the sentence."
}
}
Report Info
If you are using your modules to build a report, you need to specify what type of output corresponds to good practice or practice that may need improvement. We do this through “traffic-light” and “report” keys.
Traffic Lights
There are 5 kinds of traffic lights:
🟢 no problems detected;
🟡 something to check;
🔴 possible
problems detected;
🔵 informational only;
⚪️ not applicable;
⚫️ check failed
The simplest way to set traffic lights is to specify the meaning of “found” and “not_found”. If the module produces more than 0 rows in the output table, then the traffic light takes the “found” value, otherwise the “not_found” value.
{
"title": "List All P-Values",
"description": "List all p-values in the text, returning the matched text (e.g., 'p = 0.04') and document location in a table.",
"text": {
"pattern": "\\bp-?(value)?\\s*[<>=≤≥]{1,2}\\s*(n\\.?s\\.?|\\d?\\.\\d+)(e-\\d+)?",
"return": "match",
"perl": true
},
"traffic_light": {
"found": "info",
"not_found": "na"
}
}
If you are using the “code” type, you can also specify the traffic light in the returned list.
# code for imprecise-p module
p <- module_run(paper, "all-p-values")$table
p$p_comp <- gsub("p-?(value)?\\s*|\\s*\\d?\\.\\d+e?-?\\d*", "", p$text)
p$p_value <- gsub("^p-?(value)?\\s*[<>=≤≥]{1,2}\\s*", "", p$text)
p$p_value <- suppressWarnings(as.numeric(p$p_value))
p$imprecise <- p$p_comp == "<" & p$p_value > .001
p$imprecise <- p$imprecise | p$p_comp == ">"
p$imprecise <- p$imprecise | is.na(p$p_value)
cols <- c("text", "section", "header", "div", "p", "s", "id")
if (nrow(p) == 0) {
tl <- "na"
} else if (any(p$imprecise)) {
tl <- "red"
} else if (!all(p$imprecise)) {
tl <- "green"
} else {
tl <- "yellow"
}
list(
table = p[p$imprecise, cols],
traffic_light = tl
)
Report
Any text that you want to include in the report should be specified in the “report” key. You can set different text for each traffic light, and/or text to include in all reports.
{
"title": "Reference Consistency",
"description": "Check if all references are cited and all citations are referenced",
"code": {
"packages": ["dplyr"],
"path": "ref-consistency.R"
},
"report": {
"all": "This module relies on Grobid correctly parsing the references. There may be some false positives.",
"red": "There are references that are not cited or citations that are not referenced",
"green": "All references were cited and citations were referenced",
"na": "No citations/references were detected"
}
}
Authors
You can also include author information in the following format:
{
"title": "List All P-Values",
"description": "List all p-values in the text, returning the matched text (e.g., 'p = 0.04') and document location in a table.",
"authors": [{
"orcid": "0000-0002-7523-5539",
"name":{
"surname": "DeBruine",
"given": "Lisa"
},
"email": "debruine@gmail.com"
}],
"text": {
"pattern": "\\bp-?(value)?\\s*[<>=≤≥]{1,2}\\s*(n\\.?s\\.?|\\d?\\.\\d+)(e-\\d+)?",
"return": "match",
"perl": true
},
"traffic_light": {
"found": "info",
"not_found": "na"
}
}