Installation
You can install the development version of papercheck from GitHub with:
# install.packages("devtools")
devtools::install_github("scienceverse/papercheck")
#>
#>
#> *******************************************
#> ✅ Welcome to PaperCheck
#> For support and examples visit:
#> https://scienceverse.github.io/papercheck/
#>
#> ⚠️ This is alpha software; please check any
#> results. False positives and negatives will
#> occur at unknown rates.
#> *******************************************
You can launch an interactive shiny app version of the code below with:
Load from PDF
The function pdf2grobid()
can read PDF files and save
them in the TEI format created by grobid. This requires an
internet connection and takes a few seconds per paper, so should only be
done once and the results saved for later use.
If the server is unavailable, you can use a grobid web interface.
pdf_file <- demopdf()
xml_file <- pdf2grobid(pdf_file)
You can set up your own local grobid server following instructions from https://grobid.readthedocs.io/. The easiest way is to use Docker.
Then you can set your grobid_url to the local path http://localhost:8070.
xml_file <- pdf2grobid(pdf_file, grobid_url = "http://localhost:8070")
Load from XML
The function read_grobid()
can read XML files parsed by
grobid.
paper <- read_grobid(xml_file)
The function read_cermine()
can read XML files parsed by
cermine. This is not as good as
grobid at parsing papers, and omits figure and table captions.
cermine_xml_file <- system.file("psychsci/0956797620955209.cermine.xml",
package = "papercheck")
paper <- read_cermine(cermine_xml_file)
Load from non-PDF document
To take advantage of grobid’s ability to parse references and other
aspects of papers, for now the best way is to convert your papers to
PDF. However, papercheck can read in plain text from a character object
or text/docx file with read_text()
.
text <- "Abstract
This is my very short paper. It has two sentences."
shortpaper <- read_text(text, id = "shortpaper")
shortpaper$full_text
#> # A tibble: 3 × 7
#> text section header div p s id
#> <chr> <chr> <chr> <int> <dbl> <int> <chr>
#> 1 Abstract abstract Abstract 1 0 1 shortpaper
#> 2 This is my very short paper. abstract Abstract 1 1 1 shortpaper
#> 3 It has two sentences. abstract Abstract 1 1 2 shortpaper
filename <- system.file("extdata/to_err_is_human.docx",
package = "papercheck")
paper_from_doc <- read_text(filename)
Batch Processing
The functions pdf2grobid()
and
read_grobid()
also work on a folder of files, returning a
list of XML file paths or paper objects, respectively. The functions
search_text()
, expand_text()
and
llm()
also work on a list of paper objects.
grobid_dir <- demodir()
papers <- read_grobid(grobid_dir)
hypotheses <- search_text(papers, "hypothesi",
section = "intro",
return = "paragraph")
Paper Components
Paper objects contain a lot of structured information, including info, references, and citations.
Info
paper$info
#> $filename
#> [1] "/private/var/folders/t6/7x6md_5s2j5bfb324s784yzw0000gn/T/RtmpLA1jfq/temp_libpath7e3d3b0651/papercheck/extdata/to_err_is_human.xml"
#>
#> $title
#> [1] "To Err is Human: An Empirical Investigation"
#>
#> $description
#> [1] "This paper demonstrates some good and poor practices for use with the {papercheck} R package and Shiny app. All data are simulated. The paper shows examples of (1) open and closed OSF links; (2) citation of retracted papers; (3) missing/mismatched citations and references; (4) imprecise reporting of p-values; and (5) use of \"marginally significant\" to describe non-significant findings."
#>
#> $keywords
#> character(0)
References
References are provided in a tabular format.
paper$references
bib_id | ref | doi | bibtype | title | journal | year | authors |
---|---|---|---|---|---|---|---|
b0 | Gino F, Wiltermuth SS (2014). “Retracted: Evil Genius? How Dishonesty Can Lead to Greater Creativity.” Psychological Science, 25(4), 973-981. doi:10.1177/0956797614520714 https://doi.org/10.1177/0956797614520714. | 10.1177/0956797614520714 | Article | Retracted: Evil Genius? How Dishonesty Can Lead to Greater Creativity | Psychological Science | 2014 | F Gino, S S Wiltermuth |
b1 | Smith F (2021). “Human error is a symptom of a poor design.” Journal of Journals, 0(0), 0. doi:10.0000/0123456789 https://doi.org/10.0000/0123456789. | 10.0000/0123456789 | Article | Human error is a symptom of a poor design | Journal of Journals | 2021 | F Smith |
Citations
Citations are also provided in a tabular format, with a bib_id to match the references.
paper$citations
bib_id | text |
---|---|
b0 | Although intentional dishonestly might be a successful way to boost creativity (Gino & Wiltermuth, 2014), it is safe to say most mistakes researchers make are unintentional. |
(Smithy, 2020) | From a human factors perspective, human error is a symptom of a poor design (Smithy, 2020). |
Batch
There are functions to combine the infomation from a list of papers,
like the psychsci
built-in dataset of 250 open access
papers from Psychological Science.
info_table(psychsci[1:5], c("title", "doi"))
#> # A tibble: 5 × 3
#> id title doi
#> <chr> <chr> <chr>
#> 1 0956797613520608 Continuous Theta-Burst Stimulation Demonstrates a Caus… 10.1…
#> 2 0956797614522816 Beyond Gist: Strategic and Incremental Information Acc… 10.1…
#> 3 0956797614527830 Serotonin and Social Norms: Tryptophan Depletion Impai… 10.1…
#> 4 0956797614557697 Action-Specific Disruption of Perceptual Confidence 10.1…
#> 5 0956797614560771 Emotional Vocalizations Are Recognized Across Cultures… 10.1…
concat_tables(psychsci[1:5], "references") |>
dplyr::filter(!is.na(doi))
#> bib_id
#> 1 b40
#> 2 b0
#> 3 b1
#> 4 b2
#> 5 b3
#> 6 b4
#> ref
#> 1 Zylberberg A, Barttfeld P, Sigman M (2012). “The construction of confidence in a perceptual decision.” _Frontiers in Integrative Neuroscience_, *6*. doi:10.3389/fnint.2012.00079/full <https://doi.org/10.3389/fnint.2012.00079/full>.
#> 2 Ekman P (1994). “Strong evidence for universals in facial expressions: A reply to Russell's mistaken critique.” _Psychological Bulletin_, *115*, 268-287. doi:10.1037/0033-2909.115 <https://doi.org/10.1037/0033-2909.115>.
#> 3 Gendron M, Roberson D, Van Der Vyver JM, Barrett LF (2014). “Cultural relativity in perceiving emotion from vocalizations.” _Psychological Science_, *25*, 911-920. doi:10.1177/0956797613517239 <https://doi.org/10.1177/0956797613517239>.
#> 4 Russell JA (1994). “Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies.” _Psychological Bulletin_, *115*, 102-141. doi:10.1037/0033-2909.115.1.102 <https://doi.org/10.1037/0033-2909.115.1.102>.
#> 5 Sauter DA, Eisner F, Calder AJ, Scott SK (2010). “Perceptual cues in non-verbal vocal expressions of emotion.” _The Quarterly Journal of Experimental Psychology_, *63*, 2251-2272. doi:10.1080/17470211003721642 <https://doi.org/10.1080/17470211003721642>.
#> 6 Sauter DA, Eisner F, Ekman P, Scott SK (2010). “Crosscultural recognition of basic emotions through nonverbal emotional vocalizations.” _Proceedings of the National Academy of Sciences_, *107*, 2408-2412. doi:10.1073/pnas.0908239106 <https://doi.org/10.1073/pnas.0908239106>.
#> doi bibtype
#> 1 10.3389/fnint.2012.00079/full Article
#> 2 10.1037/0033-2909.115 Article
#> 3 10.1177/0956797613517239 Article
#> 4 10.1037/0033-2909.115.1.102 Article
#> 5 10.1080/17470211003721642 Article
#> 6 10.1073/pnas.0908239106 Article
#> title
#> 1 The construction of confidence in a perceptual decision
#> 2 Strong evidence for universals in facial expressions: A reply to Russell's mistaken critique
#> 3 Cultural relativity in perceiving emotion from vocalizations
#> 4 Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies
#> 5 Perceptual cues in non-verbal vocal expressions of emotion
#> 6 Crosscultural recognition of basic emotions through nonverbal emotional vocalizations
#> journal year
#> 1 Frontiers in Integrative Neuroscience 2012
#> 2 Psychological Bulletin 1994
#> 3 Psychological Science 2014
#> 4 Psychological Bulletin 1994
#> 5 The Quarterly Journal of Experimental Psychology 2010
#> 6 Proceedings of the National Academy of Sciences 2010
#> authors id
#> 1 A Zylberberg, P Barttfeld, M Sigman 0956797614557697
#> 2 P Ekman 0956797614560771
#> 3 M Gendron, D Roberson, J M Van Der Vyver, L F Barrett 0956797614560771
#> 4 J A Russell 0956797614560771
#> 5 D A Sauter, F Eisner, A J Calder, S K Scott 0956797614560771
#> 6 D A Sauter, F Eisner, P Ekman, S K Scott 0956797614560771
concat_tables(psychsci[1:40], "citations") |>
dplyr::filter(grepl("replicat", text)) |>
dplyr::count(id, text)
#> id
#> 1 0956797614560771
#> 2 0956797616665351
#> 3 0956797617702699
#> 4 0956797617705667
#> 5 0956797617716922
#> text
#> 1 1 We reanalyzed the data from the 29 Himba participants in our original study who had heard British vocalizations, the part of our study that Gendron et al. (2014) replicated.
#> 2 At a group level, therefore, the perceptual data replicated previous findings (Freeman et al., 2010;Hürlimann et al., 2002;Sotiropoulos et al., 2014;Stocker & Simoncelli, 2006;Thompson, 1982).
#> 3 A study by Papesh (2015) is particularly relevant: She failed to replicate the findings of Glenberg and Kaschak (2002)-a key study that we have highlighted here as well.
#> 4 This replicates the main finding of Beck et al. (2012), who also used displays with both relevant colors present.
#> 5 Though we await replication of our findings, we see the current data as promising initial evidence for the coevolved proximal mechanisms through which oxytocin facilitates potentially life-enhancing connections (Holt-Lunstad et al., 2010).
#> n
#> 1 1
#> 2 5
#> 3 2
#> 4 1
#> 5 1
Search Text
You can access a parsed table of the full text of the paper via
paper$full_text
, but you may find it more convenient to use
the function search_text()
. The defaults return a data
table of each sentence, with the section type, header, div, paragraph
and sentence numbers, and file name. (The section type is a best guess
from the headers, so may not always be accurate.)
text <- search_text(paper)
text | section | header | div | p | s | id |
---|---|---|---|---|---|---|
This paper demonstrates some good and poor practices for use with the {papercheck} R package and Shiny app. | abstract | Abstract | 0 | 1 | 1 | to_err_is_human.xml |
Although intentional dishonestly might be a successful way to boost creativity (Gino & Wiltermuth, 2014), it is safe to say most mistakes researchers make are unintentional. | intro | Introduction | 1 | 1 | 1 | to_err_is_human.xml |
In this study we examine whether automated checks reduce the amount of errors that researchers make in scientific manuscripts. | method | Method and Participants | 2 | 1 | 1 | to_err_is_human.xml |
All data needed to reproduce these analyses is available from https://osf.io/5tbm9 and code is available from https://osf.io/629bx. | results | Results | 3 | 1 | 1 | to_err_is_human.xml |
It seems automated tools can help prevent errors by providing researchers with feedback about potential mistakes, and researchers feel the app is useful. | discussion | Discussion | 4 | 1 | 1 | to_err_is_human.xml |
Pattern
You can search for a specific word or phrase by setting the
pattern
argument. The pattern is a regex string by default;
set fixed = TRUE
if you want to find exact text
matches.
text <- search_text(paper, pattern = "papercheck")
text | section | header | div | p | s | id |
---|---|---|---|---|---|---|
This paper demonstrates some good and poor practices for use with the {papercheck} R package and Shiny app. | abstract | Abstract | 0 | 1 | 1 | to_err_is_human.xml |
In this study we examine the usefulness of Papercheck to improve best practices. | intro | Introduction | 1 | 1 | 4 | to_err_is_human.xml |
Section
Set section
to a vector of the sections to search
in.
text <- search_text(paper, "papercheck",
section = "abstract")
text | section | header | div | p | s | id |
---|---|---|---|---|---|---|
This paper demonstrates some good and poor practices for use with the {papercheck} R package and Shiny app. | abstract | Abstract | 0 | 1 | 1 | to_err_is_human.xml |
Return
Set return
to one of “sentence”, “paragraph”, “section”,
or “match” to control what gets returned.
text <- search_text(paper, "papercheck",
section = "intro",
return = "paragraph")
text | section | header | div | p | s | id |
---|---|---|---|---|---|---|
Although intentional dishonestly might be a successful way to boost creativity (Gino & Wiltermuth, 2014), it is safe to say most mistakes researchers make are unintentional. From a human factors perspective, human error is a symptom of a poor design (Smithy, 2020). Automation can be use to check for errors in scientific manuscripts, and inform authors about possible corrections. In this study we examine the usefulness of Papercheck to improve best practices. | intro | Introduction | 1 | 1 | NA | to_err_is_human.xml |
Regex matches
You can also return just the matched text from a regex search by
setting return = "match"
. The extra ...
arguments in search_text()
are passed to
grep()
, so perl = TRUE
allows you to use more
complex regex, like below.
pattern <- "[a-zA-Z]\\S*\\s*(=|<)\\s*[0-9\\.,-]*\\d"
text <- search_text(paper, pattern, return = "match", perl = TRUE)
text | section | header | div | p | s | id |
---|---|---|---|---|---|---|
M = 9.12 | results | Results | 3 | 1 | 2 | to_err_is_human.xml |
M = 10.9 | results | Results | 3 | 1 | 2 | to_err_is_human.xml |
t(97.7) = 2.9 | results | Results | 3 | 1 | 2 | to_err_is_human.xml |
p = 0.005 | results | Results | 3 | 1 | 2 | to_err_is_human.xml |
M = 5.06 | results | Results | 3 | 2 | 1 | to_err_is_human.xml |
M = 4.5 | results | Results | 3 | 2 | 1 | to_err_is_human.xml |
t(97.2) = -1.96 | results | Results | 3 | 2 | 1 | to_err_is_human.xml |
p = 0.152 | results | Results | 3 | 2 | 1 | to_err_is_human.xml |
Expand Text
You can expand the text returned by search_text()
or a
module with expand_text()
.
marginal <- search_text(paper, "marginal") |>
expand_text(paper, plus = 1, minus = 1)
marginal[, c("text", "expanded")]
#> # A tibble: 2 × 2
#> text expanded
#> <chr> <chr>
#> 1 "The paper shows examples of (1) open and closed OSF links; (2) cita… "All da…
#> 2 "On average researchers in the experimental condition found the app … "On ave…
Large Language Models
You can query the extracted text of papers with LLMs using groq.
Setup
You will need to get your own API key from https://console.groq.com/keys. To avoid having to type
it out, add it to the .Renviron file in the following format (you can
use usethis::edit_r_environ()
to access the .Renviron
file).
# useful if you aren't sure where this file is
usethis::edit_r_environ()
You can get or set the default LLM model with
llm_model()
and access a list of the current available
models using llm_model_list()
.
id | owned_by | created | context_window |
---|---|---|---|
mistral-saba-24b | Mistral AI | 2025-02-19 | 32768 |
qwen-2.5-coder-32b | Alibaba Cloud | 2025-02-14 | 131072 |
deepseek-r1-distill-qwen-32b | DeepSeek / Alibaba Cloud | 2025-02-07 | 131072 |
qwen-2.5-32b | Alibaba Cloud | 2025-02-05 | 131072 |
deepseek-r1-distill-llama-70b | DeepSeek / Meta | 2025-01-26 | 131072 |
llama-3.3-70b-specdec | Meta | 2024-12-06 | 8192 |
llama-3.3-70b-versatile | Meta | 2024-12-06 | 32768 |
llama-3.2-3b-preview | Meta | 2024-09-25 | 8192 |
llama-3.2-1b-preview | Meta | 2024-09-25 | 8192 |
llama-guard-3-8b | Meta | 2023-09-03 | 8192 |
mixtral-8x7b-32768 | Mistral AI | 2023-09-03 | 32768 |
llama-3.1-8b-instant | Meta | 2023-09-03 | 131072 |
llama3-8b-8192 | Meta | 2023-09-03 | 8192 |
gemma2-9b-it | 2023-09-03 | 8192 | |
llama3-70b-8192 | Meta | 2023-09-03 | 8192 |
LLM Queries
You can query the extracted text of papers with LLMs using groq. See ?llm
for details of how to get and set up your API key, choose an LLM, and
adjust settings.
Use search_text()
first to narrow down the text into
what you want to query. Below, we limited search to the first ten
papers’ method sections, and returned sentences that contains the word
“power” and at least one number. Then we asked an LLM to determine if
this is an a priori power analysis, and if so, to return some relevant
values in a JSON-structured format.
power <- psychsci[1:10] |>
# sentences containing the word power
search_text("power", section = "method") |>
# and containing at least one number
search_text("[0-9]")
# ask a specific question with specific response format
query <- 'Does this sentence report an a priori power analysis? If so, return the test, sample size, critical alpha criterion, power level, effect size and effect size metric plus any other relevant parameters, in JSON format like:
{
"apriori": true,
"test": "paired samples t-test",
"sample": 20,
"alpha": 0.05,
"power": 0.8,
"es": 0.4,
"es_metric": "cohen\'s D"
}
If not, return {"apriori": false}
Answer only in valid JSON format, starting with { and ending with }.'
llm_power <- llm(power, query, seed = 8675309)
Expand JSON
It is useful to ask an LLM to return data in JSON structured format,
but can be frustrating to extract the data, especially where the LLM
makes syntax mistakes. The function json_expand()
tries to
expand a column with a JSON-formatted response into columns and deals
with it gracefully (sets an ‘error’ column to “parsing error”) if there
are errors. It also fixes column data types, if possible.
llm_response <- json_expand(llm_power, "answer") |>
dplyr::select(text, apriori:es_metric)
text | apriori | test | sample | alpha | power | es | es_metric |
---|---|---|---|---|---|---|---|
Sample size was calculated with an a priori power analysis, using the effect sizes reported by Küpper et al. (2014), who used identical procedures, materials, and dependent measures. | TRUE | NA | NA | NA | NA | NA | NA |
We determined that a minimum sample size of 7 per group would be necessary for 95% power to detect an effect. | TRUE | 7 | NA | 0.95 | |||
For the first part of the task, 11 static visual images, one from each of the scenes in the film were presented once each on a black background for 2 s using Power-Point. | FALSE | NA | NA | NA | NA | NA | NA |
A sample size of 26 per group was required to ensure 80% power to detect this difference at the 5% significance level. | TRUE | two-sample t-test | 26 | 0.050 | 0.80 | NA | NA |
A sample size of 18 per condition was required in order to ensure an 80% power to detect this difference at the 5% significance level. | TRUE | t-test | 18 | 0.050 | 0.80 | NA | NA |
The 13,500 selected loan requests conservatively achieved a power of .98 for an effect size of .07 at an alpha level of .05. | TRUE | 13500 | 0.050 | 0.98 | 0.07 | ||
On the basis of simulations over a range of expected effect sizes for contrasts of fMRI activity, we estimated that a sample size of 24 would provide .80 power at a conservative brainwide alpha threshold of .002 (although such thresholds ideally should be relaxed for detecting activity in regions where an effect is predicted). | TRUE | unknown | 24 | 0.002 | 0.80 | unknown | unknown |
Stimulus sample size was determined via power analysis of the sole existing similar study, which used neural activity to predict Internet downloads of music (Berns & Moore, 2012). | TRUE | NA | NA | NA | NA | NA | NA |
The effect size from that study implied that a sample size of 72 loan requests would be required to achieve .80 power at an alpha level of .05. | TRUE | 72 | 0.050 | 0.80 |
Rate Limiting
Remember, the llm()
function makes a separate query for
each row in a data frame from search_text()
. Free GROQ
accounts are rate limited, so we set the default limits to 30 queries,
but you can change this:
llm_max_calls(30)
If you hit your rate limit, the llm()
function will add
a short delay between calls, so don’t worry if you notice the query
speed slowing down after the first 30 calls.
Modules
Papercheck is designed modularly, so you can add modules to check for anything. It comes with a set of pre-defined modules, and we hope people will share more modules.
Module List
You can see the list of built-in modules with the function below.
#> * all_p_values: List all p-values in the text, returning the matched text (e.g., 'p = 0.04')
#> and document location in a table.
#>
#> Note that this will not catch p-values reported like "the p-value is 0.03" because that results in a ton of false positives when papers discuss p-value thresholds. If you need to detect text like that, use `search_text()` function and a custom pattern like "\\bp(-| )?values?\\s+.{1,20}\\s+[0-9\\.]+"
#>
#> This will catch most comparators like =<>~≈≠≤≥≪≫ and most versions of scientific notation like 5.0 x 10^-2 or 5.0e-2. If you find any formats that are not correctly handled by this function, please contact the author.
#> * all_urls: List all the URLs in the main text.
#> * effect_size: Detect t-tests and F-tests with missing effect sizes
#> * exact_p: List any p-values reported with insufficient precision (e.g., p < .05 or p = n.s.)
#> * marginal: List all sentences that describe an effect as 'marginally significant'.
#> * osf_check: List all OSF links and whether they are open, closed, or do not exist.
#> * ref_consistency: Check if all references are cited and all citations are referenced
#> * retractionwatch: Flag any cited papers in the RetractionWatch database
#> * statcheck: Check consistency of p-values and test statistics
#>
#> Use `module_help("module_name")` for help with a specific module
Running modules
To run a built-in module on a paper, you can reference it by name.
p <- module_run(paper, "all_p_values")
text | section | header | div | p | s | id | p_comp | p_value |
---|---|---|---|---|---|---|---|---|
p = 0.005 | results | Results | 3 | 1 | 2 | to_err_is_human.xml | = | 0.005 |
p = 0.152 | results | Results | 3 | 2 | 1 | to_err_is_human.xml | = | 0.152 |
p > .05 | results | Results | 3 | 2 | 2 | to_err_is_human.xml | > | 0.050 |
Creating modules
You can create your own modules using R code. Modules can also contain instructions for reporting, to give “traffic lights” for whether a check passed or failed, and to include appropriate text feedback in a report. See the modules vignette for more details.
Reports
You can generate a report from any set of modules. The default set is
c("exact_p", "marginal", "effect_size", "osf_check", "retractionwatch", "ref_consistency")
report(paper, output_format = "qmd")
See the example report.