Skip to contents

Installation

You can install the development version of papercheck from GitHub with:

# install.packages("devtools")
devtools::install_github("scienceverse/papercheck")
#> 
#> 
#> *******************************************
#> ✅ Welcome to PaperCheck
#> For support and examples visit:
#> https://scienceverse.github.io/papercheck/
#> 
#> ⚠️ This is alpha software; please check any
#> results. False positives and negatives will
#> occur at unknown rates.
#> *******************************************

You can launch an interactive shiny app version of the code below with:

Load from PDF

The function pdf2grobid() can read PDF files and save them in the TEI format created by grobid. This requires an internet connection and takes a few seconds per paper, so should only be done once and the results saved for later use.

If the server is unavailable, you can use a grobid web interface.

pdf_file <- demopdf()
xml_file <- pdf2grobid(pdf_file)

You can set up your own local grobid server following instructions from https://grobid.readthedocs.io/. The easiest way is to use Docker.

docker run --rm --init --ulimit core=0 -p 8070:8070 lfoppiano/grobid:0.8.1

Then you can set your grobid_url to the local path http://localhost:8070.

xml_file <- pdf2grobid(pdf_file, grobid_url = "http://localhost:8070")

Load from XML

The function read_grobid() can read XML files parsed by grobid.

paper <- read_grobid(xml_file)

The function read_cermine() can read XML files parsed by cermine. This is not as good as grobid at parsing papers, and omits figure and table captions.

cermine_xml_file <- system.file("psychsci/0956797620955209.cermine.xml",
                                package = "papercheck")
paper <- read_cermine(cermine_xml_file)

Load from non-PDF document

To take advantage of grobid’s ability to parse references and other aspects of papers, for now the best way is to convert your papers to PDF. However, papercheck can read in plain text from a character object or text/docx file with read_text().

text <- "Abstract

This is my very short paper. It has two sentences."
shortpaper <- read_text(text, id = "shortpaper")
shortpaper$full_text
#> # A tibble: 3 × 7
#>   text                         section  header     div     p     s id        
#>   <chr>                        <chr>    <chr>    <int> <dbl> <int> <chr>     
#> 1 Abstract                     abstract Abstract     1     0     1 shortpaper
#> 2 This is my very short paper. abstract Abstract     1     1     1 shortpaper
#> 3 It has two sentences.        abstract Abstract     1     1     2 shortpaper
filename <- system.file("extdata/to_err_is_human.docx", 
                        package = "papercheck")
paper_from_doc <- read_text(filename)

Batch Processing

The functions pdf2grobid() and read_grobid() also work on a folder of files, returning a list of XML file paths or paper objects, respectively. The functions search_text(), expand_text() and llm() also work on a list of paper objects.

grobid_dir <- demodir()

papers <- read_grobid(grobid_dir)

hypotheses <- search_text(papers, "hypothesi", 
                          section = "intro", 
                          return = "paragraph")

Paper Components

Paper objects contain a lot of structured information, including info, references, and citations.

Info

paper$info
#> $filename
#> [1] "/private/var/folders/t6/7x6md_5s2j5bfb324s784yzw0000gn/T/RtmpLA1jfq/temp_libpath7e3d3b0651/papercheck/extdata/to_err_is_human.xml"
#> 
#> $title
#> [1] "To Err is Human: An Empirical Investigation"
#> 
#> $description
#> [1] "This paper demonstrates some good and poor practices for use with the {papercheck} R package and Shiny app. All data are simulated. The paper shows examples of (1) open and closed OSF links; (2) citation of retracted papers; (3) missing/mismatched citations and references; (4) imprecise reporting of p-values; and (5) use of \"marginally significant\" to describe non-significant findings."
#> 
#> $keywords
#> character(0)

References

References are provided in a tabular format.

paper$references
bib_id ref doi bibtype title journal year authors
b0 Gino F, Wiltermuth SS (2014). “Retracted: Evil Genius? How Dishonesty Can Lead to Greater Creativity.” Psychological Science, 25(4), 973-981. doi:10.1177/0956797614520714 https://doi.org/10.1177/0956797614520714. 10.1177/0956797614520714 Article Retracted: Evil Genius? How Dishonesty Can Lead to Greater Creativity Psychological Science 2014 F Gino, S S Wiltermuth
b1 Smith F (2021). “Human error is a symptom of a poor design.” Journal of Journals, 0(0), 0. doi:10.0000/0123456789 https://doi.org/10.0000/0123456789. 10.0000/0123456789 Article Human error is a symptom of a poor design Journal of Journals 2021 F Smith

Citations

Citations are also provided in a tabular format, with a bib_id to match the references.

paper$citations
bib_id text
b0 Although intentional dishonestly might be a successful way to boost creativity (Gino & Wiltermuth, 2014), it is safe to say most mistakes researchers make are unintentional.
(Smithy, 2020) From a human factors perspective, human error is a symptom of a poor design (Smithy, 2020).

Batch

There are functions to combine the infomation from a list of papers, like the psychsci built-in dataset of 250 open access papers from Psychological Science.

info_table(psychsci[1:5], c("title", "doi"))
#> # A tibble: 5 × 3
#>   id               title                                                   doi  
#>   <chr>            <chr>                                                   <chr>
#> 1 0956797613520608 Continuous Theta-Burst Stimulation Demonstrates a Caus… 10.1…
#> 2 0956797614522816 Beyond Gist: Strategic and Incremental Information Acc… 10.1…
#> 3 0956797614527830 Serotonin and Social Norms: Tryptophan Depletion Impai… 10.1…
#> 4 0956797614557697 Action-Specific Disruption of Perceptual Confidence     10.1…
#> 5 0956797614560771 Emotional Vocalizations Are Recognized Across Cultures… 10.1…
concat_tables(psychsci[1:5], "references") |>
  dplyr::filter(!is.na(doi))
#>   bib_id
#> 1    b40
#> 2     b0
#> 3     b1
#> 4     b2
#> 5     b3
#> 6     b4
#>                                                                                                                                                                                                                                                                                   ref
#> 1                                             Zylberberg A, Barttfeld P, Sigman M (2012). “The construction of confidence in a perceptual decision.” _Frontiers in Integrative Neuroscience_, *6*. doi:10.3389/fnint.2012.00079/full <https://doi.org/10.3389/fnint.2012.00079/full>.
#> 2                                                        Ekman P (1994). “Strong evidence for universals in facial expressions: A reply to Russell's mistaken critique.” _Psychological Bulletin_, *115*, 268-287. doi:10.1037/0033-2909.115 <https://doi.org/10.1037/0033-2909.115>.
#> 3                                        Gendron M, Roberson D, Van Der Vyver JM, Barrett LF (2014). “Cultural relativity in perceiving emotion from vocalizations.” _Psychological Science_, *25*, 911-920. doi:10.1177/0956797613517239 <https://doi.org/10.1177/0956797613517239>.
#> 4                             Russell JA (1994). “Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies.” _Psychological Bulletin_, *115*, 102-141. doi:10.1037/0033-2909.115.1.102 <https://doi.org/10.1037/0033-2909.115.1.102>.
#> 5                      Sauter DA, Eisner F, Calder AJ, Scott SK (2010). “Perceptual cues in non-verbal vocal expressions of emotion.” _The Quarterly Journal of Experimental Psychology_, *63*, 2251-2272. doi:10.1080/17470211003721642 <https://doi.org/10.1080/17470211003721642>.
#> 6 Sauter DA, Eisner F, Ekman P, Scott SK (2010). “Crosscultural recognition of basic emotions through nonverbal emotional vocalizations.” _Proceedings of the National Academy of Sciences_, *107*, 2408-2412. doi:10.1073/pnas.0908239106 <https://doi.org/10.1073/pnas.0908239106>.
#>                             doi bibtype
#> 1 10.3389/fnint.2012.00079/full Article
#> 2         10.1037/0033-2909.115 Article
#> 3      10.1177/0956797613517239 Article
#> 4   10.1037/0033-2909.115.1.102 Article
#> 5     10.1080/17470211003721642 Article
#> 6       10.1073/pnas.0908239106 Article
#>                                                                                                      title
#> 1                                                  The construction of confidence in a perceptual decision
#> 2             Strong evidence for universals in facial expressions: A reply to Russell's mistaken critique
#> 3                                             Cultural relativity in perceiving emotion from vocalizations
#> 4 Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies
#> 5                                               Perceptual cues in non-verbal vocal expressions of emotion
#> 6                    Crosscultural recognition of basic emotions through nonverbal emotional vocalizations
#>                                            journal year
#> 1            Frontiers in Integrative Neuroscience 2012
#> 2                           Psychological Bulletin 1994
#> 3                            Psychological Science 2014
#> 4                           Psychological Bulletin 1994
#> 5 The Quarterly Journal of Experimental Psychology 2010
#> 6  Proceedings of the National Academy of Sciences 2010
#>                                                 authors               id
#> 1                   A Zylberberg, P Barttfeld, M Sigman 0956797614557697
#> 2                                               P Ekman 0956797614560771
#> 3 M Gendron, D Roberson, J M Van Der Vyver, L F Barrett 0956797614560771
#> 4                                           J A Russell 0956797614560771
#> 5           D A Sauter, F Eisner, A J Calder, S K Scott 0956797614560771
#> 6              D A Sauter, F Eisner, P Ekman, S K Scott 0956797614560771
concat_tables(psychsci[1:40], "citations") |>
  dplyr::filter(grepl("replicat", text)) |>
  dplyr::count(id, text)
#>                 id
#> 1 0956797614560771
#> 2 0956797616665351
#> 3 0956797617702699
#> 4 0956797617705667
#> 5 0956797617716922
#>                                                                                                                                                                                                                                              text
#> 1                                                                 1 We reanalyzed the data from the 29 Himba participants in our original study who had heard British vocalizations, the part of our study that Gendron et al. (2014) replicated.
#> 2                                                At a group level, therefore, the perceptual data replicated previous findings (Freeman et al., 2010;Hürlimann et al., 2002;Sotiropoulos et al., 2014;Stocker & Simoncelli, 2006;Thompson, 1982).
#> 3                                                                       A study by Papesh (2015) is particularly relevant: She failed to replicate the findings of Glenberg and Kaschak (2002)-a key study that we have highlighted here as well.
#> 4                                                                                                                               This replicates the main finding of Beck et al. (2012), who also used displays with both relevant colors present.
#> 5 Though we await replication of our findings, we see the current data as promising initial evidence for the coevolved proximal mechanisms through which oxytocin facilitates potentially life-enhancing connections (Holt-Lunstad et al., 2010).
#>   n
#> 1 1
#> 2 5
#> 3 2
#> 4 1
#> 5 1

Search Text

You can access a parsed table of the full text of the paper via paper$full_text, but you may find it more convenient to use the function search_text(). The defaults return a data table of each sentence, with the section type, header, div, paragraph and sentence numbers, and file name. (The section type is a best guess from the headers, so may not always be accurate.)

text <- search_text(paper)
text section header div p s id
This paper demonstrates some good and poor practices for use with the {papercheck} R package and Shiny app. abstract Abstract 0 1 1 to_err_is_human.xml
Although intentional dishonestly might be a successful way to boost creativity (Gino & Wiltermuth, 2014), it is safe to say most mistakes researchers make are unintentional. intro Introduction 1 1 1 to_err_is_human.xml
In this study we examine whether automated checks reduce the amount of errors that researchers make in scientific manuscripts. method Method and Participants 2 1 1 to_err_is_human.xml
All data needed to reproduce these analyses is available from https://osf.io/5tbm9 and code is available from https://osf.io/629bx. results Results 3 1 1 to_err_is_human.xml
It seems automated tools can help prevent errors by providing researchers with feedback about potential mistakes, and researchers feel the app is useful. discussion Discussion 4 1 1 to_err_is_human.xml

Pattern

You can search for a specific word or phrase by setting the pattern argument. The pattern is a regex string by default; set fixed = TRUE if you want to find exact text matches.

text <- search_text(paper, pattern = "papercheck")
text section header div p s id
This paper demonstrates some good and poor practices for use with the {papercheck} R package and Shiny app. abstract Abstract 0 1 1 to_err_is_human.xml
In this study we examine the usefulness of Papercheck to improve best practices. intro Introduction 1 1 4 to_err_is_human.xml

Section

Set section to a vector of the sections to search in.

text <- search_text(paper, "papercheck", 
                    section = "abstract")
text section header div p s id
This paper demonstrates some good and poor practices for use with the {papercheck} R package and Shiny app. abstract Abstract 0 1 1 to_err_is_human.xml

Return

Set return to one of “sentence”, “paragraph”, “section”, or “match” to control what gets returned.

text <- search_text(paper, "papercheck", 
                    section = "intro", 
                    return = "paragraph")
text section header div p s id
Although intentional dishonestly might be a successful way to boost creativity (Gino & Wiltermuth, 2014), it is safe to say most mistakes researchers make are unintentional. From a human factors perspective, human error is a symptom of a poor design (Smithy, 2020). Automation can be use to check for errors in scientific manuscripts, and inform authors about possible corrections. In this study we examine the usefulness of Papercheck to improve best practices. intro Introduction 1 1 NA to_err_is_human.xml

Regex matches

You can also return just the matched text from a regex search by setting return = "match". The extra ... arguments in search_text() are passed to grep(), so perl = TRUE allows you to use more complex regex, like below.

pattern <- "[a-zA-Z]\\S*\\s*(=|<)\\s*[0-9\\.,-]*\\d"
text <- search_text(paper, pattern, return = "match", perl = TRUE)
text section header div p s id
M = 9.12 results Results 3 1 2 to_err_is_human.xml
M = 10.9 results Results 3 1 2 to_err_is_human.xml
t(97.7) = 2.9 results Results 3 1 2 to_err_is_human.xml
p = 0.005 results Results 3 1 2 to_err_is_human.xml
M = 5.06 results Results 3 2 1 to_err_is_human.xml
M = 4.5 results Results 3 2 1 to_err_is_human.xml
t(97.2) = -1.96 results Results 3 2 1 to_err_is_human.xml
p = 0.152 results Results 3 2 1 to_err_is_human.xml

Expand Text

You can expand the text returned by search_text() or a module with expand_text().

marginal <- search_text(paper, "marginal") |>
  expand_text(paper, plus = 1, minus = 1)

marginal[, c("text", "expanded")]
#> # A tibble: 2 × 2
#>   text                                                                  expanded
#>   <chr>                                                                 <chr>   
#> 1 "The paper shows examples of (1) open and closed OSF links; (2) cita… "All da…
#> 2 "On average researchers in the experimental condition found the app … "On ave…

Large Language Models

You can query the extracted text of papers with LLMs using groq.

Powered by Groq for fast inference.

Setup

You will need to get your own API key from https://console.groq.com/keys. To avoid having to type it out, add it to the .Renviron file in the following format (you can use usethis::edit_r_environ() to access the .Renviron file).

GROQ_GPT_KEY="sk-proj-abcdefghijklmnopqrs0123456789ABCDEFGHIJKLMNOPQRS"
# useful if you aren't sure where this file is
usethis::edit_r_environ()

You can get or set the default LLM model with llm_model() and access a list of the current available models using llm_model_list().

id owned_by created context_window
mistral-saba-24b Mistral AI 2025-02-19 32768
qwen-2.5-coder-32b Alibaba Cloud 2025-02-14 131072
deepseek-r1-distill-qwen-32b DeepSeek / Alibaba Cloud 2025-02-07 131072
qwen-2.5-32b Alibaba Cloud 2025-02-05 131072
deepseek-r1-distill-llama-70b DeepSeek / Meta 2025-01-26 131072
llama-3.3-70b-specdec Meta 2024-12-06 8192
llama-3.3-70b-versatile Meta 2024-12-06 32768
llama-3.2-3b-preview Meta 2024-09-25 8192
llama-3.2-1b-preview Meta 2024-09-25 8192
llama-guard-3-8b Meta 2023-09-03 8192
mixtral-8x7b-32768 Mistral AI 2023-09-03 32768
llama-3.1-8b-instant Meta 2023-09-03 131072
llama3-8b-8192 Meta 2023-09-03 8192
gemma2-9b-it Google 2023-09-03 8192
llama3-70b-8192 Meta 2023-09-03 8192

LLM Queries

You can query the extracted text of papers with LLMs using groq. See ?llm for details of how to get and set up your API key, choose an LLM, and adjust settings.

Use search_text() first to narrow down the text into what you want to query. Below, we limited search to the first ten papers’ method sections, and returned sentences that contains the word “power” and at least one number. Then we asked an LLM to determine if this is an a priori power analysis, and if so, to return some relevant values in a JSON-structured format.

power <- psychsci[1:10] |>
  # sentences containing the word power
  search_text("power", section = "method") |>
  # and containing at least one number
  search_text("[0-9]") 

# ask a specific question with specific response format
query <- 'Does this sentence report an a priori power analysis? If so, return the test, sample size, critical alpha criterion, power level, effect size and effect size metric plus any other relevant parameters, in JSON format like:

{
  "apriori": true, 
  "test": "paired samples t-test", 
  "sample": 20, 
  "alpha": 0.05, 
  "power": 0.8, 
  "es": 0.4, 
  "es_metric": "cohen\'s D"
}

If not, return {"apriori": false}

Answer only in valid JSON format, starting with { and ending with }.'

llm_power <- llm(power, query, seed = 8675309)

Expand JSON

It is useful to ask an LLM to return data in JSON structured format, but can be frustrating to extract the data, especially where the LLM makes syntax mistakes. The function json_expand() tries to expand a column with a JSON-formatted response into columns and deals with it gracefully (sets an ‘error’ column to “parsing error”) if there are errors. It also fixes column data types, if possible.

llm_response <- json_expand(llm_power, "answer") |>
  dplyr::select(text, apriori:es_metric)
text apriori test sample alpha power es es_metric
Sample size was calculated with an a priori power analysis, using the effect sizes reported by Küpper et al. (2014), who used identical procedures, materials, and dependent measures. TRUE NA NA NA NA NA NA
We determined that a minimum sample size of 7 per group would be necessary for 95% power to detect an effect. TRUE 7 NA 0.95
For the first part of the task, 11 static visual images, one from each of the scenes in the film were presented once each on a black background for 2 s using Power-Point. FALSE NA NA NA NA NA NA
A sample size of 26 per group was required to ensure 80% power to detect this difference at the 5% significance level. TRUE two-sample t-test 26 0.050 0.80 NA NA
A sample size of 18 per condition was required in order to ensure an 80% power to detect this difference at the 5% significance level. TRUE t-test 18 0.050 0.80 NA NA
The 13,500 selected loan requests conservatively achieved a power of .98 for an effect size of .07 at an alpha level of .05. TRUE 13500 0.050 0.98 0.07
On the basis of simulations over a range of expected effect sizes for contrasts of fMRI activity, we estimated that a sample size of 24 would provide .80 power at a conservative brainwide alpha threshold of .002 (although such thresholds ideally should be relaxed for detecting activity in regions where an effect is predicted). TRUE unknown 24 0.002 0.80 unknown unknown
Stimulus sample size was determined via power analysis of the sole existing similar study, which used neural activity to predict Internet downloads of music (Berns & Moore, 2012). TRUE NA NA NA NA NA NA
The effect size from that study implied that a sample size of 72 loan requests would be required to achieve .80 power at an alpha level of .05. TRUE 72 0.050 0.80

Rate Limiting

Remember, the llm() function makes a separate query for each row in a data frame from search_text(). Free GROQ accounts are rate limited, so we set the default limits to 30 queries, but you can change this:

If you hit your rate limit, the llm() function will add a short delay between calls, so don’t worry if you notice the query speed slowing down after the first 30 calls.

Modules

Papercheck is designed modularly, so you can add modules to check for anything. It comes with a set of pre-defined modules, and we hope people will share more modules.

Module List

You can see the list of built-in modules with the function below.

#>  * all_p_values: List all p-values in the text, returning the matched text (e.g., 'p = 0.04')
#> and document location in a table.
#> 
#> Note that this will not catch p-values reported like "the p-value is 0.03" because that results in a ton of false positives when papers discuss p-value thresholds. If you need to detect text like that, use `search_text()` function and a custom pattern like "\\bp(-| )?values?\\s+.{1,20}\\s+[0-9\\.]+"
#> 
#> This will catch most comparators like =<>~≈≠≤≥≪≫ and most versions of scientific notation like 5.0 x 10^-2 or 5.0e-2. If you find any formats that are not correctly handled by this function, please contact the author.
#>  * all_urls: List all the URLs in the main text.
#>  * effect_size: Detect t-tests and F-tests  with missing effect sizes
#>  * exact_p: List any p-values reported with insufficient precision (e.g., p < .05 or p = n.s.)
#>  * marginal: List all sentences that describe an effect as 'marginally significant'.
#>  * osf_check: List all OSF links and whether they are open, closed, or do not exist.
#>  * ref_consistency: Check if all references are cited and all citations are referenced
#>  * retractionwatch: Flag any cited papers in the RetractionWatch database
#>  * statcheck: Check consistency of p-values and test statistics
#>  
#> Use `module_help("module_name")` for help with a specific module

Running modules

To run a built-in module on a paper, you can reference it by name.

p <- module_run(paper, "all_p_values")
text section header div p s id p_comp p_value
p = 0.005 results Results 3 1 2 to_err_is_human.xml = 0.005
p = 0.152 results Results 3 2 1 to_err_is_human.xml = 0.152
p > .05 results Results 3 2 2 to_err_is_human.xml > 0.050

Creating modules

You can create your own modules using R code. Modules can also contain instructions for reporting, to give “traffic lights” for whether a check passed or failed, and to include appropriate text feedback in a report. See the modules vignette for more details.

Reports

You can generate a report from any set of modules. The default set is c("exact_p", "marginal", "effect_size", "osf_check", "retractionwatch", "ref_consistency")

report(paper, output_format = "qmd")

See the example report.