papercheck • papercheck

Installation

You can install the development version of papercheck from GitHub with:

# install.packages("devtools")
devtools::install_github("scienceverse/papercheck")

library(papercheck)

#> 
#> 
#> *******************************************
#> ✅ Welcome to PaperCheck
#> For support and examples visit:
#> https://scienceverse.github.io/papercheck/
#> 
#> ⚠️ This is alpha software; please check any
#> results. False positives and negatives will
#> occur at unknown rates.
#> *******************************************

You can launch an interactive shiny app version of the code below with:

papercheck_app()

Load from PDF

The function pdf2grobid() can read PDF files and save them in the TEI format created by grobid. This requires an internet connection and takes a few seconds per paper, so should only be done once and the results saved for later use.

If the server is unavailable, you can use a grobid web interface.

pdf_file <- demopdf()
xml_file <- pdf2grobid(pdf_file)

You can set up your own local grobid server following instructions from https://grobid.readthedocs.io/. The easiest way is to use Docker.

docker run --rm --init --ulimit core=0 -p 8070:8070 lfoppiano/grobid:0.8.2

Then you can set your grobid_url to the local path http://localhost:8070.

xml_file <- pdf2grobid(pdf_file, grobid_url = "http://localhost:8070")

Load from XML

The function read() can read XML files parsed by grobid or cermine, plus any XML files in JATS-DTD APA or NLM formats.

paper <- read(xml_file)

XML files parsed by cermine are not as good as grobid at parsing papers, and omits figure and table captions.

cermine_xml_file <- system.file("psychsci/0956797620955209.cermine.xml",
                                package = "papercheck")
paper <- read(cermine_xml_file)

Load from non-PDF document

To take advantage of grobid’s ability to parse references and other aspects of papers, for now the best way is to convert your papers to PDF. However, papercheck can read in plain text from a text/docx file with read().

filename <- system.file("extdata/to_err_is_human.docx", 
                        package = "papercheck")
paper_from_doc <- read_text(filename)

Batch Processing

The functions pdf2grobid() and read() also work on a folder of files, returning a list of XML file paths or paper objects, respectively. The functions search_text(), expand_text() and llm() also work on a list of paper objects.

grobid_dir <- demodir()

papers <- read(grobid_dir)

hypotheses <- search_text(papers, "hypothesi", 
                          section = "intro", 
                          return = "paragraph")

Paper Components

Paper objects contain a lot of structured information, including info, references, and citations.

Info

paper$info

#> $title
#> [1] "To Err is Human: An Empirical Investigation"
#> 
#> $description
#> [1] "This paper demonstrates some good and poor practices for use with the {papercheck} R package and Shiny app. All data are simulated. The paper shows examples of (1) open and closed OSF links; (2) citation of retracted papers; (3) missing/mismatched citations and references; (4) imprecise reporting of p-values; and (5) use of \"marginally significant\" to describe non-significant findings."
#> 
#> $keywords
#> [1] ""
#> 
#> $doi
#> [1] ""
#> 
#> $submission
#> [1] ""
#> 
#> $filename
#> [1] "/private/var/folders/t6/7x6md_5s2j5bfb324s784yzw0000gn/T/Rtmp4hJX7T/temp_libpath137c2715ec1dd/papercheck/extdata/to_err_is_human.xml"

Bibliography

The bibliography is provided in a tabular format.

paper$bib

xref_id	ref	doi	bibtype	title	journal	year	authors	id
b0	Gino F, Wiltermuth SS (2014). “Retracted: Evil Genius? How Dishonesty Can Lead to Greater Creativity.” Psychological Science, 25(4), 973-981. doi:10.1177/0956797614520714 https://doi.org/10.1177/0956797614520714.	10.1177/0956797614520714	Article	Retracted: Evil Genius? How Dishonesty Can Lead to Greater Creativity	Psychological Science	2014	F Gino, S S Wiltermuth	to_err_is_human
b1	Smith F (2021). “Human error is a symptom of a poor design.” Journal of Journals, 0(0), 0. doi:10.0000/0123456789 https://doi.org/10.0000/0123456789.	10.0000/0123456789	Article	Human error is a symptom of a poor design	Journal of Journals	2021	F Smith	to_err_is_human

Cross-References

Cross-references are also provided in a tabular format, with xref_id to match the bibliography table.

paper$xrefs

xref_id	type	contents	text	id	section	div	p	s
b0	bibr	(Gino & Wiltermuth, 2014)	Although intentional dishonestly might be a successful way to boost creativity (Gino & Wiltermuth, 2014), it is safe to say most mistakes researchers make are unintentional.	to_err_is_human	intro	1	1	1
NA	bibr	(Smithy, 2020)	From a human factors perspective, human error is a symptom of a poor design (Smithy, 2020).	to_err_is_human	intro	1	1	2

Batch

There are functions to combine the infomation from a list of papers, like the psychsci built-in dataset of 250 open access papers from Psychological Science.

info_table(psychsci[1:5], c("title", "doi"))

#> # A tibble: 5 × 3
#>   id               title                                                   doi  
#>   <chr>            <chr>                                                   <chr>
#> 1 0956797613520608 Continuous Theta-Burst Stimulation Demonstrates a Caus… 10.1…
#> 2 0956797614522816 Beyond Gist: Strategic and Incremental Information Acc… 10.1…
#> 3 0956797614527830 Serotonin and Social Norms: Tryptophan Depletion Impai… 10.1…
#> 4 0956797614557697 Action-Specific Disruption of Perceptual Confidence     10.1…
#> 5 0956797614560771 Emotional Vocalizations Are Recognized Across Cultures… 10.1…

concat_tables(psychsci[1:5], "bib") |>
  dplyr::filter(!is.na(doi))

#>   xref_id
#> 1     b40
#> 2      b0
#> 3      b1
#> 4      b2
#> 5      b3
#> 6      b4
#>                                                                                                                                                                                                                                                                                   ref
#> 1                                             Zylberberg A, Barttfeld P, Sigman M (2012). “The construction of confidence in a perceptual decision.” _Frontiers in Integrative Neuroscience_, *6*. doi:10.3389/fnint.2012.00079/full <https://doi.org/10.3389/fnint.2012.00079/full>.
#> 2                                                        Ekman P (1994). “Strong evidence for universals in facial expressions: A reply to Russell's mistaken critique.” _Psychological Bulletin_, *115*, 268-287. doi:10.1037/0033-2909.115 <https://doi.org/10.1037/0033-2909.115>.
#> 3                                        Gendron M, Roberson D, Van Der Vyver JM, Barrett LF (2014). “Cultural relativity in perceiving emotion from vocalizations.” _Psychological Science_, *25*, 911-920. doi:10.1177/0956797613517239 <https://doi.org/10.1177/0956797613517239>.
#> 4                             Russell JA (1994). “Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies.” _Psychological Bulletin_, *115*, 102-141. doi:10.1037/0033-2909.115.1.102 <https://doi.org/10.1037/0033-2909.115.1.102>.
#> 5                      Sauter DA, Eisner F, Calder AJ, Scott SK (2010). “Perceptual cues in non-verbal vocal expressions of emotion.” _The Quarterly Journal of Experimental Psychology_, *63*, 2251-2272. doi:10.1080/17470211003721642 <https://doi.org/10.1080/17470211003721642>.
#> 6 Sauter DA, Eisner F, Ekman P, Scott SK (2010). “Crosscultural recognition of basic emotions through nonverbal emotional vocalizations.” _Proceedings of the National Academy of Sciences_, *107*, 2408-2412. doi:10.1073/pnas.0908239106 <https://doi.org/10.1073/pnas.0908239106>.
#>                             doi bibtype
#> 1 10.3389/fnint.2012.00079/full Article
#> 2         10.1037/0033-2909.115 Article
#> 3      10.1177/0956797613517239 Article
#> 4   10.1037/0033-2909.115.1.102 Article
#> 5     10.1080/17470211003721642 Article
#> 6       10.1073/pnas.0908239106 Article
#>                                                                                                      title
#> 1                                                  The construction of confidence in a perceptual decision
#> 2             Strong evidence for universals in facial expressions: A reply to Russell's mistaken critique
#> 3                                             Cultural relativity in perceiving emotion from vocalizations
#> 4 Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies
#> 5                                               Perceptual cues in non-verbal vocal expressions of emotion
#> 6                    Crosscultural recognition of basic emotions through nonverbal emotional vocalizations
#>                                            journal year
#> 1            Frontiers in Integrative Neuroscience 2012
#> 2                           Psychological Bulletin 1994
#> 3                            Psychological Science 2014
#> 4                           Psychological Bulletin 1994
#> 5 The Quarterly Journal of Experimental Psychology 2010
#> 6  Proceedings of the National Academy of Sciences 2010
#>                                                 authors               id
#> 1                   A Zylberberg, P Barttfeld, M Sigman 0956797614557697
#> 2                                               P Ekman 0956797614560771
#> 3 M Gendron, D Roberson, J M Van Der Vyver, L F Barrett 0956797614560771
#> 4                                           J A Russell 0956797614560771
#> 5           D A Sauter, F Eisner, A J Calder, S K Scott 0956797614560771
#> 6              D A Sauter, F Eisner, P Ekman, S K Scott 0956797614560771

concat_tables(psychsci[1:40], "xrefs") |>
  dplyr::filter(grepl("replicat", text)) |>
  dplyr::count(id, text)

#> # A tibble: 11 × 3
#>    id               text                                                       n
#>    <chr>            <chr>                                                  <int>
#>  1 0956797614560771 1 We reanalyzed the data from the 29 Himba participan…     1
#>  2 0956797615617779 Experiment 1b replicated the results of Experiment 1a…     1
#>  3 0956797616647519 Note that the average expected ES found in Study 1 is…     1
#>  4 0956797616647519 This is worrisome, as the results of our first study …     2
#>  5 0956797616665351 At a group level, therefore, the perceptual data repl…     5
#>  6 0956797617693326 Experiment 2 replicated Experiment 1 under conditions…     1
#>  7 0956797617693326 Experiment 2 replicated these findings on a separate …     4
#>  8 0956797617702699 A study by Papesh (2015) is particularly relevant: Sh…     2
#>  9 0956797617705667 Experiment 3 therefore replicated the effects in a mi…     1
#> 10 0956797617705667 This replicates the main finding of Beck et al. (2012…     1
#> 11 0956797617716922 Though we await replication of our findings, we see t…     1

Search Text

You can access a parsed table of the full text of the paper via paper$full_text, but you may find it more convenient to use the function search_text(). The defaults return a data table of each sentence, with the section type, header, div, paragraph and sentence numbers, and file name. (The section type is a best guess from the headers, so may not always be accurate.)

text <- search_text(paper)

text	section	header	div	p	s	id
This paper demonstrates some good and poor practices for use with the {papercheck} R package and Shiny app.	abstract	Abstract	0	1	1	to_err_is_human
Although intentional dishonestly might be a successful way to boost creativity (Gino & Wiltermuth, 2014), it is safe to say most mistakes researchers make are unintentional.	intro	Introduction	1	1	1	to_err_is_human
In this study we examine whether automated checks reduce the amount of errors that researchers make in scientific manuscripts.	method	Method and Participants	2	1	1	to_err_is_human
All data needed to reproduce these analyses is available from https://osf.io/5tbm9 and code is available from https://osf.io/629bx.	results	Results	3	1	1	to_err_is_human
It seems automated tools can help prevent errors by providing researchers with feedback about potential mistakes, and researchers feel the app is useful.	discussion	Discussion	4	1	1	to_err_is_human

Pattern

You can search for a specific word or phrase by setting the pattern argument. The pattern is a regex string by default; set fixed = TRUE if you want to find exact text matches.

text <- search_text(paper, pattern = "papercheck")

text	section	header	div	p	s	id
This paper demonstrates some good and poor practices for use with the {papercheck} R package and Shiny app.	abstract	Abstract	0	1	1	to_err_is_human
In this study we examine the usefulness of Papercheck to improve best practices.	intro	Introduction	1	1	4	to_err_is_human

Section

Set section to a vector of the sections to search in.

text <- search_text(paper, "papercheck", 
                    section = "abstract")

text	section	header	div	p	s	id
This paper demonstrates some good and poor practices for use with the {papercheck} R package and Shiny app.	abstract	Abstract	0	1	1	to_err_is_human

Return

Set return to one of “sentence”, “paragraph”, “section”, or “match” to control what gets returned.

text <- search_text(paper, "papercheck", 
                    section = "intro", 
                    return = "paragraph")

text	section	header	div	p	s	id
Although intentional dishonestly might be a successful way to boost creativity (Gino & Wiltermuth, 2014), it is safe to say most mistakes researchers make are unintentional. From a human factors perspective, human error is a symptom of a poor design (Smithy, 2020). Automation can be use to check for errors in scientific manuscripts, and inform authors about possible corrections. In this study we examine the usefulness of Papercheck to improve best practices.	intro	Introduction	1	1	NA	to_err_is_human

Regex matches

You can also return just the matched text from a regex search by setting return = "match". The extra ... arguments in search_text() are passed to grep(), so perl = TRUE allows you to use more complex regex, like below.

pattern <- "[a-zA-Z]\\S*\\s*(=|<)\\s*[0-9\\.,-]*\\d"
text <- search_text(paper, pattern, return = "match", perl = TRUE)

text	section	header	div	p	s	id
M = 9.12	results	Results	3	1	2	to_err_is_human
M = 10.9	results	Results	3	1	2	to_err_is_human
t(97.7) = 2.9	results	Results	3	1	2	to_err_is_human
p = 0.005	results	Results	3	1	2	to_err_is_human
M = 5.06	results	Results	3	2	1	to_err_is_human
M = 4.5	results	Results	3	2	1	to_err_is_human
t(97.2) = -1.96	results	Results	3	2	1	to_err_is_human
p = 0.152	results	Results	3	2	1	to_err_is_human

Expand Text

You can expand the text returned by search_text() or a module with expand_text().

marginal <- search_text(paper, "marginal") |>
  expand_text(paper, plus = 1, minus = 1)

marginal[, c("text", "expanded")]

#> # A tibble: 2 × 2
#>   text                                                                  expanded
#>   <chr>                                                                 <chr>   
#> 1 "The paper shows examples of (1) open and closed OSF links; (2) cita… "All da…
#> 2 "On average researchers in the experimental condition found the app … "On ave…

Large Language Models

You can query the extracted text of papers with LLMs using groq.

Setup

You will need to get your own API key from https://console.groq.com/keys. To avoid having to type it out, add it to the .Renviron file in the following format (you can use usethis::edit_r_environ() to access the .Renviron file).

GROQ_GPT_KEY="sk-proj-abcdefghijklmnopqrs0123456789ABCDEFGHIJKLMNOPQRS"

# useful if you aren't sure where this file is
usethis::edit_r_environ()

You can get or set the default LLM model with llm_model() and access a list of the current available models using llm_model_list().

id	owned_by	created	context_window
mistral-saba-24b	Mistral AI	2025-02-19	32768
qwen-2.5-coder-32b	Alibaba Cloud	2025-02-14	131072
deepseek-r1-distill-qwen-32b	DeepSeek / Alibaba Cloud	2025-02-07	131072
qwen-2.5-32b	Alibaba Cloud	2025-02-05	131072
deepseek-r1-distill-llama-70b	DeepSeek / Meta	2025-01-26	131072
llama-3.3-70b-specdec	Meta	2024-12-06	8192
llama-3.3-70b-versatile	Meta	2024-12-06	32768
llama-3.2-3b-preview	Meta	2024-09-25	8192
llama-3.2-1b-preview	Meta	2024-09-25	8192
llama-guard-3-8b	Meta	2023-09-03	8192
mixtral-8x7b-32768	Mistral AI	2023-09-03	32768
llama-3.1-8b-instant	Meta	2023-09-03	131072
llama3-8b-8192	Meta	2023-09-03	8192
gemma2-9b-it	Google	2023-09-03	8192
llama3-70b-8192	Meta	2023-09-03	8192

LLM Queries

You can query the extracted text of papers with LLMs using groq. See ?llm for details of how to get and set up your API key, choose an LLM, and adjust settings.

Use search_text() first to narrow down the text into what you want to query. Below, we limited search to the first ten papers’ method sections, and returned sentences that contains the word “power” and at least one number. Then we asked an LLM to determine if this is an a priori power analysis, and if so, to return some relevant values in a JSON-structured format.

power <- psychsci[1:10] |>
  # sentences containing the word power
  search_text("power", section = "method") |>
  # and containing at least one number
  search_text("[0-9]") 

# ask a specific question with specific response format
query <- 'Does this sentence report an a priori power analysis? If so, return the test, sample size, critical alpha criterion, power level, effect size and effect size metric plus any other relevant parameters, in JSON format like:

{
  "apriori": true, 
  "test": "paired samples t-test", 
  "sample": 20, 
  "alpha": 0.05, 
  "power": 0.8, 
  "es": 0.4, 
  "es_metric": "cohen\'s D"
}

If not, return {"apriori": false}

Answer only in valid JSON format, starting with { and ending with }.'

llm_power <- llm(power, query, seed = 8675309)

Expand JSON

It is useful to ask an LLM to return data in JSON structured format, but can be frustrating to extract the data, especially where the LLM makes syntax mistakes. The function json_expand() tries to expand a column with a JSON-formatted response into columns and deals with it gracefully (sets an ‘error’ column to “parsing error”) if there are errors. It also fixes column data types, if possible.

llm_response <- json_expand(llm_power, "answer") |>
  dplyr::select(text, apriori:es_metric)

text	apriori	test	sample	alpha	power	es	es_metric
Sample size was calculated with an a priori power analysis, using the effect sizes reported by Küpper et al. (2014), who used identical procedures, materials, and dependent measures.	TRUE	NA	NA	NA	NA	NA	NA
We determined that a minimum sample size of 7 per group would be necessary for 95% power to detect an effect.	TRUE		7	NA	0.95
For the first part of the task, 11 static visual images, one from each of the scenes in the film were presented once each on a black background for 2 s using Power-Point.	FALSE	NA	NA	NA	NA	NA	NA
A sample size of 26 per group was required to ensure 80% power to detect this difference at the 5% significance level.	TRUE	two-sample t-test	26	0.050	0.80	NA	NA
A sample size of 18 per condition was required in order to ensure an 80% power to detect this difference at the 5% significance level.	TRUE	t-test	18	0.050	0.80	NA	NA
The 13,500 selected loan requests conservatively achieved a power of .98 for an effect size of .07 at an alpha level of .05.	TRUE		13500	0.050	0.98	0.07
On the basis of simulations over a range of expected effect sizes for contrasts of fMRI activity, we estimated that a sample size of 24 would provide .80 power at a conservative brainwide alpha threshold of .002 (although such thresholds ideally should be relaxed for detecting activity in regions where an effect is predicted).	TRUE	unknown	24	0.002	0.80	unknown	unknown
Stimulus sample size was determined via power analysis of the sole existing similar study, which used neural activity to predict Internet downloads of music (Berns & Moore, 2012).	TRUE	NA	NA	NA	NA	NA	NA
The effect size from that study implied that a sample size of 72 loan requests would be required to achieve .80 power at an alpha level of .05.	TRUE		72	0.050	0.80

Rate Limiting

Remember, the llm() function makes a separate query for each row in a data frame from search_text(). Free GROQ accounts are rate limited, so we set the default limits to 30 queries, but you can change this:

llm_max_calls(30)

If you hit your rate limit, the llm() function will add a short delay between calls, so don’t worry if you notice the query speed slowing down after the first 30 calls.

Modules

Papercheck is designed modularly, so you can add modules to check for anything. It comes with a set of pre-defined modules, and we hope people will share more modules.

Module List

You can see the list of built-in modules with the function below.

module_list()

#>  * all_p_values: List all p-values in the text, returning the matched text (e.g., 'p = 0.04') and document location in a table.
#>  * all_urls: List all the URLs in the main text.
#>  * effect_size: Detect t-tests and F-tests with missing effect sizes
#>  * exact_p: List any p-values reported with insufficient precision (e.g., p < .05 or p = n.s.)
#>  * marginal: List all sentences that describe an effect as 'marginally significant'.
#>  * osf_check: List all OSF links and whether they are open, closed, or do not exist.
#>  * ref_consistency: Check if all references are cited and all citations are referenced
#>  * retractionwatch: Flag any cited papers in the RetractionWatch database
#>  * statcheck: Check consistency of p-values and test statistics
#>  
#> Use `module_help("module_name")` for help with a specific module

Running modules

To run a built-in module on a paper, you can reference it by name.

p <- module_run(paper, "all_p_values")

text	section	header	div	p	s	id	p_comp	p_value
p = 0.005	results	Results	3	1	2	to_err_is_human	=	0.005
p = 0.152	results	Results	3	2	1	to_err_is_human	=	0.152
p > .05	results	Results	3	2	2	to_err_is_human	>	0.050

Creating modules

You can create your own modules using R code. Modules can also contain instructions for reporting, to give “traffic lights” for whether a check passed or failed, and to include appropriate text feedback in a report. See the modules vignette for more details.

Reports

You can generate a report from any set of modules. The default set is c("exact_p", "marginal", "effect_size", "osf_check", "retractionwatch", "ref_consistency")

report(paper, output_format = "qmd")

See the example report.