Skip to contents

Papercheck is designed modularly, so you can add modules to check for anything. It comes with a set of pre-defined modules, and we hope people will share more modules.

Module List

You can see the list of built-in modules with the function below.

  • all-p-values: List all p-values in the text, returning the matched text (e.g., ‘p = 0.04’) and document location in a table.
  • all-urls: List all the URLs in the main text
  • imprecise-p: List any p-values reported with insufficient precision (e.g., p < .05 or p = n.s.)
  • llm-summarise: Generate a 1-sentence summary for each section
  • marginal: List all sentences that describe an effect as ‘marginally significant’.
  • osf-check: List all OSF links and whether they are open, closed, or do not exist.
  • ref-consistency: Check if all references are cited and all citations are referenced
  • retractionwatch: Flag any cited papers in the RetractionWatch database
  • statcheck: Check consistency of p-values and test statistics

Built-in Modules

Below, we will demonstrate the use of each built-in module, first on a single paper and then a list of papers, the psychsci list of 250 open-access papers from Psychological Science.

paper <- psychsci$`0956797620955209`

all-p-values

List all p-values in the text, returning the matched text (e.g., ‘p = 0.04’) and document location in a table.

all_p <- module_run(paper, "all-p-values")

all_p # print table
text section header div p s
p = .003 method Questionnaires 6 2 3
p = .08 method Questionnaires 6 2 3
p < .001 results Results 9 2 1
p < .025 results Results 9 2 3
p = .040 results Results 9 2 3
p = .173 results Results 9 2 3
p = .006 results Results 9 2 4
p = .02 results Results 9 2 4
p = .691 results Results 9 2 5
p = .303 results Results 9 2 5
p = .023 results Results 9 3 3
p < .001 results Results 9 3 3
p = .006 results Results 9 4 2
p = .037 results Results 9 4 2
p = .038 results Results 9 4 2
p = .358 results Results 9 4 2
p < .001 results Results 9 4 3
p = .127 results Results 9 4 3
p = .062 results Results 9 4 3
p = .047 results Results 9 4 3

Showing 20 of 20 rows

If you run this module on all 250 papers, you will get more rows than you probably want to print, so the default just shows you the first 20 rows.

all_p_ps <- module_run(psychsci, "all-p-values")

all_p_ps # print table
text section header div p s id
p < .001 results Results 8 1 1 0956797613520608
p < .001 results Results 8 1 2 0956797613520608
p < .001 results Results 8 1 3 0956797613520608
p = .016 results Results 8 1 4 0956797613520608
p < .001 results Results 8 2 1 0956797613520608
p < .001 results Results 8 2 1 0956797613520608
p = .005 results Results 7 2 1 0956797614522816
p < .001 results Results 7 2 2 0956797614522816
p = .153 results Results 7 2 2 0956797614522816
p < .001 results Results 7 2 6 0956797614522816
p < .001 results Results 7 2 7 0956797614522816
p = .315 results Results 7 2 7 0956797614522816
p = .024 results Eye movement strategies 8 2 2 0956797614522816
p = .008 results Eye movement strategies 8 3 2 0956797614522816
p = .09 results Eye movement strategies 8 3 2 0956797614522816
p = .028 results Eye movement strategies 8 3 2 0956797614522816
p = .122 results Eye movement strategies 8 3 3 0956797614522816
p = .001 results Eye movement strategies 8 3 4 0956797614522816
p = .237 results Eye movement strategies 8 3 5 0956797614522816
p = .008 results Gaze patterns 9 2 1 0956797614522816

Showing 20 of 4683 rows

You can access the full table for further processing.

all_p_ps$table |>
  count(text, sort = TRUE) |>
  head()
#> # A tibble: 6 × 2
#>   text          n
#>   <chr>     <int>
#> 1 p < .001   1511
#> 2 p < .05     152
#> 3 p < .01     143
#> 4 p = .001    123
#> 5 p = .002     99
#> 6 p < .0001    91

all-urls

List all the URLs in the main text. There will, of course, be a few false positives when text in the paper is formatted as a valid URL.

all_urls <- module_run(paper, "all-urls")

all_urls # print table
text section header div p s
3.9.1.7 method Participants 3 1 5
https://osf.io/k2dbf method Analyses 8 1 1
https://osf.io/k2dbf funding Open Practices 14 1 1
https://osf.io/k2dbf funding Open Practices 14 2 1
http://www.psychologicalscience.org/publications/badges funding Open Practices 14 2 3

Showing 5 of 5 rows

all_urls_ps <- module_run(psychsci, "all-urls")

# show the most common URLs
all_urls_ps$table |>
  count(text, sort = TRUE) |>
  head()
#> # A tibble: 6 × 2
#>   text                                                        n
#>   <chr>                                                   <int>
#> 1 http://www.psychologicalscience.org/publications/badges   126
#> 2 http://pss.sagepub.com/content/by/supplemental-data        28
#> 3 https://osf.io/sh8qg                                       11
#> 4 https://osf.io/2j6ne                                       10
#> 5 https://osf.io/79kbm                                       10
#> 6 http://pss.sagepub.com/content/25/1/3.full                  9

imprecise-p

List any p-values that may have been reported with insufficient precision (e.g., p < .05 or p = n.s.).

imprecise <- module_run(paper, "imprecise-p")

imprecise # print table

You may have reported some imprecise p-values

text section header div p s
p < .025 results Results 9 2 3

Showing 1 of 1 rows

You can use the expand_text() function to get the full sentence or paragraph for context. Here you can see that “p < .025” was not an imprecisely reported p-value, but a description of the preregistered alpha threshold.

exp <- expand_text(imprecise, paper)

exp$expanded # print expanded text
#> [1] "The main effect of illness recency did not meet our preregistered threshold (p < .025)-recently ill: M = 661 ms, SD = 197; not recently ill: M = 626 ms, SD = 153, F(1, 400) = 4.23, η p 2 = .010, 90% CI = [.000, .039], p = .040-nor did the interaction between illness recency and face type (disfigured vs. typical), F(1, 400) = 1.87, η p 2 = .005, 90% CI = [.000, .027], p = .173."

We can investigate the most common imprecise p-values in the PsychSci set. “p < .01” and “p < .05” are probably often describing figures or tables, but what is the deal with “p > .25”?

imprecise_ps <- module_run(psychsci, "imprecise-p")

imprecise_ps$table |>
  count(text, sort = TRUE) |>
  head()


[38;5;246m# A tibble: 6 × 2
[39m text n 
[3m
[38;5;246m
[39m
[23m 
[3m
[38;5;246m
[39m
[23m 
[38;5;250m1
[39m p < .05 152 
[38;5;250m2
[39m p < .01 143 
[38;5;250m3
[39m p < .005 15 
[38;5;250m4
[39m p > .05 13 
[38;5;250m5
[39m p > .250 13 
[38;5;250m6
[39m p > .25 12

We can expand the text to check the context for “p > .25”.

gt.25 <- imprecise_ps$table |>
  filter(grepl("\\.25", text)) |>
  expand_text(paper = psychsci)

gt.25$expanded[1:3] # look at the first 3
#> [1] "Time increased negative attitudes among liberal participants, b = 0.13, SE = 0.06, 95% CI = [0.02, 0.24], t(1922) = 2.34, p = .020, but did not affect attitudes among conservative participants, b = -0.04, SE = 0.06, 95% CI = [-0.15, 0.07], t(1922) = -0.64, p > .250."                                                      
#> [2] "Time increased negative attitudes among liberal participants, b = 0.22, SE = 0.06, 95% CI = [0.10, 0.33], t(1922) = 3.63, p < .001, but did not affect attitudes among conservative participants, b = 0.03, SE = 0.06, 95% CI = [-0.09, 0.14], t(1922) = 0.45, p > .250."                                                        
#> [3] "The twoway interactions between knowledge distance and search effort (odds ratio = 1.67, p = .080) and between knowledge distance and search variation (odds ratio = 0.65, p = .077) were marginally significant, whereas the interaction between search effort and variation was not significant (odds ratio = 1.01, p > .250)."

llm-summarise

Generate a 1-sentence summary for each section. This uses the llm() function with the query “summarise this section briefly, in one sentence” and returns a one-sentence summary of each major section (abstract, intro, method, results, discussion, acknowledgements, appendices)

summaries <- module_run(paper, "llm-summarise", seed = 8675309)

summaries # print default table
text section answer time tokens
Across taxa, animals possess a number of behavioral adaptations that function to mitigate the cos… abstract Humans have evolved behavioral adaptations to detect and avoid infectious microorganisms, includi… 0.1327031 373
the evolutionary-biology literature, which highlights the fact that organisms neutralize pathogen… intro The study aimed to replicate a previous finding that people who have recently been ill are more a… 0.2404155 1406
Method Except where noted, all methodological details-including all stimuli and dot-probe procedu… method The study replicated the methodological details of Miller and Maner (2011) to test the interactio… 0.3714075 2314
Results On the basis of our preregistered exclusion criteria, we removed the 9 participants with … results The study found a main effect of face type, with slower responses to disfigured faces, but no mai… 0.2508097 1227
Discussion The study replicated here has been interpreted as a key piece of evidence supporting a… discussion This study’s null findings do not provide strong evidence that pathogen avoidance does not vary a… 0.2339141 904
NA Funding This work was supported by Horizon 2020 European Research Council Grant No. StG-2015 6… funding This study was funded by a Horizon 2020 European Research Council Grant and has made its data, an… 0.1644160 210
Declaration of Conflicting Interests The author(s) declared that there were no conflicts of inter… annex The author(s) declared that there were no conflicts of interest with respect to the authorship or… 0.0778079 101

Showing 7 of 7 rows

The default table isn’t very helpful here, so let’s customise the output. The object returned from a module usually contains a table. We just want to show the section and answer columns here.

summaries$table[, c("section", "answer")]
#>      section
#> 1   abstract
#> 2      intro
#> 3     method
#> 4    results
#> 5 discussion
#> 6    funding
#> 7      annex
#>                                                                                                                                                                                                                                                                                                                    answer
#> 1                                                            Humans possess behavioral adaptations that help mitigate the costs of infectious microorganisms, including sensitivity to cues related to pathogens, which influence various psychological phenomena such as food learning, stigmatization, and mate choice.
#> 2                                                        The study aimed to replicate a previous finding that people who have recently been ill are more attentive to visual cues of pathogens, and to explore whether this effect is related to self-reported disgust sensitivity and visual attention to pathogen cues.
#> 3                                     The study replicated the methodological details of Miller and Maner (2011) to test the interaction between illness recency and face type on attentional biases, using a dot-probe task with 413 participants and identical stimuli, procedures, and analyses as the original study.
#> 4 The study found a main effect of face type, with slower responses to disfigured faces, but no main effect of illness recency or interaction between illness recency and face type, and failed to replicate the original study's finding of attentional bias towards disfigured faces only in recently ill participants.
#> 5              This study's null findings do not provide strong evidence that pathogen avoidance does not vary as a function of immunological resistance due to methodological limitations, and the field would benefit from better developing the validity of measures used to test behavioral-immune system hypotheses.
#> 6                                                                                                               This study was funded by a Horizon 2020 European Research Council Grant and has made its data, analysis code, and design/analysis plans publicly available, earning Open Data and Preregistration badges.
#> 7                                                                                                                                                                                      The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.

If you try to run this on the whole PsychSci set, you will get an error message because the number of LLM calls is much higher than the maximum number, which is set by default to 30 (the per-minute rate limit for the default model on a free groq account). You can increase the maximum allowable calls (llm() manages your rate limit, so this would take nearly an hour at 30 calls per minute) using llm_max_calls(), but we aren’t going to do this here.

summaries_ps <- module_run(psychsci, "llm-summarise", seed = 8675309)
#> Error in (function (text, query, text_col = "text", model = llm_model(), : You need to include the argument API_KEY or set the variable GROQ_API_KEY in your Renviron

marginal

List all sentences that describe an effect as ‘marginally significant’.

marginal <- module_run(paper, "marginal")

marginal # print table

No effects were described as marginally/borderline/close to significant.

Let’s check how many are in the full set.

marginal_ps <- module_run(psychsci, "marginal")

marginal_ps # print table

You described effects as marginally/borderline/close to significant. It is better to write ‘did not reach the threshold alpha for significance’.

text section header div p s id
Although the PTSD group showed a significant facilitation effect (M = 0.68), F(1, 15) = 5.575, p … results Details. 11 1 4 0956797615569889
A marginally significant negative correlation was found between suppression-induced forgetting on… results Posttraumatic Stress Diagnostic Scale 13 1 4 0956797615569889
When we more closely matched depression symptoms by comparing the low-BDI-II PTSD group with the … results The effects of depressive symptoms 14 3 1 0956797615569889
In that analysis, the group difference in suppression-induced forgetting only approached signific… results Conditionalized final-test data 15 1 2 0956797615569889
The Congruency × Alignment interaction was marginally significant for faces, F(1, 19) = 4.12, p =… results Results 7 2 3 0956797615617779
The twoway interactions between knowledge distance and search effort (odds ratio = 1.67, p = .080… results Results 6 2 3 0956797616634665
An omnibus 3 (sex ratio) × 2 (participant sex) ANOVA on vaccine-investment variance revealed a ma… results Results 15 2 1 0956797616636631
Further, we observed a marginally significant indirect effect of target condition on endorsement … fig 3 NA 2 0956797617692000
The dagger and asterisks indicate marginally significant and significant paths ( † p < .10, **p <… fig Fig. 5 . 5 NA 7 0956797617692000
Given the unexpected nature of the finding and the marginally significant trend, we will not inte… annex Supplemental Material 20 1 2 0956797617716922
A marginally significant fit effect was found for conscientiousness, but this Response surface in… results Actual Personality 7 4 6 0956797617724435
A fit bonus was not calculated for emotional stability because the a4 parameter did not reach the… results Actual Personality 7 4 12 0956797617724435
The fit bonus is reported only for those traits that had at least a marginally significant a4 par… discussion Discussion 8 3 8 0956797617724435
Participant height (added as a fixed effect to the interceptonly model) had a marginally signific… results Results 9 6 1 0956797618772822
There was a marginally significant interaction between participant height and opponent height on … results Results 9 7 2 0956797618772822
SES was marginally significantly related to coins kept for self in the dictator game versus the u… results Results 9 10 6 0956797618772822
Subsequently, opponent height was added to the model, which was marginally significantly related … results Studies 1 and 2: Combined Data Analysis 14 2 2 0956797618772822
However, results revealed a marginally significant interaction between implicit partner evaluatio… results Implicit partner evaluations and nonverbal behaviors 13 2 3 0956797618785899
Although the main effect of implicit partner evaluations on nonverbal behavior remained significa… results Implicit partner evaluations and nonverbal behaviors 13 3 3 0956797618785899
Following up on the debate on the use of significance levels in psychology, we empirically examin… abstract Abstract 0 2 1 0956797619830326

Showing 20 of 97 rows

You can see that the default shows only the first 20 rows and truncates long lines. We can adjust this by explicitly calling the print function and setting maxrows and trunc_cell.

marginal_ps |> print(maxrows = 5, trunc_cell = Inf)

You described effects as marginally/borderline/close to significant. It is better to write ‘did not reach the threshold alpha for significance’.

text section header div p s id
Although the PTSD group showed a significant facilitation effect (M = 0.68), F(1, 15) = 5.575, p = .032, η p 2 = .271, and the control group showed only a marginally significant effect (M = 0.56), F(1, 15) = 4.368, p = .054, η p 2 = .226, the difference between the groups was not significant, F(1, 30) = 0.100, p = .754, η p 2 = .003 (Fig. 3c). results Details. 11 1 4 0956797615569889
A marginally significant negative correlation was found between suppression-induced forgetting on the details measure and PDS scores in the PTSD group (Kendall’s τ = -.33, p = .07; Fig. 4b). results Posttraumatic Stress Diagnostic Scale 13 1 4 0956797615569889
When we more closely matched depression symptoms by comparing the low-BDI-II PTSD group with the high-BDI-II control group, we observed a marginally significant group-by-condition interaction for the details measure, F(1, 16) = 4.032, p = .062, η p 2 = .201. results The effects of depressive symptoms 14 3 1 0956797615569889
In that analysis, the group difference in suppression-induced forgetting only approached significance, F(1, 30) = 3.869, p = .058, η p 2 = .114. results Conditionalized final-test data 15 1 2 0956797615569889
The Congruency × Alignment interaction was marginally significant for faces, F(1, 19) = 4.12, p = .057, η p 2 = .18, and significant for line patterns, F(1, 19) = 85.53, p < .001, η p 2 = .82. results Results 7 2 3 0956797615617779

Showing 5 of 97 rows

osf-check

List all OSF links and whether they are open, closed, or do not exist. This requires an internet connection to check the status of the link.

osf_check <- module_run(paper, "osf-check")

osf_check # print table

All OSF links are open

text section header div p s status
https://osf.io/k2dbf method Analyses 8 1 1 open
https://osf.io/k2dbf funding Open Practices 14 1 1 open
https://osf.io/k2dbf funding Open Practices 14 2 1 open

Showing 3 of 3 rows

This would take a while to run for the 341 unique OSF links found in the PsychSci set (usually 1-2 seconds per link), so we will skip it here (but can reveal we found 10 closed links).

osf_check_ps <- module_run(psychsci, "osf-check")

ref-consistency

Check if all references are cited and all citations are referenced.

ref_consistency <- module_run(paper, "ref-consistency")

ref_consistency # print table

This module relies on Grobid correctly parsing the references. There may be some false positives. There are references that are not cited or citations that are not referenced

bib_id ref missing
b0 (Illness recency. 95 .05 .03 .06 .08 5. Latencies (typical faces; ms) 633.90 163.50. 06 .03 .07 …. citation
b17 Jones BC, Hahn AC, Fisher CI, Wang H, Kandrik M, Tybur JM, Debruine LM (2018). “Hormonal correlat… citation
b18 Jones BC, Hahn AC, Fisher CI, Wang H, Kandrik M, Tybur JM, Debruine LM (2018). “Reply to Fleischm… citation
b32 Tybur JM, Inbar Y, Aarøe L, Barclay P, Barlow FK, De Barra M, Žeželj .., I (2016). “Parasite stre… citation
b35 Tybur JM, Lieberman D, Kurzban R, Descioli P (2013). “Disgust: Evolved function and structure.” _… citation

Showing 5 of 5 rows

It looks like there are some references with missing citations. The first one doesn’t look like a reference, and grobid often parses tables oddly. You’d need to look at the original PDF to see if the others are actually missing or false positives. Here, they are all false positives, based on grobid not being able to match the in-text citation to the reference list.

retractionwatch

Flag any cited papers in the RetractionWatch database. It takes a substantial time to make calls to crossref for each reference, so we provide a summary of the database in papercheck, which is updated regularly (last 2025-02-28).

rw <- module_run(paper, "retractionwatch")

rw # print table

You cited no papers in the Retraction Watch database (as of 2025-02-28)

In the full PsychSci set, we find three papers in the RetractionWatch database cited, one with a retraction and two with corrections.

rw_ps <- module_run(psychsci, "retractionwatch")

print(rw_ps, trunc_cell = 30) # print table

You cited some papers in the Retraction Watch database (as of 2025-02-28). These may be retracted, have corrections, or expressions of concern.

bib_id doi ref id retractionwatch text
b15 10.1177/0956797612470827 Kok BE, Coffey KA, Cohn MA,… 09567976221150616 Correction The vagus nerve controls re…
b26 10.1186/gb-2013-14-10-r115 Horvath S (2013). “DNA meth… 09567976231194221 Correction Epigenetic aging is one suc…
b34 10.1038/s41562-023-01749-9 Protzko J, Krosnick J, Nels… 09567976241249183 Retraction One may also ask whether ou…
b25 10.1186/gb-2013-14-10-r115 Horvath S (2013). “DNA meth… 09567976241260247 Correction Discrepancies between someo…
b25 10.1186/gb-2013-14-10-r115 Horvath S (2013). “DNA meth… 09567976241260247 Correction EAA from the Horvath PanTis…
b25 10.1186/gb-2013-14-10-r115 Horvath S (2013). “DNA meth… 09567976241260247 Correction The replication of results …

Showing 6 of 6 rows

Check the context of the citations in the text column:

rw_ps$table$text
#> [1] "The vagus nerve controls respiratory sinus arrhythmia (RSA), the covariation of heart rate and respiration, known to be an index of cardiac health and predictor of enhanced capacity for emotion regulation (Butler et al., 2006;Kok et al., 2013)."
#> [2] "Epigenetic aging is one such indicator of accelerated biological aging (Horvath, 2013)."                                                                                                                                                             
#> [3] "One may also ask whether our results can be generalized from our CloudResearch and student samples to broader populations in the United States or elsewhere (Hanel & Vione, 2016;Protzko et al., 2023)."                                             
#> [4] "Discrepancies between someone's biological and chronological age translate into \"age acceleration,\" with older biological age relative to chronological age indicating greater acceleration (Horvath, 2013)."                                      
#> [5] "EAA from the Horvath PanTissue clock (Horvath, 2013) was calculated using the methylclock package (Pelegí-Sisó et al., 2021)."                                                                                                                       
#> [6] "The replication of results across the PedBE and Horvath PanTissue clocks increases our confidence in study findings because these tools share no overlapping DNAm sites (Horvath, 2013;L."

statcheck

Check consistency of p-values and test statistics using functions from statcheck.

statcheck <- module_run(paper, "statcheck")

statcheck # print table

We detected possible errors in test statistics

test_type df2 test_comp test_value p_comp reported_p computed_p raw error decision_error one_tailed_in_txt apa_factor text section header div p s
t 248.4 = 2.01 = 0.023 0.0455124 t(248.4) = 2.01, p = .023 TRUE FALSE FALSE 1 Yes-the 90% confidence intervals of the difference in attentional bias for participants who were … results Results 9 3 3

Showing 1 of 1 rows

Here we see a false positive, where the paper reported the results of an equivalence test, which are meant to be one-tailed, but statcheck did not detect that this was one-tailed.

In the full PsychSci set, there are more than 27K sentences with numbers to check, so this takes about a minute to run.

statcheck_ps <- module_run(psychsci, "statcheck")

There will be, of course, some false positives in the full set of 151 flagged values. Let’s look just at the flagged values where the computed p-value is about double the reported p-value, and this changes the significance decision (at an alpha of 0.05).

statcheck_ps$table |>
  filter(decision_error, 
         round(computed_p/reported_p, 1) == 2.0) |>
  select(reported_p, computed_p, raw) |>
  mutate(computed_p = round(computed_p, 4))
#>   reported_p computed_p                        raw
#> 1     0.0290     0.0589 F(1, 361) = 3.59, p = .029
#> 2     0.0470     0.0947     t(24) = 1.74, p = .047
#> 3     0.0270     0.0547     t(24) = 2.02, p = .027
#> 4     0.0400     0.0797     t(24) = 1.83, p = .040
#> 5     0.0480     0.0962    t(240) = 1.67, p = .048
#> 6     0.0460     0.0915     t(32) = 1.74, p = .046
#> 7     0.0420     0.0846     t(21) = 1.81, p = .042
#> 8     0.0343     0.0686    t(10) = 2.04, p = .0343
#> 9     0.0330     0.0654     t(55) = 1.88, p = .033