Papercheck

https://scienceverse.github.io/talks/2025-papercheck-intro/

Lisa DeBruine

Abstract

In this talk, I will introduce Papercheck, a new tool that leverages text search, code, and large language models to extract and supplement information from scientific documents (including manuscripts, submitted or published articles, or preregistration documents) and provides automated suggestions for improvement.

Inspired by practices in software development, where automated checks (e.g., CRAN checks for R packages) are used to identify issues before release, Papercheck aims to screen scientific manuscripts to identify potential issues or areas for improvement and guide researchers in adopting best practices. It can also assist with processing large numbers of papers for metascientific enquiry.

The Problem

Best Practices are Rapidly Evolving

Un-FAIR Meta-Data

  • All research outputs should be FAIR
  • PDFs are where data goes to die
  • Meta-data use cases:
    • facilitating meta-analyses
    • improving the re-use of reliable measures
    • meta-scientific research

Solutions

Checklists?

Reporting guidelines, such as CONSORT, PRISMA, and JARS often provide extensive checklists.

  • Time-consuming
  • Requires expertise
  • Can be vague
  • Who checks the checklist?

Automated Checks

  • Time-efficient
  • Requires less expertise
  • Reproducible
  • Generates machine-readable metadata

Automation Strategies

Grobid: A machine learning software for extracting structured information from scholarly documents

And then…

Text Search

Code

LLM

R Package

Paper Import

file <- "papers/debruine-fret.pdf"
xml <- pdf2grobid(file, 
                  consolidateCitations = TRUE, 
                  consolidateHeader = TRUE)
paper <- read(xml)
-------------
debruine-fret
-------------

Facial resemblance enhances trust

* Sections: 8
* Sentences: 174
* References: 22
* Citations: 25

Batch Import

papers <- read("papers")
--------------
debruine-child
--------------

Resemblance to self increases the appeal of child faces to both men and women

* Sections: 14
* Sentences: 184
* References: 23
* Citations: 48
-------------
debruine-fret
-------------

Facial resemblance enhances trust

* Sections: 8
* Sentences: 174
* References: 22
* Citations: 25
------------
debruine-sex
------------

Facial resemblance increases the attractiveness of same-sex faces more than other-sex faces

* Sections: 11
* Sentences: 179
* References: 40
* Citations: 48
------------
debruine-tnl
------------

Trustworthy but not lust-worthy: context-specific effects of facial resemblance

* Sections: 6
* Sentences: 101
* References: 32
* Citations: 39

LLM

query <- 'How many subjects were in the studies in total? 
Return your answer in JSON format giving the total and 
any subgroupings by gender, e.g.:
{"total": 100, men": 42, "women": 58}, 
Only return valid JSON, no notes.'

llm_subjects <- papers |> 
  search_text("\\d+", section = "method") |>
  search_text(return = "section") |>
  llm(query)

llm_subjects |> json_expand()
id answer total men women
debruine-child {“total”: 71, “men”: 32, “women”: 39} 71 32 39
debruine-fret {“total”: 48, “men”: 24, “women”: 24} 48 24 24
debruine-sex {“total”: 136, “men”: 86, “women”: 50} 136 86 50
debruine-tnl {“total”: 144, “men”: 66, “women”: 78} 144 66 78

OSF Functions

# find all OSF links in the papers
links <- osf_links(psychsci)

# get info about a link and its children
osf <- osf_retrieve(links$text[1], recursive = TRUE)

# download all files
osf_file_download(links$text[1])
osf_id name osf_type public parent size downloads
e2aks Action-specific disruption of perceptual confidence nodes TRUE NA NA NA
7jh5v Data nodes TRUE e2aks NA NA
pj4e8 Analysis scripts nodes TRUE e2aks NA NA
553e58658c5e4a219919a629 allData_orientation.txt files NA 7jh5v 1096113 41
553e58658c5e4a219919a62a allData_contrast_M1.txt files NA 7jh5v 691560 68
553e58658c5e4a219919a62c Mratio_contrast_M1.txt files NA 7jh5v 5508 31
553e58658c5e4a219919a628 Mratio_all.txt files NA 7jh5v 15326 34
553e58658c5e4a219919a62b allData_contrast_PMC.txt files NA 7jh5v 712804 59
553e7e168c5e4a21991a4dac tms_analysis.R files NA pj4e8 5670 66

Modules

module_list()
 * all_p_values: List all p-values in the text, returning the matched text (e.g., 'p = 0.04') and document location in a table.
 * all_urls: List all the URLs in the main text.
 * effect_size: Detect t-tests and F-tests with missing effect sizes
 * exact_p: List any p-values reported with insufficient precision (e.g., p < .05 or p = n.s.)
 * marginal: List all sentences that describe an effect as 'marginally significant'.
 * osf_check: List all OSF links and whether they are open, closed, or do not exist.
 * ref_consistency: Check if all references are cited and all citations are referenced
 * retractionwatch: Flag any cited papers in the RetractionWatch database
 * statcheck: Check consistency of p-values and test statistics
 
Use `module_help("module_name")` for help with a specific module

Modules: Effect Sizes

mod <- module_run(
  paper = psychsci,
  module = "effect_size"
)
id ttests_n ttests_with_es ttests_without_es Ftests_n Ftests_with_es Ftests_without_es
0956797613520608 0 0 0 5 5 0
0956797614522816 5 0 5 20 20 0
0956797614527830 0 0 0 0 0 0
0956797614557697 1 0 1 5 5 0
0956797614560771 2 2 0 0 0 0
0956797614566469 0 0 0 0 0 0
0956797615569001 2 1 1 0 0 0
0956797615569889 0 0 0 12 12 0
0956797615583071 10 6 4 4 2 2
0956797615588467 7 4 3 1 0 1

Modules: Effect Sizes

mod <- module_run(
  paper = psychsci,
  module = "effect_size"
)

Modules: StatCheck

This module uses the {statcheck} package to check the consistency of p-values and test statistics.

module_run(psychsci[6:10], "statcheck")

We detected possible errors in test statistics

id stats_found stats_error decision_error
0956797614566469 0 0 0
0956797615569001 2 0 0
0956797615569889 19 0 0
0956797615583071 22 0 0
0956797615588467 5 1 0

Showing 5 of 5 rows

It currently only works for APA-formatted stats, but we are working on an extended version.

Modules: Exact P-Values

This module scans the text for all p-values and flags those reported inexactly, such as p < .01, p < .10, or p = n.s.

module_run(psychsci[1:10], "exact_p")

You may have reported some imprecise p-values

id exact_p imprecise_p
0956797613520608 6 0
0956797614522816 39 0
0956797614527830 10 3
0956797614557697 16 11
0956797614560771 3 1
0956797614566469 0 0
0956797615569001 2 23
0956797615569889 24 2
0956797615583071 21 3
0956797615588467 13 8

Showing 10 of 10 rows

Modules: Marginal Significance

This module searches the text for phrases such as “marginally significant” or “borderline significance” and flags them.

module_run(psychsci[8], "marginal")

You described effects as marginally/borderline/close to significant. It is better to write ‘did not reach the threshold alpha for significance’.

text section div p s
Although the PTSD group showed a significant facilitation effect (M = 0.68), F(1, 15) = 5.575, p … results 11 1 4
A marginally significant negative correlation was found between suppression-induced forgetting on… results 13 1 4
When we more closely matched depression symptoms by comparing the low-BDI-II PTSD group with the … results 14 3 1
In that analysis, the group difference in suppression-induced forgetting only approached signific… results 15 1 2

Showing 4 of 4 rows

Modules: Inaccessible Resources

This module scans text for references to OSF projects and checks their status, flagging users if any of the links are either broken or lead to inaccessible private projects.

module_run(psychsci[1:10], "osf_check")
id osf.open osf.closed osf.invalid
0956797613520608 0 0 0
0956797614522816 0 0 0
0956797614527830 0 0 0
0956797614557697 2 0 0
0956797614560771 0 0 0
0956797614566469 0 0 0
0956797615569001 1 0 0
0956797615569889 0 1 0
0956797615583071 2 0 0
0956797615588467 0 0 0

Modules: Reference Consistency

This modules checks for missing references or citations.

module_run(papers[[1]], "ref_consistency")

This module relies on Grobid correctly parsing the references. There may be some false positives.

There are references that are not cited or citations that are not referenced

bib_id ref bibtype title journal year authors missing
b3 Bressan P, Grassi M (Bressan, P., & Grassi, M. Parental. resemblance in one-year-olds and the Gau… Article resemblance in one-year-olds and the Gaussian curve Evolution and Human Behavior Bressan, P., & Grassi, M. Parental. resemblance in one-year-olds and the Gaussian curve. Evolutio… P Bressan, M Grassi citation
b8 Debruine LM (Submitted for publication). “Context-specific responses to a cue of kinship: Facial … Article Context-specific responses to a cue of kinship: Facial resemblance increases the attractiveness o… Proceedings of the Royal Society of London B Submitted for publication L M Debruine citation

Showing 2 of 2 rows

Modules: Retracted Papers

This module searches the RetractionWatch database for all cited references in a paper and flags those that have been retracted.

mod <- module_run(psychsci, "retractionwatch")

mod$summary |> dplyr::filter(rw_Correction + rw_Retraction > 0)
                 id rw_Correction rw_Retraction
1 09567976221150616             1             0
2 09567976231194221             1             0
3 09567976241249183             0             1
4 09567976241260247             3             0

Promoting Adoption

Center for Open Science

Workflows

Individual

Automated

Meta-Science

Systemic

Caveats

  • Validation
  • Sustainability
  • Inappropriate Use

Thank You!

papercheck - download the package or submit issues

VeriSci - join a community to create or test modules

@debruine - see what else I’m up to