Papercheck

https://scienceverse.github.io/talks/2025-papercheck-intro/

Lisa DeBruine

Abstract

In this talk, I will introduce Papercheck, a new tool that leverages text search, code, and large language models to extract and supplement information from scientific documents (including manuscripts, submitted or published articles, or preregistration documents) and provides automated suggestions for improvement.

Inspired by practices in software development, where automated checks (e.g., CRAN checks for R packages) are used to identify issues before release, Papercheck aims to screen scientific manuscripts to identify potential issues or areas for improvement and guide researchers in adopting best practices. It can also assist with processing large numbers of papers for metascientific enquiry.

The Problem

Best Practices are Rapidly Evolving

Un-FAIR Meta-Data

All research outputs should be FAIR
PDFs are where data goes to die
Meta-data use cases:
- facilitating meta-analyses
- improving the re-use of reliable measures
- meta-scientific research

Solutions

Checklists?

Reporting guidelines, such as CONSORT, PRISMA, and JARS often provide extensive checklists.

Time-consuming
Requires expertise
Can be vague
Who checks the checklist?

JARS-1
2
3

Automated Checks

Time-efficient
Requires less expertise
Reproducible
Generates machine-readable metadata

Automation Strategies

Grobid: A machine learning software for extracting structured information from scholarly documents

And then…

R Package

Paper Import

file <- "papers/debruine-fret.pdf"
xml <- pdf2grobid(file, 
                  consolidateCitations = TRUE, 
                  consolidateHeader = TRUE)
paper <- read(xml)

-------------
debruine-fret
-------------

Facial resemblance enhances trust

* Sections: 8
* Sentences: 174
* References: 22
* Citations: 25

Batch Import

papers <- read("papers")

--------------
debruine-child
--------------

Resemblance to self increases the appeal of child faces to both men and women

* Sections: 14
* Sentences: 184
* References: 23
* Citations: 48

-------------
debruine-fret
-------------

Facial resemblance enhances trust

* Sections: 8
* Sentences: 174
* References: 22
* Citations: 25

------------
debruine-sex
------------

Facial resemblance increases the attractiveness of same-sex faces more than other-sex faces

* Sections: 11
* Sentences: 179
* References: 40
* Citations: 48

------------
debruine-tnl
------------

Trustworthy but not lust-worthy: context-specific effects of facial resemblance

* Sections: 6
* Sentences: 101
* References: 32
* Citations: 39

Text Search

search_text(paper, "Canadian")

text	section	header	div	p	s	id
The monetary value of X ranged from (Canadian) $2 to $5 and was balanced between self and non-self morph trials.	intro	INTRODUCTION	1	4	5	debruine-fret
In the pay-off structure of the current game, the expected effect of such an evolved psychology would be both to raise the incentive to trust from (Canadian) $1 to $(1 ϩ r) and to lower the cost of betrayal from $1 to $(1 Ϫ r).	intro	INTRODUCTION	1	6	2	debruine-fret

Regex Text Search

search_text(paper, "\\b\\S+\\s*=\\s*\\S+\\b", return = "match")

text	section	header	div	p	s	id
n = 11	method	METHODS (a) Preparation of facial stimuli	2	3	4	debruine-fret
n = 9	method	METHODS (a) Preparation of facial stimuli	2	3	4	debruine-fret
n = 4).	method	METHODS (a) Preparation of facial stimuli	2	3	4	debruine-fret
n = 11	results	RESULTS	6	1	1	debruine-fret
n = 13).	results	RESULTS	6	1	1	debruine-fret
10 = 2.63	results	RESULTS	6	1	2	debruine-fret
p = 0.013	results	RESULTS	6	1	2	debruine-fret
10 = 0.00	results	RESULTS	6	1	2	debruine-fret
12 = 2.13	results	RESULTS	6	1	3	debruine-fret
p = 0.028	results	RESULTS	6	1	3	debruine-fret

LLM

query <- 'How many subjects were in the studies in total? 
Return your answer in JSON format giving the total and 
any subgroupings by gender, e.g.:
{"total": 100, men": 42, "women": 58}, 
Only return valid JSON, no notes.'

llm_subjects <- papers |> 
  search_text("\\d+", section = "method") |>
  search_text(return = "section") |>
  llm(query)

llm_subjects |> json_expand()

id	answer	total	men	women
debruine-child	{“total”: 71, “men”: 32, “women”: 39}	71	32	39
debruine-fret	{“total”: 48, “men”: 24, “women”: 24}	48	24	24
debruine-sex	{“total”: 136, “men”: 86, “women”: 50}	136	86	50
debruine-tnl	{“total”: 144, “men”: 66, “women”: 78}	144	66	78

OSF Functions

# find all OSF links in the papers
links <- osf_links(psychsci)

# get info about a link and its children
osf <- osf_retrieve(links$text[1], recursive = TRUE)

# download all files
osf_file_download(links$text[1])

osf_id	name	osf_type	public	parent	size	downloads
e2aks	Action-specific disruption of perceptual confidence	nodes	TRUE	NA	NA	NA
7jh5v	Data	nodes	TRUE	e2aks	NA	NA
pj4e8	Analysis scripts	nodes	TRUE	e2aks	NA	NA
553e58658c5e4a219919a629	allData_orientation.txt	files	NA	7jh5v	1096113	41
553e58658c5e4a219919a62a	allData_contrast_M1.txt	files	NA	7jh5v	691560	68
553e58658c5e4a219919a62c	Mratio_contrast_M1.txt	files	NA	7jh5v	5508	31
553e58658c5e4a219919a628	Mratio_all.txt	files	NA	7jh5v	15326	34
553e58658c5e4a219919a62b	allData_contrast_PMC.txt	files	NA	7jh5v	712804	59
553e7e168c5e4a21991a4dac	tms_analysis.R	files	NA	pj4e8	5670	66

Modules

module_list()

 * all_p_values: List all p-values in the text, returning the matched text (e.g., 'p = 0.04') and document location in a table.
 * all_urls: List all the URLs in the main text.
 * effect_size: Detect t-tests and F-tests with missing effect sizes
 * exact_p: List any p-values reported with insufficient precision (e.g., p < .05 or p = n.s.)
 * marginal: List all sentences that describe an effect as 'marginally significant'.
 * osf_check: List all OSF links and whether they are open, closed, or do not exist.
 * ref_consistency: Check if all references are cited and all citations are referenced
 * retractionwatch: Flag any cited papers in the RetractionWatch database
 * statcheck: Check consistency of p-values and test statistics
 
Use `module_help("module_name")` for help with a specific module

Modules: Effect Sizes

mod <- module_run(
  paper = psychsci,
  module = "effect_size"
)

id	ttests_n	ttests_with_es	ttests_without_es	Ftests_n	Ftests_with_es	Ftests_without_es
0956797613520608	0	0	0	5	5	0
0956797614522816	5	0	5	20	20	0
0956797614527830	0	0	0	0	0	0
0956797614557697	1	0	1	5	5	0
0956797614560771	2	2	0	0	0	0
0956797614566469	0	0	0	0	0	0
0956797615569001	2	1	1	0	0	0
0956797615569889	0	0	0	12	12	0
0956797615583071	10	6	4	4	2	2
0956797615588467	7	4	3	1	0	1

Modules: Effect Sizes

mod <- module_run(
  paper = psychsci,
  module = "effect_size"
)

Modules: StatCheck

This module uses the {statcheck} package to check the consistency of p-values and test statistics.

module_run(psychsci[6:10], "statcheck")

We detected possible errors in test statistics

id	stats_found	stats_error
0956797614566469	0	0
0956797615569001	2	0
0956797615569889	19	0
0956797615583071	22	0
0956797615588467	5	1

Showing 5 of 5 rows

It currently only works for APA-formatted stats, but we are working on an extended version.

Modules: Exact P-Values

This module scans the text for all p-values and flags those reported inexactly, such as p < .01, p < .10, or p = n.s.

module_run(psychsci[1:10], "exact_p")

You may have reported some imprecise p-values

id	exact_p	imprecise_p
0956797613520608	6	0
0956797614522816	39	0
0956797614527830	10	3
0956797614557697	16	11
0956797614560771	3	1
0956797614566469	0	0
0956797615569001	2	23
0956797615569889	24	2
0956797615583071	21	3
0956797615588467	13	8

Showing 10 of 10 rows

Modules: Marginal Significance

This module searches the text for phrases such as “marginally significant” or “borderline significance” and flags them.

module_run(psychsci[8], "marginal")

You described effects as marginally/borderline/close to significant. It is better to write ‘did not reach the threshold alpha for significance’.

text	section	div	p	s
Although the PTSD group showed a significant facilitation effect (M = 0.68), F(1, 15) = 5.575, p …	results	11	1	4
A marginally significant negative correlation was found between suppression-induced forgetting on…	results	13	1	4
When we more closely matched depression symptoms by comparing the low-BDI-II PTSD group with the …	results	14	3	1
In that analysis, the group difference in suppression-induced forgetting only approached signific…	results	15	1	2

Showing 4 of 4 rows

Modules: Inaccessible Resources

This module scans text for references to OSF projects and checks their status, flagging users if any of the links are either broken or lead to inaccessible private projects.

module_run(psychsci[1:10], "osf_check")

id	osf.open	osf.closed
0956797613520608	0	0
0956797614522816	0	0
0956797614527830	0	0
0956797614557697	2	0
0956797614560771	0	0
0956797614566469	0	0
0956797615569001	1	0
0956797615569889	0	1
0956797615583071	2	0
0956797615588467	0	0

Modules: Reference Consistency

This modules checks for missing references or citations.

module_run(papers[[1]], "ref_consistency")

This module relies on Grobid correctly parsing the references. There may be some false positives.

There are references that are not cited or citations that are not referenced

bib_id	ref	bibtype	title	journal	year	authors	missing
b3	Bressan P, Grassi M (Bressan, P., & Grassi, M. Parental. resemblance in one-year-olds and the Gau…	Article	resemblance in one-year-olds and the Gaussian curve	Evolution and Human Behavior	Bressan, P., & Grassi, M. Parental. resemblance in one-year-olds and the Gaussian curve. Evolutio…	P Bressan, M Grassi	citation
b8	Debruine LM (Submitted for publication). “Context-specific responses to a cue of kinship: Facial …	Article	Context-specific responses to a cue of kinship: Facial resemblance increases the attractiveness o…	Proceedings of the Royal Society of London B	Submitted for publication	L M Debruine	citation

Showing 2 of 2 rows

Modules: Retracted Papers

This module searches the RetractionWatch database for all cited references in a paper and flags those that have been retracted.

mod <- module_run(psychsci, "retractionwatch")

mod$summary |> dplyr::filter(rw_Correction + rw_Retraction > 0)

                 id rw_Correction rw_Retraction
1 09567976221150616             1             0
2 09567976231194221             1             0
3 09567976241249183             0             1
4 09567976241260247             3             0

Promoting Adoption

Center for Open Science

Workflows

Caveats

Validation
Sustainability
Inappropriate Use

Thank You!

papercheck - download the package or submit issues

VeriSci - join a community to create or test modules

@debruine - see what else I’m up to