https://scienceverse.github.io/talks/2025-papercheck-intro/
In this talk, I will introduce Papercheck, a new tool that leverages text search, code, and large language models to extract and supplement information from scientific documents (including manuscripts, submitted or published articles, or preregistration documents) and provides automated suggestions for improvement.
Inspired by practices in software development, where automated checks (e.g., CRAN checks for R packages) are used to identify issues before release, Papercheck aims to screen scientific manuscripts to identify potential issues or areas for improvement and guide researchers in adopting best practices. It can also assist with processing large numbers of papers for metascientific enquiry.
Grobid: A machine learning software for extracting structured information from scholarly documents
And then…
-------------
debruine-fret
-------------
Facial resemblance enhances trust
* Sections: 8
* Sentences: 174
* References: 22
* Citations: 25
--------------
debruine-child
--------------
Resemblance to self increases the appeal of child faces to both men and women
* Sections: 14
* Sentences: 184
* References: 23
* Citations: 48
-------------
debruine-fret
-------------
Facial resemblance enhances trust
* Sections: 8
* Sentences: 174
* References: 22
* Citations: 25
------------
debruine-sex
------------
Facial resemblance increases the attractiveness of same-sex faces more than other-sex faces
* Sections: 11
* Sentences: 179
* References: 40
* Citations: 48
------------
debruine-tnl
------------
Trustworthy but not lust-worthy: context-specific effects of facial resemblance
* Sections: 6
* Sentences: 101
* References: 32
* Citations: 39
text | section | header | div | p | s | id |
---|---|---|---|---|---|---|
The monetary value of X ranged from (Canadian) $2 to $5 and was balanced between self and non-self morph trials. | intro | INTRODUCTION | 1 | 4 | 5 | debruine-fret |
In the pay-off structure of the current game, the expected effect of such an evolved psychology would be both to raise the incentive to trust from (Canadian) $1 to $(1 ϩ r) and to lower the cost of betrayal from $1 to $(1 Ϫ r). | intro | INTRODUCTION | 1 | 6 | 2 | debruine-fret |
text | section | header | div | p | s | id |
---|---|---|---|---|---|---|
n = 11 | method | METHODS (a) Preparation of facial stimuli | 2 | 3 | 4 | debruine-fret |
n = 9 | method | METHODS (a) Preparation of facial stimuli | 2 | 3 | 4 | debruine-fret |
n = 4). | method | METHODS (a) Preparation of facial stimuli | 2 | 3 | 4 | debruine-fret |
n = 11 | results | RESULTS | 6 | 1 | 1 | debruine-fret |
n = 13). | results | RESULTS | 6 | 1 | 1 | debruine-fret |
10 = 2.63 | results | RESULTS | 6 | 1 | 2 | debruine-fret |
p = 0.013 | results | RESULTS | 6 | 1 | 2 | debruine-fret |
10 = 0.00 | results | RESULTS | 6 | 1 | 2 | debruine-fret |
12 = 2.13 | results | RESULTS | 6 | 1 | 3 | debruine-fret |
p = 0.028 | results | RESULTS | 6 | 1 | 3 | debruine-fret |
query <- 'How many subjects were in the studies in total?
Return your answer in JSON format giving the total and
any subgroupings by gender, e.g.:
{"total": 100, men": 42, "women": 58},
Only return valid JSON, no notes.'
llm_subjects <- papers |>
search_text("\\d+", section = "method") |>
search_text(return = "section") |>
llm(query)
llm_subjects |> json_expand()
id | answer | total | men | women |
---|---|---|---|---|
debruine-child | {“total”: 71, “men”: 32, “women”: 39} | 71 | 32 | 39 |
debruine-fret | {“total”: 48, “men”: 24, “women”: 24} | 48 | 24 | 24 |
debruine-sex | {“total”: 136, “men”: 86, “women”: 50} | 136 | 86 | 50 |
debruine-tnl | {“total”: 144, “men”: 66, “women”: 78} | 144 | 66 | 78 |
osf_id | name | osf_type | public | parent | size | downloads |
---|---|---|---|---|---|---|
e2aks | Action-specific disruption of perceptual confidence | nodes | TRUE | NA | NA | NA |
7jh5v | Data | nodes | TRUE | e2aks | NA | NA |
pj4e8 | Analysis scripts | nodes | TRUE | e2aks | NA | NA |
553e58658c5e4a219919a629 | allData_orientation.txt | files | NA | 7jh5v | 1096113 | 41 |
553e58658c5e4a219919a62a | allData_contrast_M1.txt | files | NA | 7jh5v | 691560 | 68 |
553e58658c5e4a219919a62c | Mratio_contrast_M1.txt | files | NA | 7jh5v | 5508 | 31 |
553e58658c5e4a219919a628 | Mratio_all.txt | files | NA | 7jh5v | 15326 | 34 |
553e58658c5e4a219919a62b | allData_contrast_PMC.txt | files | NA | 7jh5v | 712804 | 59 |
553e7e168c5e4a21991a4dac | tms_analysis.R | files | NA | pj4e8 | 5670 | 66 |
* all_p_values: List all p-values in the text, returning the matched text (e.g., 'p = 0.04') and document location in a table.
* all_urls: List all the URLs in the main text.
* effect_size: Detect t-tests and F-tests with missing effect sizes
* exact_p: List any p-values reported with insufficient precision (e.g., p < .05 or p = n.s.)
* marginal: List all sentences that describe an effect as 'marginally significant'.
* osf_check: List all OSF links and whether they are open, closed, or do not exist.
* ref_consistency: Check if all references are cited and all citations are referenced
* retractionwatch: Flag any cited papers in the RetractionWatch database
* statcheck: Check consistency of p-values and test statistics
Use `module_help("module_name")` for help with a specific module
id | ttests_n | ttests_with_es | ttests_without_es | Ftests_n | Ftests_with_es | Ftests_without_es |
---|---|---|---|---|---|---|
0956797613520608 | 0 | 0 | 0 | 5 | 5 | 0 |
0956797614522816 | 5 | 0 | 5 | 20 | 20 | 0 |
0956797614527830 | 0 | 0 | 0 | 0 | 0 | 0 |
0956797614557697 | 1 | 0 | 1 | 5 | 5 | 0 |
0956797614560771 | 2 | 2 | 0 | 0 | 0 | 0 |
0956797614566469 | 0 | 0 | 0 | 0 | 0 | 0 |
0956797615569001 | 2 | 1 | 1 | 0 | 0 | 0 |
0956797615569889 | 0 | 0 | 0 | 12 | 12 | 0 |
0956797615583071 | 10 | 6 | 4 | 4 | 2 | 2 |
0956797615588467 | 7 | 4 | 3 | 1 | 0 | 1 |
This module uses the {statcheck} package to check the consistency of p-values and test statistics.
We detected possible errors in test statistics
id | stats_found | stats_error | decision_error |
---|---|---|---|
0956797614566469 | 0 | 0 | 0 |
0956797615569001 | 2 | 0 | 0 |
0956797615569889 | 19 | 0 | 0 |
0956797615583071 | 22 | 0 | 0 |
0956797615588467 | 5 | 1 | 0 |
Showing 5 of 5 rows
It currently only works for APA-formatted stats, but we are working on an extended version.
This module scans the text for all p-values and flags those reported inexactly, such as p < .01, p < .10, or p = n.s.
You may have reported some imprecise p-values
id | exact_p | imprecise_p |
---|---|---|
0956797613520608 | 6 | 0 |
0956797614522816 | 39 | 0 |
0956797614527830 | 10 | 3 |
0956797614557697 | 16 | 11 |
0956797614560771 | 3 | 1 |
0956797614566469 | 0 | 0 |
0956797615569001 | 2 | 23 |
0956797615569889 | 24 | 2 |
0956797615583071 | 21 | 3 |
0956797615588467 | 13 | 8 |
Showing 10 of 10 rows
This module searches the text for phrases such as “marginally significant” or “borderline significance” and flags them.
You described effects as marginally/borderline/close to significant. It is better to write ‘did not reach the threshold alpha for significance’.
text | section | div | p | s |
---|---|---|---|---|
Although the PTSD group showed a significant facilitation effect (M = 0.68), F(1, 15) = 5.575, p … | results | 11 | 1 | 4 |
A marginally significant negative correlation was found between suppression-induced forgetting on… | results | 13 | 1 | 4 |
When we more closely matched depression symptoms by comparing the low-BDI-II PTSD group with the … | results | 14 | 3 | 1 |
In that analysis, the group difference in suppression-induced forgetting only approached signific… | results | 15 | 1 | 2 |
Showing 4 of 4 rows
This module scans text for references to OSF projects and checks their status, flagging users if any of the links are either broken or lead to inaccessible private projects.
id | osf.open | osf.closed | osf.invalid |
---|---|---|---|
0956797613520608 | 0 | 0 | 0 |
0956797614522816 | 0 | 0 | 0 |
0956797614527830 | 0 | 0 | 0 |
0956797614557697 | 2 | 0 | 0 |
0956797614560771 | 0 | 0 | 0 |
0956797614566469 | 0 | 0 | 0 |
0956797615569001 | 1 | 0 | 0 |
0956797615569889 | 0 | 1 | 0 |
0956797615583071 | 2 | 0 | 0 |
0956797615588467 | 0 | 0 | 0 |
This modules checks for missing references or citations.
This module relies on Grobid correctly parsing the references. There may be some false positives.
There are references that are not cited or citations that are not referenced
bib_id | ref | bibtype | title | journal | year | authors | missing |
---|---|---|---|---|---|---|---|
b3 | Bressan P, Grassi M (Bressan, P., & Grassi, M. Parental. resemblance in one-year-olds and the Gau… | Article | resemblance in one-year-olds and the Gaussian curve | Evolution and Human Behavior | Bressan, P., & Grassi, M. Parental. resemblance in one-year-olds and the Gaussian curve. Evolutio… | P Bressan, M Grassi | citation |
b8 | Debruine LM (Submitted for publication). “Context-specific responses to a cue of kinship: Facial … | Article | Context-specific responses to a cue of kinship: Facial resemblance increases the attractiveness o… | Proceedings of the Royal Society of London B | Submitted for publication | L M Debruine | citation |
Showing 2 of 2 rows
This module searches the RetractionWatch database for all cited references in a paper and flags those that have been retracted.
Center for Open Science
papercheck - download the package or submit issues
VeriSci - join a community to create or test modules
@debruine - see what else I’m up to