Validation • papercheck

library(papercheck)
library(dplyr)

This is a demo of the workflow for module validation. We are still piloting this workflow and it is likely to change.

Validation sample

Create a paper list object for the papers in your validation sample. Here, we’ll just use the first 10 papers in the psychsci set, but in practice you will need many more papers.

sample_papers <- psychsci[1:10]

Expected Results

Create objects for the expected results of the module you’re validating. You can test any or all of the typically returned table or summary tables, as well as any other custom results.

This usually requires quite a lot of manual work to determine the ground truth for each paper in your validation sample.

Results Table

For returned tables, the columns should have the same names as the columns returned by the module. You can omit any columns (except id) and they will not be checked in the validation. Here, we will validate only the text column.

[!NOTE] You can use search_text(sample_papers) to get a list of all sentences in the sample, or narrow it down to sentences that match a search term. This can give you a starting table that you can code more easily for expected results.

Summary Table

We can also check the summary table. This is likely to be more useful for reporting validation statistics, as you can report for what percent of papers each column matches the expected value.

Run Validation

The validate() function takes the paper list as a first argument, the module name or path as the second argument, and then the expected values of any expected results. These arguments must be named and have the same names as the results returned from the module, such as table, summary, or traffic_light.

v <- validate(sample_papers, 
              module = "marginal", 
              table = exp_table, 
              summary = exp_summary)

If you print the result, it will give you a text summary of the validation.

v
#>  Validated matches for module `marginal`:
#> 
#> * N in validation sample: 10
#> * table: 
#>   * true_positive: 4
#>   * false_positive: 0
#>   * false_negative: 0
#> * summary: 
#>   * marginal: 1

Results List

The result is actually a list with the module name, the observed results of the module for each expected return object, a list of match information for each expected return object, and stats for this match information.

sapply(v, names) |> str()
#> List of 4
#>  $ module  : NULL
#>  $ observed: chr [1:2] "table" "summary"
#>  $ matches : chr [1:2] "table" "summary"
#>  $ stats   : chr [1:3] "n_papers" "table" "summary"

Non-Summary Tables

For tables where there are zero or more rows possible per id, the matches table gives you expected, observed, and match columns.

v$matches$table

The stats for such tables gives you the number of true positives, false positives, and false negatives. This is for all columns, not column-by-column, since there may be multiple rows per paper id.

v$stats$table
#> $true_positive
#> [1] 4
#> 
#> $false_positive
#> [1] 0
#> 
#> $false_negative
#> [1] 0

[!NOTE] The stats section does not report true negatives because the total sample N can differ from module to module. For example, a module that identifies any sentences that describe an effect as ‘marginally significant’ has a total sample N of all the sentences in all the papers. Alternatively, a module that identifies whether each paper reports at least one power analysis has a total sample N of the number of papers.

Summary Tables

For summary tables, where there is one row per paper id, the matches table is a little different. For each non-id column, it returned the expected and observed values, plus a column stating whether these match.

v$matches$summary

The stats gives you the percent of matches for each column.

v$stats$summary
#> $marginal
#> [1] 1