This is a demo of the workflow for module validation. We are still piloting this workflow and it is likely to change.
Validation sample
Create a paper list object for the papers in your validation sample.
Here, we’ll just use the first 10 papers in the psychsci
set, but in practice you will need many more papers.
sample_papers <- psychsci[1:10]
Expected Results
Create objects for the expected results of the module you’re
validating. You can test any or all of the typically returned
table
or summary
tables, as well as any other
custom results.
This usually requires quite a lot of manual work to determine the ground truth for each paper in your validation sample.
Results Table
For returned tables, the columns should have the same names as the
columns returned by the module. You can omit any columns (except
id
) and they will not be checked in the validation. Here,
we will validate only the text column.
[!NOTE] You can use
search_text(sample_papers)
to get a list of all sentences in the sample, or narrow it down to sentences that match a search term. This can give you a starting table that you can code more easily for expected results.
Run Validation
The validate()
function takes the paper list as a first
argument, the module name or path as the second argument, and then the
expected values of any expected results. These arguments must be named
and have the same names as the results returned from the module, such as
table
, summary
, or
traffic_light
.
v <- validate(sample_papers,
module = "marginal",
table = exp_table,
summary = exp_summary)
If you print the result, it will give you a text summary of the validation.
v
#> Validated matches for module `marginal`:
#>
#> * N in validation sample: 10
#> * table:
#> * true_positive: 4
#> * false_positive: 0
#> * false_negative: 0
#> * summary:
#> * marginal: 1
Results List
The result is actually a list with the module name, the observed results of the module for each expected return object, a list of match information for each expected return object, and stats for this match information.
Non-Summary Tables
For tables where there are zero or more rows possible per id, the
matches
table gives you expected
,
observed
, and match
columns.
v$matches$table
The stats for such tables gives you the number of true positives, false positives, and false negatives. This is for all columns, not column-by-column, since there may be multiple rows per paper id.
v$stats$table
#> $true_positive
#> [1] 4
#>
#> $false_positive
#> [1] 0
#>
#> $false_negative
#> [1] 0
[!NOTE] The stats section does not report true negatives because the total sample N can differ from module to module. For example, a module that identifies any sentences that describe an effect as ‘marginally significant’ has a total sample N of all the sentences in all the papers. Alternatively, a module that identifies whether each paper reports at least one power analysis has a total sample N of the number of papers.
Summary Tables
For summary tables, where there is one row per paper id, the matches table is a little different. For each non-id column, it returned the expected and observed values, plus a column stating whether these match.
v$matches$summary
The stats gives you the percent of matches for each column.
v$stats$summary
#> $marginal
#> [1] 1