A step back

Replication Crisis

Causes:

  • Questionable Research Practices (QRPs, John et al., 2012)

  • Publication bias

  • Publish or perish!(Callard, 2022)

  • Researchers degrees of freedom (Simmons et al., 2011)

Inflation of false positives

Replication Crisis

Some proposed solutions:

  • Open science

  • Pre-registration

  • Registered Reports

  • Multiverse Analysis (Steegen et al., 2016)

Multiverse Analysis Why?

Multiverse Analysis What?

 

Multiverse of analytical scenarios
(data collection, coding and analysis)

Analysis and presentation of results
from every plausible scenario

↙                ↘

Explorative methods

Inferential methods (PIMA)

PIMA

  • Innovative permutation-based method

    sign-flipping score test

  • Strong Family Wise Error Rate (FWER) control

  • Good statistical power

  • p-value adjustment for multiple comparisons (maxT)

    → allowing for selective inference

  • Applies to Generalized Linear Models (GLMs)

PIMA - Key Strenghts

  • Distribution-Free:
    \(\rightarrow\) No assumptions about data normality (non-parametric)

  • Handles Dependencies:
    \(\rightarrow\) Accounts for correlations between multiverse specifications

The Multiple Comparisons Problem

  • Coin Toss Example:

    What is the probability of getting at least one Head?


    1. One Toss: \(P(\text{Head}) = 0.50\)


    1. Ten Tosses: \(P(\ge 1 \text{ Head}) = 1 - (1 - 0.5)^{10} \approx 0.999\)


    The probability of getting what we want approaches 100%!

Psychology’s “Coin Toss”

  • In Psychology, our coin has a probability of Head of \(5\%\)

  • Then we can keep tossing this coin until we win

  • We eventually get our significant \(p < .05^*\)

The problem is that Head = False Positives…

The Bonferroni Fix

Easy fix for keeping the false positives rate \(< 5\%\)

\[ \alpha_{adj} = \frac{\alpha_{standard}}{\text{number of tries (k)}} \]

Example: if we do ten different tests on different data
\[ \alpha_{adj} = \frac{0.05}{10} = 0.005\]

Now we are safe from false positives…

but it’s incredibly hard to find true effects!

The Multiverse Reality

  • Bonferroni assumes tests are independent (like coin tosses)

  • But Multiverse specifications are highly correlated:
    \(\rightarrow\) similar tests on the same data

  • Bonferroni ignores the correlations
    \(\rightarrow\) Massive Power Loss (High Type II Error)

  • max-T adjustment:
    1. empirically models the correlation structure via permutations
    2. corrects for multiplicity without killing statistical power

Bonferroni vs max-T

Courtesy of Dr. Gambarota.

Multiverse
Meta-Analysis

PIMMA - Case Study Application

Dataset

  • RCTs on psychotherapy effectiveness for depression (Plessen et al., 2023)

  • n of primary studies = 124

  • Population = adults

Multiverse Meta-Analysis How?

Step 1. Creating the Multiverse

Scenarios Model Therapy Format Bias Diagnosis
\(m_1\) EE CBT Individual High Clinical
\(m_2\) RE Non-CBT Group Low Cut-off
\(m_{1920}\) RE All All All All

  • Compute Meta-Analysis for each scenario (\(m_i\))

  • Include Meta-Analyses with at least 10 studies (\(k \geq 10\))

Step 2. Score calculation

  • Compute score (\(z_k\)) for every study k in each scenario \(m_i\)
\[ z_k = \frac{y_k}{v_k + \tau_0^2} \]

\(y_k\) = effect size estimate from study k

\(v_k\) = variance of study k

\(\tau_0^2\) = between-study variance under the null (\(H_0\))

Step 2. Score Matrix

  • Store the scores \(z_k\) in a matrix [k x m]
    → Rows = primary studies scores (\(z_k\))
    → Columns = scenarios/meta-analyses (\(m_i\))
\(m_1\) \(m_2\) \(m_i\) \(m_{1144}\)
\(k_1\) 0.34 0.28 0.00
\(k_2\) - 0.25 0.00 -0.25
\(k_{124}\) 0.00 0.52 0.48

Null values (= 0.00) indicate study \(k_i\) was not included in meta-analysis \(m_i\)

Step 3. Permutation-based Inference

  • Sign-flipping score test (see Girardi et al., 2024)
  1. Randomly multiply each row \(k_i\) by +1 or -1 (sign-flipping)
    → equivalent to re-sampling under the null hypothesis
  2. Compute the test statistic for each permuted scenario \(m_i\)
  3. Repeat B times → null distribution of each scenario \(m_i\)
  4. Compare observed scores vs permuted null distribution
    → raw & adjusted p-values (maxT)

Case Study - Results

p-value adjustment (maxT)


Significant Meta-Analyses

Never = 8 (0.7%)

Before correction = 1136 (99.3%)

After correction = 1030 (90%)

Summary effects (k = 1144)


Mdn = 0.59

\(\boldsymbol{\bar{x}}\) = 0.63

Min-Max = [0.28-1.61]

Clinical Significance \(\geq\) 0.24
(Cuijpers et al., 2014)

Implications

  • Inferential Multiverse Meta-Analysis

    → to enhance transparency and robustness

  • Addressing selective reporting and p-hacking

  • Relative stability of findings on the effectiveness of psychotherapies for depression

Limitations

  • Simplification of dataset and analyses

  • No multilevel meta-analyses

  • No quantitative assessment of publication bias

Future directions

  • PIMMA to consolidate knowledge and evidence
    in psychology

  • Extend the method to multilevel and/or multivariate meta-analyses

Multiverse - Key Takeaways

  • Theory comes first!

    → Ground every model in a solid theoretical framework

  • Be parsimonious

    → Include only well-justified models for statistical power

  • Be exhaustive

    → Account for all relevant variables to avoid inflated false positive rates