Tutorial: Public Schizophrenia Example

This article documents a real trait2gene analysis run using public data:

  • the public PoPS schizophrenia example data from the FinucaneLab repository

  • a schizophrenia association table downloaded from the GWAS Catalog API

  • a locus file derived from public MAGMA gene-level output

The goal is not to rerun MAGMA locally. MAGMA was not installed in this environment, so the run uses public precomputed MAGMA outputs and still executes the rest of the trait2gene pipeline for real.

Data sources

  • PoPS upstream example repository: https://github.com/FinucaneLab/pops

  • GWAS Catalog schizophrenia associations API: https://www.ebi.ac.uk/gwas/rest/api/v2/associations?efo_id=MONDO_0005090&page=0&size=200

Files used in this run

  • real_runs/schizophrenia_public_example/config.yaml

  • scripts/prepare_public_schizophrenia_example.py

  • scripts/run_public_schizophrenia_example.py

  • real_runs/schizophrenia_public_example/schizophrenia_gwascatalog_hits.tsv

  • real_runs/schizophrenia_public_example/schizophrenia_magma_top_loci.tsv

  • real_runs/schizophrenia_public_example/verification_summary.json

  • real_runs/schizophrenia_public_example/result_snapshot.json

One-shot reproduction

python scripts/run_public_schizophrenia_example.py

This command:

  • clones the public FinucaneLab/pops example repository into real_data/

  • downloads 200 schizophrenia associations from the GWAS Catalog API

  • derives 8 non-overlapping loci from strong non-HLA MAGMA hits

  • generates real_runs/schizophrenia_public_example/config.yaml

  • runs the full trait2gene pipeline

  • writes verification and snapshot JSON sidecars

Manual step-by-step reproduction

python scripts/prepare_public_schizophrenia_example.py
trait2gene validate real_runs/schizophrenia_public_example/config.yaml
trait2gene doctor --config real_runs/schizophrenia_public_example/config.yaml
trait2gene run real_runs/schizophrenia_public_example/config.yaml

validate passes cleanly in this mode because the public example uses resources.precomputed_magma_prefix instead of requiring a local MAGMA binary.

What the pipeline did

  1. validated the config and downloaded public inputs

  2. wrote the resolved manifest and software metadata

  3. copied public PASS_Schizophrenia.genes.out and .genes.raw into work/magma/

  4. ran vendored munge_feature_directory.py on the public raw feature files

  5. ran vendored upstream pops.py

  6. prioritized genes within 8 loci

  7. wrote HTML, JSON, TSV, and metadata outputs

Output locations

The run outputs live under:

  • real_runs/schizophrenia_public_example/results

Important outputs:

  • work/pops/schizophrenia.preds

  • work/pops/schizophrenia.coefs

  • work/pops/schizophrenia.marginals

  • tables/prioritized_genes.tsv

  • tables/all_genes_ranked.tsv

  • tables/top_features.tsv

  • reports/report.html

  • metadata/run_metadata.json

Runtime summary

From result_snapshot.json and metadata/run_metadata.json:

  • total status: ok

  • feature_prep: about 2.51 seconds

  • pops: about 60.57 seconds

  • whole pipeline: about 63.47 seconds

Result snapshot

The run produced:

  • 8 prioritized genes

  • 103 ranked genes across the selected loci

  • 20 top features in the summary table

Top prioritized genes:

Locus

Top gene

magma_top_01

GBF1

magma_top_02

CACNA1C

magma_top_03

TCF4

magma_top_04

IREB2

magma_top_05

PPP1R16B

magma_top_06

PTBP2

magma_top_07

TNRC6B

magma_top_08

KLC1

Top feature names started with:

  • GTEx.53

  • mouse_brain2_projected_pcaloadings_clusters.59

  • mouse_brain2_projected_pcaloadings_clusters.25

  • mouse_brain2_projected_pcaloadings.84

  • mouse_brain2_projected_pcaloadings_clusters.74

Consistency check against upstream PoPS output

As a sanity check, the trait2gene PoPS score output was compared with the public upstream example output:

  • compared genes: 18,383

  • Pearson correlation: ~1.0

  • maximum absolute score difference: 2.47e-14

  • mean absolute score difference: 2.73e-15

That means the wrapped trait2gene execution reproduced the public upstream PoPS example essentially exactly while still producing standardized downstream artifacts and reports.

Why this tutorial uses locus_file

The public PoPS example repository ships:

  • raw features

  • pre-munged features

  • precomputed MAGMA gene outputs

It does not ship the original SNP-level summary statistics used to generate those MAGMA outputs. For that reason:

  • a real schizophrenia association table from GWAS Catalog was downloaded as the config input

  • locus prioritization used a locus file derived from the public MAGMA results