Tutorial: Public Schizophrenia Example¶
This article documents a real trait2gene analysis run using public data:
the public PoPS schizophrenia example data from the FinucaneLab repository
a schizophrenia association table downloaded from the GWAS Catalog API
a locus file derived from public MAGMA gene-level output
The goal is not to rerun MAGMA locally. MAGMA was not installed in this
environment, so the run uses public precomputed MAGMA outputs and still
executes the rest of the trait2gene pipeline for real.
Data sources¶
PoPS upstream example repository:
https://github.com/FinucaneLab/popsGWAS Catalog schizophrenia associations API:
https://www.ebi.ac.uk/gwas/rest/api/v2/associations?efo_id=MONDO_0005090&page=0&size=200
Files used in this run¶
real_runs/schizophrenia_public_example/config.yamlscripts/prepare_public_schizophrenia_example.pyscripts/run_public_schizophrenia_example.pyreal_runs/schizophrenia_public_example/schizophrenia_gwascatalog_hits.tsvreal_runs/schizophrenia_public_example/schizophrenia_magma_top_loci.tsvreal_runs/schizophrenia_public_example/verification_summary.jsonreal_runs/schizophrenia_public_example/result_snapshot.json
One-shot reproduction¶
python scripts/run_public_schizophrenia_example.py
This command:
clones the public
FinucaneLab/popsexample repository intoreal_data/downloads 200 schizophrenia associations from the GWAS Catalog API
derives 8 non-overlapping loci from strong non-HLA MAGMA hits
generates
real_runs/schizophrenia_public_example/config.yamlruns the full
trait2genepipelinewrites verification and snapshot JSON sidecars
Manual step-by-step reproduction¶
python scripts/prepare_public_schizophrenia_example.py
trait2gene validate real_runs/schizophrenia_public_example/config.yaml
trait2gene doctor --config real_runs/schizophrenia_public_example/config.yaml
trait2gene run real_runs/schizophrenia_public_example/config.yaml
validate passes cleanly in this mode because the public example uses
resources.precomputed_magma_prefix instead of requiring a local MAGMA binary.
What the pipeline did¶
validated the config and downloaded public inputs
wrote the resolved manifest and software metadata
copied public
PASS_Schizophrenia.genes.outand.genes.rawintowork/magma/ran vendored
munge_feature_directory.pyon the public raw feature filesran vendored upstream
pops.pyprioritized genes within 8 loci
wrote HTML, JSON, TSV, and metadata outputs
Output locations¶
The run outputs live under:
real_runs/schizophrenia_public_example/results
Important outputs:
work/pops/schizophrenia.predswork/pops/schizophrenia.coefswork/pops/schizophrenia.marginalstables/prioritized_genes.tsvtables/all_genes_ranked.tsvtables/top_features.tsvreports/report.htmlmetadata/run_metadata.json
Runtime summary¶
From result_snapshot.json and metadata/run_metadata.json:
total status:
okfeature_prep: about 2.51 secondspops: about 60.57 secondswhole pipeline: about 63.47 seconds
Result snapshot¶
The run produced:
8 prioritized genes
103 ranked genes across the selected loci
20 top features in the summary table
Top prioritized genes:
Locus |
Top gene |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Top feature names started with:
GTEx.53mouse_brain2_projected_pcaloadings_clusters.59mouse_brain2_projected_pcaloadings_clusters.25mouse_brain2_projected_pcaloadings.84mouse_brain2_projected_pcaloadings_clusters.74
Consistency check against upstream PoPS output¶
As a sanity check, the trait2gene PoPS score output was compared with the
public upstream example output:
compared genes:
18,383Pearson correlation:
~1.0maximum absolute score difference:
2.47e-14mean absolute score difference:
2.73e-15
That means the wrapped trait2gene execution reproduced the public upstream
PoPS example essentially exactly while still producing standardized downstream
artifacts and reports.
Why this tutorial uses locus_file¶
The public PoPS example repository ships:
raw features
pre-munged features
precomputed MAGMA gene outputs
It does not ship the original SNP-level summary statistics used to generate those MAGMA outputs. For that reason:
a real schizophrenia association table from GWAS Catalog was downloaded as the config input
locus prioritization used a locus file derived from the public MAGMA results