Tutorial: Real Paths and End-to-End Runs¶
This walkthrough shows how to run trait2gene with real filesystem paths
instead of auto, so the pipeline can execute without extra path guessing.
Two common patterns¶
1. Full run with external MAGMA and pre-munged features¶
Use this when trait2gene run should execute MAGMA itself and you already have
a PoPS-ready feature prefix.
Start from:
src/trait2gene/data/examples/config.realpaths.template.yaml
The fields that usually need real absolute paths are:
input.sumstatsresources.magma_binresources.reference_panelresources.gene_locationsresources.gene_annotationresources.feature_matrix_prefixresources.control_features_pathoutput.outdir
Minimal example:
project: bmi_realpaths
trait: bmi
input:
sumstats: /data/gwas/bmi/bmi.sumstats.tsv.gz
genome_build: GRCh37
ancestry: EAS
resources:
magma_bin: /opt/magma/magma
reference_panel: /data/reference/1000g/eas/g1000_eas
gene_locations: /data/reference/magma/gene_loc_build37.txt
gene_annotation: /data/reference/pops/gene_annot.tsv
feature_matrix_prefix: /data/reference/pops/features/default_full_v1/pops_features
control_features_path: /data/reference/pops/features/features_jul17_control.txt
output:
outdir: /data/projects/trait2gene/results/bmi_realpaths
Run order:
trait2gene validate config.yaml
trait2gene doctor --config config.yaml
trait2gene run config.yaml
2. Reuse precomputed MAGMA and pre-munged features¶
Use this when MAGMA has already been run elsewhere and you only want PoPS, locus prioritization, and reporting.
Start from:
src/trait2gene/data/examples/config.precomputed.template.yaml
Key path fields:
resources.precomputed_magma_prefixresources.gene_annotationresources.feature_matrix_prefixresources.control_features_pathanalysis.prioritization.locus_fileoutput.outdir
Example commands:
trait2gene validate config.precomputed.yaml
trait2gene run-pops config.precomputed.yaml
trait2gene prioritize config.precomputed.yaml
trait2gene report config.precomputed.yaml
What each path must point to¶
resources.reference_panel¶
This is a PLINK prefix, not a directory. If the prefix is
/data/reference/1000g/eas/g1000_eas, then these files must exist:
/data/reference/1000g/eas/g1000_eas.bed/data/reference/1000g/eas/g1000_eas.bim/data/reference/1000g/eas/g1000_eas.fam
resources.precomputed_magma_prefix¶
This is also a prefix. If it is /data/projects/cad/magma/cad, then these
files must exist:
/data/projects/cad/magma/cad.genes.out/data/projects/cad/magma/cad.genes.raw
resources.feature_matrix_prefix¶
This is the prefix written by munge_feature_directory.py. If it is
/data/reference/pops/features/default_full_v1/pops_features, then there
should be:
pops_features.rows.txtpops_features.cols.0.txt,pops_features.cols.1.txt, …pops_features.mat.0.npy,pops_features.mat.1.npy, …
resources.raw_feature_dir¶
This should be a directory of tab-separated feature files. Each file must:
have a header
use
ENSGIDas the first columnhave unique feature names across all files
If you set resources.raw_feature_dir, trait2gene prep-features runs the
vendored upstream munging script and copies the resulting prefix into
work/features/pops_features.
Recommended validation sequence¶
Before a full run:
trait2gene validate config.yaml
trait2gene doctor --config config.yaml
trait2gene fetch-resources config.yaml
trait2gene inspect outputs /data/projects/trait2gene/results/bmi_realpaths
Standard outputs after success¶
work/magma/*.genes.outand*.genes.rawwork/features/pops_features.*work/pops/*.preds,*.coefs,*.marginalstables/prioritized_genes.tsvtables/all_genes_ranked.tsvreports/report.htmlreports/summary.jsonmetadata/run_metadata.json
Troubleshooting¶
Validation says a prefix is unresolved¶
Usually the path points to the right directory but the wrong prefix basename. Check whether you supplied the full prefix and not only the parent directory.
run-magma fails even though MAGMA exists¶
Check:
resources.magma_binis executableresources.reference_panelpoints to a PLINK prefixresources.gene_locationsmatches the same genome build as your summary statistics
run-pops fails after feature preparation¶
Check:
resources.gene_annotationcontainsENSGID,CHR, andTSSwork/features/pops_features.rows.txtexistswork/magma/<trait>.genes.outand.genes.rawexist