Tutorial: Real Paths and End-to-End Runs

This walkthrough shows how to run trait2gene with real filesystem paths instead of auto, so the pipeline can execute without extra path guessing.

Two common patterns

1. Full run with external MAGMA and pre-munged features

Use this when trait2gene run should execute MAGMA itself and you already have a PoPS-ready feature prefix.

Start from:

  • src/trait2gene/data/examples/config.realpaths.template.yaml

The fields that usually need real absolute paths are:

  • input.sumstats

  • resources.magma_bin

  • resources.reference_panel

  • resources.gene_locations

  • resources.gene_annotation

  • resources.feature_matrix_prefix

  • resources.control_features_path

  • output.outdir

Minimal example:

project: bmi_realpaths
trait: bmi

input:
  sumstats: /data/gwas/bmi/bmi.sumstats.tsv.gz
  genome_build: GRCh37
  ancestry: EAS

resources:
  magma_bin: /opt/magma/magma
  reference_panel: /data/reference/1000g/eas/g1000_eas
  gene_locations: /data/reference/magma/gene_loc_build37.txt
  gene_annotation: /data/reference/pops/gene_annot.tsv
  feature_matrix_prefix: /data/reference/pops/features/default_full_v1/pops_features
  control_features_path: /data/reference/pops/features/features_jul17_control.txt

output:
  outdir: /data/projects/trait2gene/results/bmi_realpaths

Run order:

trait2gene validate config.yaml
trait2gene doctor --config config.yaml
trait2gene run config.yaml

2. Reuse precomputed MAGMA and pre-munged features

Use this when MAGMA has already been run elsewhere and you only want PoPS, locus prioritization, and reporting.

Start from:

  • src/trait2gene/data/examples/config.precomputed.template.yaml

Key path fields:

  • resources.precomputed_magma_prefix

  • resources.gene_annotation

  • resources.feature_matrix_prefix

  • resources.control_features_path

  • analysis.prioritization.locus_file

  • output.outdir

Example commands:

trait2gene validate config.precomputed.yaml
trait2gene run-pops config.precomputed.yaml
trait2gene prioritize config.precomputed.yaml
trait2gene report config.precomputed.yaml

What each path must point to

resources.reference_panel

This is a PLINK prefix, not a directory. If the prefix is /data/reference/1000g/eas/g1000_eas, then these files must exist:

  • /data/reference/1000g/eas/g1000_eas.bed

  • /data/reference/1000g/eas/g1000_eas.bim

  • /data/reference/1000g/eas/g1000_eas.fam

resources.precomputed_magma_prefix

This is also a prefix. If it is /data/projects/cad/magma/cad, then these files must exist:

  • /data/projects/cad/magma/cad.genes.out

  • /data/projects/cad/magma/cad.genes.raw

resources.feature_matrix_prefix

This is the prefix written by munge_feature_directory.py. If it is /data/reference/pops/features/default_full_v1/pops_features, then there should be:

  • pops_features.rows.txt

  • pops_features.cols.0.txt, pops_features.cols.1.txt, …

  • pops_features.mat.0.npy, pops_features.mat.1.npy, …

resources.raw_feature_dir

This should be a directory of tab-separated feature files. Each file must:

  • have a header

  • use ENSGID as the first column

  • have unique feature names across all files

If you set resources.raw_feature_dir, trait2gene prep-features runs the vendored upstream munging script and copies the resulting prefix into work/features/pops_features.

Standard outputs after success

  • work/magma/*.genes.out and *.genes.raw

  • work/features/pops_features.*

  • work/pops/*.preds, *.coefs, *.marginals

  • tables/prioritized_genes.tsv

  • tables/all_genes_ranked.tsv

  • reports/report.html

  • reports/summary.json

  • metadata/run_metadata.json

Troubleshooting

Validation says a prefix is unresolved

Usually the path points to the right directory but the wrong prefix basename. Check whether you supplied the full prefix and not only the parent directory.

run-magma fails even though MAGMA exists

Check:

  • resources.magma_bin is executable

  • resources.reference_panel points to a PLINK prefix

  • resources.gene_locations matches the same genome build as your summary statistics

run-pops fails after feature preparation

Check:

  • resources.gene_annotation contains ENSGID, CHR, and TSS

  • work/features/pops_features.rows.txt exists

  • work/magma/<trait>.genes.out and .genes.raw exist