Architecture

trait2gene is structured as a thin Python orchestration layer around external MAGMA and upstream-compatible PoPS components.

Core responsibilities

The package owns:

  • the CLI surface and command routing

  • config parsing and validation

  • resource resolution and manifest writing

  • standardized output layout

  • provenance and run metadata

  • locus prioritization and report generation

MAGMA stays external, and PoPS compatibility is preserved through vendored upstream scripts plus a small compatibility layer.

Execution model

The full pipeline follows this high-level sequence:

  1. validate config and input contracts

  2. resolve resources and write manifests

  3. run or copy MAGMA outputs

  4. prepare feature matrices

  5. execute PoPS

  6. prioritize genes within loci

  7. write tables, metadata, and reports

Package layout

Important package areas:

  • trait2gene.cli CLI commands and Typer app wiring

  • trait2gene.config Pydantic models and config loaders

  • trait2gene.resources manifest and resource resolution helpers

  • trait2gene.engine planning, logging, provenance, and pipeline orchestration

  • trait2gene.workflows stage implementations

  • trait2gene.domain loci, ranking, and QC helpers

  • trait2gene.report HTML and table rendering

  • trait2gene.vendor vendored upstream PoPS scripts

Why the stages stay separate

Each stage is exposed as its own command so you can:

  • validate configs without running heavy steps

  • reuse precomputed MAGMA outputs

  • prepare features independently of PoPS

  • rerun reporting or prioritization on existing outputs