Quickstart

Before running Episodic, we recommend reading Tay et al. (2022 and 2023) and becoming familiar with Fixed Local Clock (FLC) models.

Installation

Install Episodic with pip:

pip install episodic

Episodic uses Snakemake and conda environments for the workflow tools. After installing the package, create the workflow environments:

episodic env create

See Installation for more detail.

CLI overview

The main entry point is:

episodic run [OPTIONS]

For a minimal FLC analysis, provide an alignment and one or more foreground groups:

episodic run \
  --alignment data.fasta \
  --group group1 \
  --group group2

List-valued options can be repeated. For example, multiple alignment partitions are supplied by repeating --alignment, and multiple clock models are supplied by repeating --clock:

episodic run \
  --alignment HA1.fasta \
  --alignment HA2.fasta \
  --group bird \
  --clock strict \
  --clock flc-stem

Episodic accepts both workflow options and Snakemake options. Any unrecognised arguments are passed through to Snakemake:

episodic run \
  --config config.yaml \
  --profile slurm \
  --set-resources beast:mem_mb=16G \
  --set-resources beast:runtime=04:00:00

Useful general options:

CLI option	Description
`--config FILE`	Load a YAML workflow config. CLI values override matching config values.
`--profile`, `-p`	Use a Snakemake profile, for example `slurm` or `slurm-gpu`.
`--dry`, `-n`	Dry-run the workflow and print what would run.
`--dag`, `-d`	Write the workflow DAG to `.pdf`, `.png`, or `.svg`.
`--cores`, `-c`	Number of local cores Snakemake may use.
`--no-conda`	Disable workflow conda environments.
`--verbose`, `-v`	Run with verbose output.
`--help-snakemake`, `-hs`	Show Snakemake help.

Other top-level commands include:

Command	Description
`episodic info`	Show workflow information.
`episodic config`	Print the workflow configuration schema/defaults.
`episodic template`	Print the bundled BEAST XML template.
`episodic env`	Manage workflow conda environments.
`episodic script`	Run workflow helper scripts directly.
`episodic profile`	Show bundled Snakemake profiles.

Configuration files

You can run the workflow entirely from command-line options, but a YAML config is usually easier to reproduce:

alignment:
  - data.fasta
group:
  - group1
  - group2
clock:
  - flc-stem
  - relaxed
rate_gamma_prior_shape: 0.5
rate_gamma_prior_scale: 0.1
date_delimiter: "@"
date_index: -1
output:
  dir: results
  dated: false
beast:
  chain_length: 10000000
  samples: 10000
  seed: 12345
  duplicates: 2
  threads: 4
  args: "-beagle -beagle_CPU"
marginal_likelihood:
  estimate: false

Run it with:

episodic run --config config.yaml

CLI options override values from the config file:

episodic run --config config.yaml --beast-chain-length 1000000 --beast-samples 1000

Input naming

Episodic extracts sampling dates from sequence names. By default, headers are split on @, and the last field is interpreted as the date:

[email protected]@2023.75

Use --date-delimiter and --date-index if your headers use a different format.

For FLC models, --group values are matched against sequence headers to define foreground lineages. At least one group is required when any selected clock contains flc.

All Workflow Parameters

The table below lists the workflow config keys, their CLI flags, and defaults.

Config key	CLI flag	Default	Description
`alignment`	`--alignment`, `-a`	`null`	Path to a FASTA alignment partition. Repeat for multiple partitions. Required for real runs.
`group`	`--group`, `-g`	`null`	Header-matching group string used to define FLC foreground lineages. Repeat for multiple groups. Required for FLC clocks.
`foreground_label`	`--foreground-label`	`null`	Optional label for shared foreground/local FLC rate parameters and plot labels. With partitions, the label follows the partition, for example `HA2.Bird.clade.rate`.
`background_label`	`--background-label`	`background`	Optional label for background clock-rate parameters and plot labels. With partitions, the label follows the partition, for example `HA2.Human.clock.rate`; without this option the default is `HA2.background.clock.rate`.
`clock`	`--clock`	`flc-stem`	Clock model to run. Repeat for multiple models. Options: `strict`, `relaxed`, `flc-stem`, `flc-shared-stem`, `flc-clade`, `flc-shared-clade`, `flc-stem-and-clade`, `flc-shared-stem-and-clade`.
`rate_gamma_prior_shape`	`--rate-gamma-prior-shape`, `-shape`	`0.5`	Shape parameter for the gamma prior on clock rates.
`rate_gamma_prior_scale`	`--rate-gamma-prior-scale`, `-scale`	`0.1`	Scale parameter for the gamma prior on clock rates.
`mcc_tree.heights`	`--mcc-tree-heights`	`mean`	Node-height summary for MCC trees. Repeat for multiple values. Options include `mean`, `median`, `keep`, and `ca`.
`date_delimiter`	`--date-delimiter`	`@`	Delimiter used to split dates from sequence headers.
`date_index`	`--date-index`	`-1`	Zero-based field index containing the sampling date after splitting the header. `-1` means the last field.
`newick`	`--newick`	`null`	Optional Newick tree. If provided, the topology is fixed.
`output.dir`	`--output-dir`, `-o`	`.`	Output directory.
`output.dated`	`--output-dated`, `--no-output-dated`	`false`	Create a timestamped subdirectory under `output.dir`.
`beast.chain_length`	`--beast-chain-length`, `-cl`	`10000000`	BEAST MCMC chain length.
`beast.samples`	`--beast-samples`, `-s`	`10000`	Number of samples to draw from the BEAST chain.
`beast.seed`	`--beast-seed`	`null`	Optional base seed used to derive stable, job-specific BEAST seeds.
`beast.duplicates`	`--beast-duplicates`	`2`	Number of duplicate BEAST runs for convergence checks.
`beast.template`	`--beast-template`	bundled template	Optional Episodic BEAST XML template.
`beast.fit_clocks`	`--beast-fit-clocks`, `--no-beast-fit-clocks`	`true`	Run BEAST clock fitting. Disable to run only other requested workflow branches, such as MLE.
`beast.threads`	`--beast-threads`	`4`	Threads passed to BEAST.
`beast.args`	`--beast-args`	`-beagle -beagle_CPU`	Extra command-line arguments passed to BEAST.
`beast.envmodules`	`--beast-envmodules`	`GCC/11.3.0`, `beagle-lib/4.0.1-CUDA-12.2.0`	Environment modules to load for BEAST when Snakemake module loading is enabled. Repeat for multiple modules.
`marginal_likelihood.estimate`	`--marginal-likelihood-estimate`, `-mle`	`false`	Run path-sampling/stepping-stone marginal likelihood estimation.
`marginal_likelihood.path_steps`	`--marginal-likelihood-path-steps`	`100`	Number of path steps for marginal likelihood estimation.
`marginal_likelihood.chain_length`	`--marginal-likelihood-chain-length`	`1000000`	Chain length for marginal likelihood estimation.
`marginal_likelihood.log_every`	`--marginal-likelihood-log-every`	`10000`	Logging interval for marginal likelihood estimation.
`marginal_likelihood.duplicates`	`--marginal-likelihood-duplicates`	`3`	Number of duplicate marginal likelihood runs.

BEAST seeds

Episodic always passes an explicit -seed to BEAST. This avoids BEAST assigning the same time-based seed to jobs that start at the same time on a cluster.

Seeds are derived deterministically from:

beast.seed, if provided, otherwise an internal default base value.
The clock wildcard.
The run name wildcard, including duplicate number.

This means duplicate jobs submitted at the same time receive different seeds, while rerunning the same analysis with the same config and output naming uses the same seeds. To start a new independent set of chains, change beast.seed:

beast:
  seed: 20260506

Outputs

By default, results are written to output.dir. If output.dated: true, a timestamped subdirectory is created under that directory.

For a full list of output files and naming patterns, see Workflow Outputs.