Differential expression差异表达

rnaseq-de-flow

This flow turns expression counts plus sample metadata into DESeq2 statistical results, normalized matrices, gene lists, and diagnostic plots. It is the point where expression measurement becomes an explicit comparison between biological conditions.

这个流程把表达计数矩阵和样本元数据转换成 DESeq2 统计结果、归一化矩阵、基因列表和诊断图。它是表达测量变成“不同生物条件之间显式比较”的节点。

0.2.0-r1After expression/count接在表达或计数之后GitHub

Counts-first command

基于计数矩阵的命令

taf-rnaseq-de-flow \
  --counts expression-out/03_results/matrices/gene_counts.tsv \
  --metadata metadata.tsv \
  --design '~ condition' \
  --contrast condition:treated:control \
  --outdir de-out

Inputs, design, and contrast

输入、设计公式和比较组

The count matrix contains one gene column and one column per sample. Metadata sample IDs must match count matrix columns exactly. --design is the DESeq2 model formula; --contrast uses factor:numerator:denominator, where positive log2 fold-change means numerator is higher than denominator.

计数矩阵包含一个基因列和每个样本一列。metadata 中的样本 ID 必须和计数矩阵列名完全一致。--design 是 DESeq2 模型公式；--contrast 格式是 factor:numerator:denominator，log2 fold-change 为正表示 numerator 相对 denominator 上调。

gene_counts.tsv

基因计数矩阵

gene_id	WT_01	WT_02	SNF2_01	SNF2_02
YAL001C	10	12	40	42
YAL002W	5	8	7	9

metadata.tsv

sample	condition	batch
WT_01	WT	b1
WT_02	WT	b2
SNF2_01	SNF2KO	b1
SNF2_02	SNF2KO	b2

Use --design '~ condition' for a simple two-group comparison. Use --design '~ batch + condition' only when batch is known, represented in the metadata, and sufficiently balanced to estimate. Reliable DESeq2 analysis requires biological replication.

简单两组比较可用 --design '~ condition'。只有批次已知、写在 metadata 中，并且样本结构足以估计批次效应时，才使用 --design '~ batch + condition'。可靠的 DESeq2 分析需要生物学重复。

Complete parameter reference

完整参数说明

Parameter	参数	Required	是否必需
`--counts`	yes是	none	Gene count matrix. Default gene column is `gene_id`; remaining columns are sample IDs.基因计数矩阵。默认基因列是 `gene_id`，其余列为样本 ID。
`--metadata`	yes是	none	Sample metadata table. Default sample column is `sample`.样本元数据表。默认样本列名是 `sample`。
`--design`	yes是	none	DESeq2 formula such as `'~ condition'` or `'~ batch + condition'`. It defines the statistical model.DESeq2 公式，例如 `'~ condition'` 或 `'~ batch + condition'`，用于定义统计模型。
`--contrast`	yes是	none	Contrast in `factor:numerator:denominator` form, for example `condition:treated:control`.比较组，格式为 `factor:numerator:denominator`，例如 `condition:treated:control`。
`--outdir`, `-o`	yes是	none	Dedicated output directory. Existing directories are refused unless `--force` is used.专用输出目录。目录已存在时默认拒绝运行，除非使用 `--force`。
`--sample-column`	no否	`sample`	Sample ID column in metadata. Change when your metadata uses another column name.metadata 中的样本 ID 列名。你的 metadata 使用其他列名时修改。
`--gene-column`	no否	`gene_id`	Gene/feature ID column in the count matrix.计数矩阵中的基因/feature ID 列名。
`--padj-cutoff`	no否	0.05	Adjusted p-value cutoff for significant gene lists and volcano highlighting.用于显著基因列表和火山图高亮的校正 P 值阈值。
`--lfc-cutoff`	no否	1	Absolute log2 fold-change cutoff. 1 means at least two-fold change.绝对 log2 fold-change 阈值。1 表示至少 2 倍变化。
`--fit-type`	no否	`parametric`	DESeq2 dispersion fit: `parametric`, `local`, or `mean`. Change only when diagnostics suggest the default fit is poor.DESeq2 离散度拟合方式：`parametric`、`local` 或 `mean`。只有诊断显示默认拟合不佳时才修改。
`--lfc-shrink`	no否	`none`	Optional log2FC shrinkage: `none`, `ashr`, or `apeglm`.可选 log2FC 收缩：`none`、`ashr` 或 `apeglm`。
`--coef`	conditional条件必需	none	DESeq2 coefficient name. Required only for `--lfc-shrink apeglm`.DESeq2 coefficient 名称。仅在 `--lfc-shrink apeglm` 时需要。
`--min-count`	no否	1	Count threshold for low-expression prefiltering.低表达预过滤的 count 阈值。
`--min-samples`	no否	2	Minimum samples that must meet `--min-count`.至少多少个样本需要达到 `--min-count`。
`--top-var`	no否	500	Top variable genes used for PCA selection.用于 PCA 选择的高变基因数量。
`--top-heatmap`	no否	50	Top variable genes shown in the heatmap; top-gene expression plot shows up to 12 strongest DE genes.热图展示的高变基因数量；top gene expression 图最多展示 12 个最强差异基因。
`--force`	no否	off	Replace standard outputs inside an existing output directory.允许替换已有输出目录中的标准结果。

How it connects

如何接上下游

Default standard-flow uses expression-flow counts. With --route both --de-source featurecounts, the same DE flow can consume count-flow's featureCounts matrix.

标准流程默认使用表达流程产生的计数矩阵。使用 --route both --de-source featurecounts 时，同一个差异表达流程可以改用计数流程产生的 featureCounts 矩阵。

taf-rnaseq-enrichment-flow \
  --gene-list de-out/03_results/gene_lists/significant_genes.tsv \
  --ranked-genes de-out/03_results/gene_lists/ranked_genes.tsv \
  --gene-sets gene_sets.gmt \
  --background background.tsv \
  --outdir enrichment-out

Key outputs and limits

关键输出与边界

Key outputs include results.tsv, normalized_counts.tsv, ranked_genes.tsv, significant gene lists, PCA, sample correlation, MA, volcano, heatmap, distribution, and top-gene plots. The flow cannot fix weak experimental design or missing biological replication.

关键输出包括 results.tsv、normalized_counts.tsv、ranked_genes.tsv、显著基因列表，以及 PCA、样本相关性、MA、火山图、热图、分布图和 top gene 图。流程不能修复薄弱实验设计，也不能替代生物学重复。