Differential expression差异表达
rnaseq-de-flow
This flow turns expression counts plus sample metadata into DESeq2 statistical results, normalized matrices, gene lists, and diagnostic plots. It is the point where expression measurement becomes an explicit comparison between biological conditions.
这个流程把表达计数矩阵和样本元数据转换成 DESeq2 统计结果、归一化矩阵、基因列表和诊断图。它是表达测量变成“不同生物条件之间显式比较”的节点。
Counts-first command
基于计数矩阵的命令
taf-rnaseq-de-flow \
--counts expression-out/03_results/matrices/gene_counts.tsv \
--metadata metadata.tsv \
--design '~ condition' \
--contrast condition:treated:control \
--outdir de-out
Inputs, design, and contrast
输入、设计公式和比较组
The count matrix contains one gene column and one column per sample. Metadata sample IDs must match count matrix columns exactly. --design is the DESeq2 model formula; --contrast uses factor:numerator:denominator, where positive log2 fold-change means numerator is higher than denominator.
计数矩阵包含一个基因列和每个样本一列。metadata 中的样本 ID 必须和计数矩阵列名完全一致。--design 是 DESeq2 模型公式;--contrast 格式是 factor:numerator:denominator,log2 fold-change 为正表示 numerator 相对 denominator 上调。
gene_counts.tsv
基因计数矩阵
gene_id WT_01 WT_02 SNF2_01 SNF2_02
YAL001C 10 12 40 42
YAL002W 5 8 7 9
metadata.tsv
sample condition batch
WT_01 WT b1
WT_02 WT b2
SNF2_01 SNF2KO b1
SNF2_02 SNF2KO b2
Use --design '~ condition' for a simple two-group comparison. Use --design '~ batch + condition' only when batch is known, represented in the metadata, and sufficiently balanced to estimate. Reliable DESeq2 analysis requires biological replication.
简单两组比较可用 --design '~ condition'。只有批次已知、写在 metadata 中,并且样本结构足以估计批次效应时,才使用 --design '~ batch + condition'。可靠的 DESeq2 分析需要生物学重复。
Complete parameter reference
完整参数说明
| Parameter | 参数 | Required | 是否必需 | Default | 默认值 | Meaning and when to change it | 含义与选择建议 |
|---|
--counts | yes是 | none | Gene count matrix. Default gene column is gene_id; remaining columns are sample IDs.基因计数矩阵。默认基因列是 gene_id,其余列为样本 ID。 |
--metadata | yes是 | none | Sample metadata table. Default sample column is sample.样本元数据表。默认样本列名是 sample。 |
--design | yes是 | none | DESeq2 formula such as '~ condition' or '~ batch + condition'. It defines the statistical model.DESeq2 公式,例如 '~ condition' 或 '~ batch + condition',用于定义统计模型。 |
--contrast | yes是 | none | Contrast in factor:numerator:denominator form, for example condition:treated:control.比较组,格式为 factor:numerator:denominator,例如 condition:treated:control。 |
--outdir, -o | yes是 | none | Dedicated output directory. Existing directories are refused unless --force is used.专用输出目录。目录已存在时默认拒绝运行,除非使用 --force。 |
--sample-column | no否 | sample | Sample ID column in metadata. Change when your metadata uses another column name.metadata 中的样本 ID 列名。你的 metadata 使用其他列名时修改。 |
--gene-column | no否 | gene_id | Gene/feature ID column in the count matrix.计数矩阵中的基因/feature ID 列名。 |
--padj-cutoff | no否 | 0.05 | Adjusted p-value cutoff for significant gene lists and volcano highlighting.用于显著基因列表和火山图高亮的校正 P 值阈值。 |
--lfc-cutoff | no否 | 1 | Absolute log2 fold-change cutoff. 1 means at least two-fold change.绝对 log2 fold-change 阈值。1 表示至少 2 倍变化。 |
--fit-type | no否 | parametric | DESeq2 dispersion fit: parametric, local, or mean. Change only when diagnostics suggest the default fit is poor.DESeq2 离散度拟合方式:parametric、local 或 mean。只有诊断显示默认拟合不佳时才修改。 |
--lfc-shrink | no否 | none | Optional log2FC shrinkage: none, ashr, or apeglm.可选 log2FC 收缩:none、ashr 或 apeglm。 |
--coef | conditional条件必需 | none | DESeq2 coefficient name. Required only for --lfc-shrink apeglm.DESeq2 coefficient 名称。仅在 --lfc-shrink apeglm 时需要。 |
--min-count | no否 | 1 | Count threshold for low-expression prefiltering.低表达预过滤的 count 阈值。 |
--min-samples | no否 | 2 | Minimum samples that must meet --min-count.至少多少个样本需要达到 --min-count。 |
--top-var | no否 | 500 | Top variable genes used for PCA selection.用于 PCA 选择的高变基因数量。 |
--top-heatmap | no否 | 50 | Top variable genes shown in the heatmap; top-gene expression plot shows up to 12 strongest DE genes.热图展示的高变基因数量;top gene expression 图最多展示 12 个最强差异基因。 |
--force | no否 | off | Replace standard outputs inside an existing output directory.允许替换已有输出目录中的标准结果。 |
How it connects
如何接上下游
Default standard-flow uses expression-flow counts. With --route both --de-source featurecounts, the same DE flow can consume count-flow's featureCounts matrix.
标准流程默认使用表达流程产生的计数矩阵。使用 --route both --de-source featurecounts 时,同一个差异表达流程可以改用计数流程产生的 featureCounts 矩阵。
taf-rnaseq-enrichment-flow \
--gene-list de-out/03_results/gene_lists/significant_genes.tsv \
--ranked-genes de-out/03_results/gene_lists/ranked_genes.tsv \
--gene-sets gene_sets.gmt \
--background background.tsv \
--outdir enrichment-out
Key outputs and limits
关键输出与边界
Key outputs include results.tsv, normalized_counts.tsv, ranked_genes.tsv, significant gene lists, PCA, sample correlation, MA, volcano, heatmap, distribution, and top-gene plots. The flow cannot fix weak experimental design or missing biological replication.
关键输出包括 results.tsv、normalized_counts.tsv、ranked_genes.tsv、显著基因列表,以及 PCA、样本相关性、MA、火山图、热图、分布图和 top gene 图。流程不能修复薄弱实验设计,也不能替代生物学重复。