Functional enrichment功能富集
rnaseq-enrichment-flow
This flow interprets DE results at the gene-set level. It runs ORA on significant genes and GSEA on ranked genes using offline GMT resources, then renders a consistent plot suite.
这个流程在基因集层面解释差异表达结果。它使用离线 GMT 资源,对显著基因做 ORA,对排序基因做 GSEA,并生成统一风格图表。
Typical command
典型命令
taf-rnaseq-enrichment-flow \
--gene-list de-out/03_results/gene_lists/significant_genes.tsv \
--ranked-genes de-out/03_results/gene_lists/ranked_genes.tsv \
--gene-sets gene_sets.gmt \
--background background.tsv \
--outdir enrichment-out
Input requirements
输入要求
Provide a GMT gene-set file and at least one analysis input: --gene-list for ORA, --ranked-genes for preranked GSEA, or both. Gene IDs are treated as exact strings, so the ID system must match across DE output, GMT, and background.
必须提供 GMT 基因集文件,并至少提供一种分析输入:--gene-list 用于 ORA,--ranked-genes 用于 preranked GSEA,或两者都提供。基因 ID 会按字符串精确匹配,因此 DE 输出、GMT 和 background 的 ID 体系必须一致。
gene_list.tsv
gene_id
YAL001C
YBR160W
ranked_genes.tsv
gene_id score
YAL001C 4.2
YBR160W -1.7
gene_sets.gmt
GO_BP_RIBOSOME ribosome biogenesis YAL001C YBR160W YDL014W
GO_BP_STRESS stress response YER103W YML007W
background.tsv
gene_id
YAL001C
YBR160W
YDL014W
YER103W
--background should represent the tested gene universe, such as genes that passed expression filtering in DESeq2. Without it, ORA may test against an overly broad database universe.
--background 应代表真实被检验的基因背景,例如通过 DESeq2 表达过滤的基因集合。没有 background 时,ORA 可能使用过宽泛的数据库背景,解释会更弱。
Complete parameter reference
完整参数说明
| Parameter | 参数 | Required | 是否必需 | Default | 默认值 | Meaning and when to change it | 含义与选择建议 |
|---|
--gene-sets | yes是 | none | Standard GMT file. Each row is set ID, description, then genes.标准 GMT 文件。每行依次为 set ID、description、多个基因 ID。 |
--gene-list | one input至少一种输入 | none | Gene list for ORA. Can be a one-column list or a TSV containing --id-column.ORA 使用的基因列表。可以是一列基因,也可以是包含 --id-column 的 TSV。 |
--ranked-genes | one input至少一种输入 | none | Ranked gene table for GSEA. Must contain gene IDs and numeric scores.GSEA 使用的排序基因表,必须包含基因 ID 和数值分数。 |
--background | recommended推荐 | none | ORA universe. Usually the tested or detectable gene universe from DE analysis.ORA 检验背景,通常是 DE 分析中实际被检验或可检测的基因集合。 |
--outdir, -o | yes是 | none | Dedicated output directory. Existing directories are refused unless --force is used.专用输出目录。目录已存在时默认拒绝运行,除非使用 --force。 |
--id-column | no否 | gene_id | Gene ID column in gene list, background, and ranked tables.gene list、background 和 ranked table 中的基因 ID 列名。 |
--score-column | no否 | score | Numeric score column in ranked genes, often signed statistic or signed log10 p-value.ranked genes 中的数值分数列,常用有方向的统计量或有方向的 log10 p 值。 |
--min-size | no否 | 2 | Minimum gene-set size after filtering. Raise to remove unstable very small sets.过滤后的最小基因集大小。调高可去除非常小、不稳定的集合。 |
--max-size | no否 | 500 | Maximum gene-set size after filtering. Lower to remove overly broad terms.过滤后的最大基因集大小。调低可去除过于宽泛的条目。 |
--pvalue-cutoff | no否 | 1 | Raw p-value cutoff for retained rows. Default keeps broad output for later filtering and reporting.保留结果行的 raw p-value 阈值。默认保留较宽结果,方便后续过滤和报告。 |
--padj-method | no否 | BH | ORA p-value adjustment method: holm, hochberg, hommel, bonferroni, BH, BY, fdr, or none.ORA 的 P 值校正方法:holm、hochberg、hommel、bonferroni、BH、BY、fdr 或 none。 |
--top-n | no否 | 20 | Number of terms drawn in the main plots. It does not change the full result tables.主图中绘制的条目数量;不会改变完整结果表。 |
--seed | no否 | 1 | Random seed for fgsea/GSEA-related steps.fgsea/GSEA 相关随机步骤的随机种子。 |
--force | no否 | off | Replace standard outputs inside an existing output directory.允许替换已有输出目录中的标准结果。 |
How it connects
如何接上下游
It consumes DE output and is collected by report-flow with --enrichment-out enrichment-out. In standard-flow, it runs when gene-set inputs are provided.
它消费差异表达输出,并由报告流程通过 --enrichment-out enrichment-out 收集。在标准流程中,只要提供基因集输入,它就会运行。
Key outputs and limits
关键输出与边界
Outputs include ora_results.tsv, gsea_results.tsv, readable dotplots, original-style dotplots, ORA barplot, GSEA NES plot, enrichment curves, and plot provenance tables. Enrichment suggests biological hypotheses; it does not prove that a pathway is active.
输出包括 ora_results.tsv、gsea_results.tsv、可读气泡图、classic/original 风格气泡图、ORA 柱状图、GSEA NES 图、enrichment curves 和绘图溯源表。富集分析用于提出生物学假设,不等同于证明某条通路一定被激活。