Flow Map流程地图

RNA-seq flow family

RNA-seq 流程家族

The TAFFISH RNA-seq suite is a family of small, versioned, auditable flow apps. The reference route starts from a genome and annotation; the de novo route starts from reads and builds the transcript feature space first. Both routes keep command provenance, logs, output contracts, and static reports explicit.

TAFFISH RNA-seq 套件由一组小而清楚、可版本化、可审计的流程应用组成。有参路线从基因组和注释开始;无参路线从 reads 开始,先构建转录本特征空间。两条路线都显式保留命令溯源、日志、输出契约和静态报告。

How the flows connect

这些流程如何连接

Reference mode remains the default for projects with a reliable genome and annotation. The optional alignment lane adds genome-aware evidence. De novo mode is an explicit no-reference route that replaces the reference index and expression segment with assembly, de novo quantification, and annotation.

有参模式仍是拥有可靠基因组和注释项目的默认路线。可选比对分支补充基因组位置证据。无参模式是显式 no-reference 路线,用组装、无参定量和注释替代有参 index 与 expression 段。

rnaseq-standard-flow one command that orchestrates reference or de novo mode 根据参数编排有参或无参路线的一站式入口
Reference expression route 有参表达路线 FASTQ plus genome and annotation to expression, DE, enrichment, and report 从 FASTQ、基因组和注释到表达、差异、富集和报告
rnaseq-index-flowreference package参考资源包
->
rnaseq-expression-flowSalmon + tximport
-> -> ->
Optional alignment and count evidence route 可选比对和计数证据路线 enabled by --route both; DE can switch to featureCounts with --de-source featurecounts 通过 --route both 启用;差异分析可用 --de-source featurecounts 切换到 featureCounts
rnaseq-index-flowHISAT2 index
->
rnaseq-alignment-flowsorted BAM排序 BAM
->
rnaseq-count-flowfeatureCounts
+
rnaseq-alignment-qc-flowBAM/RNA-seq QCBAM/RNA-seq 质控
->
rnaseq-report-flowcollected evidence证据汇总
Explicit de novo route 显式无参路线 enabled by --mode denovo; no genome or annotation is required, but annotation resources must be supplied for functional interpretation 通过 --mode denovo 启用;不需要基因组或注释,但功能解释需要用户提供注释资源
->
rnaseq-denovo-expression-flowSalmon on transcripts
->
rnaseq-denovo-annotation-flowTransDecoder + DIAMOND
->
rnaseq-de-flowtranscript-level DE转录本层面差异
->
rnaseq-enrichment-flowwhen GMT exists有 GMT 时运行
->
rnaseq-report-flowde novo report无参报告

The standard flow is an umbrella, not a black box

standard-flow 是总入口,不是黑盒

rnaseq-standard-flow is designed for users who want to start from local FASTQ files and receive a coherent analysis directory plus a bilingual HTML report. In reference mode, the inputs are FASTQ, genome FASTA, annotation, metadata, and optional gene sets. In de novo mode, the inputs are FASTQ, metadata, a protein database for homology annotation, and an optional protein-to-GO map for enrichment.

rnaseq-standard-flow 面向希望从本地 FASTQ 开始,并得到一个完整分析目录和双语 HTML 报告的用户。有参模式输入 FASTQ、基因组 FASTA、注释、元数据和可选基因集;无参模式输入 FASTQ、元数据、用于同源注释的蛋白数据库,以及用于富集分析的可选蛋白到 GO 映射表。

The default behavior remains reference Salmon-first. De novo mode is never entered silently just because a genome was omitted; users must select --mode denovo. This protects routine reference analyses from accidental mode switches and makes no-reference interpretation boundaries visible in the final report.

默认行为仍然是有参 Salmon-first。流程不会因为用户漏传 genome 就悄悄切到无参;必须显式选择 --mode denovo。这样可以避免常规有参分析被意外切换,也能让无参解释边界在最终报告中清楚呈现。

For the most complete command-line manuals, including parameter-by-parameter reference and full reference/de novo examples, use the executable flow documentation in the rnaseq-standard-flow GitHub docs.

如果要正式分析自己的数据,最完整的命令行说明、逐参数解释和有参/无参完整示例,请阅读 rnaseq-standard-flow GitHub docs

Flow responsibilities

每个流程的职责

rnaseq-index-flow

0.1.0-r1

Builds the reusable reference contract from a genome and annotation: standardized annotation, transcript FASTA, tx2gene.tsv, Salmon/Kallisto indexes, and optional HISAT2 genome index.

从基因组和注释构建可复用参考契约:标准化注释、转录本 FASTA、tx2gene.tsv、Salmon/Kallisto 索引,以及可选 HISAT2 基因组索引。

rnaseq-expression-flow

0.1.0-r1

Quantifies reads against a reference transcriptome with Salmon, imports transcript evidence to gene-level matrices through tximport, and summarizes read/quantification QC.

使用 Salmon 将 reads 定量到有参转录组,通过 tximport 汇总到基因层面矩阵,并汇总 reads 与定量 QC。

rnaseq-denovo-assembly-flow

0.1.0-r1

Starts the no-reference route. It performs read QC/trimming, Trinity-first transcriptome assembly, transcript filtering, assembly statistics, optional offline BUSCO, and read-support summaries.

无参路线第一步。它完成 reads QC/修剪、Trinity-first 转录组组装、转录本过滤、组装统计、可选离线 BUSCO 和 reads 支持度摘要。

rnaseq-denovo-expression-flow

0.1.0-r1

Builds a Salmon index from assembled transcripts, quantifies each sample, and emits transcript-level count/TPM matrices. Gene or cluster matrices are produced only when an explicit mapping is supplied.

从组装转录本构建 Salmon 索引,对每个样本定量,并输出转录本层面 count/TPM 矩阵。只有显式提供映射时,才生成 gene 或 cluster 层面矩阵。

rnaseq-denovo-annotation-flow

0.1.0-r1

Predicts ORFs with TransDecoder, searches predicted proteins against a user-provided protein database with DIAMOND, writes annotation tables, and can derive transcript-space GMT/background files from a GO map.

使用 TransDecoder 预测 ORF,通过 DIAMOND 将预测蛋白搜索到用户提供的蛋白数据库,输出注释表,并可根据 GO 映射派生转录本空间的 GMT/background。

rnaseq-de-flow

0.1.0-r2

Runs DESeq2 from a count matrix and metadata. In reference mode this is usually gene-level DE; in de novo mode it may be transcript-level unless a stable mapping was supplied.

基于计数矩阵和样本元数据运行 DESeq2。有参模式通常是基因层面差异;无参模式如果没有稳定映射,则多为转录本层面差异。

rnaseq-enrichment-flow

0.1.0-r3

Interprets DE at gene-set or transcript-set level through offline GMT-based ORA and GSEA. The input universe must match the feature ID space used for DE.

通过离线 GMT 进行 ORA 和 GSEA,把差异结果提升到 gene set 或 transcript set 层面解释。输入背景必须匹配 DE 使用的特征 ID 空间。

rnaseq-alignment-flow

0.1.0-r1

Maps reads back to a reference genome with HISAT2 and produces sorted BAM files. It is useful for genome browser inspection, alignment evidence, featureCounts, and BAM QC.

使用 HISAT2 将 reads 比对回参考基因组并生成排序 BAM。适合基因组浏览器查看、比对证据、featureCounts 和 BAM 质控。

rnaseq-count-flow

0.1.0-r1

Converts aligned BAM evidence into a featureCounts gene-level matrix. It supports an alignment-derived DE source and comparison against Salmon/tximport counts.

把比对 BAM 转换成 featureCounts 基因计数矩阵,可作为基于比对的 DE 来源,也可与 Salmon/tximport 计数交叉比较。

rnaseq-alignment-qc-flow

0.1.0-r1

Evaluates BAM-level evidence with SAMtools, RSeQC, Qualimap, and MultiQC. It explains whether the alignment branch is technically trustworthy.

结合 SAMtools、RSeQC、Qualimap 和 MultiQC 评估 BAM 层面证据,说明比对分支在技术上是否可信。

rnaseq-report-flow

0.2.0-r2

Collects upstream outputs into a bilingual static report and interpretation guide. It supports reference and de novo modules, links QC HTML bundles, and preserves provenance without rerunning analysis.

把上游输出收集成双语静态报告和解读指南。它支持有参与无参模块,链接 QC HTML 子报告,并在不重跑分析的前提下保留溯源。

rnaseq-standard-flow

0.2.0-r2

The public end-to-end entrypoint. It composes the selected subflows, keeps their output blocks, collects split PNG/PDF plots, and generates the final project report.

面向用户的一站式入口。它组合被选择的子流程,保留各自输出块,收集拆分的 PNG/PDF 图,并生成最终项目报告。

How to choose a route

如何选择路线

Use reference mode when

这些情况使用有参模式

  • the organism has a reliable genome and annotation;
  • 研究物种有可靠基因组和注释;
  • gene-level expression, DE, and enrichment are the primary deliverables;
  • 主要交付物是基因层面表达、差异分析和富集;
  • speed, interpretability, and compatibility with existing databases matter.
  • 速度、可解释性以及与现有数据库兼容性更重要。

Use de novo mode when

这些情况使用无参模式

  • there is no trusted reference genome or annotation;
  • 没有可信参考基因组或注释;
  • the project can tolerate transcript-level feature interpretation;
  • 项目可以接受转录本层面特征解释;
  • users can provide offline protein and GO resources for annotation.
  • 用户能够提供离线蛋白数据库和 GO 映射资源。

Use --route both only in reference mode. De novo mode does not create genome-aligned BAM, featureCounts matrices, or reference-based BAM QC.

只有有参模式才使用 --route both。无参模式不会生成基因组比对 BAM、featureCounts 矩阵或基于参考的 BAM 质控。