Usage Manual使用手册

rnaseq-standard-flow

The standard flow is the public umbrella command for bulk RNA-seq. Version 0.2.0-r2 supports the original reference route and an explicit no-reference route. It composes stable subflows rather than reimplementing them, preserves each output block, collects split PNG/PDF plots, and renders the final bilingual project report.

rnaseq-standard-flow 0.2.0-r2 是 bulk RNA-seq 的公开总入口,支持原有有参路线和显式无参路线。它组合稳定子流程,而不是重新实现全部逻辑;运行后保留每个输出块,收集拆分的 PNG/PDF 图,并生成最终双语项目报告。

0.2.0-r2Reference and explicit de novo umbrella有参和显式无参总流程Official docs正式手册GitHub

Reference route command

有参路线命令

Use this mode when genome FASTA and a matching annotation are available. It remains the default mode and keeps the previous standard-flow user experience.

当拥有基因组 FASTA 和匹配注释时使用该模式。它仍是默认模式,并保持旧版 standard-flow 的使用体验。

taf-rnaseq-standard-flow \
  --samples samples.tsv \
  --genome genome.fa \
  --annotation genes.gff3 \
  --metadata metadata.tsv \
  --design '~ condition' \
  --contrast condition:treated:control \
  --gene-sets gene_sets.gmt \
  --background background.tsv \
  --outdir rnaseq-standard-out \
  --threads 8

Add --route both to produce HISAT2 BAM, featureCounts, and alignment QC evidence. Add --route both --de-source featurecounts only when DESeq2 should use featureCounts counts instead of Salmon/tximport counts.

添加 --route both 可生成 HISAT2 BAM、featureCounts 和比对质控证据。只有希望 DESeq2 使用 featureCounts 而不是 Salmon/tximport counts 时,才添加 --route both --de-source featurecounts

De novo route command

无参路线命令

Use this mode only when a reliable genome and annotation are unavailable or intentionally not used. The mode must be selected explicitly with --mode denovo. It does not create genome-aligned BAM or featureCounts outputs.

只有缺少可靠基因组/注释,或明确不使用参考时,才使用该模式。必须用 --mode denovo 显式选择。它不会生成基因组比对 BAM 或 featureCounts 输出。

taf-rnaseq-standard-flow \
  --mode denovo \
  --samples samples.tsv \
  --metadata metadata.tsv \
  --design '~ condition' \
  --contrast condition:treated:control \
  --protein-db proteins.faa \
  --go-map protein_go_map.tsv \
  --outdir rnaseq-denovo-standard-out \
  --threads 16 \
  --max-memory 64G

--protein-db is a local protein FASTA used for homology annotation. --go-map maps protein IDs to GO terms so the annotation flow can derive a transcript-space GMT/background for enrichment. These are not sequencing-machine outputs; they are external biological reference resources chosen by the analyst.

--protein-db 是用于同源注释的本地蛋白 FASTA。--go-map 把蛋白 ID 映射到 GO term,使注释流程能派生转录本空间的 GMT/background 供富集使用。这些不是测序下机数据,而是分析者选择的外部生物学参考资源。

Input files and how to build them

输入文件与构建方式

samples.tsv

One biological sample per row. Required columns are sample_id and read1; paired-end data adds read2. Relative paths are resolved from the table location.

每个生物学样本一行。必需列是 sample_idread1;双端数据添加 read2。相对路径按样本表所在目录解释。

metadata.tsv

One row per sample. The default sample column is sample. Variables used in --design and --contrast must appear here.

每个样本一行。默认样本列名为 sample--design--contrast 使用的变量必须来自这张表。

Reference inputs

有参输入

--genome and --annotation must come from the same release. Gene IDs from the annotation define the ID space for expression, DE, background, and enrichment.

--genome--annotation 必须来自同一版本。注释中的 gene ID 定义表达、差异、背景和富集使用的 ID 空间。

De novo annotation inputs

无关注释输入

--protein-db provides homology evidence. --go-map enables GO-derived enrichment. Without GO mapping, the report can still summarize assembly/expression/DE, but enrichment may be unavailable.

--protein-db 提供同源证据。--go-map 支持 GO 派生富集。没有 GO 映射时,报告仍可总结组装、表达和差异分析,但富集可能不可用。

FASTQ samples

FASTQ 样本表

sample_id	read1	condition
WT_01	reads/WT_01.fq.gz	WT
KO_01	reads/KO_01.fq.gz	treated

Metadata and design

样本元数据

sample	condition	batch
WT_01	control	A
KO_01	treated	A

GMT and background

GMT 和背景

GO_TERM	description	GENE1	GENE2

gene_id
GENE1
GENE2

Protein-to-GO map

蛋白到 GO 映射

protein_id	go_id
P12345	GO:0006412
P12345	GO:0005737

Parameter reference

参数说明

ParameterModeDefaultMeaning
--modeallreferenceAnalysis mode. Use reference for genome+annotation and explicit denovo for no-reference analysis.分析模式。基因组+注释使用 reference;无参分析必须显式使用 denovo
--samples, --metadata, --design, --contrast, --outdirallnoneCore sample, statistical design, comparison, and output-directory inputs.核心样本、统计设计、比较组和输出目录输入。
--genome, --annotationreferencenoneRequired for reference mode. Must match the same genome release.有参模式必需,必须来自同一参考版本。
--gene-sets, --backgroundreferencenoneOffline enrichment resources. Background is strongly recommended for ORA.离线富集资源。ORA 强烈建议提供 background。
--routereferencesalmonsalmon runs the lightweight expression route. both adds HISAT2 alignment, alignment QC, and featureCounts.salmon 运行轻量表达路线;both 额外运行 HISAT2 比对、比对 QC 和 featureCounts。
--de-sourcereferencesalmonDESeq2 count source. featurecounts requires --route both. De novo mode uses de novo transcript counts.DESeq2 计数来源。featurecounts 需要 --route both。无参模式使用无参转录本计数。
--protein-db, --go-mapdenovononeProtein FASTA and protein-to-GO map for de novo annotation and enrichment.无关注释和富集使用的蛋白 FASTA 与蛋白到 GO 映射。
--assembler, --max-memory, --min-contig-len, --ss-lib-typedenovotrinity / 16G / 200 / autoAssembly controls passed to rnaseq-denovo-assembly-flow.传给 rnaseq-denovo-assembly-flow 的组装控制参数。
--denovo-min-orf-aa, --denovo-evalue, --denovo-max-target-seqsdenovo100 / 1e-5 / 1Annotation controls passed to the de novo annotation flow.传给无关注释流程的 ORF 和同源搜索参数。
--threadsall2Threads for subflows. De novo assembly usually needs more CPU, memory, and disk than reference expression.传给子流程的线程数。无参组装通常比有参表达需要更多 CPU、内存和磁盘。
--project-name, --forceallRNA-seq project / offReport title and output replacement switch.报告标题和输出覆盖开关。

Key outputs

关键输出

Shared report outputs

共享报告输出

  • 04_reports/rnaseq_report.html
  • 04_reports/report_interpretation.html
  • 04_reports/commands.sh, versions.tsv, flow_summary.tsv
  • 03_results/plots/png/ and 03_results/plots/pdf/

Reference blocks

有参结果块

  • 03_results/reference/
  • 03_results/expression/
  • 03_results/alignment/, count/, alignment_qc/ when --route both

De novo blocks

无参结果块

  • 03_results/denovo_assembly/
  • 03_results/denovo_expression/
  • 03_results/denovo_annotation/

Statistical blocks

统计结果块

  • 03_results/de/
  • 03_results/enrichment/ when gene sets are available
  • run.manifest.json

Boundaries

边界

The flow does not download reference genomes, annotations, protein databases, GO resources, BUSCO lineages, or other large biological resources during normal execution. De novo mode does not produce reference-gene-level conclusions unless a stable mapping exists. For expression-only projects without contrasts, use expression matrices and avoid over-interpreting DE/enrichment sections.

流程正常运行时不下载参考基因组、注释、蛋白数据库、GO 资源、BUSCO lineage 或其他大型生物学资源。无参模式不会自动产生参考基因层面的结论,除非存在稳定映射。对于只有表达矩阵需求、没有可比较分组的项目,应使用表达矩阵输出,并避免过度解读差异和富集部分。