Expression quantification表达定量

rnaseq-expression-flow

This flow measures transcript-derived signal from FASTQ files and summarizes it to gene-level matrices. It is the default abundance route before DESeq2 and reporting.

这个流程从 FASTQ 测量转录本来源的表达信号,并汇总成基因层面矩阵。它是 DESeq2 和报告之前的默认表达丰度路线。

0.1.0-r1After index-flow接在参考流程之后GitHub

Typical command

典型命令

taf-rnaseq-expression-flow \
  --samples samples.tsv \
  --index ref-out/03_results/salmon_index \
  --tx2gene ref-out/03_results/tx2gene.tsv \
  --outdir expression-out \
  --threads 8

Input requirements

输入要求

samples.tsv is the central FASTQ manifest. It describes biological samples, not sequencing lanes; merge lanes before this step or give one row per final sample. Relative paths are resolved from the directory containing samples.tsv. The Salmon index and tx2gene.tsv should come from the same rnaseq-index-flow output.

samples.tsv 是 FASTQ 样本清单。它描述的是最终生物学样本,而不是下机 lane;多 lane 数据建议先合并后再进入这一步,或者保证每行就是最终样本。相对路径按 samples.tsv 所在目录解释。Salmon 索引和 tx2gene.tsv 应来自同一个 rnaseq-index-flow 输出。

Single-end samples.tsv

单端 samples.tsv

sample_id	read1
WT_01	reads/WT_01.fq.gz
WT_02	reads/WT_02.fq.gz

Paired-end samples.tsv

双端 samples.tsv

sample_id	read1	read2
WT_01	reads/WT_01_R1.fq.gz	reads/WT_01_R2.fq.gz
WT_02	reads/WT_02_R1.fq.gz	reads/WT_02_R2.fq.gz

sample_id must be unique and should contain only letters, numbers, dots, underscores, or hyphens. The output matrix column names are taken from sample_id, so choose stable analysis names here.

sample_id 必须唯一,建议只使用字母、数字、点、下划线或短横线。表达矩阵的列名会直接来自 sample_id,所以这里应使用稳定、可读、不会后续再改的分析样本名。

tx2gene.tsv

The transcript-to-gene table maps Salmon transcript IDs to gene IDs. It must match the transcript FASTA used to build the Salmon index.

转录本到基因的映射表用于把 Salmon 的转录本层面结果汇总到基因层面。它必须和构建 Salmon 索引时使用的转录本 FASTA 匹配。

tx_id	gene_id
TX1	GENE1
TX2	GENE1
TX3	GENE2

Complete parameter reference

完整参数说明

Parameter参数Required是否必需Default默认值Meaning and when to change it含义与选择建议
--samplesyesnoneFASTQ sample table. Required columns are sample_id and read1; add read2 for paired-end reads.FASTQ 样本表。必需列为 sample_idread1;双端数据增加 read2
--indexyesnoneSalmon transcriptome index directory, usually ref-out/03_results/salmon_index. Must contain info.json.Salmon 转录组索引目录,通常是 ref-out/03_results/salmon_index,其中必须有 info.json
--tx2geneyesnoneTranscript-to-gene TSV with tx_id and gene_id. Needed for gene-level matrices.包含 tx_idgene_id 两列的转录本到基因映射表,用于生成基因层面矩阵。
--outdir, -oyesnoneDedicated output directory. Existing directories are refused unless --force is used.专用输出目录。目录已存在时默认拒绝运行,除非使用 --force
--threads, -tno1Threads for FastQC, fastp, and Salmon. Increase for many samples or large FASTQ files.FastQC、fastp 和 Salmon 使用的线程数。样本多或 FASTQ 较大时可以调高。
--library-typenoASalmon library type. A lets Salmon infer strandedness and is safest when the library protocol is uncertain.Salmon 文库类型。A 表示自动推断链特异性;文库协议不确定时最稳妥。
--quantifiernosalmonQuantifier selector. r1 accepts only salmon.定量工具选择。r1 只支持 salmon
--trimnooffRun fastp and quantify cleaned FASTQ. Enable for raw reads that have not already been adapter/quality trimmed.先运行 fastp,再用清洗后的 FASTQ 定量。原始 reads 尚未接头/质量修剪时启用。
--skip-fastqcnooffSkip raw FASTQ FastQC. Use only when equivalent QC already exists and speed matters.跳过原始 FASTQ 的 FastQC。只有已有等价 QC 且需要节省时间时使用。
--min-assigned-fragsno10Salmon --minAssignedFrags. Keep the default unless tiny fixtures or special low-depth tests require relaxing it.Salmon 的 --minAssignedFrags。一般保留默认;极小测试或特殊低深度数据才考虑放宽。
--counts-from-abundancenonotximport count handling: no, scaledTPM, lengthScaledTPM, or dtuScaledTPM. Change only when the statistical plan explicitly calls for TPM-scaled counts.tximport 生成计数的方式:noscaledTPMlengthScaledTPMdtuScaledTPM。只有统计方案明确需要 TPM 缩放计数时才修改。
--forcenooffReplace standard outputs inside an existing output directory.允许替换已有输出目录中的标准结果。

How it connects

如何接上下游

taf-rnaseq-de-flow \
  --counts expression-out/03_results/matrices/gene_counts.tsv \
  --metadata metadata.tsv \
  --design '~ condition' \
  --contrast condition:treated:control \
  --outdir de-out

The report flow can collect --expression-out expression-out.

report-flow 可以通过 --expression-out expression-out 收集这一步结果。

Key outputs and limits

关键输出与边界

Use 03_results/matrices/gene_counts.tsv for DESeq2 and gene_tpm.tsv for expression-level inspection. This flow does not decide significance and does not replace group-level statistical testing.

差异表达使用 03_results/matrices/gene_counts.tsv;表达水平检查可看 gene_tpm.tsv。这个流程不判断显著性,也不替代分组统计检验。