Gene counting基因计数

rnaseq-count-flow

This flow uses featureCounts to convert aligned BAM files into a gene-level count matrix. It is part of the optional alignment/count route and can feed DESeq2 when explicitly selected.

这个流程使用 featureCounts 将比对后的 BAM 文件转换成基因层面计数矩阵。它属于可选的比对/计数路线,并可在显式选择时作为 DESeq2 输入。

0.1.0-r1After alignment-flow接在比对流程之后GitHub

Typical command

典型命令

taf-rnaseq-count-flow \
  --bams align-out/04_reports/bam_files.tsv \
  --annotation ref-out/03_results/annotation/genes.gtf \
  --outdir count-out \
  --threads 8 \
  --strand 0

Input requirements

输入要求

--bams should be the bam_files.tsv written by alignment-flow. The annotation should match the reference used to build the HISAT2 index and should usually be ref-out/03_results/annotation/genes.gtf. BAM files should be coordinate-sorted; BAI indexes are recommended.

--bams 应该是比对流程写出的 bam_files.tsv。注释应与 HISAT2 索引使用的参考一致,通常使用 ref-out/03_results/annotation/genes.gtf。BAM 应该已经按坐标排序;推荐同时提供 BAI 索引。

bam_files.tsv

sample_id	bam	bai
WT_01	bam/WT_01.sorted.bam	bam/WT_01.sorted.bam.bai
SNF2_01	bam/SNF2_01.sorted.bam	bam/SNF2_01.sorted.bam.bai

Optional sample columns

可选样本列

sample_id	bam	condition	batch	strandedness
WT_01	bam/WT_01.sorted.bam	WT	b1	unstranded
SNF2_01	bam/SNF2_01.sorted.bam	SNF2KO	b1	unstranded

Relative BAM and BAI paths are resolved from the directory containing bam_files.tsv. The flow does not modify input BAM files; generated summaries and temporary files go under --outdir.

BAM 和 BAI 的相对路径按 bam_files.tsv 所在目录解释。流程不会修改输入 BAM;所有摘要和临时文件都会写入 --outdir

Complete parameter reference

完整参数说明

Parameter参数Required是否必需Default默认值Meaning and when to change it含义与选择建议
--bamsyesnoneBAM sample table with sample_id and bam; optional bai, condition, batch, and strandedness columns are accepted.BAM 样本表,必须包含 sample_idbam;可选 bai、condition、batch、strandedness 等列。
--annotationyesnoneGTF/GFF annotation for featureCounts. Use the same reference release as the alignments.featureCounts 使用的 GTF/GFF 注释。应与比对使用的参考版本一致。
--outdir, -oyesnoneDedicated output directory. Existing directories are refused unless --force is used.专用输出目录。目录已存在时默认拒绝运行,除非使用 --force
--threads, -tno1featureCounts thread count. Increase for many BAM files or large genomes.featureCounts 使用的线程数。BAM 多或基因组较大时可以调高。
--strandno0featureCounts strand mode: 0 unstranded, 1 stranded, 2 reversely stranded. Match the library protocol.featureCounts 链特异性:0 非链特异,1 正向链特异,2 反向链特异。应与文库协议匹配。
--feature-typenoexonFeature type passed to featureCounts -t. For gene-level RNA-seq, exon is the usual choice.传给 featureCounts -t 的特征类型。基因层面 RNA-seq 通常使用 exon。
--attributenogene_idAnnotation attribute passed to featureCounts -g. Change when the annotation uses another gene identifier key.传给 featureCounts -g 的注释属性。注释使用其他基因 ID 字段时才修改。
--min-mapqno0featureCounts MAPQ threshold when greater than zero.大于 0 时作为 featureCounts 的 MAPQ 过滤阈值。
--pairednooffEnable paired-end counting with -p --countReadPairs. Use only for paired-end BAMs when fragment-level counting is desired.启用 -p --countReadPairs 双端计数。仅在 BAM 来自双端数据且希望按片段/配对计数时使用。
--min-assigned-readsno0Fail if total assigned reads are below this value. Useful for smoke tests or strict QC gates.若总 assigned reads 低于该值则失败。适合 smoke 测试或严格 QC 门槛。
--forcenooffReplace standard outputs inside an existing output directory.允许替换已有输出目录中的标准结果。

How it connects

如何接上下游

taf-rnaseq-de-flow \
  --counts count-out/03_results/matrices/gene_counts.tsv \
  --metadata metadata.tsv \
  --design '~ condition' \
  --contrast condition:treated:control \
  --outdir de-out

In standard-flow, use --route both --de-source featurecounts to select this matrix for DE.

在标准流程中,使用 --route both --de-source featurecounts 可让差异表达分析使用这个矩阵。

Key outputs and limits

关键输出与边界

The main output is 03_results/matrices/gene_counts.tsv, plus featureCounts summaries and assignment summaries. Counting depends strongly on annotation choice, strandedness, and paired-end settings.

主要输出是 03_results/matrices/gene_counts.tsv,以及 featureCounts 汇总和 reads 分配摘要。计数结果强烈依赖注释选择、链特异性和双端参数设置。