Alignment QC比对质控

rnaseq-alignment-qc-flow

This flow evaluates BAM-level RNA-seq evidence with SAMtools, RSeQC, Qualimap, and MultiQC. It asks whether the alignment branch is technically trustworthy before BAM-derived results are interpreted.

这个流程结合 SAMtools、RSeQC、Qualimap 和 MultiQC 评估 BAM 层面的 RNA-seq 证据。它回答的是:在解释 BAM 衍生结果前,比对分支在技术上是否可信。

0.1.0-r1After alignment-flow接在比对流程之后GitHub

Typical command

典型命令

taf-rnaseq-alignment-qc-flow \
  --bams align-out/04_reports/bam_files.tsv \
  --gtf ref-out/03_results/annotation/genes.gtf \
  --outdir alignment-qc-out \
  --threads 8 \
  --sequencing-protocol non-strand-specific

Input requirements

输入要求

Use the BAM table from alignment-flow and the matching GTF from index-flow. The BAM table must contain sample_id and bam; bai is recommended and otherwise an existing BAM.bai must be present. If you already have a curated RSeQC BED, pass it with --annotation-bed; otherwise the flow derives one from the GTF.

使用比对流程生成的 BAM 表,以及参考流程生成的匹配 GTF。BAM 表必须包含 sample_idbam;推荐提供 bai,否则输入 BAM 旁边必须已有 BAM.bai。如果已有人工确认的 RSeQC BED,可通过 --annotation-bed 传入;否则流程会从 GTF 生成。

bam_files.tsv

sample_id	bam	bai
WT_01	bam/WT_01.sorted.bam	bam/WT_01.sorted.bam.bai
SNF2_01	bam/SNF2_01.sorted.bam	bam/SNF2_01.sorted.bam.bai

Annotation inputs

注释输入

--gtf ref-out/03_results/annotation/genes.gtf
# optional:
--annotation-bed curated_rseqc_genes.bed

Complete parameter reference

完整参数说明

Parameter参数Required是否必需Default默认值Meaning and when to change it含义与选择建议
--bamsyesnoneBAM sample table with sample_id and bam; bai is recommended.BAM 样本表,必须包含 sample_idbam;推荐提供 bai
--gtfyesnoneGene annotation in GTF format. Qualimap uses it directly; RSeQC BED can be derived from it.GTF 格式基因注释。Qualimap 直接使用;RSeQC 需要的 BED 可由它生成。
--outdir, -oyesnoneDedicated output directory. Existing directories are refused unless --force is used.专用输出目录。目录已存在时默认拒绝运行,除非使用 --force
--annotation-bednoderivedOptional BED gene model for RSeQC. Provide a curated BED when automatic GTF conversion is not suitable.RSeQC 使用的 BED gene model。自动 GTF 转换不适合时可提供人工确认的 BED。
--threads, -tno1Recorded thread count. r1 runs per-sample QC commands serially.记录用线程数。r1 的逐样本 QC 命令仍按串行方式运行。
--java-mem-sizeno4GQualimap Java memory setting. Increase for large BAM files when Qualimap reports memory errors.Qualimap Java 内存设置。大 BAM 导致 Qualimap 内存不足时调高。
--mapqno30MAPQ cutoff for RSeQC bam_stat.py and infer_experiment.py.RSeQC bam_stat.pyinfer_experiment.py 使用的 MAPQ 阈值。
--infer-sample-sizeno200000Reads sampled by RSeQC strandedness inference. Increase for noisy or very large datasets.RSeQC 推断链特异性时抽样的 reads 数量。数据噪音大或规模很大时可调高。
--sequencing-protocolnonon-strand-specificQualimap protocol: non-strand-specific, strand-specific-forward, or strand-specific-reverse. Match the library protocol.Qualimap 文库协议:non-strand-specificstrand-specific-forwardstrand-specific-reverse。应与真实文库协议匹配。
--pairednooffEnable Qualimap paired-end mode when BAMs came from paired-end reads.BAM 来自双端 reads 时启用 Qualimap 双端模式。
--forcenooffReplace standard outputs inside an existing output directory.允许替换已有输出目录中的标准结果。

How it connects

如何接上下游

It is collected by report-flow with --alignment-qc-out alignment-qc-out. In standard-flow, it runs only when --route both is enabled.

report-flow 通过 --alignment-qc-out alignment-qc-out 收集它。在 standard-flow 中,它只会在启用 --route both 时运行。

Key outputs and limits

关键输出与边界

Outputs include SAMtools stats, RSeQC results, Qualimap HTML reports, rnaseq_qc_summary.tsv, and MultiQC. This flow does not quantify expression; it explains the reliability of the alignment evidence.

输出包括 SAMtools stats、RSeQC 结果、Qualimap HTML 报告、rnaseq_qc_summary.tsv 和 MultiQC。这个流程不做表达定量;它说明比对证据是否可靠。