Genome alignment基因组比对

rnaseq-alignment-flow

This optional flow aligns FASTQ reads to the genome with HISAT2 and produces coordinate-sorted BAM files. Use it when you need genome-aware evidence, browser inspection, featureCounts, or alignment-level QC.

这是可选流程,使用 HISAT2 将 FASTQ reads 比对到基因组,并生成坐标排序 BAM。需要基因组层面证据、浏览器查看、featureCounts 或比对层面质控时使用它。

0.1.0-r1Optional evidence route可选证据路线GitHub

Typical command

典型命令

taf-rnaseq-alignment-flow \
  --samples samples.tsv \
  --index ref-out/03_results/hisat2_index/genome \
  --outdir align-out \
  --threads 8

Input requirements

输入要求

The sample table follows the same FASTQ contract as expression-flow: one row per final sample, sample_id and read1 required, optional read2 for paired-end data. Relative FASTQ paths are resolved from the sample table directory. The HISAT2 index should be produced by rnaseq-index-flow --genome-indexer hisat2, using the same genome and annotation release used elsewhere.

样本表使用和表达流程相同的 FASTQ 契约:每个最终样本一行,必须包含 sample_idread1,双端数据增加 read2。FASTQ 相对路径按样本表所在目录解释。HISAT2 索引应由 rnaseq-index-flow --genome-indexer hisat2 生成,并与其他步骤使用同一套基因组和注释版本。

samples.tsv

sample_id	read1	read2
WT_01	reads/WT_01_R1.fq.gz	reads/WT_01_R2.fq.gz
SNF2_01	reads/SNF2_01_R1.fq.gz	reads/SNF2_01_R2.fq.gz

HISAT2 index path

HISAT2 索引路径

ref-out/03_results/hisat2_index/genome
ref-out/03_results/hisat2_index/genome.1.ht2
ref-out/03_results/hisat2_index/genome.2.ht2

Complete parameter reference

完整参数说明

Parameter参数Required是否必需Default默认值Meaning and when to change it含义与选择建议
--samplesyesnoneFASTQ sample table with sample_id, read1, and optional read2.FASTQ 样本表,包含 sample_idread1,双端数据可包含 read2
--indexyesnoneHISAT2 index prefix or a directory with exactly one prefix. From index-flow, pass ref-out/03_results/hisat2_index/genome.HISAT2 索引前缀,或只包含一个 HISAT2 前缀的目录。来自参考流程时传 ref-out/03_results/hisat2_index/genome
--outdir, -oyesnoneDedicated output directory. Existing directories are refused unless --force is used.专用输出目录。目录已存在时默认拒绝运行,除非使用 --force
--alignernohisat2Aligner selector. r1 supports only hisat2.比对工具选择。r1 只支持 hisat2
--threads, -tno2Threads for HISAT2, SAMtools, and fastp. Increase for larger FASTQ files.HISAT2、SAMtools 和 fastp 使用的线程数。FASTQ 较大时可以调高。
--trimnooffRun fastp before alignment. Enable when raw FASTQ has not already been cleaned.比对前先运行 fastp。原始 FASTQ 尚未清洗时启用。
--rna-strandnessnononeHISAT2 RNA strandness: none, F, R, FR, or RF. Set only when the library protocol is known.HISAT2 RNA 链特异性参数:noneFRFRRF。只有明确知道文库协议时设置。
--min-mapqno0Filter final BAM records by MAPQ when greater than zero. Keep 0 if downstream tools should see all mapped reads.大于 0 时按 MAPQ 过滤最终 BAM。若希望下游工具看到所有已比对 reads,保留 0。
--keep-samnooffKeep intermediate SAM files for debugging. It increases disk usage substantially.保留中间 SAM 文件用于调试,但会显著增加磁盘占用。
--forcenooffReplace standard outputs inside an existing output directory.允许替换已有输出目录中的标准结果。

How it connects

如何接上下游

taf-rnaseq-count-flow \
  --bams align-out/04_reports/bam_files.tsv \
  --annotation ref-out/03_results/annotation/genes.gtf \
  --outdir count-out

taf-rnaseq-alignment-qc-flow \
  --bams align-out/04_reports/bam_files.tsv \
  --gtf ref-out/03_results/annotation/genes.gtf \
  --outdir alignment-qc-out

Key outputs and limits

关键输出与边界

Key outputs are sorted BAM files, BAI indexes, aligner logs, alignment_summary.tsv, and bam_files.tsv. Alignment is heavier than Salmon quantification and is not required for the default expression route.

关键输出是排序 BAM、BAI 索引、比对日志、alignment_summary.tsvbam_files.tsv。比对比 Salmon 定量更耗时、更占磁盘,并不是默认表达路线的必需步骤。