Genome alignment基因组比对
rnaseq-alignment-flow
This optional flow aligns FASTQ reads to the genome with HISAT2 and produces coordinate-sorted BAM files. Use it when you need genome-aware evidence, browser inspection, featureCounts, or alignment-level QC.
这是可选流程,使用 HISAT2 将 FASTQ reads 比对到基因组,并生成坐标排序 BAM。需要基因组层面证据、浏览器查看、featureCounts 或比对层面质控时使用它。
Typical command
典型命令
taf-rnaseq-alignment-flow \
--samples samples.tsv \
--index ref-out/03_results/hisat2_index/genome \
--outdir align-out \
--threads 8
Input requirements
输入要求
The sample table follows the same FASTQ contract as expression-flow: one row per final sample, sample_id and read1 required, optional read2 for paired-end data. Relative FASTQ paths are resolved from the sample table directory. The HISAT2 index should be produced by rnaseq-index-flow --genome-indexer hisat2, using the same genome and annotation release used elsewhere.
样本表使用和表达流程相同的 FASTQ 契约:每个最终样本一行,必须包含 sample_id 和 read1,双端数据增加 read2。FASTQ 相对路径按样本表所在目录解释。HISAT2 索引应由 rnaseq-index-flow --genome-indexer hisat2 生成,并与其他步骤使用同一套基因组和注释版本。
samples.tsv
sample_id read1 read2
WT_01 reads/WT_01_R1.fq.gz reads/WT_01_R2.fq.gz
SNF2_01 reads/SNF2_01_R1.fq.gz reads/SNF2_01_R2.fq.gz
HISAT2 index path
HISAT2 索引路径
ref-out/03_results/hisat2_index/genome
ref-out/03_results/hisat2_index/genome.1.ht2
ref-out/03_results/hisat2_index/genome.2.ht2
Complete parameter reference
完整参数说明
| Parameter | 参数 | Required | 是否必需 | Default | 默认值 | Meaning and when to change it | 含义与选择建议 |
|---|
--samples | yes是 | none | FASTQ sample table with sample_id, read1, and optional read2.FASTQ 样本表,包含 sample_id、read1,双端数据可包含 read2。 |
--index | yes是 | none | HISAT2 index prefix or a directory with exactly one prefix. From index-flow, pass ref-out/03_results/hisat2_index/genome.HISAT2 索引前缀,或只包含一个 HISAT2 前缀的目录。来自参考流程时传 ref-out/03_results/hisat2_index/genome。 |
--outdir, -o | yes是 | none | Dedicated output directory. Existing directories are refused unless --force is used.专用输出目录。目录已存在时默认拒绝运行,除非使用 --force。 |
--aligner | no否 | hisat2 | Aligner selector. r1 supports only hisat2.比对工具选择。r1 只支持 hisat2。 |
--threads, -t | no否 | 2 | Threads for HISAT2, SAMtools, and fastp. Increase for larger FASTQ files.HISAT2、SAMtools 和 fastp 使用的线程数。FASTQ 较大时可以调高。 |
--trim | no否 | off | Run fastp before alignment. Enable when raw FASTQ has not already been cleaned.比对前先运行 fastp。原始 FASTQ 尚未清洗时启用。 |
--rna-strandness | no否 | none | HISAT2 RNA strandness: none, F, R, FR, or RF. Set only when the library protocol is known.HISAT2 RNA 链特异性参数:none、F、R、FR 或 RF。只有明确知道文库协议时设置。 |
--min-mapq | no否 | 0 | Filter final BAM records by MAPQ when greater than zero. Keep 0 if downstream tools should see all mapped reads.大于 0 时按 MAPQ 过滤最终 BAM。若希望下游工具看到所有已比对 reads,保留 0。 |
--keep-sam | no否 | off | Keep intermediate SAM files for debugging. It increases disk usage substantially.保留中间 SAM 文件用于调试,但会显著增加磁盘占用。 |
--force | no否 | off | Replace standard outputs inside an existing output directory.允许替换已有输出目录中的标准结果。 |
How it connects
如何接上下游
taf-rnaseq-count-flow \
--bams align-out/04_reports/bam_files.tsv \
--annotation ref-out/03_results/annotation/genes.gtf \
--outdir count-out
taf-rnaseq-alignment-qc-flow \
--bams align-out/04_reports/bam_files.tsv \
--gtf ref-out/03_results/annotation/genes.gtf \
--outdir alignment-qc-out
Key outputs and limits
关键输出与边界
Key outputs are sorted BAM files, BAI indexes, aligner logs, alignment_summary.tsv, and bam_files.tsv. Alignment is heavier than Salmon quantification and is not required for the default expression route.
关键输出是排序 BAM、BAI 索引、比对日志、alignment_summary.tsv 和 bam_files.tsv。比对比 Salmon 定量更耗时、更占磁盘,并不是默认表达路线的必需步骤。