De novo route无参路线
rnaseq-denovo-assembly-flow
This flow is the first no-reference RNA-seq subflow. It starts from a FASTQ sample table, performs read QC and optional trimming, assembles transcripts with a Trinity-first route, filters short contigs, summarizes assembly statistics, optionally runs offline BUSCO, and records provenance.
这是无参 RNA-seq 路线的第一个子流程。它从 FASTQ 样本表出发,完成 reads 质控和可选修剪,使用 Trinity-first 路线组装转录本,过滤短 contig,汇总组装统计,可选运行离线 BUSCO,并记录溯源。
Minimal command
最小命令
taf-rnaseq-denovo-assembly-flow \
--samples samples.tsv \
--outdir denovo-assembly-out \
--threads 16 \
--max-memory 64G
Input requirements
输入要求
samples.tsv uses the same FASTQ table contract as the reference expression and alignment flows: one biological sample per row, sample_id and read1 required, read2 for paired-end data. All samples in one run should use the same layout; mixed single-end and paired-end tables should be split before assembly.
samples.tsv 沿用有参 expression/alignment 流程的 FASTQ 样本表契约:每个生物学样本一行,必需列为 sample_id 和 read1,双端数据再添加 read2。同一次运行中的样本应使用同一种 layout;单端和双端混合时应先拆分。
Single-end
单端样本表
sample_id read1 condition
WT_01 reads/WT_01.fq.gz WT
KO_01 reads/KO_01.fq.gz KO
Paired-end
双端样本表
sample_id read1 read2 condition
WT_01 reads/WT_01_R1.fq.gz reads/WT_01_R2.fq.gz WT
KO_01 reads/KO_01_R1.fq.gz reads/KO_01_R2.fq.gz KO
Parameter reference
参数说明
| Parameter | Required | Default | Meaning |
|---|
--samples | yes | none | FASTQ sample table. Relative paths are resolved from the table location.FASTQ 样本表。相对路径按表所在目录解释。 |
--outdir | yes | none | Dedicated output directory; existing outputs are refused unless --force is used.专用输出目录;已有目录默认拒绝,除非使用 --force。 |
--threads | no | 2 | Threads for QC, assembly, and summaries. Trinity benefits from more CPU, but memory and I/O often become limiting.QC、组装和摘要使用的线程数。Trinity 可利用更多 CPU,但内存和 I/O 常成为限制。 |
--max-memory | no | 16G | Memory limit passed to Trinity. Real 24-sample runs usually need a much larger value than toy smoke tests.传给 Trinity 的内存限制。真实 24 样本运行通常需要明显高于 toy smoke 的设置。 |
--assembler | no | trinity | Assembler choice. trinity is the default; rnaspades is an explicit alternate path.组装器选择。默认 trinity;rnaspades 是显式备选路线。 |
--min-contig-len | no | 200 | Minimum transcript length retained after assembly. Raise it to remove more short fragments.组装后保留转录本的最小长度。调高可去除更多短片段。 |
--ss-lib-type | no | auto | Strand-specific library type for Trinity. Use a fixed value only when the library protocol is known.Trinity 链特异性文库类型。只有明确知道文库协议时才固定设置。 |
--busco-lineage | no | none | Local BUSCO lineage path or name. The flow does not download lineage data during normal execution.本地 BUSCO lineage 路径或名称。流程正常运行时不下载 lineage 数据。 |
--trim, --skip-fastqc, --no-normalize | no | off | Read preprocessing and Trinity normalization switches. Use them according to library quality and resource planning.reads 预处理与 Trinity normalization 开关,应根据文库质量和资源计划选择。 |
Key outputs
关键输出
03_results/transcripts/assembled_transcripts.fa03_results/transcripts/assembled_transcripts.filtered.fa03_results/assembly_qc/assembly_stats.tsv03_results/assembly_qc/busco_summary.tsv03_results/assembly_qc/read_support.tsv04_reports/multiqc_report.html, 04_reports/assembly_summary.tsv, 04_reports/commands.sh, run.manifest.json
How it connects
如何连接
The filtered transcript FASTA is the primary downstream contract. Pass it to rnaseq-denovo-expression-flow --transcripts and rnaseq-denovo-annotation-flow --transcripts. In rnaseq-standard-flow --mode denovo, this handoff is performed automatically.
过滤后的转录本 FASTA 是主要下游契约。将它传给 rnaseq-denovo-expression-flow --transcripts 和 rnaseq-denovo-annotation-flow --transcripts。在 rnaseq-standard-flow --mode denovo 中,这个交接会自动完成。