De novo route无参路线

rnaseq-denovo-assembly-flow

This flow is the first no-reference RNA-seq subflow. It starts from a FASTQ sample table, performs read QC and optional trimming, assembles transcripts with a Trinity-first route, filters short contigs, summarizes assembly statistics, optionally runs offline BUSCO, and records provenance.

这是无参 RNA-seq 路线的第一个子流程。它从 FASTQ 样本表出发,完成 reads 质控和可选修剪,使用 Trinity-first 路线组装转录本,过滤短 contig,汇总组装统计,可选运行离线 BUSCO,并记录溯源。

0.1.0-r1FASTQ to assembled transcriptomeFASTQ 到组装转录组GitHub

Minimal command

最小命令

taf-rnaseq-denovo-assembly-flow \
  --samples samples.tsv \
  --outdir denovo-assembly-out \
  --threads 16 \
  --max-memory 64G

Input requirements

输入要求

samples.tsv uses the same FASTQ table contract as the reference expression and alignment flows: one biological sample per row, sample_id and read1 required, read2 for paired-end data. All samples in one run should use the same layout; mixed single-end and paired-end tables should be split before assembly.

samples.tsv 沿用有参 expression/alignment 流程的 FASTQ 样本表契约:每个生物学样本一行,必需列为 sample_idread1,双端数据再添加 read2。同一次运行中的样本应使用同一种 layout;单端和双端混合时应先拆分。

Single-end

单端样本表

sample_id	read1	condition
WT_01	reads/WT_01.fq.gz	WT
KO_01	reads/KO_01.fq.gz	KO

Paired-end

双端样本表

sample_id	read1	read2	condition
WT_01	reads/WT_01_R1.fq.gz	reads/WT_01_R2.fq.gz	WT
KO_01	reads/KO_01_R1.fq.gz	reads/KO_01_R2.fq.gz	KO

Parameter reference

参数说明

ParameterRequiredDefaultMeaning
--samplesyesnoneFASTQ sample table. Relative paths are resolved from the table location.FASTQ 样本表。相对路径按表所在目录解释。
--outdiryesnoneDedicated output directory; existing outputs are refused unless --force is used.专用输出目录;已有目录默认拒绝,除非使用 --force
--threadsno2Threads for QC, assembly, and summaries. Trinity benefits from more CPU, but memory and I/O often become limiting.QC、组装和摘要使用的线程数。Trinity 可利用更多 CPU,但内存和 I/O 常成为限制。
--max-memoryno16GMemory limit passed to Trinity. Real 24-sample runs usually need a much larger value than toy smoke tests.传给 Trinity 的内存限制。真实 24 样本运行通常需要明显高于 toy smoke 的设置。
--assemblernotrinityAssembler choice. trinity is the default; rnaspades is an explicit alternate path.组装器选择。默认 trinityrnaspades 是显式备选路线。
--min-contig-lenno200Minimum transcript length retained after assembly. Raise it to remove more short fragments.组装后保留转录本的最小长度。调高可去除更多短片段。
--ss-lib-typenoautoStrand-specific library type for Trinity. Use a fixed value only when the library protocol is known.Trinity 链特异性文库类型。只有明确知道文库协议时才固定设置。
--busco-lineagenononeLocal BUSCO lineage path or name. The flow does not download lineage data during normal execution.本地 BUSCO lineage 路径或名称。流程正常运行时不下载 lineage 数据。
--trim, --skip-fastqc, --no-normalizenooffRead preprocessing and Trinity normalization switches. Use them according to library quality and resource planning.reads 预处理与 Trinity normalization 开关,应根据文库质量和资源计划选择。

Key outputs

关键输出

  • 03_results/transcripts/assembled_transcripts.fa
  • 03_results/transcripts/assembled_transcripts.filtered.fa
  • 03_results/assembly_qc/assembly_stats.tsv
  • 03_results/assembly_qc/busco_summary.tsv
  • 03_results/assembly_qc/read_support.tsv
  • 04_reports/multiqc_report.html, 04_reports/assembly_summary.tsv, 04_reports/commands.sh, run.manifest.json

How it connects

如何连接

The filtered transcript FASTA is the primary downstream contract. Pass it to rnaseq-denovo-expression-flow --transcripts and rnaseq-denovo-annotation-flow --transcripts. In rnaseq-standard-flow --mode denovo, this handoff is performed automatically.

过滤后的转录本 FASTA 是主要下游契约。将它传给 rnaseq-denovo-expression-flow --transcriptsrnaseq-denovo-annotation-flow --transcripts。在 rnaseq-standard-flow --mode denovo 中,这个交接会自动完成。