De novo route无参路线

rnaseq-denovo-annotation-flow

This flow gives assembled transcripts biological context. It predicts ORFs with TransDecoder, searches predicted proteins against a user-provided protein database with DIAMOND, writes transcript annotation and ID mapping tables, and can build GO-derived GMT/background files for enrichment when a protein-to-GO map is available.

这个流程为组装转录本提供生物学上下文。它使用 TransDecoder 预测 ORF，通过 DIAMOND 将预测蛋白搜索到用户提供的蛋白数据库，写出转录本注释和 ID 映射表；当存在 protein-to-GO 映射表时，还可以构建用于富集分析的 GO 派生 GMT/background 文件。

0.2.0-r1Homology annotation and GO resources同源注释和 GO 资源GitHub

Minimal command

最小命令

taf-rnaseq-denovo-annotation-flow \
  --transcripts denovo-assembly-out/03_results/transcripts/assembled_transcripts.filtered.fa \
  --protein-db proteins.faa \
  --go-map protein_go_map.tsv \
  --outdir denovo-annotation-out \
  --threads 8

Input requirements

输入要求

`--transcripts`

The assembled transcript FASTA. Transcript IDs are preserved as the feature space for annotation tables and optional GO gene sets.

组装转录本 FASTA。转录本 ID 会作为注释表和可选 GO 基因集的特征空间保留下来。

`--protein-db`

A local protein FASTA database, such as a curated proteome from a related species or a project-approved database. The flow does not download or bundle large annotation databases.

本地蛋白 FASTA 数据库，例如近缘物种的 curated proteome 或项目指定数据库。流程不会下载或打包大型注释数据库。

`--go-map`

Optional protein-to-GO mapping table. It allows best-hit protein IDs to be transferred into transcript-space GMT/background files for enrichment.

可选 protein-to-GO 映射表。它允许把 best-hit 蛋白 ID 的 GO 信息转移到转录本空间的 GMT/background，用于富集分析。

Interpretation boundary

解释边界

Homology evidence is not a manually curated gene model. Treat annotation and enrichment as support for hypotheses, not final functional proof.

同源证据不是人工精修 gene model。注释和富集应作为假设支持，而不是最终功能证明。

Parameter reference

参数说明

Parameter	Required	Default	Meaning
`--transcripts`	yes	none	Assembled transcript FASTA to annotate.需要注释的组装转录本 FASTA。
`--outdir`	yes	none	Dedicated output directory.专用输出目录。
`--protein-db`	recommended	none	Local protein FASTA for DIAMOND search. Without it, only ORF prediction and basic annotation structure are produced.DIAMOND 搜索使用的本地蛋白 FASTA。不提供时只生成 ORF 预测和基础注释结构。
`--go-map`	optional	none	Protein ID to GO term mapping. Required if the annotation flow should emit `denovo_go.gmt` and `denovo_background.tsv`.蛋白 ID 到 GO term 的映射。需要 annotation flow 生成 `denovo_go.gmt` 和 `denovo_background.tsv` 时提供。
`--threads`	no	2	Threads for TransDecoder support steps and DIAMOND search.TransDecoder 相关步骤和 DIAMOND 搜索使用的线程数。
`--min-orf-aa`	no	100	Minimum predicted ORF amino-acid length. Lower values retain more short ORFs; higher values reduce fragments.预测 ORF 的最小氨基酸长度。调低会保留更多短 ORF；调高可减少片段。
`--evalue`	no	1e-5	DIAMOND e-value cutoff for retained hits.DIAMOND 保留命中的 e-value 阈值。
`--max-target-seqs`	no	1	Number of target hits retained per query. The r1 report route is designed around best-hit style summaries.每个 query 保留的 target 命中数。r1 报告路线围绕 best-hit 风格摘要设计。

Key outputs

关键输出

03_results/coding/longest_orfs.pep
03_results/coding/cds.fa and 03_results/coding/proteins.fa
03_results/annotation/protein_hits.tsv
03_results/annotation/transcript_annotation.tsv
03_results/annotation/id_mapping.tsv
03_results/gene_sets/denovo_go.gmt and denovo_background.tsv when GO mapping is available
04_reports/annotation_summary.tsv, 04_reports/commands.sh, run.manifest.json

How it connects

如何连接

The annotation table and ID mapping are consumed by the final report. If denovo_go.gmt and denovo_background.tsv exist, they can be passed to rnaseq-enrichment-flow or automatically used by rnaseq-standard-flow --mode denovo. The ID space must match the DE result feature IDs.

注释表和 ID 映射会被最终报告读取。如果生成了 denovo_go.gmt 和 denovo_background.tsv，它们可以传给 rnaseq-enrichment-flow，也可以由 rnaseq-standard-flow --mode denovo 自动使用。ID 空间必须与 DE 结果中的特征 ID 匹配。