De novo route无参路线
rnaseq-denovo-annotation-flow
This flow gives assembled transcripts biological context. It predicts ORFs with TransDecoder, searches predicted proteins against a user-provided protein database with DIAMOND, writes transcript annotation and ID mapping tables, and can build GO-derived GMT/background files for enrichment when a protein-to-GO map is available.
这个流程为组装转录本提供生物学上下文。它使用 TransDecoder 预测 ORF,通过 DIAMOND 将预测蛋白搜索到用户提供的蛋白数据库,写出转录本注释和 ID 映射表;当存在 protein-to-GO 映射表时,还可以构建用于富集分析的 GO 派生 GMT/background 文件。
Minimal command
最小命令
taf-rnaseq-denovo-annotation-flow \
--transcripts denovo-assembly-out/03_results/transcripts/assembled_transcripts.filtered.fa \
--protein-db proteins.faa \
--go-map protein_go_map.tsv \
--outdir denovo-annotation-out \
--threads 8
Input requirements
输入要求
--transcripts
The assembled transcript FASTA. Transcript IDs are preserved as the feature space for annotation tables and optional GO gene sets.
组装转录本 FASTA。转录本 ID 会作为注释表和可选 GO 基因集的特征空间保留下来。
--protein-db
A local protein FASTA database, such as a curated proteome from a related species or a project-approved database. The flow does not download or bundle large annotation databases.
本地蛋白 FASTA 数据库,例如近缘物种的 curated proteome 或项目指定数据库。流程不会下载或打包大型注释数据库。
--go-map
Optional protein-to-GO mapping table. It allows best-hit protein IDs to be transferred into transcript-space GMT/background files for enrichment.
可选 protein-to-GO 映射表。它允许把 best-hit 蛋白 ID 的 GO 信息转移到转录本空间的 GMT/background,用于富集分析。
Interpretation boundary
解释边界
Homology evidence is not a manually curated gene model. Treat annotation and enrichment as support for hypotheses, not final functional proof.
同源证据不是人工精修 gene model。注释和富集应作为假设支持,而不是最终功能证明。
Parameter reference
参数说明
| Parameter | Required | Default | Meaning |
|---|
--transcripts | yes | none | Assembled transcript FASTA to annotate.需要注释的组装转录本 FASTA。 |
--outdir | yes | none | Dedicated output directory.专用输出目录。 |
--protein-db | recommended | none | Local protein FASTA for DIAMOND search. Without it, only ORF prediction and basic annotation structure are produced.DIAMOND 搜索使用的本地蛋白 FASTA。不提供时只生成 ORF 预测和基础注释结构。 |
--go-map | optional | none | Protein ID to GO term mapping. Required if the annotation flow should emit denovo_go.gmt and denovo_background.tsv.蛋白 ID 到 GO term 的映射。需要 annotation flow 生成 denovo_go.gmt 和 denovo_background.tsv 时提供。 |
--threads | no | 2 | Threads for TransDecoder support steps and DIAMOND search.TransDecoder 相关步骤和 DIAMOND 搜索使用的线程数。 |
--min-orf-aa | no | 100 | Minimum predicted ORF amino-acid length. Lower values retain more short ORFs; higher values reduce fragments.预测 ORF 的最小氨基酸长度。调低会保留更多短 ORF;调高可减少片段。 |
--evalue | no | 1e-5 | DIAMOND e-value cutoff for retained hits.DIAMOND 保留命中的 e-value 阈值。 |
--max-target-seqs | no | 1 | Number of target hits retained per query. The r1 report route is designed around best-hit style summaries.每个 query 保留的 target 命中数。r1 报告路线围绕 best-hit 风格摘要设计。 |
Key outputs
关键输出
03_results/coding/longest_orfs.pep03_results/coding/cds.fa and 03_results/coding/proteins.fa03_results/annotation/protein_hits.tsv03_results/annotation/transcript_annotation.tsv03_results/annotation/id_mapping.tsv03_results/gene_sets/denovo_go.gmt and denovo_background.tsv when GO mapping is available04_reports/annotation_summary.tsv, 04_reports/commands.sh, run.manifest.json
How it connects
如何连接
The annotation table and ID mapping are consumed by the final report. If denovo_go.gmt and denovo_background.tsv exist, they can be passed to rnaseq-enrichment-flow or automatically used by rnaseq-standard-flow --mode denovo. The ID space must match the DE result feature IDs.
注释表和 ID 映射会被最终报告读取。如果生成了 denovo_go.gmt 和 denovo_background.tsv,它们可以传给 rnaseq-enrichment-flow,也可以由 rnaseq-standard-flow --mode denovo 自动使用。ID 空间必须与 DE 结果中的特征 ID 匹配。