Inputs输入

Prepare analysis-ready inputs

准备可分析输入

The sample table names biological samples and points to local FASTQ files. The same table feeds reference expression, reference alignment, and de novo assembly/expression routes.

样本表给出生物学样本名并指向本地 FASTQ 文件。同一张表可以用于有参表达、有参比对,以及无参组装/表达路线。

sample_id	read1	read2	condition
WT_01	reads/WT_01_R1.fq.gz	reads/WT_01_R2.fq.gz	WT
KO_01	reads/KO_01_R1.fq.gz	reads/KO_01_R2.fq.gz	treated

Metadata元数据

Metadata drives the statistical design. Keep sample IDs identical to the expression matrix sample names and make the contrast explicit.

样本元数据决定统计设计。样本 ID 需要和表达矩阵中的样本名一致,并明确写出比较组。

sample	condition	batch
WT_01	control	A
KO_01	treated	A

Reference genome and annotation

参考基因组和注释

Use genome FASTA and annotation from the same release. Sequence IDs in the annotation must match the FASTA headers used by the index flow. These files are used only in reference mode.

基因组 FASTA 和注释文件应来自同一个数据库版本。注释中的序列 ID 必须和参考流程读取的 FASTA header 对齐。这些文件只用于有参模式。

Gene sets and background

Gene sets 和背景基因

Reference-mode enrichment uses offline GMT files. A background gene list is recommended for ORA so the tested universe matches the experiment and annotation space.

有参模式富集分析使用离线 GMT 文件。ORA 推荐提供背景基因列表,让被测试背景和实验/注释空间一致。

De novo protein database and GO map

无参蛋白数据库和 GO 映射

De novo analysis does not receive known gene models from a genome annotation. Functional interpretation therefore depends on homology resources selected by the analyst. --protein-db should be a local protein FASTA. --go-map is a two-column or tabular mapping from protein IDs to GO IDs, used to build transcript-space GMT and background files after best-hit annotation.

无参分析没有来自基因组注释的已知 gene model,因此功能解释依赖分析者选择的同源资源。--protein-db 应是本地蛋白 FASTA。--go-map 是 protein ID 到 GO ID 的两列或表格映射,用于在 best-hit 注释之后构建转录本空间 GMT 和 background。

Protein FASTA

蛋白 FASTA

>P12345
MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPR...

GO map

GO 映射

protein_id	go_id
P12345	GO:0006412
P12345	GO:0005737

These files are not raw sequencing outputs. They are biological reference resources, so their species, database version, license, and provenance should be recorded in the project notes.

这些文件不是测序下机输出,而是生物学参考资源。因此它们的物种、数据库版本、许可和来源应记录在项目说明中。