A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
/home/kyhan/test/rna-seq/yeast-standard-out/03_results/alignment_qc
General Statistics
| Sample Name | 5'-3' bias | M Aligned | Exonic | Intronic | Intergenic | Overlapping Exon | Reads | Reads mapped | % Reads mapped |
|---|---|---|---|---|---|---|---|---|---|
| SNF2KO_01 | 1.25 | 0.5M | 0.4M | 0.0M | 0.0M | 0.0M | 0.6M | 0.5M | 97.5% |
| SNF2KO_02 | 1.82 | 0.5M | 0.3M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 97.2% |
| SNF2KO_03 | 2.54 | 0.5M | 0.4M | 0.0M | 0.0M | 0.0M | 0.6M | 0.5M | 97.1% |
| SNF2KO_04 | 1.07 | 0.5M | 0.3M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 98.6% |
| SNF2KO_05 | 1.17 | 0.5M | 0.4M | 0.0M | 0.0M | 0.0M | 0.6M | 0.5M | 97.0% |
| SNF2KO_06 | 1.70 | 0.5M | 0.4M | 0.0M | 0.0M | 0.0M | 0.6M | 0.5M | 96.9% |
| SNF2KO_07 | 1.64 | 0.5M | 0.4M | 0.0M | 0.0M | 0.0M | 0.6M | 0.5M | 96.8% |
| SNF2KO_08 | 1.70 | 0.5M | 0.3M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 96.8% |
| SNF2KO_09 | 2.11 | 0.5M | 0.4M | 0.0M | 0.0M | 0.0M | 0.6M | 0.5M | 97.1% |
| SNF2KO_10 | 3.11 | 0.5M | 0.4M | 0.0M | 0.0M | 0.0M | 0.6M | 0.5M | 97.1% |
| SNF2KO_11 | 1.12 | 0.5M | 0.4M | 0.0M | 0.0M | 0.0M | 0.6M | 0.5M | 96.5% |
| SNF2KO_12 | 1.00 | 0.5M | 0.4M | 0.0M | 0.0M | 0.0M | 0.6M | 0.5M | 97.3% |
| WT_01 | 3.23 | 0.5M | 0.3M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 96.9% |
| WT_02 | 1.51 | 0.5M | 0.3M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 97.4% |
| WT_03 | 3.66 | 0.5M | 0.3M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 96.5% |
| WT_04 | 1.17 | 0.5M | 0.4M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 97.5% |
| WT_05 | 1.93 | 0.5M | 0.4M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 97.3% |
| WT_06 | 1.57 | 0.5M | 0.4M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 97.8% |
| WT_07 | 1.82 | 0.5M | 0.4M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 97.4% |
| WT_08 | 2.32 | 0.5M | 0.3M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 97.4% |
| WT_09 | 10.15 | 0.5M | 0.3M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 97.6% |
| WT_10 | 1.60 | 0.5M | 0.3M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 97.2% |
| WT_11 | 6.38 | 0.5M | 0.3M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 97.5% |
| WT_12 | 1.84 | 0.5M | 0.3M | 0.0M | 0.0M | 0.0M | 0.6M | 0.6M | 97.0% |
QualiMap
2.3
Quality control of alignment data and its derivatives like feature counts.http://qualimap.bioinfo.cipf.esDOI: 10.1093/bioinformatics/btv566; 10.1093/bioinformatics/bts503
Genomic origin of reads
Classification of mapped reads as originating in exonic, intronic or intergenic regions. These can be displayed as either the number or percentage of mapped reads.
There are currently three main approaches to map reads to transcripts in an RNA-seq experiment: mapping reads to a reference genome to identify expressed transcripts that are annotated (and discover those that are unknown), mapping reads to a reference transcriptome, and de novo assembly of transcript sequences (Conesa et al. 2016).
For RNA-seq QC analysis, QualiMap can be used to assess alignments produced by the first of these approaches. For input, it requires a GTF annotation file along with a reference genome, which can be used to reconstruct the exon structure of known transcripts. This allows mapped reads to be grouped by whether they originate in an exonic region (for QualiMap, this may include 5′ and 3′ UTR regions as well as protein-coding exons), an intron, or an intergenic region (see the Qualimap 2 documentation).
The inferred genomic origins of RNA-seq reads are presented here as a bar graph showing either the number or percentage of mapped reads in each read dataset that have been assigned to each type of genomic region. This graph can be used to assess the proportion of useful reads in an RNA-seq experiment. That proportion can be reduced by the presence of intron sequences, especially if depletion of ribosomal RNA was used during sample preparation (Sims et al. 2014). It can also be reduced by off-target transcripts, which are detected in greater numbers at the sequencing depths needed to detect poorly-expressed transcripts (Tarazona et al. 2011).
Gene Coverage Profile
Mean distribution of coverage depth across the length of all mapped transcripts.
There are currently three main approaches to map reads to transcripts in an RNA-seq experiment: mapping reads to a reference genome to identify expressed transcripts that are annotated (and discover those that are unknown), mapping reads to a reference transcriptome, and de novo assembly of transcript sequences (Conesa et al. 2016).
For RNA-seq QC analysis, QualiMap can be used to assess alignments produced by the first of these approaches. For input, it requires a GTF annotation file along with a reference genome, which can be used to reconstruct the exon structure of known transcripts. QualiMap uses this information to calculate the depth of coverage along the length of each annotated transcript. For a set of reads mapped to a transcript, the depth of coverage at a given base position is the number of high-quality reads that map to the transcript at that position (Sims et al. 2014).
QualiMap calculates coverage depth at every base position of each annotated transcript. To enable meaningful comparison between transcripts, base positions are rescaled to relative positions expressed as percentage distance along each transcript (0%, 1%, …, 99%). For the set of transcripts with at least one mapped read, QualiMap plots the cumulative mapped-read depth (y-axis) at each relative transcript position (x-axis). This plot shows the gene coverage profile across all mapped transcripts for each read dataset. It provides a visual way to assess positional biases, such as an accumulation of mapped reads at the 3′ end of transcripts, which may indicate poor RNA quality in the original sample (Conesa et al. 2016).
The Normalised plot is calculated by MultiQC to enable comparison of samples with varying sequencing depth. The cumulative mapped-read depth at each position across the averaged transcript position are divided by the total for that sample across the entire averaged transcript.
RSeQC
Evaluates high throughput RNA-seq data.http://rseqc.sourceforge.netDOI: 10.1093/bioinformatics/bts356
Read Distribution
Read Distribution calculates how mapped reads are distributed over genome features.
Infer experiment
Infer experiment counts the percentage of reads and read pairs that match the strandedness of overlapping transcripts. It can be used to infer whether RNA-seq library preps are stranded (sense or antisense).
Bam Stat
All numbers reported in millions.
Samtools
Toolkit for interacting with BAM/CRAM files.http://www.htslib.orgDOI: 10.1093/bioinformatics/btp352
Flagstat
This module parses the output from samtools flagstat
Flagstat: Percentage of total
This module parses the output from samtools flagstat
Mapped reads per contig
The samtools idxstats tool counts the number of mapped reads per chromosome / contig. Chromosomes with < 0.1% of the total aligned reads are omitted from this plot.
Software Versions
Software Versions lists versions of software tools extracted from file contents.
| Group | Software | Version |
|---|---|---|
| QualiMap | RNASeq | 2.3 |