TheSlide.ru

Genome assembly with SPAdes

Содержание

1. Genome assembly with SPAdes
2. Introduction
3. Why to assemble?
4. Why to assemble?Sequencing data Billions of short readsSequencing errorsContaminants
5. Why to assemble?Sequencing data Billions of short
6. Assembly basics
7. Assembly in a perfect world
8. Assembly in real world
9. De novo whole genome assembly
10. De novo whole genome assembly
11. Genomic repeatsTATTCTTCCACGTAGGGCCTTCCACGCTTCG
12. Genomic repeatsTATTCTTC CTTCCACG
13. Genomic repeatsTATTCTTC CTTCCACG
14. Genomic repeatsTATTCTTCCACGTAGGGGCCTTCCACGCTTCGTATTCTTCCACGCTTCGGGCCTTCCACGTAGG
15. Genomic repeatsTATTCTTCCACGTAGG
16. Genomic repeatsTATTCTTCCACGTAGG
17. SPAdes assembler
18. SPAdes first stepsspades.py
19. SPAdes first stepsspades.pyspades.py --helpspades.py --test
20. SPAdes first stepsspades.pyspades.py --helpspades.py --test-o
21. Input data formatsFASTA: .fasta / .faFASTQ: .fastq / .fqGzipped: .gz
22. Input data optionsUnpaired readsIllumina unpaired-s single.fastq-s single1.fastq -s single2.fastq ...
23. Input data optionsPaired-end readsInterlaced pairs in one file>left_read_idACGTGCAGG…>right_read_idGCTTCGAGG…Separate filesfile1.fastq file2.fastq>left_read_id >right_read_idACGTGCAGG… GCTTCGAGG…
24. Input data optionsPaired-end readsInterlaced pairs in one file--pe1-12 file.fastqSeparate files--pe1-1 file1.fastq --pe1-2 file2.fastq
25. Input data optionsPaired-end readsInterlaced pairs in one file--pe1-12 file.fastqSeparate files--pe1-1 file1.fastq --pe1-2 file2.fastq --pe1-s unpaired.fastq
26. SPAdes performance optionsNumber of threads-t NMaximal available RAM (GB)SPAdes will terminate if exceeded-m M
27. Pipeline optionsRun only assembler (input reads are already corrected or quality-trimmed)--only-assembler
28. Input data optionsMate-pair reads Cannot be used
29. Hybrid assembly optionsPacBio CLR --pacbio pb.fastqOxford Nanopore reads--nanopore nanopore_reads.fastq
30. Restarting SPAdesSPAdes / system crashed--continue -o your_output_dir
31. Genome assembly evaluation with QUASTCenter for Algorithmic BiotechnologySPbU
32. In realitySPAdes ABySS IDBA Ray Velvet ….
33. Which assembler to use?ABySSALLPATHS-LGCLCIDBA-UDMaSuRCAMIRARaySOAPdenovoSPAdesVelvetand many more...
34. Which assembler to use?Different technologies (Illumina, 454,
35. There is no best assembler
36. Which assembler to use?Assemblathon 1 & 2Simulated
37. Assembly evaluationBasic evaluationNo extra inputVery quickReference-based evaluationA
38. Basic statisticsOnly assemblies are needed (no additional input)Very fast to compute
39. Contig sizesNumber of contigs
40. Contig sizesNumber of contigsNumber of large contigs (i.e. > 1000 bp)
41. Contig sizesNumber of contigsNumber of large contigs (i.e. > 1000 bp)Largest contig length
42. Contig sizesNumber of contigsNumber of large contigs (i.e. > 1000 bp)Largest contig lengthTotal assembly length
43. N50The maximum length X for which the
44. N50The maximum length X for which the
45. N50The maximum length X for which the
46. N50The maximum length X for which the
47. N50The maximum length X for which the
48. N50The maximum length X for which the
49. N50The maximum length X for which the
50. N50The maximum length X for which the
51. L50The minimum number X such that X
52. L50The minimum number X such that X
53. N50-variationsN25, N75L25, L75N25 = 100, N75 = 40L25 = 1, L75 = 5
54. N50-variationsN25, N75L25, L75N25 = 100, N75 = 40L25 = 1, L75 = 5
55. N50-variationsN25, N75L25, L50, L75
56. N50-variationsN25, N75L25, L50, L75Nx, Lx
57. OtherNumber of N’s per 100 kbp
58. OtherNumber of N’s per 100 kbpGC %
59. OtherNumber of N’s per 100 kbpGC %Distributions of GC % in small windows:GC=37GC=44GC=41GC=...
60. Other
61. Reference-based metricsA lot of metricsAccurate assessment
62. Basic reference statisticsReference lengthReference GC %Number of chromosomes
63. Basic reference statisticsNGx, LGxNG50 = 40LG50 = 4
64. Basic reference statisticsNGx, LGxNG50 = 40LG50 = 4
65. Basic reference statisticsNGx, LGxNG50 = 40 40LG50 = 4 4
66. Alignment statisticsAssemblyReference genome
67. Alignment statistics
68. Genome fraction %Alignment statistics
69. Genome fraction %Duplication ratioAlignment statistics
70. Genome fraction %Duplication ratioNumber of gapsAlignment statistics
71. Genome fraction %Duplication ratioNumber of gapsLargest alignment lengthAlignment statistics
72. Genome fraction %Duplication ratioNumber of gapsLargest alignment lengthNumber of unaligned contigs (full & partial)Alignment statistics
73. Genome fraction %Duplication ratioNumber of gapsLargest alignment
74. Alignment statisticsGenome fraction %Duplication ratioNumber of gapsLargest
75. MisassembliesContigReference genomeChromosome 1Chromosome 2
76. MisassembliesContigReference genomeChromosome 1Chromosome 2Relocation> 1kbpChromosome 2Chromosome 1InversionChromosome 2Chromosome 1TranslocationChromosome 2Chromosome 1
77. There is no best metricNB!
78. NA50Assembly AAssembly B200100
79. NA50Assembly AReference genomeAssembly B200100
80. NA50Assembly AReference genomeAssembly B200100N50 = 200# misassemblies = 2N50 = 100# misassemblies = 0
81. NA50Assembly AReference genomeAssembly B200100N50 = 200# misassemblies
82. QUality ASsesment Tool for Genome Assemblies
83. QUASTAssembly statistics Basic statisticsReference-based evaluationSimple de novo evaluationAvailable as a web-based and a command line toolquast.sf.net
84. QUAST: console toolquast.pyquast.py --help
85. QUAST basicsquast.pyquast.py --helpquast.py contigs.fastaquast.py [options] contigs.fastaquast.py -o out_dir contigs.fasta
86. Reference optionsReference genome-R reference.fastaGene annotation-G genes.gff Operon annotation-O operons.gff
87. QUAST outputReports in different formatsPlain text tableTab
88. Contig alignment viewerAll alignments for each contigMisassembly details Contig ordering along the genomeOverlaps / gaps
89. Contig alignment viewer
90. Contig size viewerContigs ordered from longest to
91. Contig size viewer
92. De novo evaluation
93. Read-based statisticsNumber of aligned/unaligned reads % of assembly covered by reads
94. Read-based statisticsNumber of aligned/unaligned reads % of
95. Annotation-based statisticsNumber of ORFs
96. Annotation-based statisticsNumber of ORFsNumber of gene/operon-like regionsGeneMarkS (Borodovsky et al.)GlimmerHMM (Majoros et al.)
97. Annotation-based statisticsNumber of ORFsNumber of gene/operon-like regionsGeneMarkS
98. Thank you!Questions?
99. Скачать презентанцию

Introduction

Слайды и текст этой презентации

Слайд 1Genome assembly with SPAdes
Center for Algorithmic Biotechnology
SPbU

Genome assembly with SPAdesCenter for Algorithmic BiotechnologySPbU

Слайд 2Introduction

Introduction

Слайд 3Why to assemble?

Why to assemble?

Слайд 4Why to assemble?
Sequencing data
Billions of short reads
Sequencing errors
Contaminants

Why to assemble?Sequencing data Billions of short readsSequencing errorsContaminants

Слайд 5Why to assemble?
Sequencing data
Billions of short reads
Sequencing errors
Contaminants

Assembly
Corrects sequencing

errors
Much longer sequences
Each genomic region is presented only once
May introduce

errors

Hard to perform analysis

Why to assemble?Sequencing data Billions of short readsSequencing errorsContaminantsAssemblyCorrects sequencing errorsMuch longer sequencesEach genomic region is presented

Слайд 6Assembly basics

Assembly basics

Слайд 7Assembly in a perfect world

Assembly in a perfect world

Слайд 8Assembly in real world

Assembly in real world

Слайд 9De novo whole genome assembly

De novo whole genome assembly

Слайд 10De novo whole genome assembly

De novo whole genome assembly

Слайд 11Genomic repeats

TATTCTTCCACGTAGGGCCTTCCACGCTTCG

Genomic repeatsTATTCTTCCACGTAGGGCCTTCCACGCTTCG

Слайд 12Genomic repeats
TATTCTTC
CTTCCACG

CACGTAGG

GGCCTTCC
CTTCCACG
CACGCTTCG
TATTCTTCCACGTAGGGCCTTCCACGCTTCG

Genomic repeatsTATTCTTC CTTCCACG CACGTAGG

Слайд 13Genomic repeats
TATTCTTC
CTTCCACG

CACGTAGG

GGCCTTCC
CTTCCACG
CACGCTTCG

Genomic repeatsTATTCTTC CTTCCACG CACGTAGG

Слайд 14Genomic repeats
TATTCTTCCACGTAGG
GGCCTTCCACGCTTCG

TATTCTTCCACGCTTCG
GGCCTTCCACGTAGG

Genomic repeatsTATTCTTCCACGTAGGGGCCTTCCACGCTTCGTATTCTTCCACGCTTCGGGCCTTCCACGTAGG

Слайд 15Genomic repeats

TATTCTTCCACGTAGG

ACGTAGGGCCTT

GCCTTCCACGCTTCG
TATTCTTCCACGTAGGGCCTTCCACGCTTCG

Genomic repeatsTATTCTTCCACGTAGG ACGTAGGGCCTT

Слайд 16Genomic repeats

TATTCTTCCACGTAGG

ACGTAGGGCCTT

GCCTTCCACGCTTCG

Genomic repeatsTATTCTTCCACGTAGG ACGTAGGGCCTT

Слайд 17SPAdes assembler

SPAdes assembler

Слайд 18SPAdes first steps
spades.py

SPAdes first stepsspades.py

Слайд 19SPAdes first steps
spades.py
spades.py --help
spades.py --test

SPAdes first stepsspades.pyspades.py --helpspades.py --test

Слайд 20SPAdes first steps
spades.py
spades.py --help
spades.py --test
-o

SPAdes first stepsspades.pyspades.py --helpspades.py --test-o

Слайд 21Input data formats
FASTA: .fasta / .fa
FASTQ: .fastq / .fq
Gzipped: .gz

Input data formatsFASTA: .fasta / .faFASTQ: .fastq / .fqGzipped: .gz

Слайд 22Input data options
Unpaired reads
Illumina unpaired
-s single.fastq
-s single1.fastq -s single2.fastq ...

Input data optionsUnpaired readsIllumina unpaired-s single.fastq-s single1.fastq -s single2.fastq ...

Слайд 23Input data options
Paired-end reads
Interlaced pairs in one file
>left_read_id
ACGTGCAGG…
>right_read_id
GCTTCGAGG…

Separate files
file1.fastq file2.fastq
>left_read_id >right_read_id
ACGTGCAGG… GCTTCGAGG…

Input data optionsPaired-end readsInterlaced pairs in one file>left_read_idACGTGCAGG…>right_read_idGCTTCGAGG…Separate filesfile1.fastq file2.fastq>left_read_id >right_read_idACGTGCAGG… GCTTCGAGG…

Слайд 24Input data options
Paired-end reads
Interlaced pairs in one file
--pe1-12 file.fastq

Separate files
--pe1-1

file1.fastq --pe1-2 file2.fastq

Input data optionsPaired-end readsInterlaced pairs in one file--pe1-12 file.fastqSeparate files--pe1-1 file1.fastq --pe1-2 file2.fastq

Слайд 25Input data options
Paired-end reads
Interlaced pairs in one file
--pe1-12 file.fastq

Separate files
--pe1-1

file1.fastq --pe1-2 file2.fastq
--pe1-s unpaired.fastq

Input data optionsPaired-end readsInterlaced pairs in one file--pe1-12 file.fastqSeparate files--pe1-1 file1.fastq --pe1-2 file2.fastq --pe1-s unpaired.fastq

Слайд 26SPAdes performance options
Number of threads
-t N
Maximal available RAM (GB)
SPAdes will

terminate if exceeded
-m M

SPAdes performance optionsNumber of threads-t NMaximal available RAM (GB)SPAdes will terminate if exceeded-m M

Слайд 27Pipeline options
Run only assembler (input reads are already corrected or

quality-trimmed)
--only-assembler

Pipeline optionsRun only assembler (input reads are already corrected or quality-trimmed)--only-assembler

Слайд 28Input data options
Mate-pair reads
Cannot be used separately
Interlaced pairs in

one file
--mp1-12 mp.fastq
Separate files
--mp1-1 mp1.fastq --mp1-2 mp2.fastq

Input data optionsMate-pair reads Cannot be used separatelyInterlaced pairs in one file--mp1-12 mp.fastqSeparate files--mp1-1 mp1.fastq --mp1-2 mp2.fastq

Слайд 29Hybrid assembly options
PacBio CLR
--pacbio pb.fastq
Oxford Nanopore reads
--nanopore nanopore_reads.fastq

Hybrid assembly optionsPacBio CLR --pacbio pb.fastqOxford Nanopore reads--nanopore nanopore_reads.fastq

Слайд 30Restarting SPAdes
SPAdes / system crashed
--continue -o your_output_dir

Restarting SPAdesSPAdes / system crashed--continue -o your_output_dir

Слайд 31Genome assembly evaluation with QUAST
Center for Algorithmic Biotechnology
SPbU

Genome assembly evaluation with QUASTCenter for Algorithmic BiotechnologySPbU

Слайд 32In reality
SPAdes
ABySS
IDBA
Ray
Velvet
….

In realitySPAdes ABySS IDBA Ray Velvet ….

Слайд 33Which assembler to use?
ABySS
ALLPATHS-LG
CLC
IDBA-UD
MaSuRCA
MIRA
Ray
SOAPdenovo
SPAdes
Velvet
and many more...

Which assembler to use?ABySSALLPATHS-LGCLCIDBA-UDMaSuRCAMIRARaySOAPdenovoSPAdesVelvetand many more...

Слайд 34Which assembler to use?
Different technologies (Illumina, 454, IonTorrent, ...)
Genome type

and size (bacteria, insects, mammals, plants, ...)
Type of prepared libraries

(single reads, paired-end, mate-pairs, combinations)
Type of data (multicell, metagenomic, single-cell)

Which assembler to use?Different technologies (Illumina, 454, IonTorrent, ...)Genome type and size (bacteria, insects, mammals, plants, ...)Type

Слайд 35There is no best assembler

There is no best assembler

Слайд 36Which assembler to use?
Assemblathon 1 & 2
Simulated and real datasets
More

than 30 teams competing
Independent studies
Papers (GAGE, GAGE-B, GABenchToB)
Web-sites (nucleotid.es, …)
Surveys

Genome assembly evaluation tools
QUAST
GAGE

Which assembler to use?Assemblathon 1 & 2Simulated and real datasetsMore than 30 teams competingIndependent studiesPapers (GAGE, GAGE-B,

Слайд 37Assembly evaluation
Basic evaluation
No extra input
Very quick
Reference-based evaluation
A lot of metrics
Very

accurate
De novo evaluation
Advanced analysis of de novo assemblies

Assembly evaluationBasic evaluationNo extra inputVery quickReference-based evaluationA lot of metricsVery accurateDe novo evaluationAdvanced analysis of de novo

Слайд 38Basic statistics
Only assemblies are needed (no additional input)
Very fast to

compute

Basic statisticsOnly assemblies are needed (no additional input)Very fast to compute

Слайд 39Contig sizes
Number of contigs

Contig sizesNumber of contigs

Слайд 40Contig sizes
Number of contigs
Number of large contigs (i.e. > 1000

Contig sizesNumber of contigsNumber of large contigs (i.e. > 1000 bp)

Слайд 41Contig sizes
Number of contigs
Number of large contigs (i.e. > 1000

bp)
Largest contig length

Contig sizesNumber of contigsNumber of large contigs (i.e. > 1000 bp)Largest contig length

Слайд 42Contig sizes
Number of contigs
Number of large contigs (i.e. > 1000

bp)
Largest contig length
Total assembly length

Contig sizesNumber of contigsNumber of large contigs (i.e. > 1000 bp)Largest contig lengthTotal assembly length

Слайд 43N50
The maximum length X for which the collection of all

contigs of length >= X covers at least 50% of

the assembly

N50The maximum length X for which the collection of all contigs of length >= X covers at

Слайд 44N50
The maximum length X for which the collection of all

contigs of length >= X covers at least 50% of

the assembly

N50The maximum length X for which the collection of all contigs of length >= X covers at

Слайд 45N50
The maximum length X for which the collection of all

contigs of length >= X covers at least 50% of

the assembly

N50The maximum length X for which the collection of all contigs of length >= X covers at

Слайд 46N50
The maximum length X for which the collection of all

contigs of length >= X covers at least 50% of

the assembly

N50The maximum length X for which the collection of all contigs of length >= X covers at

Слайд 47N50
The maximum length X for which the collection of all

contigs of length >= X covers at least 50% of

the assembly

N50The maximum length X for which the collection of all contigs of length >= X covers at

Слайд 48N50
The maximum length X for which the collection of all

contigs of length >= X covers at least 50% of

the assembly

N50The maximum length X for which the collection of all contigs of length >= X covers at

Слайд 49N50
The maximum length X for which the collection of all

contigs of length >= X covers at least 50% of

the assembly

N50The maximum length X for which the collection of all contigs of length >= X covers at

Слайд 50N50
The maximum length X for which the collection of all

contigs of length >= X covers at least 50% of

the assembly

N50 = 60

N50The maximum length X for which the collection of all contigs of length >= X covers at

Слайд 51L50
The minimum number X such that X longest contigs cover

at least 50% of the assembly

L50 = 3

L50The minimum number X such that X longest contigs cover at least 50% of the assemblyL50 =

Слайд 52L50
The minimum number X such that X longest contigs cover

at least 50% of the assembly

L50 = 3

L50The minimum number X such that X longest contigs cover at least 50% of the assemblyL50 =

Слайд 53N50-variations
N25, N75
L25, L75

N25 = 100, N75 = 40
L25 = 1,

L75 = 5

N50-variationsN25, N75L25, L75N25 = 100, N75 = 40L25 = 1, L75 = 5

Слайд 54N50-variations
N25, N75
L25, L75

N25 = 100, N75 = 40
L25 = 1,

L75 = 5

N50-variationsN25, N75L25, L75N25 = 100, N75 = 40L25 = 1, L75 = 5

Слайд 55N50-variations
N25, N75
L25, L50, L75

N50-variationsN25, N75L25, L50, L75

Слайд 56N50-variations
N25, N75
L25, L50, L75
Nx, Lx

N50-variationsN25, N75L25, L50, L75Nx, Lx

Слайд 57Other
Number of N’s per 100 kbp

OtherNumber of N’s per 100 kbp

Слайд 58Other
Number of N’s per 100 kbp
GC %

OtherNumber of N’s per 100 kbpGC %

Слайд 59Other
Number of N’s per 100 kbp
GC %
Distributions of GC %

in small windows:

GC=37
GC=44
GC=41
GC=...

OtherNumber of N’s per 100 kbpGC %Distributions of GC % in small windows:GC=37GC=44GC=41GC=...

Слайд 60Other

Other

Слайд 61Reference-based metrics
A lot of metrics
Accurate assessment

Reference-based metricsA lot of metricsAccurate assessment

Слайд 62Basic reference statistics
Reference length
Reference GC %
Number of chromosomes

Basic reference statisticsReference lengthReference GC %Number of chromosomes

Слайд 63Basic reference statistics
NGx, LGx

NG50 = 40
LG50 = 4

Basic reference statisticsNGx, LGxNG50 = 40LG50 = 4

Слайд 64Basic reference statistics
NGx, LGx

NG50 = 40
LG50 = 4

Basic reference statisticsNGx, LGxNG50 = 40LG50 = 4

Слайд 65Basic reference statistics
NGx, LGx

NG50 = 40 40
LG50 = 4 4

Basic reference statisticsNGx, LGxNG50 = 40 40LG50 = 4 4

Слайд 66Alignment statistics

Assembly
Reference genome

Alignment statisticsAssemblyReference genome

Слайд 67Alignment statistics

Alignment statistics

Слайд 68
Genome fraction %

Alignment statistics

$Genome fraction %Alignment statistics$

Слайд 69
Genome fraction %
Duplication ratio

Alignment statistics

$Genome fraction %Duplication ratioAlignment statistics$

Слайд 70
Genome fraction %
Duplication ratio
Number of gaps

Alignment statistics

$Genome fraction %Duplication ratioNumber of gapsAlignment statistics$

Слайд 71Genome fraction %
Duplication ratio
Number of gaps
Largest alignment length

Alignment statistics

$Genome fraction %Duplication ratioNumber of gapsLargest alignment lengthAlignment statistics$

Слайд 72Genome fraction %
Duplication ratio
Number of gaps
Largest alignment length
Number of unaligned

contigs (full & partial)

Alignment statistics

$Genome fraction %Duplication ratioNumber of gapsLargest alignment lengthNumber of unaligned contigs (full & partial)Alignment statistics$

Слайд 73Genome fraction %
Duplication ratio
Number of gaps
Largest alignment length
Number of unaligned

contigs (full & partial)
Number of mismatches/indels per 100 kbp
Alignment statistics

$Genome fraction %Duplication ratioNumber of gapsLargest alignment lengthNumber of unaligned contigs (full & partial)Number of mismatches/indels per$

Слайд 74Alignment statistics
Genome fraction %
Duplication ratio
Number of gaps
Largest alignment length
Number of

unaligned contigs (full & partial)
Number of mismatches/indels per 100 kbp
Number

of genes/operons (full & partial)

$Alignment statisticsGenome fraction %Duplication ratioNumber of gapsLargest alignment lengthNumber of unaligned contigs (full & partial)Number of mismatches/indels$

Слайд 75Misassemblies
Contig
Reference genome
Chromosome 1
Chromosome 2

MisassembliesContigReference genomeChromosome 1Chromosome 2

Слайд 76Misassemblies
Contig
Reference genome
Chromosome 1
Chromosome 2
Relocation
> 1kbp
Chromosome 2
Chromosome 1

Inversion
Chromosome 2
Chromosome 1

Translocation
Chromosome 2
Chromosome

MisassembliesContigReference genomeChromosome 1Chromosome 2Relocation> 1kbpChromosome 2Chromosome 1InversionChromosome 2Chromosome 1TranslocationChromosome 2Chromosome 1

Слайд 77There is no best metric
NB!

There is no best metricNB!

Слайд 78NA50
Assembly A
Assembly B
200

100

NA50Assembly AAssembly B200100

Слайд 79NA50
Assembly A
Reference genome
Assembly B
200

100

NA50Assembly AReference genomeAssembly B200100

Слайд 80NA50
Assembly A
Reference genome
Assembly B
200

100

N50 = 200
# misassemblies = 2

N50 =

100
# misassemblies = 0

NA50Assembly AReference genomeAssembly B200100N50 = 200# misassemblies = 2N50 = 100# misassemblies = 0

Слайд 81NA50
Assembly A
Reference genome
Assembly B
200

100

N50 = 200
# misassemblies = 2
NA50 =

100

N50 = 100
# misassemblies = 0
NA50 = 100

NA50Assembly AReference genomeAssembly B200100N50 = 200# misassemblies = 2NA50 = 100N50 = 100# misassemblies = 0NA50 =

Слайд 82QUality ASsesment Tool
for Genome Assemblies

QUality ASsesment Tool for Genome Assemblies

Слайд 83QUAST
Assembly statistics
Basic statistics
Reference-based evaluation
Simple de novo evaluation

Available as a

web-based and a command line tool
quast.sf.net

QUASTAssembly statistics Basic statisticsReference-based evaluationSimple de novo evaluationAvailable as a web-based and a command line toolquast.sf.net

Слайд 84QUAST: console tool
quast.py
quast.py --help

QUAST: console toolquast.pyquast.py --help

Слайд 85QUAST basics
quast.py
quast.py --help
quast.py contigs.fasta
quast.py [options] contigs.fasta
quast.py -o out_dir contigs.fasta

QUAST basicsquast.pyquast.py --helpquast.py contigs.fastaquast.py [options] contigs.fastaquast.py -o out_dir contigs.fasta

Слайд 86Reference options
Reference genome
-R reference.fasta
Gene annotation
-G genes.gff
Operon annotation
-O operons.gff

Reference optionsReference genome-R reference.fastaGene annotation-G genes.gff Operon annotation-O operons.gff

Слайд 87QUAST output
Reports in different formats
Plain text table
Tab separated values (Excel,

Google Spreadsheets)
Interactive HTML
Plots (PDF/PNG/SVG)
Nx, NGx, NAx
Genes
Cumulative length
Interactive contig viewers (Icarus)
Contig

alignment viewer
Contig size viewer

QUAST outputReports in different formatsPlain text tableTab separated values (Excel, Google Spreadsheets)Interactive HTMLPlots (PDF/PNG/SVG)Nx, NGx, NAxGenesCumulative lengthInteractive

Слайд 88Contig alignment viewer
All alignments for each contig
Misassembly details
Contig ordering

along the genome
Overlaps / gaps

Contig alignment viewerAll alignments for each contigMisassembly details Contig ordering along the genomeOverlaps / gaps

Слайд 89Contig alignment viewer

Contig alignment viewer

Слайд 90Contig size viewer
Contigs ordered from longest to shortest
N50, N75 (NG50,

NG75)
Filtration by contig size
Gene prediction results
Available without a reference

Contig size viewerContigs ordered from longest to shortestN50, N75 (NG50, NG75) Filtration by contig sizeGene prediction resultsAvailable

Слайд 91Contig size viewer

Contig size viewer

Слайд 92De novo evaluation

De novo evaluation

Слайд 93Read-based statistics
Number of aligned/unaligned reads
% of assembly covered by

reads

Read-based statisticsNumber of aligned/unaligned reads % of assembly covered by reads

Слайд 94Read-based statistics
Number of aligned/unaligned reads
% of assembly covered by

reads

Points with low coverage
Points with multiple read clipping
Points with incorrect

insert sizes

Read-based statisticsNumber of aligned/unaligned reads % of assembly covered by readsPoints with low coveragePoints with multiple read

Слайд 95Annotation-based statistics
Number of ORFs

Annotation-based statisticsNumber of ORFs

Слайд 96Annotation-based statistics
Number of ORFs
Number of gene/operon-like regions
GeneMarkS (Borodovsky et al.)
GlimmerHMM

(Majoros et al.)

Annotation-based statisticsNumber of ORFsNumber of gene/operon-like regionsGeneMarkS (Borodovsky et al.)GlimmerHMM (Majoros et al.)

Слайд 97Annotation-based statistics
Number of ORFs
Number of gene/operon-like regions
GeneMarkS (Borodovsky et al.)
GlimmerHMM

(Majoros et al.)
Number of conservative genes
BUSCO (Simão et al.)
CEGMA (Korf

et al., no longer supported)

Annotation-based statisticsNumber of ORFsNumber of gene/operon-like regionsGeneMarkS (Borodovsky et al.)GlimmerHMM (Majoros et al.)Number of conservative genesBUSCO (Simão

Слайд 98Thank you!
Questions?

Thank you!Questions?

Скачать презентацию

Обратная связь

Если не удалось найти и скачать доклад-презентацию, Вы можете заказать его на нашем сайте. Мы постараемся найти нужный Вам материал и отправим по электронной почте. Не стесняйтесь обращаться к нам, если у вас возникли вопросы или пожелания:

Email: Нажмите что бы посмотреть

Что такое TheSlide.ru?

Это сайт презентации, докладов, проектов в PowerPoint. Здесь удобно хранить и делиться своими презентациями с другими пользователями.

Для правообладателей