![]() In particular, the deficiency of the reads in genome finishing and de novo genome assembly has not been characterized. However, a systematic evaluation of the reads on their ability in resolving repetitive sequences and identification of complex genomic variations remains to be seen. The long reads have also been used to resolve a subset of highly-repetitive transposons in the genome of Drosophila melanogaster 8. This technology, also known as Moleculo, has been demonstrated its use by performing de novo genome assembly of the Botryllus schlosseri, a star ascidian animal 6 and whole-genome haplotyping of humans 7. A surprising long read length plus its high accuracy is posed to affect de novo genome assembly or gap finishing in a draft genome. Illumina has recently released Synthetic Long-Read technology ( ), which allows construction of synthetic long reads from the short sequencing reads generated with its existing HiSeq platform. ![]() The high false positive rate of indels (insertion and deletion) also hinders the use of the PacBio reads in variation calling 5. However, most of the single-pass reads suffer from a high error rate up to 15-18% 5, thus need to be corrected before being used for genome assembly and other applications. One example is the single molecule real time sequencing (SMRT) from PacBio, which is able to generate sequencing reads up to 30 Kbp and has been demonstrated to be useful in resolving the complex genomic regions 4. Such tradeoff has also catalyzed the third generation sequencing, a term coined for a sequencing method capable of producing reads of unusual length. For example, the 454 platform can produce reads of length up to 1 Kbp, which is useful in resolving genomic gaps 3. Therefore, a tradeoff has to be made across different NGS platforms to balance the read length and yield. Despite the substantial efforts that have been made in the past decade to increase the read length, for example, from 22 bp to up to 300 bp by Illumina platform 2, these lengths are still unsatisfactory for many applications, including de novo genome assembly, genome gap finishing and identification of complex structural variations in a draft genome. The short length also makes it problematic in variation call and de novo genome assembly. For example, ambiguity often remains when the short reads are mapped against a reference genome or among one another, which is further complicated by the accuracy of read sequence. The relatively short length of sequencing reads produced by most NGS platforms 1 limits their use particularly in genome assembly and finishing. One critical issue associated with all NGS platforms is the read length on top of the read throughput and accuracy. Capacity of generating large volume of sequencing reads in a short period of time enables genome assembly, genotyping, expression profiling and systematic identification of DNA binding sites in a way that is difficult or impractical otherwise. The high-throughput sequencing technology, also referred to as Next Generation Sequencing (NGS), has transformed biomedical research from genetics to developmental biology. Finally, an N50 contig size of at least 86 Kbp can be achieved with 24×reads but with substantial mis-assembly errors, highlighting a need for novel assembly algorithm for the long reads. Fourth, at least 40 Kbp missing genomic sequences are recovered in the C. Third, the reads of smaller size are more capable of recovering repetitive sequences than those of bigger size. Second, the reads are able to reliably detect missing but not extra sequences in the C. However, the presence of tandem repetitive sequences prevents pre-assembly of long reads in the relevant genomic region. First, the reads are highly accurate and capable of recovering most types of repetitive sequences. We evaluate the promise and deficiency of the long reads in these aspects using isogenic C. However, a systematic assessment of their use in genome finishing and assembly is still lacking. Illumina has recently released a technology called Synthetic Long-Read Sequencing that can produce reads of unusual length, i.e., predominately around 10 Kb. Most next-generation sequencing platforms permit acquisition of high-throughput DNA sequences, but the relatively short read length limits their use in genome assembly or finishing.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |