Long-read sequencing of Eucalyptus tree genomes
Eucalypt trees are foundation species that have important roles in regenerative agriculture, ecosystem restoration and carbon sequestration. The phenotypic diversity of Eucalypts is tremendous with >800 recognized species, however, the genetic divergence is largely unknown. Current draft genomes can have thousands to hundreds of thousands of contigs rather than chromosomes, containing incorrect assemblies, gaps and errors. Without quality genomes, large-scale genomic studies become limited in output and accuracy. To sequence Eucalyptus genomes, we adopted the portable MinION sequencer from Oxford Nanopore Technologies. As a native DNA molecule passes through a nanopore, changes in electrical current is measured to determine the nucleotide. This enables read lengths of 1 kb to 1 Mb and can identify epigenetic marks. As long-reads can span repetitive and duplicated regions, it is becoming possible to resolve complex genomes, including polyploid plant genomes.
For long-read sequencing, a large quantity of high quality intact DNA is required. First, we optimised a density gradient and detergent based nuclei extraction to limit reads from high copy count plastid genomes. Secondly, we optimised a gentle high molecular weight gDNA extraction free of columns and high centrifugation. Latest protocol development can be found on Protocols.io: Jones, A. and Borevitz, J. (2018). Nuclear DNA purification from recalcitrant plant species for long-read sequencing.
By performing nuclear DNA purification, intact high molecular weight DNA was obtained, fragment size being predominantly 20-140 kb in length (Figure 2). After creating native DNA libraries, we have been reproducibly obtaining >9 gigabases of sequencing from a single MinION flow cell, with read length N50 values capable of >30 kb (Table 1 and Figure 3). This includes reliably obtaining reads over 200 kb in length and with further optimising, we envisage longer reads and a higher N50.
Using the de novo assembler Canu, a draft genome for E. melliodora was created (Figure 4). From 1,133 contigs, an assembly N50 of 1.22 Mb was achieved, which contained sizes up to 4.72 Mb. In comparison, the published E. grandis reference, created from short-read Sanger sequencing, is highly fragmented with 4,951 scaffolds consisting of 32,724 contigs.
Long-read sequences are approaching the contig and scaffold sizes of published draft genomes generated from short-read data. Accordingly, long-read sequencing has the future potential to create high quality genomes, resolving chromosomes from telomere to telomere, without the need for scaffolding.
Sequencing data and reference genomes generated in this project are being made publicly available. Eucalyptus sequencing reads are available on the Sequence Read Archive (SRA) under BioProject PRJNA509734. Other work on Acacia genomes is available under BioProject PRJNA510265.
Tissue used for creating draft genomes was kindly was kindly provided by the Australian National Botanic Gardens, Canberra, Australia. Thank you to Tom North and Caroline Chong for their support.
We also thank Dean Nicolle, owner of the Currency Creek Arboretum, South Australia, for providing samples and support for this project.
This research is funded by The Australian Research Council Centre of Excellence in Plant Energy Biology and an Australian Research Council Discovery Grant.