Long-read sequencing of Eucalyptus tree genomes

Figure 1. Collecting over 5,000 Eucalyptus accessions at Currency Creek Arboretum, South Australia. Image by Helen Bothwell.

Introduction

Eucalypt trees are foundation species that have important roles in regenerative agriculture, ecosystem restoration and carbon sequestration. The phenotypic diversity of Eucalypts is tremendous with >800 recognized species, however, the genetic divergence is largely unknown. Current draft genomes can have thousands to hundreds of thousands of contigs rather than chromosomes, containing incorrect assemblies, gaps and errors. Without quality genomes, large-scale genomic studies become limited in output and accuracy. To sequence Eucalyptus genomes, we adopted the portable MinION sequencer from Oxford Nanopore Technologies. As a native DNA molecule passes through a nanopore, changes in electrical current is measured to determine the nucleotide. This enables read lengths of 1 kb to 1 Mb and can identify epigenetic marks. As long-reads can span repetitive and duplicated regions, it is becoming possible to resolve complex genomes, including polyploid plant genomes.

Methods

For long-read sequencing, a large quantity of high quality intact DNA is required. First, we optimised a density gradient and detergent based nuclei extraction to limit reads from high copy count plastid genomes. Secondly, we optimised a gentle high molecular weight gDNA extraction free of columns and high centrifugation. Latest protocol development can be found on Protocols.io: Jones, A. and Borevitz, J. (2018). Nuclear DNA purification from recalcitrant plant species for long-read sequencing.

Results

By performing nuclear DNA purification, intact high molecular weight DNA was obtained, fragment size being predominantly 20-140 kb in length (Figure 2). After creating native DNA libraries, we have been reproducibly obtaining >9 gigabases of sequencing from a single MinION flow cell, with read length N50 values capable of >30 kb (Table 1 and Figure 3). This includes reliably obtaining reads over 200 kb in length and with further optimising, we envisage longer reads and a higher N50.

Figure 2. Analysis of DNA quality; Eucalyptus melliodora is shown as a representative example. (A) 50 ng of DNA separated on a 1% agarose gel by electrophoresis (B) 300 ng of DNA separated by pulsed field gel electrophoresis. (C) Spectrophotometer results on a Thermo Scientific Nanodrop 1000. Figure generated by Ashley Jones.



Table 1. Sequencing results with a Oxford Nanopore MinION flow cell per sample (FLO-MIN 106 R9.4.1 revC), except for the first entry, which shows the results of a PromethION flow cell performed at a sequencing facility. For library input with ligation kits, first number is the initial input, second number is the recovery after all cleans. Table generated by Ashley Jones.


Figure 3. Read length histogram generated while running an Oxford Nanopore MinION. An  E. melliodora DNA prep was processed with rapid transposase (A) and end ligation (B) library preps. (C) E. albens processed by ligation library prep. Inserts show pore usage. Figure generated by Ashley Jones.

Using the de novo assembler Canu, a draft genome for E. melliodora was created (Figure 4). From 1,133 contigs, an assembly N50 of 1.22 Mb was achieved, which contained sizes up to 4.72 Mb. In comparison, the published E. grandis reference, created from short-read Sanger sequencing, is highly fragmented with 4,951 scaffolds consisting of 32,724 contigs.

Figure 4. Density plot of contig sizes of E. melliodora assembled with Oxford Nanopore long-reads compared to the publicly available scaffolds of E. grandis, which was sequenced with Sanger technology. Dotted lines represent the N50 values. Figure generated by Scott Ferguson.

Conclusions

Long-read sequences are approaching the contig and scaffold sizes of published draft genomes generated from short-read data. Accordingly, long-read sequencing has the future potential to create high quality genomes, resolving chromosomes from telomere to telomere, without the need for scaffolding.

Data Access

Sequencing data and reference genomes generated in this project are being made publicly available. Eucalyptus sequencing reads are available on the Sequence Read Archive (SRA) under BioProject PRJNA509734. Other work on Acacia genomes is available under BioProject PRJNA510265.

Acknowledgements

Tissue used for creating draft genomes was kindly was kindly provided by the Australian National Botanic Gardens, Canberra, Australia. Thank you to Tom North and Caroline Chong for their support.

We also thank Dean Nicolle, owner of the Currency Creek Arboretum, South Australia, for providing samples and support for this project.

This research is funded by The Australian Research Council Centre of Excellence in Plant Energy Biology and an Australian Research Council Discovery Grant.