PROJECT TITLE: Shuffling the deck: The impact of transposon variation on the evolvability of grasses

AIMS AND BACKGROUND

The continued development of advanced plant germplasm to feed and provision our world requires an understanding of the heritable underpinnings of agronomically valuable traits. To date, this has often been done using genetic markers such as Single Nucleotide Polymorphisms (SNPs) to associate with measured phenotypes. Although this has lead to some success, there are still many agronomically relevant phenotypes that are heritable traits not adequately described using these markers alone. This ‘missing heritability’ (Eichler et al., 2010; Brachi 2011) has been attributed partly, to the activity of transposable elements (TEs) within the genome (Dubin et al., 2015; Stuart et al., 2016). Transposons, originally discovered in the cereal Zea mays, are mobile units of DNA that can ‘transpose’ throughout the genome leading to positional rearrangements and/or entire copies of elements being placed in other areas of an organism’s genome.  TEs have been implicated as a major source of genetic variability that can rapidly evolve in particular genetic lineages and be stimulated by stressful environmental conditions (Vitte et al., 2014; Rey et al., 2016). Although extremely common within many crop genomes, their repetitive nature and ability to mobilise have made them extremely difficult targets to accurately profile, even with the advancements of sequencing technology in the past. This limitation is no longer present, allowing an entire new level of genomic variation to be profiled using novel sequence (re)analysis methodology.

The team organised for this project have applied novel genomic analysis techniques (as part of CI Eichten’s DECRA) to identify polymorphic transposable elements within Arabidopsis thaliana (Stuart et al., 2016) and Brachypodium reference lines (Eichten et al., 2016) showing TE variation to be pervasive. Their impact has been overlooked in traditional SNP datasets. New TE variants act as novel, functionally disruptive genetic markers for trait association. Beyond this, TE variation can also be used as a measure of an individual’s ‘evolvability’: The overall TE variation within an individual that can generate phenotypic outcomes available for natural or artificial selection. In this way, the evolvability among different genetic backgrounds is a molecular phenotype that may be due to underlying genetic differences. TE variation may itself also correlate with observable phenotypic outcomes. This proposal aims to robustly identify TE variation across hundreds of individuals and assess its influence on phenotypic outcomes in two key grass species:

AIM 1: Profile the extent of transposon diversity in clonal and diverse populations of the model cereal Brachypodium distachyon and associate them with phenomics-based juvenile cereal developmental traits. OUTCOME: This large-scale study within and between accessions will reveal associations of TEs with early growth traits, and trait variability, important to cereal development. It will utilize cutting-edge phenomic tools and be a proof of concept in how TE variation associates with growth traits.

AIM 2: Apply TE identification methodology to Sorghum diversity and mapping germplasm to associate TE variation to agronomic traits in the field for integration into marker-based breeding strategies. OUTCOME: An expanded understanding of how novel TE variants accumulate and associate with field-based agronomic traits for this staple feed-crop. Provide direct contrast to AIM 1 results in order to generalise the influence TE variation has within other important grass crops.

AIM 3: Validate associations, perform functional assessment of TE variants, and examine germplasm evolvability in relation to intra-plot variation using high-throughput phenomic measurements. OUTCOME: Novel phenotypes displayed within an inbred accession will be related to emerging TE variation to confirm the a cause of the trait variability. The relationship between evolvability and phenomic traits will guide germplasm selection and additional research into genetic predisposition for high evolvability for integration into breeding pipelines.

These aims will increase the predictive ability of genomic selection and reveal key functional variants that have been overlooked to date to expose the functional relevance of what was once viewed as ‘junk’ DNA within the genome.

What makes this strategy unique and what will be the breakthrough findings?

Recent work has made the identification and assessment of transposon variation possible using short-read sequencing data. This data is often generated for other uses (e.g. identification of SNPs) and can act as a previously untapped resource to identify transposons. In this way, we can leverage multiple large phenomic and sequencing datasets on these species that have already been generated (terraref.org) and directly integrate new results with previously analysed work.

Transposon variation is a common occurrence within populations of grass species (upwards of 80% of genome content in some cases) but their relative influence on phenotypic traits is unknown. A potential genetic polymorphism can be ‘unlocked’ when changes in element repression occur. As such, transposons have the ability to rapidly mobilise and may act as a novel source of heritable variation that can then be acted upon by natural or artificial selection by humans. This project aims to specifically address this question of ‘evolvability’ by looking for new transposition occurring within standing plots of inbred backgrounds in Brachypodium and Sorghum (Figure 1). This work may change the paradigm in regards to the impact and importance of transposable elements within plant lineages and highlight their level of functional relevance in relation to standing (SNP based) genetic variation.

Figure 1: (A) Model cereal Brachypodium and staple feed grass Sorghum (photo via P. Savoury, UQ) will be used to identify novel transposon insertions and deletions across accessions (lower panel). (B) Phenotypic measurements in hand will be combined with TE variants and existing SNP data to perform genome-wide association studies (bottom panel). GWAS will examine where (i) novel TE variants have trait associations missed by SNPs or (ii) TE variants linked to nearby SNPs identify a functional variant. (C) Confirmation of phenotypic association in field plots. Interplot phenotypic measurements for agronomic traits (gradient across squares A through I) may be associated with specific TE states (open/closed triangles above each plot). Sibling plants may display intraplot variation (high ‘evolvability’) in which young transpositions may be segregating. Here individuals will be assessed for phenotype and transposon variant identification to dissect relationship between TE state and observed phenotypic variation (bottom panel).

Why work with two species?

This project will focus on both basic and applied understanding of transposon variation within grasses. Different grass species have variable levels of transposon load, overall genome size, and selection histories. Because of this, we will use two key grass species to allow for species contrasts and a more general understanding of transposon impact within monocots that will facilitate potential application to other valuable species. Brachypodium distachyon is an established grass model species (IBI, 2010; Gordon et al., 2014) in which we have ample (epi)genomic data to compare to and dissect the impact of variants on gene function as developed in CI Eichten’s current DECRA project. We are also working with Sorghum bicolor, a staple crop in Africa as well as a commercially grown feed source in Australia which recently surpassed wheat as the most valuable cereal grown in Queensland (Felton-Taylor, 2015). For research, sorghum has robust phenotypic projects, genome resequencing efforts already underway (Mace et al., 2013), and a more compact genome compared to other commercial grasses such as wheat or maize. Sorghum is also a bioenergy target crop in the USA with massive (>$10M) genomic and phenomic studies underway (terraref.org). These data are available openly through our collaborators at the Danforth Plant Science Centre whom have agreed to include our TE finding strategy in joint studies. The research project aims proposed here will determine the impact of transposable element variation as a rapidly evolving source of genetic variation contributing to agronomic traits such as early vigor, heading date, height, and yield.

INVESTIGATORS

The project brings together global leaders in extended genotype profiling, genomic trait association, Brachypodium, and Sorghum genomic research. CI Eichten will coordinate and advise research experiments regarding the project and integrate results into publishable outputs. This will be his major project following current work on extended genotype profiling (DE150101206) within Brachypodium, which ends in 2018. The proposed project would commence in April of 2018 to prevent conflict with the completion of Eichten’s DECRA commitments. CI Borevitz will lead the investigation of evolvability and phenotypic adaptation. With an expertise in association studies and phenomics research, he will advise and coordinate result dissemination with his international collaborators at Danforth. Previous research by these project leaders have highlighted the need to expand our view of genetic variation to include transposable element polymorphisms, chromatin factors, as well as sources of genome shock such as polyploidization (Eichten & Borevitz, 2013). CI Ivakov is leading research into hyperspectral imaging and analysis of grasses and novel techniques to use multivariate data for genetic mapping studies to identify ‘cryptic’ phenotypes previously unobserved. Ivakov is a member of the ARC Centre of Excellence in Translational Photosynthesis working on the development of hyperspectral indexing for trait associations with SNPs. His work with this project will build from this to perform phenomic analysis of Sorghum traits through hyperspectral imaging of both field and growth chamber experiments and associate them with identified genetic markers and transposon locations. CI van Oosterom is a sorghum researcher focusing on the dissection of crop physiological processes that underpin adaptation to abiotic stresses, linking these processes to their genetic control, and incorporating these insights into predictive crop growth simulation models. His experience with Sorghum research at the University of Queensland provides invaluable experience with experimental designs and trait associations within this crop. He will facilitate Sorghum growth and phenomic assessment at UQ and interact with other UQ Sorghum researchers who strongly support this project (David Jordan, Ian Godwin). An A4 level postdoctoral researcher will be brought on in the first year to lead molecular and computational work. This includes gathering published sequencing data, performing transposon identification, annotating and analysing trait associations, across all aims of this proposal. The postdoc will then lead resequencing and functional validation work in AIM3 by working with a PhD student (APA funded) and Honours student (project funded) that will join in the second year of the project.

PROJECT QUALITY AND INNOVATION

Heritability and variation: The cornerstone of genetics is to understand the heritable information underlying phenotypic outcomes under given environmental conditions. The advent of low cost DNA sequencing has allowed for the identification of small variants within the genome, such as single nucleotide polymorphisms (SNPs) and small insertions of deletions (InDels). Common genetic variants such as these, typed in large populations have allowed genome wide association studies (GWAS). Although this technique has led to moderate success in some plant systems, substantial variation remains to be explained. This ‘missing heritability’ (Eichler et al., 2010) could be due to unstable alleles and multiple rare variants within diverse populations. Another hypothesis is the impact of epigenetic phenomenon in which heritable variation is unlinked to the underlying genetic state of an organism. Although techniques have been developed to profile possible epigenetic signals, such as DNA methylation (Hardcastle, 2013), many proposed ‘epigenetic’ signals are in fact tied to underlying genetic variation such as transposable elements (Eichten et al., 2014; Dubin et al., 2015; Mirouze & Vitte, 2014; Quadrana et al., 2016; Stuart et al., 2016). Consequently, a direct assessment of transposable elements and their level of variation across populations will be crucial to dissect the remaining functional causes of traits that are not adequately explained via SNP analysis.

Transposons and their genomic impact: The past decade of chromatin studies across plant populations has often lead back to transposable element sequences within plant genomes as likely targets and/or causes of chromatin variation. TEs themselves are ubiquitous in all eukaryotes as DNA sequences that can become, or were in the past, mobile (Lisch, 2013). These elements often act either through cut-and-paste (type II) or copy-and-paste via an RNA intermediate (type I) mechanisms to achieve mobility within the genome. In grasses specifically, TEs account for the vast majority of overall genome size variation with some species showing extreme amplification of specific element families (Fedoroff, 2012; Tenaillon et al., 2010; Lisch, 2013; Vitte et al., 2014).

Transposons have multiple mechanisms to influence other features of the genome. These include direct disruption of coding sequences, positional effects through redefined borders of heterochromatin and euchromatin (Lippman et al., 2004; Hollister & Gaut, 2009; Hollister et al., 2011; Eichten et al., 2012), and also may contain regulatory sequences that can influence surrounding genes (Rebollo et al., 2012; Vitte et al., 2014). Given their disruptive potential, organisms have developed a complex mechanism to silence their ability to transpose largely through repressive chromatin modifications (Lisch 2009; Bucher et al 2012). DNA methylation and other histone modifications are strongly targeted to transposons and other repetitive sequences within the genomes of grasses. The disruption of these silencing pathways can lead to the expression, and possible transposition, of previously silenced elements (Miura et al., 2001; Jia et al., 2009; Reinders et al., 2009; Bennetzen & Wang, 2014; Rigal et al., 2016). From this, it is clear that active transposons within an organism’s genome can lead to novel genetic variation and regulation. When transposition occurs in germline tissue this can give rise to heritable variation segregating in the next generation. These young alleles, variable within inbred lines, have not been investigated as a source of phenotypic variation in plants. In human studies however, multiple rare variants (TEs or SNPs) within different families are showing promise toward explaining missing heritability (Robinson et al., 2014).

Due to TEs ability to have multiple copies and often high levels of sequence similarity, the gains of short read technology on TE identification have been limited due to ambiguous mapping of reads to the genome. However, recent advances in analysis and sequencing methods now allow for the identification of excised or inserted transposons from paired-end sequencing data (Nakagome et al., 2014; Quadrana et al., 2016; Stuart et al., 2016). In addition, long-read sequencing technologies commercialized by Oxford Nanopore or Pacific BioScience eliminate the ambiguity of short reads which do not read through most TEs that are > 300bp (Jiao et al., 2016). Recent work by the CIs have applied these methods to identify tens of thousands of polymorphic transposable elements within sequenced populations of Arabidopsis thaliana (Stuart et al., 2016) and a set of resequenced Brachypodium distachyon reference lines (Eichten et al., 2016). A key finding was that many of the transposon variants identified were not in linkage disequilibrium with nearby SNPs, indicating that traditional SNP genotyping, even with imputation, will miss many of these likely functional variants that influence the surrounding genes.

Second order phenotype – Evolvability: With an increasingly variable climate predicted for many growing regions of Australia, traditional breeding populations may be unable to cope. An exciting possibility is that new mutations may lead to more variance in phenotypic outcomes (Kooke et al., 2015; Wicker et al., 2016). Evidence has shown that specific transposable elements can be activated under stress conditions (Le et al., 2014; Makarevitch et al., 2015). Active transposition could lead to rapid adaptation, but also lead to many deleterious changes. However, the relative ability, or frequency of these new heritable changes may be of overall benefit in a breeding population. This second order phenotype is the variability in a trait’s value within lines rather than the average measure itself (Rey et al., 2016). By producing variable phenotypes, some individuals may perform better under new environmental conditions. By first selecting for high evolvability to generate variance and then eliminating it from the improved lines, a new breeding program could rapidly advance into novel conditions. This higher risk approach is tractable given that we can now quantify and track TE variation genome wide. The consideration of evolvability is grounded on theory in natural populations. However, agriculture often uses monocultures to try and eliminate phenotypic variance where possible. This practice has its own concerns regarding limited genetic diversity (Wolfe 2000) and does not provide for flexibility in how field-grown crops can best adapt to each growing season. Preliminary studies in crop species (Wei et al., 2016) have highlighted how direct studies of transposon variation can highlight missed variation.

PROJECT AIMS

AIM 1 Profile the extent of transposon diversity in clonal and diverse populations of the model cereal Brachypodium distachyon and associate them with phenomics-based juvenile cereal developmental traits. Previous work has described the chromatin landscape of this species for both genetically diverse and genetically similar populations (Eichten et al., 2016). This aim expands experiments from CI Eichten’s DECRA work to identify a comprehensive set of transposon variants across the species and act as a proof of concept in relating transposons to SNPs as well as early growth phenotypic traits. The data generated will allow for direct relation to DNA methylation datasets as well as SNP diversity to frame the impact of transposable element variation against these heritable features.

AIM 1.1 Identify TE variation across diverse mapping germplasm and within families. Genetic profiling of thousands of Brachypodium distachyon has been performed in the Borevitz lab allowing the identification of both a core diversity set (72 accessions) as well as populations of low diversity (24 accessions). Paired-end sequencing of the core diversity set (4-10x coverage) has been completed for SNP-genotyping and will be reanalysed to identify novel transposable element variants between accessions. This will identify transposons that have historical variation within the species. Preliminary scans indicate hundreds of variants present per line when compared to the reference annotation. To investigate recent variation, additional paired-end Illumina sequencing will be performed on 96 low diversity (based on SNP diversity) accessions to 4-10x coverage. Transposon identification (Stuart et al., 2016) will be performed to identify both new insertions and deletions compared to the Bd21 reference accession. As the reference annotation cannot directly identify phased variants (e.g true deletions vs rare insertions in the reference), an outgroup species B. stacei will be used to determine whether an ancestral TE has jumped out or whether a new insertion has occurred. The frequency of TE variation determined within lines is the second order evolvability phenotype. We hypothesise that the genetic cause underlying evolvability will be an active TE locus however other chromatin factors may play a role and multiple loci may come up in a genome scan for evolvability.

AIM 1.2 Define relationships between TE variation and known genotype markers (SNPs). A key step in understanding the role of transposons on phenotypic variation is to determine their relationship to standing genetic diversity between samples. Previous work by CIs Eichten and Borevitz in Arabidopsis thaliana have shown that many transposons are ‘unlinked’ to nearby SNP variation (Stuart et al., 2016). To investigate this relationship within grasses, all accessions previously genotyped for SNPs will be compared with identified TE variants to determine the linkage disequilibrium found between nearby SNP-TE pairings. This will result in a subset of variable transposons that are not tagged by nearby SNPs and are novel markers distinct from previous genotyping assays allowing trait associations previously unavailable.

AIM 1.3 Associate early growth phenotypes to novel TE markers. The diversity set of Brachypodium has been extensively phenotyped for early growth traits such as leaf width, leaf length, growth stage, and germination in the Borevitz lab. These phenotypes have been successfully used in SNP-based genome wide association studies (GWAS) under multiple simulated environmental conditions. Novel transposon variant markers will be integrated with SNP data to determine if GWAS hits can show increased resolution and novel associations compared to SNP-only association scans. TEs also provide candidate causal variants for Quantitative Trait Loci linked to SNPs.  In this proposal we will expand the set of phenotypes to include hyperspectral image data. This deep phenotype data contains a large amount of information (thousands of pixels with thousands of wave bands per plant) and will give us the best chance to find phenotypes associated with novel TE variation between or within lines (Figure 1C). It will also allow us to visualize sectoring within a plant underlying mitotic transposition. SNPs will be used as a control for hyperspectral trait associations. Defined spectral indices are regularly used to summarise hyperspectral data, e.g. normalized difference vegetation index (NDVI). This will provide a readily interpretable list of image traits as an additional control in our study.

The results of AIM 1 will provide a robust measurement of transposon diversity across the Brachypodium distachyon species under controlled conditions in the lab. It will act as a direct comparison to the dicot model Arabidopsis thaliana (Stuart et al., 2016; Quadrana et al., 2016) and provide a grass comparison for subsequent work in Sorghum bicolor in the field.

AIM 2 Apply identification methodology to Sorghum diversity and mapping germplasm to associate TE variation to agronomic traits in the field for integration into marker-based breeding strategies. To expand on the results of the model monocot Brachypodium, a direct assessment of transposon diversity within the feed crop Sorghum bicolor will be conducted. Large panels of diverse germplasm as well as multi-year field phenotyping of agronomic trait data is available for over 500 accessions from collaborators in the TerraRef project and the public Sorghum pre-breeding program at the University of Queensland to mine and reanalyse with the addition of novel transposon variant information gleaned from resequencing data.

AIM 2.1 Mine Sorghum resequencing data to identify TE variants. A set of 384 sequenced accessions from the Sorghum BioEnergy Association panel (BEA; Brenton et al., 2016) will be used as the basis for analysis. This will be combined with an additional 44 accessions previously sequenced as part of the 336 accession Sorghum Association Panel (SAP; Mace et al., 2013) and will be reanalysed to identify transposon variants. An additional 96 previously unsequenced Sorghum accessions from the SAP will be selected for additional 4-10x paired-end sequencing. This will bring the total number of diverse Sorghum accessions to at least 524 for this study. Sequencing data will be analysed using TEPID to identify inserted and deleted transposons across accessions. Transposon classification will be conducted to determine if specific element families are enriched for species variation.

AIM 2.2 Identify associations between TE variants and agronomic field-measured traits. Measurements by CI van Oosterom and collaborators (Jordon and Mace at UQ in Centre of Excellence in Translational Photosynthesis) for agronomic traits (examples include early vigor, greenness, tillering, heading date, height, and yield) have been conducted across the SAP panel. This will be combined with multiple, replicated, phenomic analysis of the BEA lines being conducted with collaborators Nadia Shakoor and Todd Mockler at the Danforth Plant Science Center and part of the TerraRef crop analytics project (terraref.org). Novel TE variants (including those unlinked to SNPs) will be integrated with current SNP assessments (Figure 1B; Brenton et al., 2016) to expand the resolution and power of current association scans. Comparisons between SNP-only and TE-SNP GWAS approaches will be performed and candidate loci selected as putative trait markers for molecular validation.

AIM 3 Validate associations, perform functional assessment of TE variants, and examine germplasm evolvability in relation to intra-plot variation using high-throughput phenomic measurements. Although novel TE variants can be used as markers without an understanding of their functional impact, the results from AIM 1 and AIM 2 can allow for direct study of individual variants and how they impact their genomic region to provide clear examples of how TE state can influence genes leading to the observed phenotypic output. AIM 3 moves beyond short-read analysis to independently verify TE variant states and investigate their relationship to nearby gene expression.

AIM 3.1 Molecular validation and functional assessment of TE variants in Sorghum. Although read-based identification of TE variants is robust (Stuart et al., 2016), novel variants will be confirmed using localised PCR validation as well as the Oxford Nanopore MinION long-read technology. Individual long reads can span the entire distance of novel TE insertions and are expected to allow for confirmation of complete element transposition for elements with minimal overall sequencing (0.5-1x coverage). Successful validation assays may be expanded into additional germplasm not initially sequenced to confirm the TE allele frequency across a larger panel of accessions. As TE variants may influence their local genetic regulation in a variety of ways, we will also select TE variants associated with measured traits in both species to perform in-depth genomic profiling. This will focus on the expression patterns of nearby genes for accessions displaying the various TE states (present or absent). Although these cis-acting regulations may be expected, we will also scan for trans-acting associations between TE variant states and gene expression throughout the genome as well as associations to other TE states (long-range LD). As with most emerging technologies, the use of novel long-read sequencing for variant validation as well as phenomic trait analysis may also be difficult to perform. If there are difficulties using long-read technology, localised PCR validation flanking insertion sites to define amplicon size will be performed on a subset of loci and accessions as previously shown (Stuart et al., 2016).

AIM 3.2 Examine plot variation in relationship to TE variation as ‘evolvability’ measure. New, image based remote sensing techniques, specifically drone based hyperspectral imaging, will be used to obtain field plot measurements that include within plot variation of traits. Hyperspectral imaging contains ‘cryptic’ phenotypes to observe trait variation previously unseen. The standard between-plot GWAS will be performed using SNPs and TE sources of genetic variation. In addition, within field plots spectral trait variability will also be used as a proxy for phenotypic evolvability. We hypothesise that lines with high TE frequency will also have high phenotype variability within plots. Individual plant spectral traits will be extracted for 48-96 individuals within the plot. Tissue from these plants will be used for paired-end genomic sequencing and TE variant calling. The within plot genotype and phenotype data will be used to quickly validate TE trait association. Seeds from individual lines will be expanded for validation in the subsequent generation in the next field trial. This intra-plot variation analysis, when successful, will save a year in validation. Seeds from variable maternal lines within plots will also be planted into plots in subsequent years for replication.

FEASIBILITY

The proposed project fits specifically with the expertise of all named investigators from extended genotype profiling (Eichten), association studies (Borevitz), and phenomic association (Ivakov, van Oosterom). The investigators have a strong record of scientific excellence and are world-leaders in their respective fields. The researchers are based at the ANU with access to world-leading molecular labspace, sequencing facilities, plant growth facilities (NCRIS), plant phenotyping capability and computing infrastructure. Field and greenhouse work for sorghum will be conducted at the University of Queensland, where the sorghum research group supporting this project is world-leading in its integration of crop physiology and crop modelling (CI van Oosterom) with genomics (Ian Godwin) and crop improvement (David Jordan). The requested project budget covers all required data generation costs for additional sequencing and phenotyping to meet the project aims based on current costs. Beyond this, all preexisting sequencing and phenomic data is available through our collaborators at UQ as well as TerraRef through the Mockler Lab at the Danforth Plant Science Center. We have existing relationships in place with our collaborators to obtain and use these data within the context of this proposal. The ANU, UQ, and TerraRef collaboration thus provides a research environment perfectly suited to accomplish the stated aims.

However, innovative research does have risks. It is possible that the investigation will find minimal evidence for transposon variation across the selected germplasm exists; limiting its availability for association scans. However, preliminary scans of Brachypodium indicate that there are at least hundreds of novel variants. If this occurs, focus will shift to understanding the molecular function of the variants discovered while also investigating why so few variants are found within the selected species, given that TEs often make up the majority of grass genomes by proportion. This may lead to expansion of the analysis to more diverse lineages to determine when transposon variation occurred within grasses. AIM 2 also relies of phenotypic assessment of field-based plants. Although many phenotypes successfully measured in past growing seasons will be used, there is always risk that field plots will be atypical due to weather and other annual variation. This may be especially important for within-plot variation (AIM 3) where transposition events happened in a previous generation and are segregating among siblings. All efforts will be made to measure traits and collect tissue for sequencing under current conventions and best practices. Any direct field-based result will be replicated in subsequent growing year and/or validated within a greenhouse environment whenever possible.

BENEFIT

The outcomes of this proposal will provide valuable insights into the importance of transposon variation within grasses. Model system work will highlight the underlying functional impact new insertions or deletions can have on gene expression and regulation within the genome. This information will influence future research decisions in plant genetics to highlight the importance of transposon assessment in all future population studies. Beyond basic knowledge, work with Sorghum will provide a more robust view of transposon influence in grasses generally and directly show the impact of novel transposition within a crop species. With a current Sorghum market of over $400 million in Australia, novel transposon markers for agronomic traits could influence breeding decisions and help to develop the next generation of germplasm for the nation.

 

This project’s key factor is that it utilises already available sequencing data and phenotypic measurements for hundreds of samples. Public data allows for the project to be completed at scale (hundreds of accessions for two species) with the only budgeted cost of sequencing for a validation and selective inclusion of diverse accessions. In this way, we can leverage public sequencing data to obtain results not otherwise possible due to the high cost of data generation.

COMMUNICATION OF RESULTS

Raw data will be made available in real time on the borevitzlab.anu.edu.au web site. The advances resulting from this work will be promoted through open-access primary scientific publications in high impact journals as well as at international conferences. Transposon variants developed for Brachypodium will be provided to the international community (brachypodium.org). Similarly, Sorghum results will be disseminated with current SNP marker sets throughout the sorghum breeding and genomics community through our collaboration with David Jordan at UQ in the Centre of Excellence in Translational Photosynthesis.

MANAGEMENT OF DATA

A strength of this project is the public availability of sequencing datasets to reanalyse for transposon variation. The investigators are strong proponents of open access datasets as well as agreed upon data standards. All work resulting from this project will be published under open access agreements and source code for all analyses will be provided. All sequence generated from this work will be made publically available through the short read archive (SRA) and curated datasets will conform to commonly used formats and made available through primary publications.

REFERENCES 

Bennetzen JL, Wang H. 2014. The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu Rev Plant Biol 65: 505–530.

Brachi B, Morris GP, Borevitz JO. 2011. Genome-wide association studies in plants: the missing heritability is in the field. Genome Biol 12: 232.

Brenton ZW, Cooper EA, Myers MT, et al. 2016. A Genomic Resource for the Development, Improvement, and Exploitation of Sorghum for Bioenergy. Genetics 204.

Bucher E, Reinders J, Mirouze M. 2012. Epigenetic control of transposon transcription and mobility in Arabidopsis. Curr Opin Plant Biol 15: 503–10.

Dubin MJ, Zhang P, Meng D, et al. 2015. DNA methylation in Arabidopsis has a genetic basis and shows evidence of local adaptation. Elife 4: e05255.

Eichler EE, Flint J, Gibson G, et al. 2010. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11: 446–450.

Eichten S, Borevitz J. 2013. Epigenomics: Methylation’s mark on inheritance. Nature 495: 181–182.

Eichten SR, Ellis N a, Makarevitch I, et al. 2012. Spreading of heterochromatin is limited to specific families of maize retrotransposons. PLoS Genet 8: e1003127.

Eichten SR, Schmitz RJ, Springer NM. 2014. Epigenetics: Beyond Chromatin Modifications and Complex Genetic Regulation. Plant Physiol 165: 933–947.

Eichten SR, Stuart T, Srivastava A, et al. 2016. DNA methylation profiles of diverse Brachypodium distachyon align with underlying genetic diversity. Genome Res 26: 1520–1531.

Fedoroff N V. 2012. Transposable Elements, Epigenetics, and Genome Evolution. Science (80- ) 338: 758–767.

Felton-Taylor A. 2015. Sorghum was Queensland’s most valuable crop last summer. http://www.abc.net.au/news/2015-07-13/sorghum-was-queensland%27s-most-valuable-crop-last-summer/6615916 (Accessed January 5, 2017).

Gordon SP, Priest H, Des Marais DL, et al. 2014. Genome diversity in Brachypodium distachyon: deep sequencing of highly diverse inbred lines. Plant J 79: 361–74.

Hollister JD, Gaut BS. 2009. Epigenetic silencing of transposable elements: a trade-off between reduced transposition and deleterious effects on neighboring gene expression. Genome Res 19: 1419–28.

Hollister JD, Smith LM, Guo Y-L, et al. 2011. Transposable elements and small RNAs contribute to gene expression divergence between Arabidopsis thaliana and Arabidopsis lyrata. Proc Natl Acad Sci U S A 108: 2322–7.

IBI, International T, Initiative B. 2010. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463: 763–8.

Jia Y, Lisch DR, Ohtsu K, et al. 2009. Loss of RNA-dependent RNA polymerase 2 (RDR2) function causes widespread and unexpected changes in the expression of transposons, genes, and 24-nt small RNAs. PLoS Genet 5: e1000737.

Jiao Y, Peluso P, Shi J, et al. 2016. The complex sequence landscape of maize revealed by single molecule technologies. bioRxiv.

Kim MY, Zilberman D. 2014. DNA methylation as a system of plant genomic immunity. Trends Plant Sci 19: 320–6.

Kooke R, Johannes F, Wardenaar R, et al. 2015. Epigenetic Basis of Morphological Variation and Phenotypic Plasticity in Arabidopsis thaliana. Plant Cell 27: 337–348.

Le T-N, Schumann U, Smith NA, et al. 2014. DNA demethylases target promoter transposable elements to positively regulate stress responsive genes in Arabidopsis. Genome Biol 15: 458.

Lippman Z, Gendrel A-V, Black M, et al. 2004. Role of transposable elements in heterochromatin and epigenetic control. Nature 430: 471–476.

Lisch D. 2009. Epigenetic regulation of transposable elements in plants. Annu Rev Plant Biol 60: 43–66.

Lisch D. 2013. How important are transposons for plant evolution? Nat Rev Genet 14: 49–61.

Lisch D, Bennetzen JL. 2011. Transposable element origins of epigenetic gene regulation. Curr Opin Plant Biol 14:156–61.

Mace ES, Tai S, Gilding EK, et al. 2013. Whole-genome sequencing reveals untapped genetic potential in Africa’s indigenous cereal crop sorghum. Nat Commun 4: 337–342.

Makarevitch I, Waters AJ, West PT, et al. 2015. Transposable elements contribute to activation of maize genes in response to abiotic stress. PLoS Genet 11: e1004915.

Mirouze M, Vitte C. 2014. Transposable elements, a treasure trove to decipher epigenetic variation: insights from Arabidopsis and crop epigenomes. J Exp Bot 65: 2801–12.

Miura A, Yonebayashi S, Watanabe K, et al. 2001. Mobilization of transposons by a mutation abolishing full DNA methylation in Arabidopsis. Nature 411: 212–214.

Nakagome M, Solovieva E, Takahashi A, et al. 2014. Transposon Insertion Finder (TIF): a novel program for detection of de novo transpositions of transposable elements. BMC Bioinformatics 15: 71.

Quadrana L, Bortolini Silveira A, Mayhew GF, et al. 2016. The Arabidopsis thaliana mobilome and its impact at the species level. Elife 5: e15716.

Rebollo R, Romanish MT, Mager DL. 2012. Transposable Elements: An Abundant and Natural Source of Regulatory Sequences for Host Genes. Annu Rev Genet 46: 21–42.

Reinders J, Wulff BBH, Mirouze M, et al. 2009. Compromised stability of DNA methylation and transposon immobilization in mosaic Arabidopsis epigenomes. Genes Dev 23: 939–950.

Rey O, Danchin E, Mirouze M, et al. 2016. Adaptation to Global Change: A Transposable Element-Epigenetics Perspective. Trends Ecol Evol 31: 514–526.

Rigal M, Becker C, Pélissier T, et al. 2016. Epigenome confrontation triggers immediate reprogramming of DNA methylation and transposon silencing in Arabidopsis thaliana F1 epihybrids. Proc Natl Acad Sci U S A 113: E2083-92.

Robinson MR, Wray NR, Visscher PM. 2014. Explaining additional genetic variation in complex traits. Trends Genet 30: 124–132.

Stuart T, Eichten SR, Cahn J, et al. 2016. Population scale mapping of transposable element diversity reveals links to gene regulation and epigenomic variation. Elife 5: e60–e69.

Tenaillon MI, Hollister JD, Gaut BS. 2010. A triptych of the evolution of plant transposable elements. Trends Plant Sci 15: 471–478.

Vitte C, Fustier M-A, Alix K, et al. 2014. The bright side of transposons in crop evolution. Brief Funct Genomics 13: 276–95.

Wei B, Liu H, Liu X, et al. 2016. Genome-wide characterization of non-reference transposons in crops suggests non-random insertion. BMC Genomics 17: 536.

Weigel D, Colot V. 2012. Epialleles in plant evolution. Genome Biol 13: 249.

Wicker T, Yu Y, Haberer G, et al. 2016. DNA transposon activity is associated with increased mutation rates in genes of rice and other grasses. Nat Commun 7: 12790.

Wolfe MS. 2000. Crop strength through diversity. Nature 406: 681–682.