Understanding Asteraceae: Validation of a Hyb-Seq probe set for evolutionary studies
To accurately reconstruct the relationships between different species, it is necessary to analyze the sequences of a carefully selected, and preferably large, sample of different genes. Hyb-Seq is a powerful tool for obtaining these gene sequences, but must be calibrated for each group analyzed to ensure that an informative sample of genes is sequenced. Researchers must consider a variety of factors when choosing which genes to sequence, and the choices made in gene sampling can influence the outcome of the analysis. In a recent issue of Plant Science Applications, Katy Jones and her colleagues evaluated the performance of a set of Hyb-Seq probes designed for the large and diverse Asteraceae family, and found them to be effective in rebuilding relationships at different taxonomic levels, from subspecies to tribe.
Genes that would be informative in one taxonomic group may not be informative in another, for various reasons: the gene is not present in all species, or evolves too slowly in that group to add meaningful information to a phylogenetic analysis, or has duplicated to create several paralogues. The various evolutionary histories of a large group such as the Asteraceae make it difficult to select the genes to sequence. "Asteraceae is the largest family of angiosperms and the set of COS Asteraceae probes contains 1061 loci, some of which may be informative for some tribes/genders but not for others, for example because of potential paralogy in some groups but not in others," says Dr Jones, corresponding author of the manuscript, work she did during her post-doctoral research at the Botanischer Garten und Botanisches Museum Berlin.
Dr. Jones and his colleagues were interested in how the genes sampled in the Hyb-Seq set of Asteraceae probes at 1061 locus would perform phylogenetic analyses at different taxonomic levels. The researchers tested the probe on an Asteraceae tribe, the Cichorieae: "We wanted to know how analyzing a dataset containing many species of a large tribe compared to a dataset containing only a small species complex can influence phylogenetic inference in this small species complex," said Jones. "It was quite exploratory at the beginning and over time, the questions, ideas and number of different taxonomic groups have increased!"
The researchers found that the Hyb-Seq probe set produced sequential data that accurately reconstructed species relationships at multiple different levels, but that the way the data were sub-sampled and analyzed was important and influenced the results. For example, phylogenetic analysis using coalescing tree approaches yielded different results than those obtained with maximum likelihood methods when long branches (loci that have undergone considerable evolution) were not removed.
As part of this work, Dr. Jones and his colleagues present an optimized pipeline for the preparation and analysis of Hyb-Seq data and discuss different wet laboratory approaches that could influence the results, streamlining the process for other research groups. It was a direct response to their own personal experience with Hyb-Seq: "We often sent emails about different things, for example if someone discovered that they hadn't captured the off-target plastome properly or that they had captured more of it than in previous passes," said Jones. "We were talking about our wet laboratory steps or testing pipelines." Mr. Jones noted the support of the Asteraceae community in this work, and in particular that of the late Vicki Funk.
Jones hopes that developing the nuances of this type of analysis will allow more powerful tools such as Hyb-Seq to be used in the future: "I hope this article will encourage more people to use Hyb-Seq data for their research questions, as phylogenetic methods are becoming even more accessible," said Jones.