Seed World

According to Study, Data Mining Brings New Clarity To Plant Breeding

Data mining methods that draw on computer science and statistics can predict traits in corn with better accuracy than previous prediction methods.

The immense number of possible hybrids that can be created from inbred corn plants can leave plant breeders wondering where to start when attempting to produce new crop varieties with desirable traits. But new research from an Iowa State University (ISU) agronomist shows how advanced data mining techniques can enhance the efficiency of the process.

The study, published recently in the peer-reviewed academic journal Molecular Plant, outlines several data management approaches that can help plant breeders predict the traits of potential hybrids faster and cheaper than growing and testing the plants. Jianming Yu, professor of agronomy and Pioneer Distinguished chair in Maize Breeding, said the study shows how genomic selection, or making accurate predictions of how a plant will perform based on its genotype, can inform plant breeding decisions.

Recent advances in biotechnology have allowed scientists to compile massive amounts of genomic data, mapping much of the genetic material present in various organisms, Yu says. But matching the genotype of an organism with its real-world, observable traits, or phenotype, presents a different set of challenges.

The ISU researchers worked with a multi-institution team, including scientists from the University of Missouri, the University of North Carolina and the University of Delaware. The team started the project by generating phenotypic metrics for 276 corn hybrids. The team explored a range of data mining approaches to a dataset containing genomic information as well as phenotypic metrics of these hybrids. Using a suite of data mining methods, they predicted flowering time, ear height and yield for each hybrid, then compared their predictions to the real result and found their method was more accurate than previous prediction models.

“With these new methods, training samples for building prediction models were more informative so that prediction accuracy was higher,” says Tingting Guo, first author of the paper and a postdoctoral research associate in agronomy.

The team applied the methods to two other crops, wheat and rice, for which experimental records were available for a large number of hybrids from earlier studies. The approach also worked well for these other crops.

The research demonstrated that effective genomic prediction models can be established with a training set of 2-13 percent of the size of the whole set, enabling an efficient exploration of many genetic combinations.

The data mining methods tested in the study drew on several disciplines, including computer science and statistics. The techniques lean on clustering and graphic networking, but Yu described the approach as a way of recognizing patterns in vast amounts of data.

“Data mining is another way of saying pattern finding,” Yu says. “We looked at a large dataset and found patterns linking genotype to phenotype. We then answer the question, ‘Which way of selecting the samples would allow breeders to exploit the patterns with minimal initial efforts?'”

Yu says this line of research could bridge the gap between predicting traits and designing better crops. Optimizing the process of how plant breeders select genetic lines to test will allow for more strategic use of time and effort, leading to more rapid progress in developing crop varieties with desirable traits.

Funding for the research came from the National Science Foundation and the ISU Plant Sciences Institute. Other ISU scientists contributing to the research include Xiaoqing Yu, a former ISU postdoctoral research associate; Xianran Li, an adjunct associate professor of agronomy; and Haozhe Zhang, a graduate assistant in statistics; and Chengsong Zhu, a former ISU postdoctoral research associate.