May 1, 2009 Research Highlight Biology

Building a soy gene catalogue

A RIKEN-led consortium of scientists has compiled a massive collection of complete gene sequences for the invaluable soybean plant

Figure 1: The life and times of the soybean plant—a soybean crop, flowers, maturing pods and roots and nodules (from top left to bottom right).

In the thousands of years since the soybean was first cultivated, it has only become more useful and important, providing nutrition for billions of humans and animals as well as raw material for a numerous industrial applications, including lubricants, inks and plastics. As valuable as this crop already is, however, a better understanding of its genomic content could enable scientists to cultivate still more useful strains that are hardier or better suited for specific applications (Fig. 1).

A first draft of the soybean genome was recently made publicly available, but an even more useful resource would be a complete database of full-length, gene sequences—containing not only protein-coding regions, but also the regulatory sequences that govern when and where a protein is produced. A consortium of scientists from across Japan, led by Kazuo Shinozaki and colleagues at the RIKEN Plant Science Center in Yokohama, has pooled their resources to tackle this task, and recently announced a major step forward: the successful sequencing of more than 6,500 complete gene transcripts¹.

They began by pooling RNA isolated from plants cultivated under a wide variety of conditions, such as low temperature or high salt, to ensure expression of as many different genes as possible. They subsequently converted these RNAs into complementary DNA (cDNA), which makes them suitable for cloning and sequencing. They obtained sequence data from nearly 40,000 clones, which were subsequently computationally assembled into overlapping ‘sequence scaffolds’. From these, they identified a total of 6,570 full-length cDNAs.

The resulting dataset is important not only in terms of magnitude, but novelty as well. “Our collection is the first full-length cDNA resource of soybean in the world,” explains Taishi Umezawa, co-lead author on this work, along with Tetsuya Sakurai. Importantly, many of these sequences represent previously uncharacterized transcripts, as well as quite a few expressed sequences that appear to be soybean-specific—from the raw sequence data, Shinozaki’s team identified more than 500 sequences with no apparent equivalent in other plant species.

The team has deposited their data with Japan’s National Bioresource Project (NBRP), making them publicly available for broader analysis, and is also collaborating with American researchers towards the annotation of their genomic data. Their findings have also borne commercial fruit, however, in the form of soybean-specific ‘DNA chips’, now available to the scientific community from Agilent Technologies. “These will be useful for studying gene expression profiles in soybean,” says Umezawa, “and we are using them to investigate environmental stress-responsive gene expression.”

References

1. Umezawa, T., Sakurai, T., Totoki, Y., Toyoda, A., Seki, M., Ishiwata, A., Akiyama, K., Kurotani, A., Yoshida, T., Mochida, K. et al. Sequencing and analysis of approximately 40 000 soybean cDNA clones from a full-length-enriched cDNA library. DNA Research 15, 333–346 (2008). doi: 10.1093/dnares/dsn024