Nov. 10, 2006 Research Highlight Biology

3-D protein models make grand debut

A database containing approximately 6 million predicted protein structures promises to boost biological and pharmaceutical research

Figure 1: Six million protein structure predictions—such as this model of a human oncogene (Brk tyrosine kinase)—should aid the scientific community in the search for new biological insights and better leads to new drugs.

On September 28 of this year, the RIKEN Genomic Sciences Center announced the release of FAMSBASE, the largest database ever of predicted protein structures. This launch culminates a five-year effort carried out under the auspices of Japan’s ‘Protein 3000 Project’ and promises to open a new era of protein-related research. Accurate structure predictions are crucial to a better understanding of protein-protein and protein-drug interactions and protein function in general.

A team led by Hideaki Umeyama has assembled a collection of approximately 6 million protein structure predictions obtained through homology modeling (Fig. 1). Umeyama is a protein chemist at the Genomic Sciences Center in Yokohama and a professor at the School of Pharmaceutical Sciences, Kitasato University.

The researchers constructed 3-D models of unknown proteins by mimicking the structures of sequence-homologous proteins. The 3-D structures were determined previously using nuclear magnetic resonance (NMR) or x-ray crystallography. Modeling was done using an algorithm, FAMS (Full Automatic Modeling System), that was developed by Umeyama and his colleagues¹.

While determining a 3-D structure by NMR spectroscopy or x-ray crystallography is still the best approach to understand the molecular mechanisms underlying a protein’s function, these methods are complex to implement and not always applicable. As a result, the structures of only about 39,000 proteins had been deposited in the Protein Data Bank (PDB) as of September 2006. This highlights the need to develop good modeling tools to predict the 3-D structures of the thousands of proteins for which a sequence is available but no 3-D structure has yet been solved.

In a pilot study completed about a year ago, Umeyama and colleagues tested their approach by developing a database of 1,603 models of neuraminidases, enzymes that play an essential role in the life cycle of all influenza viruses².

The current version of FAMSBASE contains 793,612 newly predicted human protein models, 505,628 protein models of systems important for drug research such as mouse and rat models and 3,368,709 protein models for other species whose genome sequences have been determined. “This database should serve as a boon to scientists working on basic protein science as well as to those trying to design new or improved drugs,” says Umeyama. The RIKEN FAMSBASE is freely available and will be updated automatically as new structures become available.

References

1. Ogata, K. & Umeyama, H. An automatic homology modeling method consisting of database searches and simulated annealing. Journal of Molecular Graphics and Modelling 18, 258–272 (2000). doi: 10.1016/s1093-3263(00)00037-1
2. Takeda-Shitaka, M., Terashi, G., Takaya, D., Kanou, K., Iwadate, M. & Umeyama, H. Protein structure prediction in CASP6 using CHIMERA and FAMS. Proteins 61 (Suppl 7), 122–127 (2005). doi: 10.1002/prot.20728