Gene annotation, protein structure analysis, plant ontologies, transcriptomes - dramatic increases in the size, variety and complexity of data resources in the life sciences have accentuated the challenges of data analysis in the information age. Adding to these challenges, much of the data handled at each step of the research process is private, making integration with public data more difficult and hindering collaboration. Overcoming these challenges requires systems for securely integrating data resources and making their information widely available through a flexible interface.
The RIKEN Bioinformatics And Systems Engineering (BASE) division, Japan's leading research institute focusing on the integration and publication of life science research data, has now developed such an interface. Referred to as Semantic-JSON, the interface accesses a "virtual laboratory cloud centre" also developed at BASE named the Scientists' Networking System (SciNetS), which brings together, as of May 2011, a total of 192 public database projects both internal and external to RIKEN. SciNetS creates common ground for sharing life science data resources by linking these resources together in a network of semantic relationships based on standardized Semantic Web techniques.
Already, RIKEN has successfully applied Semantic-JSON to a number of projects, including international data collaborations on mouse phenotypes, domestic integrated database projects, and the GenoCon International Rational Genome Design Contest. Looking ahead, RIKEN plans to use the interface to distribute life science data across its research centres and with international collaborators via the SciNetS project, broadening the life-sciences Semantic Web data universe and promising to achieve not just comprehensive understanding of various life phenomena, but also collaborative breakthroughs for medicine, industry and the environment.
This research result will appear in the online version of the British scientific journal Nucleic Acids Research on June 1.
Life science research depends crucially on the availability of informatics infrastructure for systematically storing and integrating vast amounts of diverse bioinformatics data. Indeed, a deep understanding of data collected using today's cutting-edge bioinformatics technologies is impossible without this infrastructure, yet conventional databases are limited in the types of data they can handle. For more sophisticated processing and analysis, infrastructure is needed that can simultaneously sort and organize the vast variety of different types of life science data and make this data available for public use.
At the RIKEN Bioinformatics And System Engineering (BASE) division, researchers have developed a novel research infrastructure around a set of virtual laboratories (collaboration via the cloud) that allows researchers to store massive amounts of life-sciences data and schematically and semantically organise relationships between individual records in a virtually-constructed, closed, secure data space. This collaboration centre, the Scientists' Networking System (SciNetS), does more than just publish data from RIKEN to the web. As an infrastructure for life science data sharing, it also encourages new forms of research collaboration, enabling scientific discoveries not possible through individual research activities alone.
Fully exploiting this collaborative potential, however, requires that SciNetS data be made available on the web through an easy-to-use interface, to be accessed and analysed via commonly-used programming languages. Semantic-JSON is the technological innovation which makes this possible.
Semantic-JSON also achieves a second major advance in life science research by bridging the gap between public data available for general use, and private data held by individual researchers or research groups. Researchers often need to unite public and private data for analysis; yet doing so is far from trivial due to differences in access permissions across virtual laboratories. Freely releasing such data, on the other hand, poses significant security issues. What is thus needed is a technology to enable virtual laboratories to manage their own data access permissions in a secure way, while also accessing relationship information and merging (public and private) original data from different virtual labs.
To accomplish this union of data, Semantic-JSON employs a trick similar to the URL shortening tools used on common social media services such as Twitter. The Semantic-JSON interface shrinks URLs for data internal and external to SciNetS into shorter identifiers, and uses these to lookup permissions for specific data, returning only the data appropriate to the access privileges of a given user. Unlike conventional URL shortening services, however, a short identifier in Semantic-JSON points to not only a URL but to a wealth of relationship between data, thus realising a unified domain semantic web structure.
By incorporating such security considerations, Semantic-JSON achieves a form of data access not implemented in conventional Semantic Web data tools. Researchers can thus access both public and private original data on SciNetS under a data access control, and use Semantic-JSON to traverse individual virtual labs, obtaining relationships not only for public data but for private data as well. Simply by selecting a single data item, a user can access related public and (depending on their privileges) private data from different data constellations, enabling deeper integration of widely-dispersed data resources.
RIKEN BASE has already applied Semantic-JSON to the implementation of a tool that allows users to create programs on their web browsers by accessing SciNetS data. This tool was successfully employed in 2010 by contestants in GenoCon, the first International Rational-Genome-Design Contest, for designing Arabidopsis plant genome sequences using data managed on SciNetS.
Since its foundation in 2008, research at RIKEN BASE has focused on the development, through SciNetS, of an infrastructure for enabling collaboration between researchers (virtual laboratory centre). Internationally, BASE has played a key role in the release of data in Japan for an international collaboration on Arabidopsis and mouse phenotypes. In Japan, BASE is one of the core institutions supporting activities of the Japan Science and Technology Agency (JST) Bio-sciences Database Centre. In each of these roles, the interface for data interchange is of key importance. By enabling this interchange for data published from virtual laboratories on SciNetS, Semantic-JSON achieves a major milestone, opening the door to data sharing via a variety of different devices such as mobile phones and PCs.
Through the use of SciNetS and Semantic-JSON, RIKEN aims to broaden the application of research results to society, developing the life-sciences information infrastructure necessary to accelerate data schematisation research both in Japan and across the world.