News & Media


September 6, 2012

Comprehensive transcriptome analysis of human ENCODE cells

Transcription starting sites identified using CAGE technique provide powerful data sets for delineating functional elements across the human genome

ENCODE, an international research project led by the National Human Genome Research Institute (NHGRI), has produced and analyzed 1649 data sets designed to annotate functional elements of the entire human genome. Data on transcription starting sites (TSS) contributed by a research team at the RIKEN Omics Science Center provided key anchor points linking the epigenetic status of genes observed at the 5' end directly to their RNA output.

The ENCODE (Encyclopedia of DNA Elements) project aims to delineate all functional elements encoded in the human genome. Thirty-two institutes from five countries have contributed to the project, each providing their own unique technologies and expertise. The project has developed methods and performed a large number of sequence-based studies mapping functional elements including RNA transcribed regions, protein-coding regions, transcription-factor-binding sites, chromatin structure, and DNA methylation sites.

A team of researchers at the RIKEN Omics Science Center led by Dr. Piero Carninci contributed to the mapping of RNA transcribed regions through their identification of TSSs using RIKEN's original CAGE technique. Subcellular compartments (whole cells, nuclei, and cytosol) from 15 cell lines were fractionated before RNA isolation. For one particular cell line (K562), further fractionation was performed to obtain chromatin, nucleoplasm, and nucleoli.

Isolated RNAs were then divided depending on their length, and long RNAs were further fractioned into polyadenylated and non-polyadenylated long RNA's. Each of the RNA fractions were then characterized for function analysis.

The data set was integrated with data sets provided by other research groups for further analysis, which included modeling transcription levels from histone modification/transcription factor-binding patterns and prediction of transcription activities at distal enhancer regions. Overall, this comprehensive data, together with other data sets, contributed to assigning biochemical functions for 80% of the human genome, particularly in areas outside of well-studied protein-coding regions. Another striking result is the pervasive presence of lowly-expressed RNA transcripts, whose localization is restricted to the cell nucleus.

"Scientists at the RIKEN Omics Science Center are particularly pleased with this work because the CAGE technology, developed earlier, was employed as one of the standard technologies for analyzing the output of the genome," Dr. Carninci said. "This international collaboration is in line with the OSC mission to understand the function of the genome. OSC has pioneered the field with the FANTOM project, which provided a first comprehensive annotation of the mouse and human genome using CAGE, and identified a transcriptional network that controls the cell fate. The current ENCODE dataset provides a comprehensive set of data that strengthens and complements our previous and current work, aimed at understanding the function and regulation of the genome in health and disease states. OSC is committed to further characterize the genome output for much larger number of cells."


  • The ENCODE Project Consortium, “An integrated encyclopedia of DNA elenments in the human genome”. Nature, 2012. doi: 10.1038/nature11233
  • S. Diebali., et. al, “Landscape of transcription in human cells” Nature, 2012. doi: 10.1038/nature11247


Piero Carninci
LSA Technology Development Group
RIKEN Omics Science Center

Jens Wilkinson
RIKEN Global Relations and Research Coordination Office
Tel: +81-(0)48-462-1225 / Fax: +81-(0)48-463-3687