Lenhard Sars/CBU Group
Bioinformatics of Transcription and Transcriptional
(2006 - 2011)
Transcription start site usage and nucleotide composition anisotropy in mouse bidirectional promoters (from Engström et al, PLoS Genet. 2(4):e47, 2006).
This group had a joint appointment between Bergen Centre for Computational Science (BCCS) and Sars Centre, both units of Uni Research AS.
Transcriptional regulation has been identified as the top-priority research subject of advanced post-genome bioinformatics. Binding site specificities of individual eukaryotic transcription factors are notoriously low, which precludes their application to genome-wide analysis. The only available biologically meaningful transcription factor binding site data comes from relatively rare experimental analysis of inferred regulatory regions. Some progress was made upon observation that some tissue-specific genes are regulated by cis-regulatory modules (CRM), i.e. clusters of binding sites for tissue-specific regulatory elements. Another major leap was facilitated by cross-species comparisons of regulatory sequences (phylogenetic footprinting). This step has been successfully combined with other detection methods such as CRM detection, resulting in significant increase in the specificity of predictions.
We are working on several new approaches to harness the new discoveries and newly available data into a next-generation gene regulation bioinformatics platform. We assembled JASPAR, the world´s first open access database of transcription factor binding site profiles from higher eukaryotes. In addition e developed a computational framework for transcription factor binding site analysis (TFBS) and applied it to quantitatively demonstrate the ability of cross-species comparisons to drastically improve detection rate of transcription factor binding sites. ConSite is our web-based application for the phylogenetic footprinting enhanced detection of transcription factor binding sites.
Based on conceptual framework inherited from studies in bacteria and yeast, previous methods primarily focused on regions 5' upstream from the inferred transcription binding sites. New incoming data is starting to invalidate this as a general approach: many genes are regulated by non-coding elements distributed along the length of the entire gene. Our analysis of the genomic context and organization of 3583 ultra-conserved non-coding regions (UCRs) in the human genome, revealed that they tend to cluster near and around genes involved in fundamental developmental processes in vertebrates, and most often have known homologs in invertebrates (e.g. in Drosophila). The genomic organization of SCR clusters revealed a striking array of long-range enhancers around key genes, sometimes spanning areas of more than 1 MB. This discovery provides an argument against focusing on proximal promoter regions in search for key regulatory elements, and implies the existence of long-range, chromatin-level regulatory mechanisms. We continue to explore the long range regulatory elements across higher eukaryotes.
Another exciting area of research research we are involved in is the analysis of mammalian transcriptome. In collaboration with RIKEN Genome Science Center (Japan), we are dissecting the loci with demonstrated complex transcription patterns, including the occurences of natural antisense, bidirectional promoters and cis-regulatory chains.
Future projects and goals:
- Elucidation of the genomic organization long range regulatory elements in metazoan genomes and inference of their physiological function hrom hints provided by their sequence, genomic organization and evolution
- Building predictive models for regulatory determinants of context-specific gene expression that include long-range regulatory elements
- Bioinformatics of vertebrate development - transcriptional regulatory network approach to vertebrate embryonic development circuitry
- Exploring the structure and establishment of classification scheme for vertebrate core promoters and transcription start sites
- Development of methods for predicting the effects of regulatory variation in the genome