
Identification of Conserved Promoter Motifs and Transcription Factor Binding Sites in Plant Promoters
Explore the identification of conserved promoter motifs and transcription factor binding sites in plant promoters through wet-lab and in silico methods. Discover the significance of transcription factor binding sites, experimentally verified sites, de novo motif discovery, and real promoter structure in understanding gene regulation. Utilize databases of orthologous promoters for comparative analysis and annotation of transcription start sites in plants and chordates.
Uploaded on | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Identifying conserved promoter motifs and transcription factor binding sites in plant promoters Endre Sebesty n, ARI-HAS, Martonv s r, Hungary 26th, November, 2009 RCPGD Annual Meeting
Transcription factor binding sites TFs bind short, often degenerate DNA sequences Promoters are variable length 5 sequences With TFBSs TFBSs are usually conserved in a nonconserved surrounding sequence Some well known TFBSs TATA box GC box CpG island Lots of other, less genereal TFBSs Similarly expressed genes, or homologues should contain similar TFBSs
TFBS search and promoter analysis Wet-lab methods DNAse footprinting Electrophoretic mobility shift assay ChIP-Chip, ChIP-Seq In silico methods Experimentally verified sites Consensus sequences Consensus matrices De novo motif discovery Oligo frequency Phylogenetic footprinting Other methods
Experimentally verified sites TRANSFAC JASPAR PLACE PlantCARE
De novo motif discovery Orthologous gene groups Evolutionary conserved functional sites Co-regulated genes Same tissue, body part Same developmental stage Etc
Real promoter structure No general motifs No TATA-box, GC-box, etc Lots of false positive TFBS With wet-lab and in silico methods Sometimes no apparent common TFBSs between coregulated genes
Database of Orthologous Promoters Orthologous promoter sequence collections Based on a BLAST search with first exons of reference species Plants (Viridiplantae) Reference species: Arabidopsis thaliana Chordates Reference species: Homo sapiens 500/1000/3000 bp 5 upstream regions Conserved sequence regions Annotations Xrefs to other databases Annotated transcription start sites
DoOP subsets Cluster > Subset Subset: collection of evolutionary monophyletic sequences in a cluster Plant subsets Brassicaceae Arabidopsis thaliana Brassicaceae species Eudicotyledons Grape, Solanum species, papaya, tobacco Magnoliophyta Maize, rice Viridiplantae
Other 45000 Solanum tuberosum Arabidopsis lyrata 40000 Sorghum bicolor Physcomitrella patens 35000 Capsella rubella Glycine max 30000 Zea mays Oryza sativa Solanum lycopersicum 25000 Nicotiana tabacum Brassica napus 20000 Lotus japonicus Medicago truncatula 15000 Vitis vinifera Ricinus communis 10000 Populus trichocarpa Carica papaya Boechera stricta 5000 Brassica oleracea Brassica rapa 0 v1.5 v1.6 v1.8 Arabidopsis thaliana
Gene types Gene Ontology Standardized annotation for genes Biological process What does it do? Transcription, translation, stress response, etc Cellular component Where is it located? Membrane, ribosome, cytosol, etc Molecular function How does it work? Dehydrogenase, ATP binding, etc
Gene types Gene Ontology 500 bp promoters Search for significantly enriched terms in annotation Brassicaceae Eudicotyledons Magnoliophyta Viridiplantae BP: transcription, translation, protein folding, stress response CC: plasma membrane, ribosome parts MF: ATP/GTP binding, DNA binding, ribosome parts
Motif generation Phylogenetic footprinting Functional TFBSs should be conserved Local sequence alignment Define conserved regions
Motif generation eudicotyledons Magnoliophyta Brassicaceae
Motif statistics Motif number 500 1000 3000 Brassicaceae 323411 410720 893788 eudicotyledons 13863 20192 34353 Magnoliophyta 2009 2211 1938 Viridiplantae 589 565 372
Motif statistics % conserved Brassicaceae eudicotyledons Magnoliophyta Viridiplantae 500 22 5 6 4 1000 19 3 5 2 3000 16 2 2 1 Avg length Brassicaceae eudicotyledons Magnoliophyta Viridiplantae 500 9 7 8 9 1000 9 7 9 9 3000 9 7 8 9
TFBS databases Database TRANSFAC JASPAR PLACE PlantCARE ABS AGRIS TFBSs 977 18 416 646 650 72 Lots of redundant data Low quality, not updated More than a 100 different version for TATA box
Synthetic biology Synthetic biology iGEM competition BioBricks MIT Registry of Standard Biological Parts UV responsive promoter Promoter expressed in roots Etc Synthetic promoters Define basic promoter elements Build and use custom made promoters Gene expression more or less when and where you want it
SNP conservation Gene expression levels change because Regulatory elements change Usually NOT protein coding regions Conserved promoter regions might be functional regulatory elements Search for SNPs in this regions These SNPs might be interesting for breeders as theye are likely to be functional ones
A real example Vilmos So s, Endre Sebesty n, Ang la Juh sz, J nos Pint r, Marnie E. Light, Johannes Van Staden, Ervin Bal zs (2009) Stress-related genes define essential steps in the response of maize seedlings to smoke-water. Functional and Integrative Genomics, Volume 9, Number 2, Pages 231-242; doi:10.1007/s10142- 008-0105-8 Microarray experiments Maize kernels (Mv 540) 24 and 48 h control vs smoke treated samples Up and downregulated genes Promoter sequences up to 1500 bp were extracted if available
Analysis of promoters TRANSFAC database version 12.1 Collection of TFBSs More than a 100 plant TFBSs DRE-element: GCCGAC Scan for the TFBSs in the maize promoters Up and downregulated Also count the frequencies of all 5-8mer sequences In all available maize promoters, not only the up or downregulated Calculate the over or underrepresentation of a TFBS by the following Observed frequency in up or downregulated promoters divided by the expected frequency in all promoters If ratio > 1 : overrepresented If ratio < 1 : underrepresented
Analysis of promoters Results Binding sites related to Organogenesis Meristem development Housekeeping functions Biotic stress Cold and dehydration stress ABA related motifs