Combined PCA Analysis of Human Coronary Artery Smooth Muscle Cells
This study presents a combined analysis of ENCODE DNase-Seq and ATAC-Seq data to explore the clustering of human coronary artery smooth muscle cells. Principal Component Analysis (PCA) was utilized for normalization across datasets, providing insights into the underlying genomic architecture of these cells.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Combined ENCODE DNase-Seq and ATAC-Seq PCA analysis normalization across datasets clustering example of human coronary artery smooth muscle cells Milos Pjanic Stanford University School of Medicine Cardiovascular Institute Stanford 2016
PRINCIPAL COMPONENT ANALYSIS
AoSMC aortic smooth muscle cells. FibroP fibroblasts taken from individuals with Parkinson's disease, AG20443, AG08395 and AG08396 were pooled for this sample. Fibrobl child fibroblast. HCF cardiac fibroblasts HCFaa cardiac fibroblasts- adult atrial HPAF pulmonary artery fibroblasts HCM cardiac myocytes HUVEC umbilical vein endothelial cells HPAEC pulmonary artery endothelial cells. HRGEC renal glomerular endothelial cells HMVEC-dAd adult dermal microvascular endothelial cells. H7-hESC undifferentiated embryonic stem cells The ENCODE Analysis Working Group (AWG) has performed uniform processing on datasets produced by multiple data production groups in the ENCODE Consortium. This track represents a uniform set of open chromatin elements (DNaseI hypersensitive sites) in 125 ENCODE cell types, based on DNase-seq data produced by the "Open Chromatin" (Duke/UNC/UT-A) and University of Washington (UW) ENCODE groups from the project inception in 2007 through the ENCODE January 2011 data freeze. The AWG uniform datasets are used in downstream analysis pipelines by members of the ENCODE Consortium and are one of the primary sources of data referenced in the 2012 ENCODE integrative analysis paper (ENCODE Project Consortium 2012).
Composite number of DNase regions in the human genome from ENCODE 2,233,542 samtools flagstat CA2305-LV422-2_S3_concat_001.sam 139667614 + 0 in total (QC-passed reads + QC-failed reads) 88230538 + 0 mapped (63.17%:-nan%) samtools flagstat CA2305-LV424-1_S4_concat_001.sam 110640448 + 0 in total (QC-passed reads + QC-failed reads) 73873513 + 0 mapped (66.77%:-nan%) 89968 wgEncodeAwgDnaseUwSknshraUniPk.narrowPeak.cut.merge 91100 wgEncodeAwgDnaseUwTh2UniPk.narrowPeak.cut.merge 92709 wgEncodeAwgDnaseUwGm06990UniPk.narrowPeak.cut.merge 97313 wgEncodeAwgDnaseDukeCllUniPk.narrowPeak.cut.merge 110108 wgEncodeAwgDnaseUwCd20UniPk.narrowPeak.cut.merge 114060 wgEncodeAwgDnaseUwHct116UniPk.narrowPeak.cut.merge 116642 wgEncodeAwgDnaseUwPanc1UniPk.narrowPeak.cut.merge 116901 wgEncodeAwgDnaseDukeHelas3ifna4hUniPk.narrowPeak.cut.merge 117217 wgEncodeAwgDnaseDukeGm18507UniPk.narrowPeak.cut.merge 122479 wgEncodeAwgDnaseUwCaco2UniPk.narrowPeak.cut.merge 123918 wgEncodeAwgDnaseUwHpaecUniPk.narrowPeak.cut.merge 125234 wgEncodeAwgDnaseUwHmvecdadUniPk.narrowPeak.cut.merge 147429 020805.2_L1_TAAGGCGA_L001_peaks.SERUMFREE.bed.cut 165147 020805.2_L1_CGTACTAG_L001_peaks.PDGF-BB.bed.cut 166500 CA1508_L2_TAAGGCGA_L002_peaks_SERUMFREE.bed.cut 175639 020805.2_L1_AGGCAGAA_L001_peaks.PDGF-DD.bed.cut 185467 CA1508_L1_CGTACTAG_L001_peaks_PDGF-BB.bed.cut 198030 CA1508_L2_TCCTGAGC_L002_peaks_TGF-B1.bed.cut 198976 020805.2_L1_TCCTGAGC_L001_peaks.TGF-B1.bed.cut 205872 CA1508_L1_AGGCAGAA_L001_peaks.PDGF-DD.bed.cut 12441 CA2305-LV422-2_S3_concat_001_peaks_TCF21overex_CONTROL_cut.bed 14770 CA2305-LV424-1_S4_concat_001_peaks_TCF21overex_LV_cut.bed 17424 CA2356-LV520-1_S5_concat_001_peaks_TCF21kd_CONTROLSHRNA_cut.bed 20756 CA2356-LV521-2_S6_concat_001_peaks_TCF21kd_SHRNA_cut.bed 227086 wgEncodeAwgDnaseUwHrpepicUniPk.narrowPeak.cut.merge 227644 wgEncodeAwgDnaseDukeMedulloUniPk.narrowPeak.cut.merge 230696 wgEncodeAwgDnaseUwNhdfadUniPk.narrowPeak.cut.merge 258032 wgEncodeAwgDnaseDukeFibropUniPk.narrowPeak.cut.merge 258188 wgEncodeAwgDnaseUwdukeH1hescUniPk.narrowPeak.cut.merge 266618 wgEncodeAwgDnaseUwH7hescUniPk.narrowPeak.cut.merge 271891 wgEncodeAwgDnaseDukePhteUniPk.narrowPeak.cut.merge 285745 wgEncodeAwgDnaseDukeMelanoUniPk.narrowPeak.cut.merge 305056 wgEncodeAwgDnaseUwdukeHsmmUniPk.narrowPeak.cut.merge 308446 wgEncodeAwgDnaseUwdukeLncapUniPk.narrowPeak.cut.merge 310034 wgEncodeAwgDnaseUwdukeTh1UniPk.narrowPeak.cut.merge 318528 wgEncodeAwgDnaseUwdukeHsmmtubeUniPk.narrowPeak.cut.merge 339865 wgEncodeAwgDnaseUwdukeHmecUniPk.narrowPeak.cut.merge 378892 wgEncodeAwgDnaseDukeOsteoblUniPk.narrowPeak.cut.merge 406115 wgEncodeAwgDnaseDukeFibroblUniPk.narrowPeak.cut.merge 105484 L1_TAAGGCGA_L001_peaks_normalized_to_ENCODE_cut.bed
UNIONBEDG cat 1.bg chr1 1000 1500 10 chr1 2000 2100 20 Problem - differently defined regions in different tracks. cat 2.bg chr1 900 1600 60 chr1 1700 2050 50 125 ENCODE tracks > 20,231,734 regions in a composite file - PCA difficult, influence on variance cat 3.bg chr1 1980 2070 80 chr1 2090 2100 20 bedtools unionbedg -i 1.bg 2.bg 3.bg chr1 900 1000 0 60 0 chr1 1000 1500 10 60 0 chr1 1500 1600 0 60 0 chr1 1700 1980 0 50 0 chr1 1980 2000 0 50 80 chr1 2000 2050 20 50 80 chr1 2050 2070 20 0 80 chr1 2070 2090 20 0 0 chr1 2090 2100 20 0 20 Filter by size of the region >100bp
k=mean (signal) ATACSeq /mean (signal) ENCODE AoSMC
DESeq clustering of RNASeq data from top 100 expressed genes in human smooth muscle cells from coronary arteries, from two different individuals, treated 1h or 6 hours with TGF beta.
Batch effect Cell line effect