
Statistical Analysis Laboratory Data May 12, 2015
Delve into Statistical Analysis of Laboratory Data with a focus on AnnotationDbi and RNA-Seq in a study conducted on May 12, 2015. Explore key findings related to gene symbols, Ensembl IDs, and gene names in a mouse genome context. Uncover valuable insights for further research and analysis.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
SPH 247 Statistical Analysis of Laboratory Data SPH 247 Statistical Analysis of Laboratory Data 1 May 12, 2015
AnnotationDbi and RNA-Seq > source("http://bioconductor.org/biocLite.R") > biocLite("AnnotationDbi") > biocLite("org.Mm.eg.db") > library(AnnotationDbi) > library(org.Mm.eg.db) # Mouse genome. Use Hs for human > columns(org.Mm.eg.db) [1] "ENTREZID" "PFAM" "IPI" "PROSITE" "ACCNUM" [6] "ALIAS" "CHR" "CHRLOC" "CHRLOCEND" "ENZYME" [11] "PATH" "PMID" "REFSEQ" "SYMBOL" "UNIGENE" [16] "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "GENENAME" "UNIPROT" [21] "GO" "EVIDENCE" "ONTOLOGY" "GOALL" "EVIDENCEALL" [26] "ONTOLOGYALL" "MGI" > keytypes(org.Mm.eg.db) [1] "ENTREZID" "PFAM" "IPI" "PROSITE" "ACCNUM" [6] "ALIAS" "ENZYME" "PATH" "PMID" "REFSEQ" [11] "SYMBOL" "UNIGENE" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" [16] "GENENAME" "UNIPROT" "GO" "EVIDENCE" "ONTOLOGY" [21] "GOALL" "EVIDENCEALL" "ONTOLOGYALL" "MGI" May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data 2
AnnotationDbi and RNA-Seq > help("SYMBOL") > head(keys(org.Mm.eg.db,keytype="SYMBOL")) [1] "Pzp" "Aanat" "Aatk" "Abca1" "Abca4" "Abca2" > head(keys(org.Mm.eg.db,keytype="ENSEMBL")) [1] "ENSMUSG00000030359" "ENSMUSG00000020804" "ENSMUSG00000025375" [4] "ENSMUSG00000015243" "ENSMUSG00000028125" "ENSMUSG00000026944" > length(keys(org.Mm.eg.db,keytype="ENSEMBL")) [1] 23098 > head(count.data) SRX033480 SRX033488 SRX033481 SRX033489 SRX033482 SRX033490 ENSMUSG00000000001 369 744 287 769 348 803 ENSMUSG00000000028 0 1 0 1 1 1 ENSMUSG00000000037 0 1 1 5 0 4 ENSMUSG00000000056 21 46 20 36 12 55 ENSMUSG00000000058 15 43 12 34 14 32 ENSMUSG00000000078 517 874 340 813 378 860 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data 3
AnnotationDbi and RNA-Seq > hk <- head(keys(org.Mm.eg.db,keytype="ENSEMBL")) > select(org.Mm.eg.db,keys=hk,keytype="ENSEMBL",columns=c("SYMBOL","GENENAME")) ENSEMBL SYMBOL GENENAME 1 ENSMUSG00000030359 Pzp pregnancy zone protein 2 ENSMUSG00000020804 Aanat arylalkylamine N-acetyltransferase 3 ENSMUSG00000025375 Aatk apoptosis-associated tyrosine kinase 4 ENSMUSG00000015243 Abca1 ATP-binding cassette, sub-family A (ABC1), member 1 5 ENSMUSG00000028125 Abca4 ATP-binding cassette, sub-family A (ABC1), member 4 6 ENSMUSG00000026944 Abca2 ATP-binding cassette, sub-family A (ABC1), member 2 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data 4
AnnotationDbi and RNA-Seq > k <- rownames(count.data)[order(pv2)[1:10]] > select(org.Mm.eg.db,keys=k,keytype="ENSEMBL",columns=c("SYMBOL","GENENAME")) ENSEMBL SYMBOL GENENAME 1 ENSMUSG00000023236 Scg5 secretogranin V 2 ENSMUSG00000030532 Hddc3 HD domain containing 3 3 ENSMUSG00000028393 Alad aminolevulinate, delta-, dehydratase 4 ENSMUSG00000015484 Fam163a family with sequence similarity 163, member A 5 ENSMUSG00000024248 Cox7a2l cytochrome c oxidase subunit VIIa polypeptide 2-like 6 ENSMUSG00000074064 Mlycd malonyl-CoA decarboxylase 7 ENSMUSG00000054354 <NA> <NA> 8 ENSMUSG00000033585 Ndn necdin 9 ENSMUSG00000056592 Zfp658 zinc finger protein 658 10 ENSMUSG00000024026 Glo1 glyoxalase 12 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data 5
AnnotationDbi and RNA-Seq > select(org.Mm.eg.db,keys=k,keytype="ENSEMBL",columns=c("SYMBOL","GENENAME","PATH")) ENSEMBL SYMBOL GENENAME PATH 1 ENSMUSG00000023236 Scg5 secretogranin V <NA> 2 ENSMUSG00000030532 Hddc3 HD domain containing 3 <NA> 3 ENSMUSG00000028393 Alad aminolevulinate, delta-, dehydratase 00860 4 ENSMUSG00000028393 Alad aminolevulinate, delta-, dehydratase 01100 5 ENSMUSG00000015484 Fam163a family with sequence similarity 163, member A <NA> 6 ENSMUSG00000024248 Cox7a2l cytochrome c oxidase subunit VIIa polypeptide 2-like 00190 7 ENSMUSG00000024248 Cox7a2l cytochrome c oxidase subunit VIIa polypeptide 2-like 04260 8 ENSMUSG00000024248 Cox7a2l cytochrome c oxidase subunit VIIa polypeptide 2-like 05010 9 ENSMUSG00000024248 Cox7a2l cytochrome c oxidase subunit VIIa polypeptide 2-like 05012 10 ENSMUSG00000024248 Cox7a2l cytochrome c oxidase subunit VIIa polypeptide 2-like 05016 11 ENSMUSG00000074064 Mlycd malonyl-CoA decarboxylase 00410 12 ENSMUSG00000074064 Mlycd malonyl-CoA decarboxylase 00640 13 ENSMUSG00000074064 Mlycd malonyl-CoA decarboxylase 01100 14 ENSMUSG00000074064 Mlycd malonyl-CoA decarboxylase 04146 15 ENSMUSG00000054354 <NA> <NA> <NA> 16 ENSMUSG00000033585 Ndn necdin <NA> 17 ENSMUSG00000056592 Zfp658 zinc finger protein 658 <NA> 18 ENSMUSG00000024026 Glo1 glyoxalase 1 00620 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data 6
AnnotationDbi and RNA-Seq > library(GO.db) > select(org.Mm.eg.db,keys=k[5],keytype="ENSEMBL",columns=c("SYMBOL","GO")) ENSEMBL SYMBOL GO EVIDENCE ONTOLOGY 1 ENSMUSG00000024248 Cox7a2l GO:0002082 IMP BP 2 ENSMUSG00000024248 Cox7a2l GO:0004129 IEA MF 3 ENSMUSG00000024248 Cox7a2l GO:0005739 IDA CC 4 ENSMUSG00000024248 Cox7a2l GO:0005743 IEA CC 5 ENSMUSG00000024248 Cox7a2l GO:0005746 IEA CC 6 ENSMUSG00000024248 Cox7a2l GO:0009055 IEA MF 7 ENSMUSG00000024248 Cox7a2l GO:0016020 IEA CC 8 ENSMUSG00000024248 Cox7a2l GO:0097249 IDA CC 9 ENSMUSG00000024248 Cox7a2l GO:0097250 IDA BP > kgo <- select(org.Mm.eg.db,keys=k[5],keytype="ENSEMBL",columns=c("SYMBOL","GO"))$GO May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data 7
AnnotationDbi and RNA-Seq > columns(GO.db) [1] "GOID" "TERM" "ONTOLOGY" "DEFINITION" > kgo <- select(org.Mm.eg.db,keys=k[5],keytype="ENSEMBL",columns=c("SYMBOL","GO"))$GO > select(GO.db,keys=kgo[1],columns="DEFINITION",keytype="GOID") GOID 1 GO:0002082 DEFINITION 1 Any process that modulates the frequency, rate or extent of the chemical reactions and pathways resulting in the phosphorylation of ADP to ATP that accompanies the oxidation of a metabolite through the operation of the respiratory chain. We can also use the appropriate TxDb package to annotate genomic position. biocLite("TxDb.Mmusculus.UCSC.mm10.knownGene") We will not pursue this further in this course. May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data 8