Scalable and Robust Dimension Reduction and Clustering
This project focuses on developing a method called DACIDR for scalable dimension reduction and clustering of large-scale bioinformatics data. The methodology involves deterministic annealing clustering and interpolative dimension reduction techniques, along with visualization methods such as PlotViz3 for cluster identification in 3D. The project aims to enable faster observation and verification of genomic data through efficient data processing and visualization strategies.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
SCALABLE AND ROBUST DIMENSION REDUCTION AND CLUSTERING Yang Ruan Advised by Geoffrey Fox
Motivation Bioinformatics Data Deluge Large Scale Data Clustering Large Scale Date Visualization Enable Faster Observation and Verification <- id <- Sequence >SRR042318.5 GAGTTTAGCCTTGCG >SRR042318.32 GAGTTTAGCCTTGCG >SRR042318.70 GAGTTTTAGCCTTGCGG >SRR042318.81 GTTTAGCCTTGC DACIDR
Overview of DACIDR Deterministic Annealing Clustering and Interpolative Dimension Reduction Method (DACIDR) Split input set into in-samples and out-of-samples Apply full pairwise clustering and multidimensional scaling on in- samples Use in-sample result to interpolate out-of-samples. Pairwise Clustering All-Pair Sequence Alignment Visualization Interpolation Multidimensional Scaling Simplified Flow Chart of DACIDR
Clustering Visualization Use PlotViz3 to visualize the result in 3D Different identified cluster on in different color DACIDR is parallelized using Twister and MPI Metagenomics hmp16SrRNA COG Protein
Phylogenetic Tree Visualization Spherical Phylogram visualized using the phylogenetic tree generated by RaXml using the representative sequences and reference sequences, the color scheme is same as in left figure. RaXml result visualized as Rectangular Phylogram shown in 2D
Flowchart of the Process to Generate Spherical Phylogram