
Exploring Research Directions in Big Data Graph Analytics
Discover the latest trends and challenges in the field of big data graph analytics through this comprehensive research overview, covering graph analytics concepts, problems, applications, and computational models.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard
Contents Introduction What is Graph Analytics? Problems in Graph Analytics Path Problems Pattern Problems Pattern Matching Models Applications Computational Models & Frameworks Conclusion
What ,Why and Where ? Large Data Graphs, query processing and Iterative algorithms. Big Data Graph Analytics Wide Ranging Domain Page Rank Algorithm, Relationship Analysis, Maps and Bio-chemistry. Traditional graph computation algorithms. Scalable graph analytics is still a challenge designing algorithms, performance, indexing, geographic distribution
Graph Analytics data items :labeled vertices, & relationships : labeled edges Labelled Multi-Digraphs - Vertex labeled Digraphs - directed graph with edge labels - Fully labelled Multi-Digraph - adds edge labels - allows multiple edges between 2 vertices if labels are distinct - G(V, E, L, l) : vertices, labeled edges, label set, vertex labeling
Problems in Graph Analytics Problems can broadly classified in three categories Path Problems Reachability Shortest Path Pattern Problems Graph Simulation Graph Morphisms Partition Problems
Path Problems Reachability Given a graph G and two vertices, u, w G.V, find a path (sequence of edges) connecting them path(u, w) = uv1l i1, v1v2l i2, . . . , vnw lin+1 G.E Reachability is simply reach(u, w) = path(u, w) Shortest Path Given k vertices, find a minimum distance path that includes all k vertices. Algorithms: Dijkstra, Bellman-Ford
Pattern Problems Pattern Matching Model Given a query graph Q, match its labeled vertices to corresponding labeled vertices in a data graph G : Q.V 2G.V All the pattern matching models start with matching vertex labels For each vertex u in Q.V, require u' (u), l(u') = l(u) Graph Simulation For each label matching pair (u, u') in Q.V G.V, require matching children v child Q(u), v' (v) such that u v' G.E
Pattern Problems Dual Simulation For each vertex pair (u, u') in Q.V G.V, also require matching parents w parent Q(u), w (w) such that w u' G.E Strong Simulation Matching patterns in G must be contained in some ball B radius(B) = diameter(Q)
Pattern Problems Strict Simulation only balls containing vertices in an initial dual match Tight Simulation only balls with centers corresponding to a central vertex in Q radius(B) = diameter(Q) CAR-tight Simulation child and parent match must satisfy
Pattern Problems Graph Homomorphism Mapping function f: Q.V G.V such that u Q, u' = f(u), l(u') = l(u) u v Q.E implies u'v G.E Sub-graph Isomorphism - If we further require the mapping functions to be bijections between Q.V and G .V , where G is a sub-graph of G (G G) Difference : correspondence between vertices, one-to-one correspondence
Applications Graph Databases Neo4j, Orient DB and Titan Semantic Web SPARQL Queries (subject, predicate, object) subject -edge-> object e.g. SELECT ?x, ?y, ?z WHERE { x a Lawyer . y a Doctor . z a Lawyer . x friend y . x competes_with z . y friend z } Social Media - Facebook, Twitter, LinkedIn, Amazon
Computational Models & Frameworks MapReduce/Hadoop Apache Storm Apache Spark Bulk Synchronous Parallel (BSP) - BSP can be quite useful for implementing graph algorithms - vertex-centric, has become popular for big data graph analytics. - significant overhead in the synchronization
Conclusion Graphs are now becoming an important form of representation for data. Implementation of improved sequential algorithms Adding of Extensions.