
Twitter Social Network Community Detection and Clustering Analysis
Explore the community detection and clustering analysis of the Twitter social network, focusing on team members, unbalanced bipartite graphs, kernel identification, ego-net extraction, clustering algorithms, personalized PageRank, dense communities, and cluster validation. Various techniques and algorithms are utilized to identify and analyze communities within the Twitter network.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Community detection for the Twitter social network. Team Members: Nikhil Chadda Anupam Prakash Achal Soni
Twitter Social Network An unbalanced bipartite graph, average degrees shown. 377 Kernel-115k Kernel: Content producers, very popular users. 1 2 . 5 825 Auxiliary Communities: Follow kernel users. Throw away edges coming into auxiliary communities. Auxiliary Communities 41M 18.3
Finding the kernel. Info-chimps: Trust-rank and follower credibility scores. Single machine memory based approach scales to bigger kernels: 400, 000 nodes.
Ego-net extraction Ego-net: Sub-graph of twitter network, consists of user and friends up-to two levels. Find smallest k such that the k-core has at most 10,000 nodes, typically few million edges. Cluster ego-net to find communities.
Clustering Algorithms Different algorithms find differently structured communities. Relevant measures for twitter ego-nets: Conductance: Boundary edges/ Total number of edges. Density: Number of edges/ maximum possible edges. Local page-rank based partitioning [ACL 06]. Weight balancing algorithm [Wang et al 11].
Personalized Page-rank Find page-rank vectors: (i) p: uniformly random seed. (ii) q: central user as seed. Sort nodes in order of q[i] / p[i]. Sweep through sorted nodes to identify clusters. Second derivatives to find transition points.
Dense communities Greedy: Add node with highest in-degree to cluster. Select cluster size that achieves the maximum density. Refine cluster by swapping nodes to improve density. Performance: About a 4-8 times density gain.
Validating clusters C++ implementation, given kernel user finds Page-rank and dense clusters. Info-chimps rare words Api to validate clusters, results to be presented on 10th. Proof of concept: With a larger kernel and an inverted index of rare words, do we find topic specific communities?
Clusters in a random sample. Education cluster 500: admissions, alumni, student, campus, faculty, professor. Hawaiian cluster 550: Honolulu, aloha, hawaii, waikiki, mahalo, oahu. Ruby programmers 100: railsconf, rubyconf, ruby, rails, rspec, clojure, jruby, github. Nigerian music, tourism, financial advice, realtors, yoga, fishing, rap music, medicine, Israel, telephony, health, politics, design.
Thank You Questions?