
Frequent Pattern Analysis in Data Mining
Discover the importance of frequent pattern analysis in data mining, including its applications in market basket analysis, association rule mining, and more. Learn about basic concepts, efficient mining methods, and the significance of identifying frequent patterns in data sets.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Knowledge Data Discovery TOPIC 6 - Associations Analysis Antoni Wibowo
COURSE OUTLINE 1. BASIC CONCEPTS AND A ROAD MAP 2. EFFICIENT AND SCALABLE FREQUENT ITEMSET MINING METHODS
Note: This slides are based on the additional material provided with the textbook that we use: J. Han, M. Kamber and J. Pei, Data Mining: Concepts and Techniques and P. Tan, M. Steinbach, and V. Kumar "Introduction to Data Mining .
What Is Frequent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of frequent itemsets and association rule mining Motivation: Finding inherent regularities in data What products were often purchased together? Beer and diapers?! What are the subsequent purchases after buying a PC? What kinds of DNA are sensitive to this new drug? Can we automatically classify web documents? Applications Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 4
Market Basket Analysis Mining Frequent Patterns, Association, and Correlations General data characteristics 3/20/2025 5
Why Is Freq. Pattern Mining Important? Discloses an intrinsic and important property of data sets Forms the foundation for many essential data mining tasks Association, correlation, and causality analysis Sequential, structural (e.g., sub-graph) patterns Pattern analysis in spatiotemporal, multimedia, time-series, and stream data Classification: associative classification Cluster analysis: frequent pattern-based clustering Data warehousing: iceberg cube and cube-gradient Broad applications Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 6
Basic Concepts: Frequent Patterns and Association Rules Itemset X = {x1, , xk} Find all the rules X Y with minimum support and confidence support, s, probability that a transaction contains X ^ Y confidence, c, conditional probability that a transaction having X also contains Y Transaction-id Items bought 10 A, B, D 20 A, C, D 30 A, D, E 40 B, E, F 50 B, C, D, E, F Customer buys both Customer buys diaper Let sup_min = 50%, con_fmin = 50% Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3} Association rules: A => D (sup=60%, conf=100%) D => A (sup=60%, conf=75%) Customer buys beer
Scalable Methods for Mining Frequent Patterns The downward closure property of frequent patterns Any subset of a frequent itemset must be frequent If {beer, diaper, nuts} is frequent, so is {beer, diaper} i.e., every transaction having {beer, diaper, nuts} also contains {beer, diaper} Scalable mining methods: Three major approaches Apriori (Agrawal & Srikant@VLDB 94) Freq. pattern growth (FPgrowth Han, Pei & Yin @SIGMOD 00) Vertical data format approach (Charm Zaki & Hsiao @SDM 02) Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 8
Apriori: A Candidate Generation-and-Test Approach Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested! (Agrawal & Srikant @VLDB 94, Mannila, et al. @ KDD 94) Method: Initially, scan DB once to get frequent 1-itemset Generate length (k+1) candidate itemsets from length k frequent itemsets Test the candidates against DB Terminate when no frequent or candidate set can be generated Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 9
The Apriori Algorithm Pseudo-code: Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk != ; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end returnkLk; Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 10
How to Generate Candidates? Suppose the items in Lk-1 are listed in an order Step 1: self-joining Lk-1 insert intoCk select p.item1, p.item2, , p.itemk-1, q.itemk-1 from Lk-1 p, Lk-1 q where p.item1=q.item1, , p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1 Step 2: pruning forall itemsets c in Ck do forall (k-1)-subsets s of c do if (s is not in Lk-1) then delete c from Ck Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 11
Important Details of Apriori How to generate candidates? Step 1: self-joining Lk Step 2: pruning How to count supports of candidates? Example of Candidate-generation L3={abc, abd, acd, ace, bcd} Self-joining: L3*L3 abcd from abc and abd acde from acd and ace Pruning: acde is removed because ade is not in L3 C4={abcd} Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 12
Illustrating Apriori Principle null null A A B B C C D D E E AB AB AC AC AD AD AE AE BC BC BD BD BE BE CD CD CE CE DE DE Found to be Infrequent ABC ABC ABD ABD ABE ABE ACD ACD ACE ACE ADE ADE BCD BCD BCE BCE BDE BDE CDE CDE ABCD ABCD ABCE ABCE ABDE ABDE ACDE ACDE BCDE BCDE Pruned supersets ABCDE ABCDE
The Apriori Algorithm An Example min_sup = 2
Generating Association Rules from Frequent Itemsets Association rules can be generated as follows: 1. For each frequent itemset l, generate all nonempty subsets of l 2. For every nonempty subset s of l, output the rule s => (l s) if support_count(l)/support_count(s) min_conf Rule Confidence I1^I2 => I5 2/4 = 50% I1^I5 => I2 2/2 = 100% I2^I5 => I1 I1 2/2 = 100% => I2^I5 I2 => 2/6 = 33% I1^I5 I5 => 2/7 = 29% Example: I1^I2 2/2 = 100% Suppose the data contain the frequent itemset l = {I1, I2, I5} The nonempty subsets of l are : {I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2}, and {I5} The resulting association rules: Suppose min_conf = 70% Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 16
How to Count Supports of Candidates? Why counting supports of candidates a problem? The total number of candidates can be very huge One transaction may contain many candidates Method: Candidate itemsets are stored in a hash-tree Leaf node of hash-tree contains a list of itemsets and counts Interior node contains a hash table Subset function: finds all the candidates contained in a transaction Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 17
Challenges of Frequent Pattern Mining Challenges Multiple scans of transaction database Huge number of candidates Tedious workload of support counting for candidates Improving Apriori: general ideas Reduce passes of transaction database scans Shrink number of candidates Facilitate support counting of candidates Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 27
Mining Frequent Patterns Without Candidate Generation Frequently Pattern growth or FP-growth mines the complete set of frequent itemsets without candidate generation FP-growth adopts a divide-and-conquer strategy as follows: 1. First, it compresses the database representing frequent items into a frequent-pattern tree, or FP-tree, which retains the itemset association information. 2. It then divides the compressed database into a set of conditional databases (a special kind of projected database), each associated with one frequent item or pattern fragment, and mines each such database separately Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 28
FP-Growth (Example) Lets use the same mining of transaction database that used in Apriori method First, scan of the database to get the set of frequent items (1-itemsets) and their support counts Let the min_sup = 2. The set of frequent items is sorted in the order of descending support count. This resulting set or list is denoted as: L = {I2: 7}, {I1: 6}, {I3: 6}, {I4: 2}, {I5: 2}}
FP-Tree An FP-tree is then constructed as follows: First, create the root of the tree, labeled with null. Scan database D a second time. The items in each transaction are processed in L order (i.e., sorted according to descending support count), and a branch is created for each transaction Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 30
FP-Tree Mining Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 31
Why Is FP-Growth the Winner? Divide-and-conquer: decompose both the mining task and DB according to the frequent patterns obtained so far leads to focused search of smaller databases Other factors: no candidate generation, no candidate test compressed database: FP-tree structure no repeated scan of entire database basic ops counting local freq items and building sub FP- tree, no pattern search and matching Mining Frequent Patterns, Apriori Method, Frequent Pattern (FP) Growth Method 3/20/2025 32
Summary Frequent pattern is a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set . Frequent pattern discloses an intrinsic and important property of data sets. The downward closure property of frequent patterns involves Any subset of a frequent itemset must be frequent Three major approaches in Scalable mining methods: Apriori (Agrawal & Srikant@VLDB 94), Freq. pattern growth (FPgrowth Han, Pei & Yin @SIGMOD 00) and Vertical data format approach (Charm Zaki & Hsiao @SDM 02). Association rules can be generated from Frequent Itemsets. 37 March 20, 2025 Introduction
References 1. Han, J., Kamber, M., & Pei, Y. (2006). Data Mining: Concepts and Technique . Edisi 3. Morgan Kaufman. San Francisco 2. Tan, P.N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining . Addison-Wesley. Michigan 3. Witten, I. H., & Frank, E. (2005). Data Mining : Practical Machine Learning Tools and Techniques . Second edition. Morgan Kaufmann. San Francisco 3/20/2025 Introduction 38
Thank You Thank You