Density-based Anomaly Detection Using StrOUD and LOF Strangeness

density based anomaly detection using stroud n.w
1 / 28
Embed
Share

Learn about density-based anomaly detection using the StrOUD and LOF strangeness based outlier detection algorithm as demonstrated in the video by Kevin Molloy. Discover the concept of density anomaly score, which can help in identifying outliers based on their lower density compared to other data points. Explore applications in credit card fraud detection, network intrusion detection, and medical diagnosis from MRI scans.

  • Anomaly Detection
  • StrOUD
  • LOF
  • Density-Based
  • Outlier Detection

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Density-based Anomaly Detection using StrOUD and LOF Strangeness based Outlier Detection Algorithm Video by Kevin Molloy From the paper Detecting Outliers using Transduction and Statistical Testing by Barbar , Domeniconi, and Rogers (KDD'06 Proceedings)

  2. Anomaly Detection Credit Card Fraud Network Intrusion Detection Medical Diagnosis from MRI Scans

  3. Density Anomaly Score Idea: Outliers will have lower density than other points (since they will have few (if any) close neighbors).

  4. Density Anomaly Score Idea: Outliers will have lower density than other points (since they will have few (if any) close neighbors). Dist-k (x, k) = ? ? ???????? ???? ????? ? ?? ??? ?? ???? ????

  5. Density Anomaly Score Idea: Outliers will have lower density than other points (since they will have few (if any) close neighbors). Dist-k (x, k) = ? ? ???????? ???? ????? ? ?? ??? ?? ???? ???? 1 Density (x, k)= ???? ? ?,?

  6. Density Anomaly Score Idea: Outliers will have lower density than other points (since they will have few (if any) close neighbors). Dist-k (x, k) = ? ? ???????? ???? ????? ? ?? ??? ?? ???? ???? 1 Density (x, k)= ???? ? ?,? Point 0

  7. Density Anomaly Score Idea: Outliers will have lower density than other points (since they will have few (if any) close neighbors). Dist-k (x, k) = ? ? ???????? ???? ????? ? ?? ??? ?? ???? ???? 1 Density (x, k)= ???? ?,? Point 0

  8. Density Anomaly Score Idea: Outliers will have lower density than other points (since they will have few (if any) close neighbors). Dist-k (x, k) = ? ? ???????? ???? ????? ? ?? ??? ?? ???? ???? 0.43 1 Density (x, k)= ???? ? ?,? Use k = 3. Density(0,3) = Point 0 1 0.43 2.33

  9. Density Anomaly Score Point 25 Idea: Outliers will have lower density than other points (since they will have few (if any) close neighbors). 1.75 Dist-k (x, k) = ? ? ???????? ???? ????? ? ?? ??? ?? ???? ???? 1 Density (x, k)= ???? ? ?,? Use k = 3. Density(0,3) = Point 0 1 0.43 2.33 1 1.75 0.537 Density(25,3) =

  10. Density Anomaly Score Idea: Outliers will have lower density than other points (since they will have few (if any) close neighbors). Dist-k (x, k) = ? ? ???????? ???? ????? ? ?? ??? ?? ???? ???? 1 Density (x, k)= ???? ? ?,?

  11. Local Outlier Factor Idea: Compare the density estimate of a point to the estimates of its neighbors.

  12. Local Outlier Factor Idea: Compare the density estimate of a point to the estimates of its neighbors. Reachability-distance (a, b, k) Reach-d(a, b, k) = max( dist-k(b, k), dist(a,b) )

  13. Local Outlier Factor Idea: Compare the density estimate of a point to the estimates of its neighbors. Reachability-distance (a, b, k) Reach-d(a, b, k) = max( dist-k(b, k), dist(a,b) ) Local reachability density (a,k) ? ??(?)???? ?(?,?) ??? lrd(a, k) = 1/

  14. Local Outlier Factor Idea: Compare the density estimate of a point to the estimates of its neighbors. Reachability-distance (a, b, k) Reach-d(a, b, k) = max( dist-k(b, k), dist(a,b) ) Local reachability density (a,k) ? ??(?)???? ?(?,?) ??? lrd(a, k) = 1/ ???? ? 0,1 +???? ? 0,2 +???? ?(0,3) 3 Lrd(0,3) = 1/ 0.44 0.44 ??????? +???? ? 0,2 +???? ?(0,3) 3 Lrd(0,3) = 1/

  15. Local Outlier Factor Idea: Compare the density estimate of a point to the estimates of its neighbors. Reachability-distance (a, b, k) Reach-d(a, b, k) = max( dist-k(b, k), dist(a,b) ) Local reachability density (a,k) ? ??(?)???? ?(?,?) ??? lrd(a, k) = 1/ ???? ? 0,1 +???? ? 0,2 +???? ?(0,3) 3 Lrd(0,3) = 1/ 0.45 0.44 +0.45+???? ?(0,3) 3 Lrd(0,3) = 1/

  16. Local Outlier Factor Idea: Compare the density estimate of a point to the estimates of its neighbors. Reachability-distance (a, b, k) Reach-d(a, b, k) = max( dist-k(b, k), dist(a,b) ) Local reachability density (a,k) ? ??(?)???? ?(?,?) ??? lrd(a, k) = 1/ ???? ? 0,1 +???? ? 0,2 +???? ?(0,3) 3 Lrd(0,3) = 1/ 0.66 0.44 +0.45+0.66 3 Lrd(0,3) = 1/

  17. Local Outlier Factor Idea: Compare the density estimate of a point to the estimates of its neighbors. Reachability-distance (a, b, k) Reach-d(a, b, k) = max( dist-k(b, k), dist(a,b) ) Local reachability density (a,k) ? ??(?)???? ?(?,?) ??? lrd(a, k) = 1/ ???? ? 0,1 +???? ? 0,2 +???? ?(0,3) 3 Lrd(0,3) = 1/ 0.66 0.44 +0.45+0.66 3 Lrd(0,3) = 1/ 1.94

  18. Local Outlier Factor Idea: Compare the density estimate of a point to the estimates of its neighbors. Reachability-distance (a, b, k) Reach-d(a, b, k) = max( dist-k(b, k), dist(a,b) ) Local reachability density (a,k) ? ??(?)???? ?(?,?) ??? lrd(a, k) = 1/ ? ?????? ?,? ??? /???(?,?) LOF(a, k) =

  19. Local Outlier Factor Idea: Compare the density estimate of a point to the estimates of its neighbors. Reachability-distance (a, b, k) Reach-d(a, b, k) = max( dist-k(b, k), dist(a,b) ) Local reachability density (a,k) ? ??(?)???? ?(?,?) ??? lrd(a, k) = 1/ Point lrd(p,3) 0 1.94 ? ?????? ?,? ??? 1 1.72 /???(?,?) LOF(a, k) = 2 1.98 3 1.99

  20. Local Outlier Factor Idea: Compare the density estimate of a point to the estimates of its neighbors. Reachability-distance (a, b, k) Reach-d(a, b, k) = max( dist-k(b, k), dist(a,b) ) Local reachability density (a,k) ? ??(?)???? ?(?,?) ??? lrd(a, k) = 1/ Point lrd(p,3) 1.72+1.98+1.99 3 LOF(0, 3) = / 1.94 0 1.94 ? ?????? ?,? ??? 1 1.72 /???(?,?) LOF(a, k) = 2 1.98 3 1.99

  21. Local Outlier Factor Idea: Compare the density estimate of a point to the estimates of its neighbors. Reachability-distance (a, b, k) Reach-d(a, b, k) = max( dist-k(b, k), dist(a,b) ) Local reachability density (a,k) ? ??(?)???? ?(?,?) ??? lrd(a, k) = 1/ Point lrd(p,3) 1.72+1.98+1.99 3 LOF(0, 3) = / 1.94 0 1.94 ? ?????? ?,? ??? LOF(0, 3) = 1.90 / 1.94 0.98 1 1.72 /???(?,?) LOF(a, k) = 2 1.98 3 1.99

  22. Local Outlier Factor Idea: Compare the density estimate of a point to the estimates of its neighbors. Reachability-distance (a, b, k) Reach-d(a, b, k) = max( dist-k(b, k), dist(a,b) ) Local reachability density (a,k) ? ??(?)???? ?(?,?) ??? lrd(a, k) = 1/ Point lrd(p,3) 1.72+1.98+1.99 3 LOF(0, 3) = / 1.94 0 1.94 ? ?????? ?,? ??? LOF(0, 3) = 1.90 / 1.94 0.98 1 1.72 /???(?,?) LOF(a, k) = 2 1.98 LOF < 1 inlier 3 1.99 LOF > 1 outlier (lower densities)

  23. Local Outlier Factor Point 24 Outlier? Local reachability density (a,k) Reachability-distance (a, b, k) Reach-d(a, b, k) = max( dist-k(b, k), dist(a,b) ) ? ??(?)???? ?(?,?) ??? lrd(a, k) = 1/

  24. Local Outlier Factor Point 24 Outlier? Local reachability density (a,k) Reachability-distance (a, b, k) Reach-d(a, b, k) = max( dist-k(b, k), dist(a,b) ) ? ??(?)???? ?(?,?) ??? lrd(a, k) = 1/

  25. Local Outlier Factor Point 24 Outlier? Point lrd(p,3) 9.82 +7.34 +3.63 3 LOF(24, 3) = / 1.42 24 1.42 ? ?????? ?,? ??? 12 9.82 /???(?,?) LOF(a, k) = 14 7.34 LOF(24, 3) = 6.93 / 1.42 4.88 17 3.63

  26. Local Outlier Factor with Python from sklearn.neighbors import LocalOutlierFactor np.random.seed(17) # Make data points x1,y1 = np.random.normal(2,0.5,12),np.random.normal(2,.5,12) x2,y2 = np.random.normal(5,0.1,12),np.random.normal(4,.1,12) x3, y3 = np.array([4.3]), np.array([4.3]) # outlier x4, y4 = np.array([3.0]), np.array([4.0]) # outlier lof = LocalOutlierFactor(n_neighbors=3) lof.fit(allPoints) sc = plt.scatter(allPoints[:,0],allPoints[:,1], c=(-1 * lof.negative_outlier_factor_),s=40,cmap='jet') plt.title('LOF Plot (k=' + str(k) + ')') plt.colorbar(sc, aspect=10)

  27. StrOUD Strangeness based Outlier Detection Algorithm (Barbar et al, 2006) Requires a strangeness factor (?): Density-based Local Outlier Factor StrOUD Method for detecting if p is an outlier 1. Sort training data by strangeness, call this the baseline. 2. Count values in baseline with strangeness values higher than point p. Call this value b. 3. Compute p-value: ?+1 ?+1 4. Compare p-value with (1 confidence). If p-value is smaller, than label point p an outlier.

  28. StrOUD Example is point p (4,3) an anomaly 1. Take an array of points: [ [1, 1], [2, 3], [1, 0.75], [0, 0], [2, 2], [10, 10], [5, 5], [1, 2], [3, 2] ] 2. Compute LOF (strangeness) (lof = LocalOutlierFactor(neighbors=3, novelty=True) (recall that you need to multiple negative_outlier_factor_ values by -1 to get the LOF) [1.08, 1.05, 1.12, 1.12, 1.13, 5.10, 2.55, 0.77, 1.05] 3. Sort array and call this our baseline: [0.77, 1.05, 1.05, 1.08, 1.12, 1.12, 1.13, 2.55, 5.10] 4. Compute LOF of [4,3] (use -1 * lof.score_samples([[4,3]]) = 1.39 5. b = 2 (baseline contains 2 values that are greater than p, 2.55 and 5.10) 6. p-value = ?+1 ?+1 = 2+1 9+1= 0.30 7. Is p-value < 1 0.95? 0.30 < 0.05 ? No. Then point p is not an outlier.

Related


More Related Content