Data Mining Methods in Web Analysis by Prof. Kristina Machova, PhD

Slide Note

Delve into the world of data mining methods in web analysis with a focus on machine learning techniques like decision trees, ensemble learning, and more. Explore the applications of machine learning, including sentiment analysis and identifying authorities and trolls. Dive deep into classification tasks, feature spaces, and different versions of machine learning algorithms. Understand the concept of decision trees and their representation, induction process, and algorithmic application. Gain insights into the fascinating realm of machine learning and its various domains.

riek_ud Follow

Uploaded on Feb 26, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Data Mining Methods in Web Analysis prof. Ing. Krist na Machov , PhD.

Content oMachine learning oDecision trees oEnsemble learning o Bagging o Boosting o Random forests oApplications of machine learning o Sentiment analysis o Authorities and trolls identification

Machine Learning TASK FOR LEARNING Classification task (assigning of an object to a class) Version space (concepts definitions and relations between them) Feature space Sequence task (searching for a sequence of steps to aim) State space Representation of Output/Input Training set, Testing set LEARNING ALGORITHM Version Space Search Decision Trees Naive Bayes Classifier SVM Support Vector Machines Controlled learning Uncontrolled learning

Machine learning Versions space

Machine learning Feature space

Machine Learning MACHINE LEARNING

Decision Trees ML- DT

Decision Trees Representation: o High illustrative and interpretive user acceptance o Acyclic graph tree o Nodes: o Leaf nodes represent classes (results of a classifying) o Root and intermediate nodes - testing attributes o Edges represent values of testing attributes Using: I IF F a new example fulfil all conditions on the journey from the root node to a leaf node, THAN THAN the example is classified to the class of the given leaf node

Decision Trees Induction o Application of an approach Separate and Conquer o Ending Condition (EC) o perfect classification each subspace contains only examples of one the same class o non-perfect classification - each subspace contains eg. 90% examples of one class o General algorithm: IF IF EC is fulfil for each subspace THEN ELSE ELSE 1. Choose a subspace with examples of different classes 2. Select still unused testing attribute TA 3. Split the subspace to other subspaces according values of TA THEN End

Decision Trees ID3 o Algorithm ID3 (Iterative Dichotomizer) o Ross Quinlan, 1979 o Performs the perfect classifying o Non-incremental o Selection of TA Shannon theory of information o Decreases entropy H, increases information gain IG o Generate the minimal tree conditions: o un-contradict examples o un-redundant examples o un-depending attributes n n 2 J = j = j j j = ( ) log H S = ( , ) ( ) ( ) H S A p S H S 2 + + j j n n n n 1 1 2 1 2 1 = ( ) ( ) ( , ) IG A H S H S A

Decision Trees - C4.5 o Ross Quinlan, 1993 o Modification of algorithm ID3 o Non-incremental induction o Attributes: o discrete (nominal, binary) o continuous (real, integer numbers) o Ratio Information Gain (IGr) o New possibilities to process: o continuous attributes o un-known values of attributes

Decision Trees RIG o ID3 prefers the selection of testing attributes with more values o C4.5 solution normalization using the ratio entropy Hrand the ratio information gain IGr o Ration Entropy: o Increases with the number of branchs of a tree o Leads to the decision tree smaller in width m ( , ) A I S A = j = ( , ) ( ) log ( ( )) = H S A p a p a ( , ) IG S A 2 r j j r ( , ) H S 1 p

Decision Trees Continuous Attributes o Values of a continuous attribute are ordered o Threshold value is computed for each pair of neighbor values as the arithmetic mean (average) o There are m-1 thresholds for m values of the attribute o We need to select one threshold for dividing space of examples into two sub-spaces o The threshold with maximal ratio information gain IGris selected o The space is divided into two subspaces according selected threshold: o First subspace - examples with attribute values less then the selected threshold o Second subspace greater then the threshold

Decision Trees Continuous Attributes Prediction of salary to three classes: Low Good Very good YEARS number of years of playing basketball HITS number of hits Left branche < Right branche LOW LOW Discrete and continuous attributes can be combined. GOOD GOOD VERY GOOD VERY GOOD

Ensemble Learning o Too small or less quality training set (irrelevant attributes, non typical examples of class) o Weak classifier low effectivity o Various samples of a training set are formed o On each sample, one particular weak classifier is learned o The result classification is created by the voting among all particular classifiers o Stacking o Bagging o Boosting o Random forests

Ensemble Learning Bagging o Bootstrap AGGreatING [Breiman, 1994] o Forms m=1...M different samples of TS o Particular classifier is learned using a ML method Hm: > {-1,1} a document D is(1)/is not(-1) classified to a class C o Voting of particular classifiers is modeled by the formula below ( menforced influence of more precise classifiers on the result prediction): M = m = ( , ) ( , ) H d c sign H d c i j m m i j 1

Ensemble Learning Bagging

Ensemble Learning Bagging Bagging like strategies: o Disjoint partitions each example only once, each subset creates 1/M part of an original TS o Small bags random selection with replication o No replication small bags o Disjoint bags in each subset 1/M, at least one randomly selected element is repeated, number of replications in each subset is the same Original TS A B C D E F G H I J K L M N O P Disjoint partitions A B C D Small bags (SB) A C H L No replication SB A C H L Disjoint bags A B C D C E F G H I J K L M N O P B P L P D I O H K C F K O P B N D I J E K G F M E F G H E I J K L J M N O P O

Ensemble Learning Boosting Boosting is an improved bagging [Schapire-Singer, 1999] based on the weighting of training examples o Forms m=1...M samples from the original TS o Samples differ only in weights of training examples o On every sample, weak classifier is trained Hm: > {-1,1} using the same selected algorithm of ML o In every iteration: o weights of wrong/right classified examples are increased/decreased o modified sample is used for learning of the new classifier o Error driven ensemble learning better results o Result of classification is the result of voting of particular classifiers (similar as in the bagging)

Ensemble Learning Random Forests o Random Forests [Breiman, 2001] is a modification of the bagging o EL is based exceptionally on the decision tree ML method o Generates a set of de-correlated trees forest of trees o Ensemble classifier o averages results of classifying of particular decision trees (regression DT) o creates the result of voting by particular classifiers DTs (classification DT) o Particular classifiers have to be independent (de-correlated) random selection of a subset of attributes for every tree o Independent trees can be generated in parallel process o Random selection of training subset of training set for generation of every decision tree

Ensemble Learning Random Forests The random forests is a fast and highly precise approach often used o Initialization faze setting parameters: o number of decision trees in the model o number of randomly selected attributes in every tree o Within selection of the testing attribute in a node of generated tree - m attributes from whole number of p attributes are taken into account o One attribute from m attributes is selected o Within another sub-node different subset of m attributes is considered o One attribute is selected again

Ensemble Learning Random Forests o Within every splitting of the space of training examples it is forbidden to use some attributes (unselected for subset of m attributes) Why? Irrational? o IF IF there is one strong predictor in a group of slightly strong predictors THEN THEN the strong predictor will be used for generation of all trees o Leads to similar trees - strong correlation of the trees Averaging of the forest of highly correlated trees will not bring any benefit compared to using only one tree o o Principle of de-correlation of trees the probability of the selection of a strong predictor will be only (p-m)/p other predictors will have more chance

Ensemble Learning Random Forests o Main difference between the random forest and the bagging is the selection of a subset of m predictors o IF m=p THEN the random forest is equal to the bagging o Less value m leads to a more reliable random forest (mainly in the case of many correlated predictors o It is easier to train the random forest then tune the boosting o Popular, in more softwer packages of ML

Ensemble Learning - Comparison o Ensemble Learning increases the precision of a classification providing a voting of the set of weak classifiers (strong classifier do not need to rejoice) o Cons: o loss of a clarity o result of the learning is less illustrative o increasing of a computational complexity o Achieve results prefer boosting before bagging o Error driven ensemble learning in the boosting not only refines results but also accelerates the learning o Random forests do not require much tuning o They achieve only 4,88% of the classification error (bagging 5,4%). o They can not be over-learned

Applications of Machine Learning o Labeled trainig set any machine learning method can be used marvelous marvelous reliable reliable never never totally totally conforming conforming very very satisfied satisfied really really mad mad Class Class D D W11 0 0 0 0 W12 0 0 0 0 0 W23 0 0 0 0 0 W34 0 0 0 0 W35 0 0 0 0 0 W46 0 0 0 0 W47 0 0 0 0 0 W58 0 0 0 0 W59 Positive Negative Positive Positive Negative 1 2 3 4 5 This mobile is marvelous, its functioning is reliable. Never buy this mobile. This mobile is totally conforming. I am very satisfied with my mobile. It really drives me mad. preprocessing using TF-IDF weighting scheme Lexical profile is really big Binary representation 1 1 or 0 0 Vector representation W Wij ij

Applications of Machine Learning Vector representation of texts reviews short documents dicontain terms (words) tjwith weight wij w w w w d 1 1 , 1 2 , 1 , 1 , 1 j n d w w w w 1 , 2 2 , 2 , 2 , 2 j n 2 = = A w w w w d 1 , i 2 , , , i i j i n i w w w w d 1 , 2 , , , m m m j m n m

Applications of Machine Learning o Labeled trainig set has to contain CLASSES definition Class Class Class Class Class Class Class Class Class Class Class Class Positive Negative 3 2 Hapiness Sadness Toxically Non-toxically Authority Non- authority Troll Non-troll Positive Positive Negative Negative 1 -1 -2 -3 Surprise Fear Disgustt Anger troll identifying opinion degreeemotions classifying troll reviews recognition opinion polarity authority identificatio n SA Sentiment Analysis

Applications of Machine Learning authority identificatio n troll identifying emotions classifying opinion polarity troll reviews recognition opinion degree o learning from features of a reviewer o attributes (predictors) has to be defined learning from short texts

Applications of Machine Learning o Authority Identification attributes definition o NC is the number of reviews o ANR is the average number of reactions on one review o AL is the average number of all layers, on which the comments are situated o NCH is the number of characters, which represents the length of o K is karma of a contributor in the form of a number from 0 to 200 o AE is the average evaluation of the comment - number from 0 to 80.

Applications of Machine Learning o Troll identifying attributes definition ? o Abusive troll o Insistent troll o Troll, who controls the grammar o Always injured troll o Troll, who knows everything o Troll, who is beyond the theme o Troll, who is a continuous spamer

Data Mining Methods in Web Analysis by Prof. Kristina Machova, PhD

Download Presentation

Presentation Transcript

Related

More Related Content