
Automatic Key Term Extraction Using Branching Entropy - NTU Study
Discover the innovative approach by National Taiwan University for automatic key term extraction from spoken course lectures through branching entropy and prosodic/semantic features. Explore the proposed methodology, learning methods, experiments, and evaluation conducted to enhance key term extraction efficiency.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
National Taiwan University, Taiwan Automatic Key Term Extraction from Spoken Course Lectures Using Branching Entropy and Prosodic/Semantic Features Speaker:
Key Term Extraction, NTU 2 Outline O Introduction O Proposed Approach O Branching Entropy O Feature Extraction O Learning Method O Experiments & Evaluation O Conclusion
Key Term Extraction, NTU 3 Introduction
Key Term Extraction, NTU 4 Definition O Key Term O Higher term frequency O Core content O Two types O Keyword O Key phrase O Advantage O Indexing and retrieval O The relations between key terms and segments of documents
Key Term Extraction, NTU 5 Introduction
Key Term Extraction, NTU 6 Introduction language model n gram hmm acoustic model hidden Markov model phone
Key Term Extraction, NTU 7 Introduction bigram language model n gram hmm acoustic model hidden Markov model phone Target: extract key terms from course lectures
Key Term Extraction, NTU 8 Proposed Approach
Key Term Extraction, NTU 9 Automatic Key Term Extraction Original spoken documents Learning Methods ASR trans Branching Entropy Feature Extraction Archive of spoken documents 1) 2) 3) K-means Exemplar AdaBoost Neural Network ASR speech signal
Key Term Extraction, NTU 10 Automatic Key Term Extraction Learning Methods ASR trans Branching Entropy Feature Extraction Archive of spoken documents 1) 2) 3) K-means Exemplar AdaBoost Neural Network ASR speech signal
Key Term Extraction, NTU 11 Automatic Key Term Extraction Learning Methods ASR trans Branching Entropy Feature Extraction Archive of spoken documents 1) 2) 3) K-means Exemplar AdaBoost Neural Network ASR speech signal
Key Term Extraction, NTU 12 Automatic Key Term Extraction Phrase Identification Learning Methods ASR trans Branching Entropy Feature Extraction Archive of spoken documents 1) 2) 3) K-means Exemplar AdaBoost Neural Network ASR speech signal First using branching entropy to identify phrases
Key Term Extraction, NTU 13 Automatic Key Term Extraction Key Term Extraction Phrase Identification Learning Methods ASR trans Branching Entropy Feature Extraction Archive of spoken documents 1) 2) 3) K-means Exemplar AdaBoost Neural Network ASR speech signal Key terms entropy acoustic model : Learning to extract key terms by some features
Key Term Extraction, NTU 14 Automatic Key Term Extraction Key Term Extraction Phrase Identification Learning Methods ASR trans Branching Entropy Feature Extraction Archive of spoken documents 1) 2) 3) K-means Exemplar AdaBoost Neural Network ASR speech signal Key terms entropy acoustic model :
Key Term Extraction, NTU 15 How to decide the boundary of a phrase? Branching Entropy represent is of in : is can : : hidden Markov model : O hidden is almost always followed by the same word
Key Term Extraction, NTU 16 How to decide the boundary of a phrase? Branching Entropy represent is of in : is can : : hidden Markov model : O hidden is almost always followed by the same word O hidden Markov is almost always followed by the same word
Key Term Extraction, NTU 17 How to decide the boundary of a phrase? Branching Entropy represent is of in : is can : : hidden Markov model : boundary O hidden is almost always followed by the same word O hidden Markov is almost always followed by the same word O hidden Markov model is followed by many different words Define branching entropy to decide possible boundary
Key Term Extraction, NTU 18 How to decide the boundary of a phrase? Branching Entropy represent is of in : is can : : hidden Markov model xi X : O Definition of Right Branching Entropy O Probability of children xifor X O Right branching entropy for X
Key Term Extraction, NTU 19 How to decide the boundary of a phrase? Branching Entropy represent is of in : is can : : hidden Markov model X : boundary O Decision of Right Boundary O Find the right boundary located between X and xiwhere
Key Term Extraction, NTU 20 How to decide the boundary of a phrase? Branching Entropy represent is of in : is can : : hidden Markov model :
Key Term Extraction, NTU 21 How to decide the boundary of a phrase? Branching Entropy represent is of in : is can : : hidden Markov model :
Key Term Extraction, NTU 22 How to decide the boundary of a phrase? Branching Entropy represent is of in : is can : : hidden Markov model :
Key Term Extraction, NTU 23 How to decide the boundary of a phrase? Branching Entropy represent is of in : is can : : hidden Markov model X : boundary O Decision of Left Boundary O Find the left boundary located between X and xi where X: model Markov hidden Using PAT Tree to implement
Key Term Extraction, NTU 24 How to decide the boundary of a phrase? Branching Entropy O Implementation in the PAT tree O Probability of children xifor X O Right branching entropy for X hidden X : hidden Markov x1: hidden Markov model x2: hidden Markov chain Markov state variable X 4 chain model 1 5 distribution 3 x2 2 6 x1
Key Term Extraction, NTU 25 Automatic Key Term Extraction Key Term Extraction Phrase Identification Learning Methods ASR trans Branching Entropy Feature Extraction Archive of spoken documents 1) 2) 3) K-means Exemplar AdaBoost Neural Network ASR speech signal Key terms entropy acoustic model : Extract some features for each candidate term
Key Term Extraction, NTU 26 Feature Extraction O Prosodic features Speaker tends to use longer duration to emphasize key terms O For each candidate term appearing at the first time Feature Name Duration (I IV) Feature Description normalized duration (max, min, mean, range) using 4 values for duration of the term duration of phone a normalized by avg duration of phone a
Key Term Extraction, NTU 27 Feature Extraction O Prosodic features Higher pitch may represent significant information O For each candidate term appearing at the first time Feature Name Duration (I IV) Feature Description normalized duration (max, min, mean, range)
Key Term Extraction, NTU 28 Feature Extraction O Prosodic features Higher pitch may represent significant information O For each candidate term appearing at the first time Feature Name Duration (I IV) Pitch (I - IV) Feature Description normalized duration (max, min, mean, range) F0 (max, min, mean, range)
Key Term Extraction, NTU 29 Feature Extraction O Prosodic features Higher energy emphasizes important information O For each candidate term appearing at the first time Feature Name Duration (I IV) Pitch (I - IV) Feature Description normalized duration (max, min, mean, range) F0 (max, min, mean, range)
Key Term Extraction, NTU 30 Feature Extraction O Prosodic features Higher energy emphasizes important information O For each candidate term appearing at the first time Feature Name Duration (I IV) Pitch (I - IV) Energy (I - IV) Feature Description normalized duration (max, min, mean, range) F0 (max, min, mean, range) energy (max, min, mean, range)
Key Term Extraction, NTU 31 Feature Extraction O Lexical features Feature Name TF IDF TFIDF PoS Feature Description term frequency inverse document frequency tf * idf the PoS tag Using some well-known lexical features for each candidate term
Key Term Extraction, NTU 32 Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Probability D1 D2 t1 t2 T1 T2 tj P(T |D ) k i Tk P(t |T ) j Di k TK tn tj: terms Di: documents Tk: latent topics DN
Key Term Extraction, NTU 33 Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Probability non-key term key term How to use it? Feature Name Feature Description Latent Topic Probability (mean, variance, standard deviation) LTP (I - III) describe a probability distribution
Key Term Extraction, NTU 34 Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Significance Within-topic to out-of-topic ratio non-key term within-topic freq. out-of-topic freq. key term Feature Name Feature Description Latent Topic Probability (mean, variance, standard deviation) LTP (I - III)
Key Term Extraction, NTU 35 Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Significance Within-topic to out-of-topic ratio non-key term within-topic freq. out-of-topic freq. Feature Name key term Feature Description Latent Topic Probability (mean, variance, standard deviation) Latent Topic Significance (mean, variance, standard deviation) LTP (I - III) LTS (I - III)
Key Term Extraction, NTU 36 Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Entropy non-key term key term Feature Name Feature Description Latent Topic Probability (mean, variance, standard deviation) Latent Topic Significance (mean, variance, standard deviation) LTP (I - III) LTS (I - III)
Key Term Extraction, NTU 37 Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Entropy non-key term Higher LTE key term Lower LTE Feature Name Feature Description Latent Topic Probability (mean, variance, standard deviation) Latent Topic Significance (mean, variance, standard deviation) term entropy for latent topic LTP (I - III) LTS (I - III) LTE
Key Term Extraction, NTU 38 Automatic Key Term Extraction Key Term Extraction Phrase Identification Learning Methods ASR trans Branching Entropy Feature Extraction Archive of spoken documents 1) 2) 3) K-means Exemplar AdaBoost Neural Network ASR speech signal Key terms entropy acoustic model : Using learning approaches to extract key terms
Key Term Extraction, NTU 39 Learning Methods O Unsupervised learning O K-means Exemplar Transform a term into a vector in LTS (Latent Topic Significance) space Run K-means The terms in the same cluster focus on a single topic Find the centroid of each cluster to be the key term The term in the same group are related to the key term The key term can represent this topic
Key Term Extraction, NTU 40 Learning Methods O Supervised learning O Adaptive Boosting O Neural Network Automatically adjust the weights of features to produce a classifier
Key Term Extraction, NTU 41 Experiments & Evaluation
Key Term Extraction, NTU 42 Experiments O Corpus O NTU lecture corpus O Mandarin Chinese embedded by English words solution viterbi algorithm (Our solution is viterbi algorithm) O Single speaker O 45.2 hours
Key Term Extraction, NTU 43 Experiments O ASR Accuracy some data from target speaker SI Model Bilingual AM and model adaptation AM CH EN Background Out-of-domain corpora trigram interpolation LM Adaptive In-domain corpus Language Char Acc (%) Mandarin 78.15 English 53.44 Overall 76.26
Key Term Extraction, NTU 44 Experiments O Reference Key Terms O Annotations from 61 students who have taken the course If the k-th annotator labeled Nkkey terms, he gave each of them a score of , but 0 to others Rank the terms by the sum of all scores given by all annotators for each term Choose the top N terms form the list (N is average Nk) O N = 154 key terms 59 key phrases and 95 keywords
Key Term Extraction, NTU 45 Experiments O Evaluation O Unsupervised learning Set the number of key terms to be N O Supervised learning 3-fold cross validation
Key Term Extraction, NTU 46 Experiments O Feature Effectiveness O Neural network for keywords from ASR transcriptions F-measure 60 56.55 48.15 50 42.86 40 35.63 30 20.78 20 10 Pr: Prosodic Lx: Lexical Sm: Semantic 0 Each set of these features alone gives F1 from 20% to 42% Prosodic features and lexical features are additive Three sets of features are all useful
Key Term Extraction, NTU 47 Experiments AB: AdaBoost NN: Neural Network O Overall Performance F-measure 70 67.31 62.39 60 55.84 51.95 Conventional TFIDF scores w/o branching entropy stop word removal PoS filtering 50 40 manual 30 23.38 20 10 0 Baseline K-means Exempler outperforms TFIDF Supervised approaches are better than unsupervised approaches U: TFIDF U: K-means S: AB S: NN Branching entropy performs well
Key Term Extraction, NTU 48 Experiments AB: AdaBoost NN: Neural Network O Overall Performance F-measure 70 67.31 62.70 62.39 60 57.68 55.84 52.60 51.95 50 43.51 40 manual 30 ASR 23.38 20.78 20 10 0 Baseline U: TFIDF U: K-means S: AB S: NN The performance of ASR is slightly worse than manual but reasonable Supervised learning using neural network gives the best results
Key Term Extraction, NTU 49 Conclusion
Key Term Extraction, NTU 50 Conclusion O We propose the new approach to extract key terms O The performance can be improved by O Identifying phrases by branching entropy O Prosodic, lexical, and semantic features together O The results are encouraging