
Automatic Key Term Extraction and Summarization in Spoken Course Lectures
Explore the innovative research conducted by Yun-Nung Chen at National Taiwan University on extracting key terms and summaries from spoken course lectures. Discover the importance of indexing, retrieval, and understanding the relations between key terms and document segments. The study delves into information extraction methods using AdaBoost, Neural Network, and Branching Entropy for efficient document understanding.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Automatic Key Term Extraction and Summarization from Spoken Course Lectures Speaker: Yun-Nung Chen Advisor: Prof. Lin-Shan Lee National Taiwan University
2 Master Defense, National Taiwan University Introduction Target: extract key terms and summaries from course lectures
3 Master Defense, National Taiwan University Introduction Key Term Summary O Indexing and retrieval O The relations between key terms and segments of documents O Efficiently understand the document Related to document understanding and semantics from the document Both are Information Extraction
4 Master Defense, National Taiwan University Automatic Key Term Extraction
5 Master Defense, National Taiwan University Definition O Key Term O Higher term frequency O Core content O Two types O Keyword O Ex. O Key phrase O Ex.
6 Master Defense, National Taiwan University Automatic Key Term Extraction Original spoken documents ASR trans Learning Methods 1) AdaBoost 2) Neural Network Branching Entropy Feature Extraction Archive of spoken documents ASR speech signal
7 Master Defense, National Taiwan University Automatic Key Term Extraction ASR trans Learning Methods 1) AdaBoost 2) Neural Network Branching Entropy Feature Extraction Archive of spoken documents ASR speech signal
8 Master Defense, National Taiwan University Automatic Key Term Extraction ASR trans Learning Methods 1) AdaBoost 2) Neural Network Branching Entropy Feature Extraction Archive of spoken documents ASR speech signal
9 Master Defense, National Taiwan University Automatic Key Term Extraction Phrase Identification ASR trans Learning Methods 1) AdaBoost 2) Neural Network Branching Entropy Feature Extraction Archive of spoken documents ASR speech signal First using branching entropy to identify phrases
10 Master Defense, National Taiwan University Automatic Key Term Extraction Key Term Extraction Phrase Identification ASR trans Learning Methods 1) AdaBoost 2) Neural Network Branching Entropy Feature Extraction Archive of spoken documents ASR speech signal Key terms entropy acoustic model : Then using learning methods to extract key terms by some features
11 Master Defense, National Taiwan University Automatic Key Term Extraction Key Term Extraction Phrase Identification ASR trans Learning Methods 1) AdaBoost 2) Neural Network Branching Entropy Feature Extraction Archive of spoken documents ASR speech signal Key terms entropy acoustic model :
12 Master Defense, National Taiwan University Branching Entropy How to decide the boundary of a phrase? represent is of is hidden Markov model in can : : : : O Inside the phrase
13 Master Defense, National Taiwan University Branching Entropy How to decide the boundary of a phrase? represent is of is hidden Markov model in can : : : : O Inside the phrase O Inside the phrase
14 Master Defense, National Taiwan University Branching Entropy How to decide the boundary of a phrase? represent is of is hidden Markov model in can : : : : boundary O Inside the phrase O Inside the phrase O Boundary of the phrase Define branching entropy to decide possible boundary
15 Master Defense, National Taiwan University Branching Entropy How to decide the boundary of a phrase? represent is of is hidden Markov model in can : : X xi : : O Definition of Right Branching Entropy O Probability of xigiven X O Right branching entropy for X
16 Master Defense, National Taiwan University Branching Entropy How to decide the boundary of a phrase? represent is of is hidden Markov model in can : : X : : boundary O Decision of Right Boundary O Find the right boundary located between X and xi where
17 Master Defense, National Taiwan University Branching Entropy How to decide the boundary of a phrase? represent is of is hidden Markov model in can : : : :
18 Master Defense, National Taiwan University Branching Entropy How to decide the boundary of a phrase? represent is of is hidden Markov model in can : : : :
19 Master Defense, National Taiwan University Branching Entropy How to decide the boundary of a phrase? represent is of is hidden Markov model in can : : : : boundary Using PAT tree to implement
20 Master Defense, National Taiwan University Automatic Key Term Extraction Key Term Extraction Phrase Identification ASR trans Learning Methods 1) AdaBoost 2) Neural Network Branching Entropy Feature Extraction Archive of spoken documents ASR speech signal Key terms entropy acoustic model : Extract prosodic, lexical, and semantic features for each candidate term
21 Master Defense, National Taiwan University Feature Extraction O Prosodic features Speaker tends to use longer duration to emphasize key terms O For each candidate term appearing at the first time Feature Name Feature Description Duration (I IV) normalized duration (max, min, mean, range) duration of phone a normalized by avg duration of phone a using 4 values for duration of the term
22 Master Defense, National Taiwan University Feature Extraction O Prosodic features Higher pitch may represent significant information O For each candidate term appearing at the first time Feature Name Feature Description Duration (I IV) normalized duration (max, min, mean, range)
23 Master Defense, National Taiwan University Feature Extraction O Prosodic features Higher pitch may represent significant information O For each candidate term appearing at the first time Feature Name Feature Description Duration (I IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range)
24 Master Defense, National Taiwan University Feature Extraction O Prosodic features Higher energy emphasizes important information O For each candidate term appearing at the first time Feature Name Feature Description Duration (I IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range)
25 Master Defense, National Taiwan University Feature Extraction O Prosodic features Higher energy emphasizes important information O For each candidate term appearing at the first time Feature Name Feature Description Duration (I IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range) Energy (I - IV) energy (max, min, mean, range)
26 Master Defense, National Taiwan University Feature Extraction O Lexical features Feature Name Feature Description TF term frequency IDF inverse document frequency TFIDF tf * idf PoS the PoS tag Using some well-known lexical features for each candidate term
27 Master Defense, National Taiwan University Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Probability D1 D2 t1 t2 T1 T2 tj P(T |D ) k i Tk P(t |T ) j Di k TK tn tj: terms Di: documents Tk: latent topics DN
28 Master Defense, National Taiwan University Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Probability non-key term key term Feature Name Feature Description Latent Topic Probability (mean, variance, standard deviation) LTP (I - III) describe a probability distribution
29 Master Defense, National Taiwan University Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Significance Within-topic to out-of-topic ratio non-key term key term within-topic freq. out-of-topic freq. Feature Name Feature Description Latent Topic Probability (mean, variance, standard deviation) LTP (I - III)
30 Master Defense, National Taiwan University Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Significance Within-topic to out-of-topic ratio non-key term key term within-topic freq. out-of-topic freq. Feature Name Feature Description Latent Topic Probability (mean, variance, standard deviation) LTP (I - III) Latent Topic Significance (mean, variance, standard deviation) LTS (I - III)
31 Master Defense, National Taiwan University Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Entropy non-key term key term Feature Name Feature Description Latent Topic Probability (mean, variance, standard deviation) LTP (I - III) Latent Topic Significance (mean, variance, standard deviation) LTS (I - III)
32 Master Defense, National Taiwan University Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Entropy non-key term Higher LTE key term Lower LTE Feature Name Feature Description Latent Topic Probability (mean, variance, standard deviation) LTP (I - III) Latent Topic Significance (mean, variance, standard deviation) LTS (I - III) LTE term entropy for latent topic
33 Master Defense, National Taiwan University Automatic Key Term Extraction Key Term Extraction Phrase Identification Learning Methods 1) AdaBoost 2) Neural Network ASR trans Branching Entropy Feature Extraction Archive of spoken documents ASR speech signal Key terms entropy acoustic model : Using supervised approaches to extract key terms
34 Master Defense, National Taiwan University Learning Methods O Adaptive Boosting (AdaBoost) O Neural Network Automatically adjust the weights of features to train a classifier
35 Master Defense, National Taiwan University Experiments Automatic Key Term Extraction
36 Master Defense, National Taiwan University Experiments O Corpus O NTU lecture corpus O Mandarin Chinese embedded by English words O Single speaker O 45.2 hours O ASR System O Bilingual AM with model adaptation [1] O LM with adaptation using random forests [2] Language Mandarin English Overall Char Acc (%) 78.15 53.44 76.26 [1] Ching-Feng Yeh, Bilingual Code-Mixed Acoustic Modeling by Unit Mapping and Model Recovery, Master Thesis, 2011. [2] Chao-Yu Huang, Language Model Adaptation for Mandarin-English Code-Mixed Lectures Using Word Classes and Random Forests, Master Thesis, 2011.
37 Master Defense, National Taiwan University Experiments O Reference Key Terms O Annotations from 61 students who have taken the course O If the an annotator labeled 150 key terms, he gave each of them a score of 1/150 , but 0 to others O Rank the terms by the sum of all scores given by all annotators for each term O Choose the top N terms form the list O N is average number of key terms O N = 154 key terms O 59 key phrases and 95 keywords O Evaluation O 3-fold cross validation
38 Master Defense, National Taiwan University Experiments O Feature Effectiveness O Neural network for keywords from ASR transcriptions F-measure 60 56.55 48.15 50 42.86 40 35.63 30 20.78 20 Pr: Prosodic Lx: Lexical Sm: Semantic 10 0 Pr Lx Sm Pr+Lx Pr+Lx+Sm Each set of these features alone gives F1 from 20% to 42% Prosodic features and lexical features are additive Three sets of features are all useful
39 Master Defense, National Taiwan University Experiments O Overall Performance (Keywords & Key Phrases) F-measure 70 62.70 60 57.68 52.60 50 Baseline 40 ASR 30 23.44 Manual 20 10 0 key phrase keyword N-Gram+TFIDF N-Gram TFIDF BE+TFIDF Branching Entropy TFIDF BE+Adaboost Branching Entropy AdaBoost BE+Neural Network Neural Network Branching Entropy Branching entropy performs well
40 Master Defense, National Taiwan University Experiments O Overall Performance (Keywords & Key Phrases) F-measure 70 67.31 62.70 62.39 60 57.68 55.84 52.60 50 Baseline 40 32.19 ASR 30 23.44 Manual 20 10 0 key phrase keyword N-Gram+TFIDF N-Gram TFIDF BE+TFIDF Branching Entropy TFIDF BE+Adaboost Branching Entropy AdaBoost BE+Neural Network Neural Network Branching Entropy The performance of manual is slightly better than ASR Supervised learning using neural network gives the best results
41 Master Defense, National Taiwan University Automatic Summarization
42 Master Defense, National Taiwan University Introduction O Extractive Summary O Important sentences in the document O Computing Importance of Sentences O Statistical Measure, Linguistic Measure, Confidence Score, N-Gram Score, Grammatical Structure Score O Ranking Sentences by Importance and Deciding Ratio of Summary Proposed a better statistical measure of a term
43 Master Defense, National Taiwan University Statistical Measure of a Term O LTE-Based Statistical Measure (Baseline) Tk Tk-1 Tk+1 O Key-Term-Based Statistical Measure O Considering only key terms ti key ti key Key terms can represent core content of the document Latent topic probability can be estimated more accurately O Weighted by LTS of the term
44 Master Defense, National Taiwan University Importance of the Sentence O Original Importance O LTE-based statistical measure O Key-term-based statistical measure O New Importance O Considering original importance and similarity of other sentences Sentences similar to more sentences should get higher importance
45 Master Defense, National Taiwan University Random Walk on a Graph O Idea O Sentences similar to more important sentences should be more important O Graph Construction O Node: sentence in the document O Edge: weighted by similarity between nodes O Node Score O Interpolating two scores score of Siin k-th iter. O Normalized original score of sentence Si O Scores propagated from neighbors according to edge weight p(j, i) Nodes connecting to more nodes with higher scores should get higher scores
46 Master Defense, National Taiwan University Random Walk on a Graph O Topical Similarity between Sentences O Edge weight sim(Si, Sj): (sentence i sentence j) O Latent topic probability of the sentence Sj t LTS Tk-1 Tk Tk+1 O Using Latent Topic Significance tj Si ti tk
47 Master Defense, National Taiwan University Random Walk on a Graph O Scores of Sentences O Converged equation O Matrix form O Solution dominate eigen vector of P O Integrated with Original Importance
48 Master Defense, National Taiwan University Experiments Automatic Summarization
49 Master Defense, National Taiwan University Experiments O Same Corpus and ASR System O NTU lecture corpus O Reference Summaries O Two human produced reference summaries for each document O Ranking sentences from the most important to of average importance O Evaluation Metric O ROUGE-1, ROUGE-2, ROUGE-3 O ROUGE-L: Longest Common Subsequence (LCS)
50 Master Defense, National Taiwan University Evaluation LTE Key ASR ROUGE-1 ROUGE-2 ROUGE-L ROUGE-3 56 28 50 19 51 23 45 14 46 41 9 40 18 10% 20% 30% 10% 20% 30% 10% 20% 30% 10% 20% 30% Key-term-based statistical measure is helpful