N-Gram Based Hybrid Optimization Algorithm for Spell Checking

hybrid optimization algorithm using n gram based n.w
1 / 13
Embed
Share

Explore a novel N-Gram based edit distance technique for spell checking, comparing it with traditional methods like edit distance and N-Gram. The hybrid algorithm shows superior efficiency and accuracy in detecting and correcting spelling errors, offering a comprehensive approach to enhancing text accuracy and clarity in various applications.

  • Spell Checking
  • N-Gram
  • Hybrid Algorithm
  • Optimization

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Hybrid Optimization Algorithm Using N-Gram Based Edit Distance M.Priya, Dr.R.Kalpana and T.Srisupriya International Conference on Communication and Signal Processing, April 6 - April 8, 2017, Chennai, India Speaker: Guan-Zhi Chen Date: 2018.09.18

  2. Abstract (1/2) Spell-checking method is used to detect and correct the incorrect words which spelled in a document. To afford the best probable solution for corrected words first we should detect the incorrect words. The several spell correcting techniques are used after detecting the wrong words or misspelled words, to produces the best accurate correct or alternate words. Spell checking is used in word processing programs, cell phones, email programs and many other applications such as blogs and forums etc. Classification of Error correction and detection provide a better understanding of different types of techniques. 3

  3. Abstract (2/2) There are two basic approaches to correct the spelling called edit distance approach and N-gram approach. This paper deals with a new technique called N-Gram based Edit distance (NGE) for transformation of strings. For rule generation to estimate the probabilities using Log model. The proposed method is compared with the other two approaches: Edit distance and N-Gram. The proposed method yields better result than other two approaches and the hybrid algorithm outperforms both edit distance and n-gram algorithm in terms of efficiency and accuracy. 4

  4. Spell errors Real-word Typographic error oan - on Spell error Cognitive error existance - existence Non-word Phonetic error privelege - privilege 5

  5. Classification of spell checking 6

  6. Edit distance (1/3) Operations: insertion, deletion, substitution, transposition S: A A B A A A A S: A A A B C T: A B A A A C A T: A B C A A Insertion Substitution Transposition Deletion 7

  7. Edit distance (2/3) Operations: insertion, deletion, substitution, transposition S: football ?? = 0 T: Football S: foetball ?? = 1 T: Football 8

  8. Edit distance (3/3) S: Kernal Insertion = 1 Deletion = 1 T: Kernel Substitution = Deletion + Insertion = 1 + 1 = 2 Similarity coefficient = 2 6 different length of S & T: sequence or latent alignment 9

  9. N-Gram ? = 1 unigram ? = 2 bigram ? = 3 trigram S: Kernal T: Kernel Similarity coefficient = 3 7 10

  10. Hybrid algorithm (N-Gram + edit distance) (1/2) Training data Training data Get the string pairs (si, ti) Apply Bigram for the pairs (si, ti) Common bigrams are predict from (si, ti) Find the union of bigram from (si, ti) Apply n-gram based edit distance and create transformation rule End error detection error detection error correction error correction k Similarity coefficient = 5 7 11

  11. Hybrid algorithm (N-Gram + edit distance) (2/2) Maximum likelihood estimation 12

  12. Experimental results 13

Related


More Related Content