Understanding Finite State Automata (FSA) in Computational Linguistics

ling c sc psyc 438 538 n.w
1 / 24
Embed
Share

Explore Finite State Automata (FSA) and its relevance in computational linguistics. Learn about the formal definitions, implementations in Perl and Python, and the distinction between deterministic and non-deterministic FSAs. Discover the natural correspondence between components of an FSA and the regex defining the language.

  • Computational Linguistics
  • FSA
  • Regular Expressions
  • Perl
  • Python

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong

  2. Today's Topics Regex example: xkcd simplewriter FSA contd. formal definition Perl and Python implementations e-transitions single vs. multiple start states

  3. xkcd:simplewriter https://xkcd.com/simplewriter/

  4. xkcd:simplewriter

  5. xkcd:simplewriter

  6. xkcd:simplewriter grep -o '|' words.js | wc -l 3633 (3634 words)

  7. Finite State Automata (FSA) L = { a+b+ } can be also be generated by the following FSA a a s x > b Conventions (used here): 1. > Indicates start state 2. Red circle indicates end (accepting) state 3. we accept a input string only when we re in an end state and we re at the end of the string b y

  8. Finite State Automata (FSA) L = { a+b+ } can be also be generated by the following FSA a a s x > b There is a natural correspondence between components of the FSA and the regex defining L Note: L = {a+b+} L = {aa*bb*} b y

  9. Finite State Automata (FSA) L = { a+b+ } can be also be generated by the following FSA Note: multiple exiting arrows, i.e. two paths, but still deterministic! a a s x > b deterministic FSA (DFSA) no ambiguity about where to go at any given state i.e. for each input symbol in the alphabet at any given state, there is a unique action to take. b y non-deterministic FSA (NDFSA) no restriction on ambiguity (surprisingly, no increase in power)

  10. Finite State Automata (FSA) more formally (Q,s,f, , ) 1. set of states (Q): {s,x,y} 2. start state (s): s 3. end state(s) (f): y 4. alphabet ( ): {a, b} 5. transition function : signature: character state state (a,s)=x (a,x)=x (b,x)=y (b,y)=y must be a finite set a s x > a b y b

  11. Finite State Automata (FSA) In Perl transition function : (a,s)=x (a,x)=x (b,x)=y (b,y)=y We can simulate our 2D transition table using a hash table whose elements are themselves also hash tables (anonymized; note: {..} = hashes) %transitiontable = ( s => { a => "x" }, x => { a => "x", b => "y" }, y => { b => "y" } ); a s x > a Syntactic sugar for %transitiontable = ( "s", { "a", "x", }, "x", { "a", "x" , "b", "y" }, "y", { "b", "y" }, ); b y b Example: print "$transitiontable{s}{a}\n";

  12. Finite State Automata (FSA) Given transition table encoded as a (nested) hash How to build a decider (Accept/Reject) in Perl? Complications to think about: How about -transitions? Multiple end states? Multiple start states? Non-deterministic FSA?

  13. Finite State Automata (FSA) Example runs: perl fsm.prl a b a b Reject perl fsm.prl a a a b b Accept %transitiontable = ( s => {a => "x"}, x => {a => "x", b => "y"}, y => {b => "y"} ); $state = "s"; foreach $c (@ARGV) { $state = $transitiontable{$state}{$c}; } if ($state eq "y") { print "Accept\n"; } else { print "Reject\n" }

  14. Finite State Automata (FSA) Perl one-liner: perl -le '%h=(s=>{a=>"x"},x=>{a=>"x",b=>"y"},y=>{b=>"y"}); $s="s"; for $c (@ARGV) {$s=$h{$s}{$c}}; print "Accept" if $s eq "y"'

  15. Finite State Automata (FSA) Perl one-liner examples: perl -le '%h=(s=>{a=>"x"},x=>{a=>"x",b=>"y"},y=>{b=>"y"}); $s="s"; for $c (@ARGV) {$s=$h{$s}{$c}}; print "Accept" if $s eq "y"' a perl -le '%h=(s=>{a=>"x"},x=>{a=>"x",b=>"y"},y=>{b=>"y"}); $s="s"; for $c (@ARGV) {$s=$h{$s}{$c}}; print "Accept" if $s eq "y"' a b Accept

  16. Finite State Automata (FSA) this is justpseudo-code not any real programming language but can be easily translated

  17. In Python 1. Python dictionary = Perl hash 1. key:value sys.argv = @ARGV (but numbered from 1, not 0) [1:] slices the command line 2. 3.

  18. In Python Python has a data structure called a tuple: (e1,..,en) Note: Python lists use [..] In Python, crucially tuples (but not lists) can also be dictionary keys Note: Many other ways of encoding FSA in Python, e.g. using object-oriented programming (classes) https://wiki.python.org/moin/FiniteStateMachine#FSA_-_Finite_State_Automation_in_Python

  19. Finite State Automata (FSA) Practical applications can be encoded and run efficiently on a computer widely used encode regular expressions (e.g. Perl regex) morphological analyzers Different word forms, e.g. want, wanted, unwanted (suffixation/prefixation) see chapter 3 of textbook speech recognizers Markov models = FSA + probabilities and much more

  20. -transitions jump from state to another state with the empty character -transition (textbook) or -transition no increase in expressive power (meaning we could do without the -transition) examples a 2 3 > b a b 1 2 > 1 3 b a b > 1 3 2 what s the equivalent without the -transition? 20

  21. -transitions Can be used to help encode: 1. Multiple start states 2. Multiple end states Next time, we'll see: Then we can get rid of the -transition (by construction)

  22. Backreferences and FSA a s x > a Deep question: why are backreferences impossible in FSA? b Example: Suppose you wanted a machine that accepted /(a+b+)\1/ One idea: link two copies of the machine together y y b a x2 a Doesn t work! Why? b y2 b

  23. Backreferences and FSA fsa.perl Perl: note line 10: next state is a function of previous state and current symbol ONLY # of a's and b's in the two halves don't have to match: perl fsa.perl aabba Reject perl fsa.perl aabbaaaabbbb Accept perl fsa.perl aabbaaaab Accept

  24. Multiple start states Example: simulate this by using an e-transition: Multiple final states vs. a single state: also same expressive power. Doesn't have to have any final states at all: L(machine) = {} What's the simplest possible FSA? < > s x a b a b z y a a b b

Related


More Related Content