
Computational Morphology and Transducer Tools for Developing SFST Programming Language
Explore Finite State Morphology with SFST programming language, creating transducers, compiling test programs, and analyzing transducer variables. Additionally, solve a fun homework assignment involving families and children. Find solutions including mapping letters and classifying children based on family, gender, and birth order.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Finite State Morphology Alexander Fraser & Luisa Berlanda fraser@cis.uni-muenchen.de CIS, Ludwig-Maximilians-Universit t M nchen Computational Morphology and Electronic Dictionaries SoSe 2016 2016-25-05
SFST programming language for developing finite-state transducers compiler which translates programs to transducers tools for applying transducers printing transducers comparing transducers
SFST Example Session > echo "Hello\ World\!" > test.fst storing a small test program > fst-compiler test.fst test.a test.fst: 2 calling the compiler > fst-mor test.a reading transducer... finished. analyze> Hello World! Hello World! analyze> Hello World no result for Hello World analyze> q interactive transducer usage transducer is loaded input recognised another input not recognised terminate program
Transducer Variables $Vroot$ = walk | talk | bark $Vinfl$ = <V>:<> (\ [<inf><n3s>]:<> |\ {<3s>}:{s} |\ {<ger>}:{ing} |\ {<past>}:{ed}) % list of verbs with regular inflection % regular verbal inflection $Nroot$ = hat | head | trick $Ninfl$ = <N>:<> (\ {<sg>}:{} |\ {<pl>}:{s}) % list of nouns with regular inflection % regular nominal inflection $Vroot$ $Vinfl$ | $Nroot$ $Ninfl$ % combine stems and inflectional endings
Homework Write a pipeline that maps all letters to lowercase and orders them backwards Family Huber has three children. Their first child is called Mia, the next one Toni and the last one Pia. Family Band has three children as well. Michael, Paul and Pia. Write a program, that can tell us the following details about the children: which family does he/she belong to, is it a son or a daugter, was he/she the first, second or third child. Output format: <family><family name><first, second child><gender>
Solution Write a pipeline that maps all letters to lowercase and orders them backwards [a-z]:[A-Z]* || [ZYXWVUTSRQPONMLKJIHFECBA]:[A-Z]*
Solution {<family><huber>}:{} ({<1><daughter>}:{mia} |{<1><son>}:{toni} |{<2><daughter>}:{pia}) |\ {<family><band>}:{} ({<1><son>}:{michael} |{<2><son>}:{paul} |{<1><daughter>}:{pia})
Lexicon Files $Vroot$ = verb.lex $Nroot$ = noun.lex The command filename reads the respective file line by line and forms the disjunction of all lines. Only the symbols : \ < > % are treated as operators.
Lexicon Files $Vroot$ = verb.lex $Vinfl$ = <V>:<> (\ [<inf><n3s>]:<> |\ {<3s>}:{s} |\ {<ger>}:{ing} |\ {<past>}:{ed}) $Nroot$ = noun.lex $Ninfl$ = <N>:<> (\ {<sg>}:{} |\ {<pl>}:{s}) $Vroot$ $Vinfl$ | $Nroot$ $Ninfl$
Symbol Set Variables #cons# = bcdfghjklmnpqrstvwxz #CONS# = BCDFGHJKLMNPQRSTVWXZ #Cons# = #cons# #CONS# #vowel# = aeiou #VOWEL# = AEIOU #Vowel# = #vowel# #VOWEL# #letter# = #vowel# #cons# #LETTER# = #VOWEL# #CONS# [#LETTER# #letter#]:[#letter# #LETTER#]* What would you get for Hallo and Ru ?
Solution What would you get for Hallo and Ru ? hALLO rU
Alphabet The alphabet defines the set of available symbol pairs which is relevant for the wildcard symbol . , the negation operator ! and the replacement operators (introduced later). ALPHABET = [A-Z] [A-Z]:[a-z] The expression on the right-hand side is compiled into a transducer and the set of character pairs is extracted from its transitions. A:. is here identical to A:[Aa] . is identical to .:. [^A-Z] all characters appearing in the alphabet which are not uppercase letters, i.e. the set of lowercase letters. .* maps mixed letter sequences to all uppercase letter sequences (analysis)
Alphabet fullform.lex: house<N><sg> house<>:s<N><pl> walk<V><inf> walk<>:i<>:n<>:g<V><ger> emorph.fst: ALPHABET = [a-zA-Z] [<V><N><sg><pl><ger><inf>]:<> fullform.lex || .* reads the lexicon and deletes the grammatical markers on the surface side.
Orthographic Rules Replace operator: t ^ > (l _ r) applies the mapping implemented in the transducer t in the left context l and right context r. l and r are automata (i.e. transducers mapping strings to themselves.) e-elision: bake<V>ing baking $Morph$ = bake<V>{<ger>}:{ing} ALPHABET = [A-Za-z] <V> $e-elision$ = e:<> ^ > (_ _<V> [ei] ) $Morph$ = $Morph$ || $e-elision$ % delete e before <V>e or <V>i % apply the rule ALPHABET = [A-Za-z] <V>:<> $Morph$ || .* % delete the <V> marker
Orthographic Rules What does this program? $Morph$ = bake<V>{<ger>}:{ing} | crash<V>{<3><sg>}:s | happy<ADJ>{<comp>}:{er} | fly<V>{<3><sg>}:s ALPHABET = [A-Za-z] <V><N><ADJ> $e-elision$ = e:<> ^-> (__<V> [ei]) $e-epenthesis$ = (<V> <>:e) ^-> ([sh]__ s) $y2i$ = y:i ^-> ([^ae] __[<ADJ><V>] e) $y2ie$ = y:{ie} ^-> ([^ae] __ [<V><N>] s) $Morph$ = $Morph$ || $e-elision$ || $e-epenthesis$ || $y2i$ || $y2ie$ ALPHABET = [A-Za-z] [<V><N><ADJ>]:<> $Morph$ || .*
Solution e-epenthesis: crash<V>s crashes y to i: happy<ADJ>er happier fly<V>s flies only in the analyze mode!
Agreement Variables $Morph$ = big<ADJ>{<comp>}:{er} | fat<ADJ>{<comp>}:{er} $cons$ = [bcdfghjklmnpqrstvwxz] $vowel$ = [aeiouy] #=g# = bdglmnpt $g$ = [#=g#] <>:[#=g#] ALPHABET = [A-Za-z] <V><N><ADJ> $gemination$ = $g$ ^-> ($cons$ $vowel$ __<ADJ> e) $Morph$ = $Morph$ || $gemination$ ALPHABET = [A-Za-z] [<V><N><ADJ>]:<> $Morph$ || .*
Solution analyze> bigger big<ADJ><comp> analyze> fater no result for fater analyze> fatter fat<ADJ><comp>