Understanding Regular Expressions and Their Applications

regular expressions n.w
1 / 26
Embed
Share

Regular expressions (regex) go beyond simple string matching, offering a powerful way to search and manipulate text in various programs. They are based on finite state machines and are not tied to a specific programming language. Learn how regex is used in editors, shells, and programs like grep. Explore the special characters and their functionalities in regex to enhance your text searching capabilities.

  • Regular Expressions
  • Text Searching
  • Programming Language
  • Finite State Machines

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Regular expressions A way to match strings beyond just asking if two strings are equal/identical They are equivalent to a abstract machines called finite state machines

  2. Regular expressions are not bound to any single program or any particular programming language The code to handle regular expressions lives in a library called regex This library is used by most software which lets the user deal with strings

  3. Regular expressions An editor lets you use them to locate text with the file The shell uses them to find/name files within a directory The program grep (fgrep,egrep) uses it to look for a string in file, but unlike a text editor, it does it from the outside

  4. Regular expressions normal characters match themselves but, these are special . { ( ) \ ^ $ | ? * +

  5. Regular expressions the next instance of the string pri will be highlighted If you hit enter the cursor/current position will jump to it / searches forward ? searches backward using the vi text editor, searching for the string pri p matches p, r matches r, i matches i (this world is case-sensitive, A does not match a)

  6. Regular expressions Not all programs allowing regular expressions will necessarily allow the full set of possibilities Here is an example of the shell using them, and the use of the first special character * - meaning 0 or more of anything

  7. file is a program which tries to figure out what is in a file (and not by using the file name/extension) another shell example - b* matches bigl and bigl2 so the loop happens two times, once with i equal to bigl and a second time with i equal to bigl2

  8. The special character $ matches the end of the line so nologin$ is only matched when it shows up at the end of a line The meaning of the rest is: 1,$ - from the first line to the last line of the file s/string1/string2/ the leading s stands for substitute If you find string1 change it to string2

  9. The program grep

  10. Show me all lines which start with r in the file /etc/passwd ^ - matches the beginning of the line $ grep ^r /etc/passwd root:x:0:0:root:/root:/bin/bash rick:x:1000:1000:,,,:/home/rick:/bin/bas h $ grep ^r.*oot /etc/passwd root:x:0:0:root:/root:/bin/bash $

  11. $ grep ^D.*ogi.*n$ /etc/passwd Debian-exim:x:111:117::/var/spool/exim4:/usr/sbin/nologin This regular expression says, find string which starts with a capital D, and that D is the first letter on the line D matches itself ^ - matches the start of the line Followed by zero or more of any character . matches any single character * is zero or more Followed by the exact string ogi o matches o g matches g i matches i

  12. $ grep ^D.*ogi.*n$ /etc/passwd Debian-exim:x:111:117::/var/spool/exim4:/usr/sbin/nologin Followed by zero or more of any character . matches any single character * is zero or more Followed by the exact string n n matches n and that n is the last character on the line

  13. $ ls CA.pl.in ct_log_list.cnf fipsinstall.c passwd.c req.pem speed.c asn1pars.c demoSRP gendsa.c pca-cert.srl rsa.c spkac.c build.info dgst.c genpkey.c pca-key.pem rsa8192.pem srp.c ca-cert.srl dhparam.c genrsa.c pca-req.pem rsautl.c storeutl.c ca-key.pem dsa-ca.pem include pkcs12.c s1024key.pem testCA.pem ca-req.pem dsa-pca.pem info.c pkcs7.c s1024req.pem testdsa.h ca.c dsa.c insta.ca.crt pkcs8.c s512-key.pem testrsa.h cert.pem dsa1024.pem kdf.c pkey.c s512-req.pem timeouts.h ciphers.c dsa512.pem lib pkeyparam.c s_client.c ts.c client.pem dsap.pem list.c pkeyutl.c s_server.c tsget.in cmp.c dsaparam.c mac.c prime.c s_time.c verify.c cmp_mock_srv.c ec.c nseq.c privkey.pem server.pem version.c cmp_mock_srv.h ecparam.c ocsp.c progs.pl server.srl vms_decc_init.c cms.c enc.c openssl-vms.cnf rand.c server2.pem x509.c crl.c engine.c openssl.c rehash.c sess_id.c crl2p7.c errstr.c openssl.cnf req.c smime.c $ cat o*.c This cat will print the file oscp.c followed by the file openssl.c The shell expand the string o*.c to oscp.c openssl.c, so the command becomes cat oscp.c openssl.c

  14. The backslash \ is called the escape character (not ASCII 27), so we say the / is escaped , in other words we are saying, do not treat the following character as a special character

  15. Search the entire file system for all files with a names that ends in .c both the shell, and find(1) allow regular expressions We escape the * so the shell will leave it alone and find will get it

  16. So what was the Finite State Machine part?

  17. You may recognize FSMs from other contexts GUI design protocols other human interfaces lot s of stuff in CS/logic

  18. Example simple FSM for an elevator FSM to recognize 111 in binary input Floor you are on ? Button you press Andrew Tuline, tuline.com

  19. finite state machines https://www.youtube.com/watch?v=vhiiia1_hC4 regular expressions https://youtu.be/528Jc3q86F8?t=54

  20. Formal definition of a Finite State Machine A finite finite state Q denotes the finite set of states represents the alphabet that consists of input symbols : Q x = Q denotes the transition function the defines the transition from qito qjQ for each input symbol where qi, qj Q. q0denotes the start state F denotes the set of final states state machine machine is a 5-tuple (Q, , , q0, F) where

  21. Formal definition of a Finite State Machine Q denotes the finite set of states represents the alphabet that consists of input symbols : Q x = Q denotes the transition function the defines the transition from qito qjQ for each input symbol where qi, qj Q. q0denotes the start state F denotes the set of final states

  22. FSM to recognize my oddball integers: 1 1-9 0-9 optional +,- first digit can t be zero any length optional E followed by exponent +,- 2 1-9 3 0-9 E +02 4 -2013e5 no yes yes 4 To test, eat a character, follow the branch, continue until all characters are eaten are you in an accepting state? If so, it is valid, if not, or if no branch existed somewhere along the way, then it is not valid

  23. Q set of states = the circles, 1,2,3,4 the alphabet = +,1,E,0,1,2,3,4,5,6,7,8,9 the transition functions, the arrows Q0 the start state, state 1 here F final states, states 1 & 2 here 1 1-9 0-9 +,- 2 1-9 3 0-9 E 4

  24. finite state machine to recognize an identifier a-z, A-Z, 0-9 1 a-z, A-Z 2 2 start in the start state eat through the string a character at a time take the branch corresponding to the char if there isn t one, stop and say NO after you take the branch for the last character, look down are you in an accepting state? if so, say YES else say NO name3 - yes x - yes equivalent RE to recognize an identifier 3n - no (a-z,A-Z) (a-z,A-Z,0-9)*

  25. most programs which let you deal with strings, also allow you to use regular expressions the domain of discourse varies, for an editor, it is the text in the file you are editing programs like find/grep, it is the contents of the file(s) which are the command line arguments for tools like awk, sed it is the data over which the program will be run for a shell, it is the names of the files in the current directory (or a directory specified in a path) to tell a program to not treat a special RE character as an RE character you need to escape it with \

More Related Content