Using Regular Expressions in Python

regular expressions in python n.w
1 / 25
Embed
Share

Learn about the power of regular expressions in Python and how they can be used to search, replace, and parse text with complex patterns of characters. Explore the features and general uses of regular expressions, along with detailed examples and explanations. Discover how regular expressions can be a valuable tool in constructing compilers, interpreters, and text editors.

  • Regular Expressions
  • Python
  • Text Patterns
  • Search
  • Parsing

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Regular Expressions in Python By Dr. Ziad Al-Sharif

  2. What is a Regular Expression Regular expressions are a: powerful language for matching text patterns and standardized way for searching, replacing, and parsing text with complex patterns of characters Most modern languages have similar library packages for regular expressions E.g., Python has the re built in module Other popular programming languages have Regex capabilities including: Perl, JavaScript, Ruby, Tcl, C++, Java, C#, etc. Regular Expression Features: Used to construct compilers, interpreters and text editors Used to search and match text patterns Used to validate text data formats especially input data

  3. General uses of Regular Expressions Search a string (search and match) Replace parts of a string (sub) Break string into small pieces (split) Finding a string (findall) In python: Before using theregular expressions in your program, you must import the library using "import re" RE Notations Operator Interpretation | Alternative ( ) Grouping ? * + {m, n} ^ $ . [ ] [ - ] [ ^ ] \d \D \w \W .. Quantification Anchors Meta- characters Character classes

  4. Introduction to Computing Using Python Example: General uses of Regular Expressions Suppose we need to find all email addresses in a web page How do we recognize email addresses? What string pattern do emails addresses exhibit? A email address string pattern, informally: An email address consists of: a user ID that is, a sequence of "allowed" characters followed by the @ symbol followed by a hostname that is, a dot-separated sequence of allowed characters A regular expression is a formal way to describe a string pattern A regular expression is a string that consists of characters and regular expression operators

  5. Introduction to Computing Using Python Regular Expression Operators Operator Interpretation . Matches any character except a new line character (\n) * Matches 0 or more repetitions of the regular expression immediately preceding it. So in regular expression ab*, operator * matches 0 or more repetitions of b, not ab + ? [] Matches 1 or more repetitions of the regular expression immediately preceding it Matches 0 or 1 repetitions of the regular expression immediately preceding it Matches any character in the set of characters listed within the square brackets; a range of characters can be specified using the first and last character in the range and putting - in between ^ | {} () $ \ If S is a set or range of characters, then [^S] matches any character not in S If A and Bare regular expressions, A|B matches any string that is matched by A or B Number of occurrences of a preceding RE to match Enclose a group of REs Matches the end Used to drop the special meaning of character following it

  6. Introduction to Computing Using Python Examples: Regular Expression Operator Interpretation Operator Interpretation [Pp]ython \d{3} Match "Python" or "python" Match exactly 3 digits [aeiou] \d{3,} Match any one lowercase vowel Match 3 or more digits [0-9] \d{3,5} Match any digit Match 3, 4, or 5 digits [a-z] Match any lowercase ASCII letter [A-Z] Match any uppercase ASCII letter [a-zA-Z0-9] Match any of lowercase, uppercase, or digits [^aeiou] Match anything other than a lowercase vowel [^0-9] Match anything other than a digit Operator Interpretation . Match any character except newline \d Match a digit: [0-9] \D Match a nondigit: [^0-9] \s Match a whitespace character: [ \t\r\n\f] \S Match nonwhitespace: [^ \t\r\n\f] \w Match a single word character: [A-Za-z0-9_] \W Match a nonword character: [^A-Za-z0-9_]

  7. Introduction to Computing Using Python Regular Expression Operators (1) Regular expression without operators Regular expression Matching strings best best Operator . Regular expression Matching strings be.t best, belt, beet, bezt, be3t, be!t, be t, ... Operators *+? Regular expression Matching strings be*t bt, bet, beet, beeet, beeeet, ... be+t bet, beet, beeet, beeeet, ... bee?t bet, beet

  8. Introduction to Computing Using Python Regular Expression Operators (2) Operator [] Regular expression Matching strings be[ls]t belt, best be[l-o]t belt, bemt, bent, beot be[a-cx-z]t beat, bebt, bect, bext, beyt, bezt Operator ^ Regular expression Matching strings be[^0-9]t belt, best, be#t, ... (but not be4t) be[^xyz]t belt, be5t, ... (but not bext, beyt, and bezt) be[^a-zA-Z]t be!t, be5t, be t, ... (but not beat)

  9. Introduction to Computing Using Python Regular Expression Operators (3) Operator | Regular expression Matching strings hello|Hello hello, Hello. a+|b+ a, b, aa, bb, aaa, bbb, aaaa, bbbb, ... ab+|ba+ ab,abb,abbb,...,andba,baa,baaa,...

  10. Introduction to Computing Using Python Regular Expression Escape Sequences Regular expression operators have special meaning inside regular expressions and cannot be used to match characters '*', '.', or '[' The escape sequence \ must be used instead regular expression '\*\[' matches string '*[' \may also signal a regular expression special sequence Operator \d \D \s Interpretation Matches any decimal digit; equivalent to [0-9] Matches any nondigit character; equivalent to [0-9] Matches any whitespace character including the blank space, the tab \r, the new line \r, and the carriage return \r \S \w \W Matches any non-whitespace character Matches any alphanumeric character; this is equivalent to [a-zA-Z0-9] Matches any nonalphanumeric character; this is equivalent to [^a-zA-Z0-9_]

  11. More Example More Example

  12. Alternative: Eg: "cat|mat" "python|jython" "cat" or "mat" "python" or "jython" Grouping: Eg: gr(e|a)y "ra(mil|n(ny|el))" "grey" or "gray" "ramil" or "ranny" or "ranel"

  13. Quantification: ? Eg: zero or one of the preceding element "rani?el" "raniel" or "ranel" "colou?r" "colour" or "color" * Eg: zero or more of the preceding element "fo*ot" "fot" or "foot" or "foooooot" "94*9" "99" or "9449" or "9444449" + Eg: one or more of the preceding element too+fan "toofan" or "tooooofan" 36+40 "3640" or "3666640" {m,n} Eg: m to n times of the preceding element "go{2,3}gle" "google" or "gooogle" "6{3}" "666" "s{2,}" "ss" or "sss" or "ssss"

  14. Anchors: ^ matches the starting position with in the string Eg: "^obje" "object" or "object oriented" "^2014" "2014" or "2014/20/07" $ Eg: matches the ending position with in the string "gram$" "program" or "kilogram" "2014$" "20/07/2014" or "2013-2014"

  15. Meta-characters: .(dot) matches any single character Eg: "bat." "bat" or "bats" or "bata" "87.1" "8741" or "8751" or"8761" [] brackets Eg: matches a single character that is contained with in the "[xyz]" "[aeiou]" "[0123456789]" "x" or "y" or "z" any vowel any digit [ - ] the brackets and the specified range. Eg: "[a-c]" "a" or "b" or "c" "[a-zA-Z]" all letters (lower & upper) "[0-9]" all digits matches a single character that is contained within [^ ] the brackets. Eg: "[^aeiou]" "[^0-9]" "[^xyz]" matches a single character that is not contained within any non-vowel any non-digit any character, but not "x", "y", or "z"

  16. Summary : Character Classes Character classes specifies a group of characters to match in astring \d \D \s Matches a single white space character[\t-tab,\n-newline, \r-return,\v-space, \f-form] Matches any non-white spacecharacter Matches alphanumeric character class([a-zA-Z0-9_]) Matches non-alphanumeric character class ([^a-zA-Z0-9_]) Matches one or more words / characters Matches word boundaries when outsidebrackets. Matches backspace when inside brackets Matches nonword boundaries Matches beginning of string Matches end of string \z Matches a decimal digit[0-9] Matches non digits \S \w \W \w+ \b \B \A

  17. RE in Python

  18. RE Functions in Python Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions import re p1 = re.compile('ab*') p2 = re.compile('ab*', re.IGNORECASE) Method/ Attribute Purpose compile() The RE is compiled into a pattern object, which have various methods findall() Finds all substrings where the RE matches, and returns them as a list. finditer() Finds all substrings where the RE matches, and returns them as an iterator. split() Split string by the occurrences of a character or a pattern

  19. The findall() Function This function attempts to match a RE pattern to a subject string with optional flags. Returns all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result. Syntax: re.findall(pattern, string, flags = 0) pattern: This is the regular expression to be matched. string This is the string, which would be searched to match the pattern at the beginning of string. flags You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below.

  20. Introduction to Computing Using Python Standard Library module re The Standard Library module re contains regular expression tools Function findall() takes regular expression patternand string text as input and returns a list of all substrings of pattern, from left to right, that match regular expression pattern >>> from re import findall >>> findall('best', 'beetbtbelt?bet, best') ['best'] >>> findall('be.t', 'beetbtbelt?bet, best') ['beet', 'belt', 'best'] >>> findall('be?t', 'beetbtbelt?bet, best') ['bt', 'bet'] >>> findall('be*t', 'beetbtbelt?bet, best') ['beet', 'bt', 'bet'] >>> findall('be+t', 'beetbtbelt?bet, best') ['beet', 'bet']

  21. The finditer() Function This function attempts to match a RE pattern to a subject string with optional flags. Returns an iterator yielding match objects over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result. Syntax: re.finditer(pattern, string, flags = 0) pattern: This is the regular expression to be matched. string This is the string, which would be searched to match the pattern at the beginning of string. flags You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below.

  22. The split() Function Split string by the occurrences of a character or a pattern, upon finding that pattern, the remaining characters from the string are returned as part of the resulting list. Splits a string into a list delimited by the passed pattern. This method is invaluable for converting textual data into data structures that can be easily read and modified by Python Syntax: re.split(pattern, string, maxsplit=0, flags=0) pattern: This is the regular expression to be matched. string This is the string, which would be searched to match the pattern at the beginning of string. flags You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below.

  23. split() Example Eg: >>> p = re.compile(r'\W+') >>> p.split( This is my first split example string') [ This', 'is', 'my', 'first', 'split', 'example'] >>> p.split( This is my first split example string', 3) [ This', 'is', 'my', 'first split example']

  24. RE flags in Python The modifiers are specified as an optional flag. These are modifiers, which are: re.I Performs case-insensitive matching. re.S Makes a period (dot) match any character, including a newline. re.U Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B.

  25. References Regular expression operations https://docs.python.org/3/library/re.html Book: Introduction to Computing Using Python (ch11) https://www.oreilly.com/library/view/introduction-to- computing/9781118213568/ Website: Regular Expressions. https://www.regular-expressions.info/examples.html Regular expression at Wikipedia https://en.wikipedia.org/wiki/Regular_expression How to write Regular Expressions https://www.geeksforgeeks.org/write-regular-expressions/ https://www.geeksforgeeks.org/regular-expression-python-examples-set-1/

More Related Content