Current location - Training Enrollment Network - Mathematics courses - What is a regular expression? for instance
What is a regular expression? for instance
At present, regular expressions have been widely used in many softwares, including *nix(Linux, Unix, etc. ), HP and other operating systems, PHP, C#, Java and other development environments, as well as many application software.

Using regular expressions can achieve powerful functions through simple methods. In order to be simple, effective and powerful, regular expression codes are difficult to learn, so we should work hard. After getting started, reference is relatively simple and effective.

Example:. +@.+\ \.+$

2. The history of regular expressions

The ancestors of regular expressions can be traced back to the early research on how the human nervous system works. Two neurophysiologists, Warren mcculloch and Walter Pitts, developed a mathematical method to describe these neural networks.

1956, a mathematician named stephen crainey published a paper entitled "Characterization of Neural Network Events" based on the early work of mcculloch and Pitts, and introduced the concept of regular expression. Regular expression is used to describe what he called "regular set algebra", so the term "regular expression" is adopted.

Subsequently, it was found that this work can be applied to some early research on computational search algorithm by Ken Thompson, the main inventor of Unix. The first practical application of regular expressions is the qed editor in Unix.

As they say, the rest is known as history. Since then, regular expressions have become an important part of text-based editors and search tools.

3. Regular expression definition

Regular expressions describe a pattern of string matching, which can be used to check whether a string contains a substring, replace a matched substring, or extract a substring that meets certain conditions from a string.

* is displayed when the directory is listed. txt in dir *。 Txt or ls *. Txt is not a regular expression, because the meaning of * here is different from that of regular *.

Regular expressions are text patterns composed of ordinary characters (such as characters A to Z) and special characters (called metacharacters). As a template, regular expressions match character patterns with searched strings.

3. 1 ordinary characters

It consists of all printed and non-printed characters that are not explicitly designated as metacharacters. This includes all uppercase and lowercase alphabetic characters, all numbers, all punctuation marks and some symbols.

3.2 Character meaning of non-printed characters

\cx matches the control character represented by x, for example, \cM matches Control-M or carriage return. The value of x must be one of a-z or a-z. Otherwise, consider c as the "c" character in its original meaning.

\f Match page breaks. Equivalent to \x0c and \cL.

\n Matches a newline character. Equivalent to \ and \cJ.

\r match carriage return. Equivalent to \x0d and \cM.

\s Matches any white space characters, including spaces, tabs, page breaks, etc. Equivalent to [\f\n\r\t\v].

\S matches any non-white space character. Equivalent to [\ f \ n \ r \ t \ v].

\ tpatch tab. Equivalent to \x09 and \cI.

\v Match vertical tabs. Equivalent to \x0b and \cK.

3.3 Special characters

The so-called special characters are characters with special meanings, such as "*" in "*". The meaning of "txt" is simply the meaning of any string. If you want to find a file with an * in the file name, you need to escape the *, that is, add a \ sign in front of it. ls \*。 Txt. Regular expressions have the following special characters.

Special character description

$ matches the end position of the input string. If the Multiline property of the RegExp object is set, $ will also match "\n" or "\r". To match the $ character itself, use \ $.

() Marks the start and end positions of the subexpression. Subexpressions can be obtained for later use. To match these characters, use \ (and \).

* Matches the previous subexpression zero or more times. To match the * character, use \ *.

+Matches the previous subexpression one or more times. To match the+character, use \+

. Matches any single character except a line break \ n ... , use \.

[Marks the beginning of a parenthetical expression. To match [,use \ [.

Matches the previous subexpression zero or once, or indicates a non-greedy qualifier. Want to match? Character, please use \? .

\ Marks the next character as a special character, literal character, backward reference or octal escape character. For example, "n" matches the character "n". \n' matches a line break. Sequence "\ \" matches "\" and "\ ("matches "(").

Matches the starting position of the input string unless it is used in a square bracket expression, in which case it means that the character set is not accepted. To match the characters themselves, use \.

{Marks the beginning of a qualifier expression. To match {,use \ {.

| means to choose between two items. To match |, use \ |.

The method of constructing regular expressions is the same as that of creating mathematical expressions. That is, small expressions are combined with various metacharacters and operators to create larger expressions. The components of a regular expression can be a single character, a group of characters, a series of characters, a choice between characters or any combination of all these components.

3.4 qualifiers

Qualifiers are used to specify how many times a given component of a regular expression must appear to satisfy a match. Is there a * or+or? Or {n} or {n,} or {n, m} * *.

*,+and? Qualifiers are greedy because they will match as many words as possible and only add one? Non-greedy or minimum matching can be achieved.

Qualifiers for regular expressions are:

Role description

* Matches the previous subexpression zero or more times. For example, zo* can match "z" and "zoo". * equivalent to {0,}.

+Matches the previous subexpression one or more times. For example,' zo+'can match' ‘zo+' and' zoo', but it can't match' z'. +is equivalent to {1,}.

Matches the previous subexpression zero times or once. For example, "do (es)?" You can use "do" in "do" or "does". ? Equivalent to {0, 1}.

{n} n is a non-negative integer. Match the determined n times. For example, "o{2}" cannot match the "o" in "Bob", but it can match two "o" in "food".

{n,} n is a non-negative integer. Match at least n times. For example,' o{2,}' cannot match' o' in' Bob', but it can match all o's in' foooood'. O{ 1,}' is equivalent to' o+'. O{0,}' is equivalent to "o*".

{n, m} m and n are nonnegative integers, where n

3.5 Locator

Used to describe the boundary of a character string or a word, where \ and $ respectively indicate the beginning and end of the character string, \b indicates the boundary before or after the word, and \B indicates the non-word boundary. Qualifiers cannot be used for locators.

3.6 choice

Enclose all options in parentheses and separate adjacent options with |. However, using parentheses will have a side effect, that is, related matches will be cached and available at this time? : put it before the first option to eliminate this side effect.

Among them? : is one of the non-capture elements and has two non-capture elements. = Then what? ! These two have more meanings. The former is a positive pre-check, where the regular expression pattern starts to match the search string in parentheses, and the latter is a negative pre-check, where the regular expression pattern does not start to match the search string.

3.7 reverse reference

Adding parentheses on both sides of a regular expression pattern or a partial pattern will cause related matches to be stored in a temporary buffer, and each captured sub-match will be stored according to what is encountered from left to right in the regular expression pattern. The number of buffers used to store sub-matches starts from 1 and is numbered continuously until the maximum is 99 sub-expressions. You can use' \n' to access each buffer, where n is a one-or two-digit decimal number that identifies a specific buffer.

Can you use non-capture metacharacters? :', '? =' or'? ! Ignore the saving of related matches.