Tuesday, September 28, 2010

Half an hour to teach you learn regular expressions


Must have a lot of people have headaches on a regular expression. Today, I am in my understanding, with some articles online, hoping to use ordinary people can understand the expression. Come and share learning experiences.

Opening, still have to talk about the ^ and $ that they were used to match the beginning and end of the string, the following are examples of

"^ The": there must be the beginning of "The" string;
"Of despair $": the end must be "of despair" in the string;

Then,
"^ Abc $": that is demanded by abc abc at the beginning and end of the string, in fact, only matches abc
"Notice": matches a string that contains notice

You can see that if you do not use our two characters (the last example), that mode (regular expressions) can be tested in the string anywhere, you do not lock into either side of him

Then, talk about''*'',''+'', and''?'',

They can be used to indicate the number of occurrences of a character or sequence. They said:
"Zero or more" equivalent to (0,)
"One or more" equivalent to (1,),
"Zero or one." Equivalent to (0,1), here are some examples:

"Ab *": and ab (0,) synonymous, matching with a beginning, followed by 0 or N can take a b a string ("a", "ab", "abbb", etc.);
"Ab +": and ab (1,) synonymous, ibid article like, but there is at least one b ("ab", "abbb", etc..);
"Ab?": And ab (0,1) synonymous, can not, or only one b;
"A? B + $": matches a 0 or a combined more than one b-terminated string.

Points,''*'',''+'', and''?'' simply ahead of it that character.

You can also brace which limits the number of characters appear, such as

"Ab (2)": requests a later must be used with two b (Not One Less) ("abb");
"Ab (2,)": requirements must be behind a two or more b (such as "abb", "abbbb", etc..);
"Ab (3,5)": can be requested a behind 2-5 b ("abbb", "abbbb", or "abbbbb").

Now we must put parentheses in a few characters, such as:
"A (bc) *": matches a followed by 0 or a "bc";
"A (bc) (1,5)": 1-5 "bc."

There is also a character''鈹?#39;', equivalent to OR operation:

"Hi 鈹?hello": matches with "hi" or "hello" string;
"(B 鈹?cd) ef": match with "bef" or "cdef" string;
"(A 鈹?b) * c": This match contains a number (including 0) a or b, followed by a string c;

A point (''.'') can represent all the single characters, not including the "n"

If, to match including the "n" including all the individual characters, how do?
By the way, with''[n.]''this pattern.

"A. [0-9]": one a plus one character plus a number from 0 to 9
"^. (3) $": any characters at the end of the three.

The content enclosed in brackets matches only a single character

"[Ab]": matches a single a or b (and "a 鈹?b" the same);
"[Ad]": matches the''a''to''d''of a single character (and "a 鈹?b 鈹?c 鈹?d" are "[abcd]" the same effect); general we all with [a-zA-Z] to specify the character as a case in English
"^ [A-zA-Z]": matches a string beginning with uppercase and lowercase letters
"[0-9]%": match with a string of the form x%
", [A-zA-Z0-9] $": matches a comma plus a string of numbers or letters at the end of

You can also put you do not want to quest for the characters listed in the brackets, you only need to use''^'' inside brackets in the total as at the beginning of "% [^ a-zA-Z]%" matches with the two percent No. There is a non-letter string.

Highlights: ^ used in brackets at the beginning of time, it means rule out the characters in parentheses

For PHP to explain, you have added in front of these characters and some of the characters escape'''',.

Do not forget that the characters in the brackets which are exceptions to this rule the road - in the brackets which all the special characters, including (''''), will lose their special properties "[*+?{}.] "matches a string containing these characters.

Also, as regx manual tells us: "If the list contains'']'', best to list it as the first character (possibly with the''^'' back). Where it''- '', the best place it in front or end face, or a second or a range of end point [ad-0-9] the middle '-' will be effective.

Read the example above, you are (n, m) should understand it. Should be noted that, n and m can be negative integers, and n is always less than m. Thus, to match at least n times and the maximum matching m times. such as "p (1,5)" will match "pvpppppp" in the first five p

Talk to at the beginning of the following

b book that he is used to match a word boundary, that is ... such as''veb'', can match the love in the ve not match there very ve

Just above the b B and the opposite. I do not cite the example of

Well, we have to make a application:

How to build a model to match the quantity of input

Build a matching model to check whether the information entered, said money for a number. We think that the amount of money there are four ways: "10000.00" and "10,000.00", or no fractional part, "10000" and "10,000". Now let's start building that matches the pattern:

^ [1-9] [0-9] * $

This is the variable must be non-0 digit. But this also means that a single "0" also can not test. The following is the solution:
^ (0 鈹?[1-9] [0-9] *) $

"Only 0 to 0 at the beginning and not the number of matching", we can also allow a minus sign before the figures:
^ (0 鈹?-? [1-9] [0-9] *) $

That is: "0 or 1 to 0 at the beginning and may have a negative sign in front of the figure." Okay, now let us do so strict, allowing for 0 at the beginning. Now let us give up a negative sign, as we said When money does not need to use. We now specify the model used to match the fractional part:
^ [0-9 ]+(.[ 0-9 ]+)?$

This implies that the string must have at least matched by an Arabic numeral at the beginning. But note that in the above model "10." Does not match, only "10" and "10.2" can. (You know why right)

^ [0-9 ]+(.[ 0-9] (2))? $

We specified above must have two decimal places behind the decimal point. If you think this is too harsh, you can change:
^ [0-9 ]+(.[ 0-9] (1,2))? $

This will allow one or two behind the decimal point character. Now we add to increase the readability of the comma (every three), we can be said:
^ [0-9] (1,3) (, [0-9] (3 })*(.[ 0-9] (1,2))? $

Do not forget''+'' can be''*'' alternative if you want to allow empty string is entered, then (why?). Do not forget the anti-diagonal''string in the php error may occur (very common error).

Now, we can confirm that a string, we now have removed all the commas str_replace (",", "", $ money) and then type as in the double and then we can do through his mathematical the.

Encore:

Construct a regular expression check email

In a complete email address has three parts:
1. User name (all in''@'' left),
2 .''@'',
3. Server name (the part that is left).
User name can contain uppercase and lowercase letters digits, periods (''.''), minus (''-''), and underlined (''_''). server name is consistent with this rule, of course, the next except crossed.

Now, the beginning and the end user name can not be the end. Server as well. There you can not have two consecutive periods at least one character between them, well now we look at how to write a user name match pattern:

^ [_a-ZA-Z0-9-] + $

Now allow the existence of a full stop. We put it together:
^ [_a-ZA-Z0-9-]+(.[_ a-zA-Z0-9-]+)*$

The above would mean: "to regulate at least one character (except.) At the beginning, followed by 0 or more a point to the beginning of the string."

Simplification that we can use eregi () replaced ereg (). Eregi () are case insensitive, we do not need to specify the two range "a-z" and "A-Z" - only need to specify a on it:
^ [_a-Z0-9-]+(.[_ a-z0-9-]+)*$

Behind the server name is the same, but remove the underscore:
^ [A-z0-9-]+(.[ a-z0-9-]+)*$

Good. Now just use "@" to connect two parts:
^ [_a-Z0-9-]+(.[_ a-z0-9-]+)*@[ a-z0-9-]+(.[ a-z0-9-]+)*$

This is the complete email authentication matches the pattern, and only need to call
eregi ('^[_ a-z0-9-]+(.[_ a-z0-9-]+)*@[ a-z0-9-]+(.[ a-z0-9-]+)*$ ', $ eamil)
Whether email can be a

Other uses of regular expressions

Extract string
ereg () and eregi () has a feature that allows users to extract regular expressions as part of the string, for example, we would like from the path / URL extracted file name - the following code is what you need:
ereg ("([^ \ /]*)$", $ pathOrUrl, $ regs);
echo $ regs [1];

Advanced substitution
ereg_replace () and eregi_replace () is also very useful: If we want all the intervals are replaced with a comma minus sign:
ereg_replace ("[nrt] +", ",", trim ($ str));

Finally, I check the other strings EMAIL regular expression to see the article for you to analyze.
"^[-!#$%&''*+ \ ./0-9 =? AZ ^ _ `az {|}~]+''.''@''.''[-!#$ %&''*+ \ / 0-9 =? AZ ^ _ `az {|}~]+.''.''[-!#$%&''*+ \ ./0-9 = ? AZ ^ _ `az {|}~]+$"
If you can easily read, that the purpose of this article reached.






相关链接:



News About Languages Education



A flash drive to buy the real Experience



AVI To MOV



Authorware make poetry OF the sentence



Jinshan valve management upgrade erp sweet taste



In October 2007 The Sixth New York International Outsourcing Exhibition



3GP to FLV



REVIEWS Covert Surveillance



On the memory LEAK (1)



DV to AVI



Graphic Editors introduction



Clever and FTP PERMISSIONS under control related forums



Expert Vehicles - Screen Savers



JSF To Pass Parameters Via URL



Pocket Baidu claims 10 MILLION downloads have been pushed Android version has expired



BI The New World: Semi-finals Games, Who The Winner?



3% discount notebook supplier return profits BEHIND



No comments:

Post a Comment