What is the use of '\G' anchor in regex?

What is the use of '\G' anchor in regex? - php

I'm having a difficulty with understanding how \G anchor works in PHP flavor of regular expressions.
I'm inclined to think (even though I may be wrong) that \G is used instead of ^ in situations when multiple matches of the same string are taking place.
Could someone please show an example of how \Gshould be used, and explain how and why it works?

UPDATE
\G forces the pattern to only return matches that are part of a continuous chain of matches. From the first match each subsequent match must be preceded by a match. If you break the chain the matches end.
<?php
$pattern = '#(match),#';
$subject = "match,match,match,match,not-match,match";
preg_match_all( $pattern, $subject, $matches );
//Will output match 5 times because it skips over not-match
foreach ( $matches[1] as $match ) {
echo $match . '<br />';
}
echo '<br />';
$pattern = '#(\Gmatch),#';
$subject = "match,match,match,match,not-match,match";
preg_match_all( $pattern, $subject, $matches );
//Will only output match 4 times because at not-match the chain is broken
foreach ( $matches[1] as $match ) {
echo $match . '<br />';
}
?>
This is straight from the docs
The fourth use of backslash is for certain simple assertions. An
assertion specifies a condition that has to be met at a particular
point in a match, without consuming any characters from the subject
string. The use of subpatterns for more complicated assertions is
described below. The backslashed assertions are
\G
first matching position in subject
The \G assertion is true only when the current matching position is at
the start point of the match, as specified by the offset argument of
preg_match(). It differs from \A when the value of offset is non-zero.
http://www.php.net/manual/en/regexp.reference.escape.php
You will have to scroll down that page a bit but there it is.
There is a really good example in ruby but it is the same in php.
How the Anchor \z and \G works in Ruby?

\G will match the match boundary, which is either the beginning of the string, or the point where the last character of last match is consumed.
It is particularly useful when you need to do complex tokenization, while also making sure that the tokens are valid.
Example problem
Let us take the example of tokenizing this input:
input 'some input in quote' more input '\'escaped quote\'' lots#_$of_fun ' \' \\ ' crazy'stuff'
Into these tokens (I use ~ to denote end of string):
input~
some input in quote~
more~
input~
'escaped quote'~
lots#_$of_fun~
' \ ~
crazy~
stuff~
The string consists of a mix of:
Singly quoted string, which allows the escape of \ and ', and spaces are conserved. Empty string can be specified using singly quoted string.
OR unquoted string, which consists of a sequence of non-white-space characters, and does not contain \ or '.
Space between 2 unquoted string will delimit them. Space is not necessary to delimit other cases.
For the sake of simplicity, let us assume the input does not contain new line (in real case, you need to consider it). It will add to the complexity of the regex without demonstrating the point.
The RAW regex for singly quoted string is '(?:[^\\']|\\[\\'])*+'
And the RAW regex for unquoted string is [^\s'\\]++
You don't need to care too much about the 2 piece of regex above, though.
The solution below with \G can make sure that when the engine fails to find any match, all characters from the beginning of the string to the position of last match has been consumed. Since it cannot skip character, the engine will stop matching when it fails to find valid match for both specifications of tokens, rather than grabbing random stuff in the rest of the string.
Construction
At the first step of construction, we can put together this regex:
\G(?:'((?:[^\\']|\\[\\'])*+)'|([^\s'\\]++))
Or simply put (this is not regex - just to make it easier to read):
\G(Singly_quote_regex|Unquoted_regex)
This will match the first token only, since when it attempts matching for the 2nd time, the match stops at the space before 'some input....
We just need to add a bit to allow for 0 or more space, so that in the subsequent match, the space at the position left off by the last match is consumed:
\G *+(?:'((?:[^\\']|\\[\\'])*+)'|([^\s'\\]++))
The regex above will now correctly identify the tokens, as seen here.
The regex can be further modified so that it returns the rest of the string when the engine fails to retrieve any valid token:
\G *+(?:'((?:[^\\']|\\[\\'])*+)'|([^\s'\\]++)|((?s).+$))
Since the alternation is tried in order from left-to-right, the last alternative ((?s).+$) will be match if and only if the string ahead doesn't make up a valid single quoted or unquoted token. This can be used to check for error.
The first capturing group will contain the text inside single quoted string, which needs extra processing to turn into the desired text (it is not really relevant here, so I leave it as an exercise to the readers). The second capturing group will contain the unquoted string. And the third capturing group acts as an indicator that the input string is not valid.
Demo for the final regex
Conclusion
The above example is demonstrate of one scenario of usage of \G in tokenization. There can be other usages that I haven't come across.

Related

Using RegEx to find a string (as variable number)

How can I find numbers inside certain strings in php?
For example, having this text inside a page, I would like to find for
|||12345|||
or
|||354|||
I'm interested in the numbers, they always change according to the page I visit (numbers being the id of the page and 3-5 characters length).
So the only thing I know for sure is those pipes surrounding the numbers.
Thanks in advance.

Using this \|\|\|\K\d{3,5}(?=\|\|\|)
gives many advantages.
https://regex101.com/r/LtbKfM/1
First, three literals without a quantifier is a simple strncmp() c
call. Also, anytime a regex starts with an assertion it is
inherently slower. Therefore, this is the fastest match for the 3
leading pipe symbols.
Second, using the \K construct excludes whatever was previously
matched from group 0. We don't want to get the 3 pipes in the
match, but we do want to match them.
edit
Note that capture group results are not stored in a special string
buffer.
Each group is really a pointer (or offset) and a length.
The pointer (or offset) is to somewhere in the source string.
When it comes time to extract a particular group string, the overload function for braces
matches[#] uses the pointer (or offset) and length to create and return a string instance.
Using the \K construct simply sets the group 0 pointer (or offset)
to the position in the string that represents the position that
matched after the \K construct.
Third, using a lookahead assertion for 3 pipe symbols does not
consume the symbols as far as the next match is concerned. This
makes these symbols available for the next match. I.e:
|||999|||888||| would get 2 matches as would
|||999|||||888|||.
The result is an array of just the numbers.
Formatted
\|\|\| # 3 pipe symbols
\K # Exclude previous items from the match (group 0)
\d{3,5} # 3-5 digits
(?= \|\|\| ) # Assertion, not consumed, 3 pipe symbols ahead

While #S.Kablar's suggestion is pretty valid, it makes use of a syntax that may be difficult for a beginner.
The more casual way to achieve your goal would be as follows:
$text = 'your input string';
if (preg_match_all('~\|{3}(\d+)\|{3}~', $text, $matches)) {
foreach($matches[1] as $number) {
var_dump($number); // prints smth like string(3) "345"
}
}
The breakdown of the regex:
~ and ~ surround the expression
\| stands for the pipe, which is a special character in regex and must be escaped with a backslash
{3} says 'the previous (the pipe) must be present exactly three times'
( and ) enclose a subpattern so that it is stored under $matches[1]
\d requires a digit
+ says 'the previous (a digit) may be repeated but must have at least one instance'

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.

A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.

You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.

In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

PHP regular expressions - pattern error

I am trying to search for some pattern in PHP with the help of preg_match. Search pattern is like this (but this is wrong):
/[\d\s*-\s*\d\s*(usd|eur)]{1}/i
\d starts with integer,
\s* there can be any number of whitespaces,
- there must be exactly one minus sign
\s* there can be any number of whitespaces,
\d then must be integer
\s* there can be any number of whitespaces,
(usd|eur) any of the following words must be present but one
[\d\s*-\s*\d\s*(usd|eur)]{1} - in string there should be exactly one occurence
the above pattern does not work, what I am doing wrong? For testing:
<?php
$pattern = '/[\d\s*-\s*\d\s*(usd|eur)]{1}/i';
$query = '100-120 100-120';
echo $pattern.'<br/>';
echo $query.'<br/>';
if(preg_match($pattern, $query))
echo 'OK';
else
echo 'not OK!';
?>
Note:
I am trying to pull out data like this:
The price of item is 100 - 120 usd in our market

[...] is a character class. It means "match any one of these characters". [abc] will match a,b, or c. It doesn't match the string "abc".
In addition:
{1} means "match the preceding expression one time". However, matching once is the default. There is no need to explicitly tell it to match one time.
\d matches a single numeric digit. Based on your example, you want \d+ - match a number made up of at least one digit.
Here is what your pattern should look like:
/\d+\s*-\s*\d+\s*(usd|eur)/i

Regular expressions are a powerful tool for examining and modifying text. Regular expressions themselves, with a general pattern notation almost like a mini programming language, allow you to describe and parse text. They enable you to search for patterns within a string, extracting matches flexibly and precisely. However, you should note that because regular expressions are more powerful, they are also slower than the more basic string functions. You should only use regular expressions if you have a particular need.
This tutorial gives a brief overview of basic regular expression syntax and then considers the functions that PHP provides for working with regular expressions.
The Basics
Matching Patterns
Replacing Patterns
Array Processing
PHP supports two different types of regular expressions: POSIX-extended and Perl-Compatible Regular Expressions (PCRE). The PCRE functions are more powerful than the POSIX ones, and faster too, so we will concentrate on them.
The Basics
In a regular expression, most characters match only themselves. For instance, if you search for the regular expression "foo" in the string "John plays football," you get a match because "foo" occurs in that string. Some characters have special meanings in regular expressions. For instance, a dollar sign ($) is used to match strings that end with the given pattern. Similarly, a caret (^) character at the beginning of a regular expression indicates that it must match the beginning of the string. The characters that match themselves are called literals. The characters that have special meanings are called metacharacters.
The dot (.) metacharacter matches any single character except newline (). So, the pattern h.t matches hat, hothit, hut, h7t, etc. The vertical pipe (|) metacharacter is used for alternatives in a regular expression. It behaves much like a logical OR operator and you should use it if you want to construct a pattern that matches more than one set of characters. For instance, the pattern Utah|Idaho|Nevada matches strings that contain "Utah" or "Idaho" or "Nevada". Parentheses give us a way to group sequences. For example, (Nant|b)ucket matches "Nantucket" or "bucket". Using parentheses to group together characters for alternation is called grouping.
If you want to match a literal metacharacter in a pattern, you have to escape it with a backslash.
To specify a set of acceptable characters in your pattern, you can either build a character class yourself or use a predefined one. A character class lets you represent a bunch of characters as a single item in a regular expression. You can build your own character class by enclosing the acceptable characters in square brackets. A character class matches any one of the characters in the class. For example a character class [abc] matches a, b or c. To define a range of characters, just put the first and last characters in, separated by hyphen. For example, to match all alphanumeric characters: [a-zA-Z0-9]. You can also create a negated character class, which matches any character that is not in the class. To create a negated character class, begin the character class with ^: [^0-9].
The metacharacters +, *, ?, and {} affect the number of times a pattern should be matched. + means "Match one or more of the preceding expression", * means "Match zero or more of the preceding expression", and ? means "Match zero or one of the preceding expression". Curly braces {} can be used differently. With a single integer, {n} means "match exactly n occurrences of the preceding expression", with one integer and a comma, {n,} means "match n or more occurrences of the preceding expression", and with two comma-separated integers {n,m} means "match the previous character if it occurs at least n times, but no more than m times".
Now, have a look at the examples:
Regular Expression Will match...
foo The string "foo"
^foo "foo" at the start of a string
foo$ "foo" at the end of a string
^foo$ "foo" when it is alone on a string
[abc] a, b, or c
[a-z] Any lowercase letter
[^A-Z] Any character that is not a uppercase letter
(gif|jpg) Matches either "gif" or "jpeg"
[a-z]+ One or more lowercase letters
[0-9\.\-] Аny number, dot, or minus sign
^[a-zA-Z0-9_]{1,}$ Any word of at least one letter, number or _
([wx])([yz]) wy, wz, xy, or xz
[^A-Za-z0-9] Any symbol (not a number or a letter)
([A-Z]{3}|[0-9]{4}) Matches three letters or four numbers
Perl-Compatible Regular Expressions emulate the Perl syntax for patterns, which means that each pattern must be enclosed in a pair of delimiters. Usually, the slash (/) character is used. For instance, /pattern/.
The PCRE functions can be divided in several classes: matching, replacing, splitting and filtering.
Matching Patterns
The preg_match() function performs Perl-style pattern matching on a string. preg_match() takes two basic and three optional parameters. These parameters are, in order, a regular expression string, a source string, an array variable which stores matches, a flag argument and an offset parameter that can be used to specify the alternate place from which to start the search:
preg_match ( pattern, subject [, matches [, flags [, offset]]])
The preg_match() function returns 1 if a match is found and 0 otherwise. Let's search the string "Hello World!" for the letters "ll":
<?php
if (preg_match("/ell/", "Hello World!", $matches)) {
echo "Match was found <br />";
echo $matches[0];
}
?>
The letters "ll" exist in "Hello", so preg_match() returns 1 and the first element of the $matches variable is filled with the string that matched the pattern. The regular expression in the next example is looking for the letters "ell", but looking for them with following characters:
<?php
if (preg_match("/ll.*/", "The History of Halloween", $matches)) {
echo "Match was found <br />";
echo $matches[0];
}
?>
Now let's consider more complicated example. The most popular use of regular expressions is validation. The example below checks if the password is "strong", i.e. the password must be at least 8 characters and must contain at least one lower case letter, one upper case letter and one digit:
<?php
$password = "Fyfjk34sdfjfsjq7";
if (preg_match("/^.*(?=.{8,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*$/", $password)) {
echo "Your passwords is strong.";
} else {
echo "Your password is weak.";
}
?>
The ^ and $ are looking for something at the start and the end of the string. The ".*" combination is used at both the start and the end. As mentioned above, the .(dot) metacharacter means any alphanumeric character, and * metacharacter means "zero or more". Between are groupings in parentheses. The "?=" combination means "the next text must be like this". This construct doesn't capture the text. In this example, instead of specifying the order that things should appear, it's saying that it must appear but we're not worried about the order.
The first grouping is (?=.{8,}). This checks if there are at least 8 characters in the string. The next grouping (?=.[0-9]) means "any alphanumeric character can happen zero or more times, then any digit can happen". So this checks if there is at least one number in the string. But since the string isn't captured, that one digit can appear anywhere in the string. The next groupings (?=.[a-z]) and (?=.[A-Z]) are looking for the lower case and upper case letter accordingly anywhere in the string.
Finally, we will consider regular expression that validates an email address:
<?php
$email = firstname.lastname#aaa.bbb.com;
$regexp = "/^[^0-9][A-z0-9_]+([.][A-z0-9_]+)*[#][A-z0-9_]+([.][A-z0-9_]+)*[.][A-z]{2,4}$/";
if (preg_match($regexp, $email)) {
echo "Email address is valid.";
} else {
echo "Email address is <u>not</u> valid.";
}
?>
This regular expression checks for the number at the beginning and also checks for multiple periods in the user name and domain name in the email address. Let's try to investigate this regular expression yourself.
For the speed reasons, the preg_match() function matches only the first pattern it finds in a string. This means it is very quick to check whether a pattern exists in a string. An alternative function, preg_match_all(), matches a pattern against a string as many times as the pattern allows, and returns the number of times it matched.
Replacing Patterns
In the above examples, we have searched for patterns in a string, leaving the search string untouched. The preg_replace() function looks for substrings that match a pattern and then replaces them with new text. preg_replace() takes three basic parameters and an additional one. These parameters are, in order, a regular expression, the text with which to replace a found pattern, the string to modify, and the last optional argument which specifies how many matches will be replaced.
preg_replace( pattern, replacement, subject [, limit ])
The function returns the changed string if a match was found or an unchanged copy of the original string otherwise. In the following example we search for the copyright phrase and replace the year with the current.
<?php
echo preg_replace("/([Cc]opyright) 200(3|4|5|6)/", "$1 2007", "Copyright 2005");
?>
In the above example we use back references in the replacement string. Back references make it possible for you to use part of a matched pattern in the replacement string. To use this feature, you should use parentheses to wrap any elements of your regular expression that you might want to use. You can refer to the text matched by subpattern with a dollar sign ($) and the number of the subpattern. For instance, if you are using subpatterns, $0 is set to the whole match, then $1, $2, and so on are set to the individual matches for each subpattern.
In the following example we will change the date format from "yyyy-mm-dd" to "mm/dd/yyy":
<?php
echo preg_replace("/(\d+)-(\d+)-(\d+)/", "$2/$3/$1", "2007-01-25");
?>
We also can pass an array of strings as subject to make the substitution on all of them. To perform multiple substitutions on the same string or array of strings with one call to preg_replace(), we should pass arrays of patterns and replacements. Have a look at the example:
<?php
$search = array ( "/(\w{6}\s\(w{2})\s(\w+)/e",
"/(\d{4})-(\d{2})-(\d{2})\s(\d{2}:\d{2}:\d{2})/");
$replace = array ('"$1 ".strtoupper("$2")',
"$3/$2/$1 $4");
$string = "Posted by John | 2007-02-15 02:43:41";
echo preg_replace($search, $replace, $string);?>
In the above example we use the other interesting functionality - you can say to PHP that the match text should be executed as PHP code once the replacement has taken place. Since we have appended an "e" to the end of the regular expression, PHP will execute the replacement it makes. That is, it will take strtoupper(name) and replace it with the result of the strtoupper() function, which is NAME.
Array Processing
PHP's preg_split() function enables you to break a string apart basing on something more complicated than a literal sequence of characters. When it's necessary to split a string with a dynamic expression rather than a fixed one, this function comes to the rescue. The basic idea is the same as preg_match_all() except that, instead of returning matched pieces of the subject string, it returns an array of pieces that didn't match the specified pattern. The following example uses a regular expression to split the string by any number of commas or space characters:
<?php
$keywords = preg_split("/[\s,]+/", "php, regular expressions");
print_r( $keywords );
?>
Another useful PHP function is the preg_grep() function which returns those elements of an array that match a given pattern. This function traverses the input array, testing all elements against the supplied pattern. If a match is found, the matching element is returned as part of the array containing all matches. The following example searches through an array and all the names starting with letters A-J:
<?php
$names = array('Andrew','John','Peter','Nastin','Bill');
$output = preg_grep('/^[a-m]/i', $names);
print_r( $output );
?>

Getting regular expression

How can i extract https://domain.com/gamer?hid=.115f12756a8641 from the below string ,i.e from url
rrth:'http://www.google.co',cctp:'323',url:'https://domain.com/gamer?hid=.115f12756a8641',rrth:'https://another.com'
P.s :I am new to regular expression, I am learning .But above string seems to be formatted..so some sort of shortcut must be there.

If your input string is called $str:
preg_match('/url:\'(.*?)\'/', $str, $matches);
$url = $matches[1];
(.*?) captures everything between url:' and ' and can later be retrieved with $matches[1].
The ? is particularly important. It makes the repetition ungreedy, otherwise it would consume everything until the very last '.
If your actual input string contains multiple url:'...' section, use preg_match_all instead. $matches[1] will then be an array of all required values.

Simple regex:
preg_match('/url\s*\:\s*\'([^\']+)/i',$theString,$match);
echo $match[1];//should be the url
How it works:
/url\s*\:\s*: matches url + [any number of spaces] + : (colon)+ [any number of spaces]But we don't need this, that's where the second part comes in
\'([^\']+)/i: matches ', then the brackets (()) create a group, that will be stored separately in the $matches array. What will be matches is [^']+: Any character, except for the apostrophe (the [] create a character class, the ^ means: exclude these chars). So this class will match any character up to the point where it reaches the closing/delimiting apostrophe.
/i: in case the string might contain URL:'http://www.foo.bar', I've added that i, which is the case-insensitive flag.
That's about it.Perhaps you could sniff around here to get a better understanding of regex's
note: I've had to escape the single quotes, because the pattern string uses single quotes as delimiters: "/url\s*\:\s*'([^']+)/i" works just as well. If you don't know weather or not you'll be dealing with single or double quotes, you could replace the quotes with another char class:
preg_match('/url\s*\:\s*[\'"]([^\'"]+)/i',$string,$match);
Obviously, in that scenario, you'll have to escape the delimiters you've used for the pattern string...

Regex Question: Matching this pattern with hard or soft quotes

I have this anchor locating regex working pretty well:
$p = '%<a.*\s+name="(.*)"\s*>(?:.*)</a>%im';
It matches <a followed by zero or more of anything followed by a space and name="
It is grabbing the names even if a class or an id precedes the name in the anchor.
What I would like to add is the ability to match on name=' with a single quote (') as well since sooner or later someone will have done this.
Obviously I could just add a second regex written for this but it seems inelegant.
Anyone know how to add the single quote and just use one regex? Any other improvements or recommendations would be very welcome. I can use all the regex help I can get!
Thanks very much for reading,
function findAnchors($html) {
$names = array();
$p = '%<a.*\s+name="(.*)"\s*>(?:.*)</a>%im';
$t = preg_match_all($p, $html, $matches, PREG_SET_ORDER);
if ($matches) {
foreach ($matches as $m) {
$names[] = $m[1];
}
return $names;
}
}

James' comment is actually a very popular, but wrong regex used for string matching. It's wrong because it doesn't allow for escaping of the string delimiter. Given that the string delimiter is ' or " the following regex works
$regex = '([\'"])(.*?)(.{0,2})(?<![^\\\]\\\)(\1)';
\1 is the starting delimeter, \2 is the contents (minus 2 characters) and \3 is the last 2 characters and the ending delimiter. This regex allows for escaping of delimiters as long as the escape character is \ and the escape character hasn't been escaped. IE.,
'Valid'
'Valid \' String'
'Invalid ' String'
'Invalid \\' String'

Try this:
/<a(?:\s+(?!name)[^"'>]+(?:"[^"]*"|'[^']*')?)*\s+name=("[^"]*"|'[^']*')\s*>/im
Here you just have to strip the surrounding quotes:
substr($match[1], 1, -1)
But using a real parser like DOMDocument would be certainly better that this regular expression approach.

Use [] to match character sets:
$p = "%<a.*\s+name=['\"](.*)['\"]\s*>(?:.*)</a>%im";

Your current solution won't match anchors with other attributes following 'name' (e.g. <a name="foo" id="foo">).
Try:
$regex = '%<a\s+\S*\s*name=["']([^"']+)["']%i';
This will extract the contents of the 'name' attribute into the back reference $1.
The \s* will also allow for line breaks between attributes.
You don't need to finish off with the rest of the 'a' tag as the negated character class [^"']+ will be lazy.

Here's another approach:
$rgx='~<a(?:\s+(?>name()|\w+)=(?|"([^"]*)"|\'([^\']*)\'))+?\1~i';
I know this question is old, but when it resurfaced just now I thought up another use for the "empty capturing groups as checkboxes" idiom from the Cookbook. The first, non-capturing group handles the matching of all "name=value" pairs under the control of a reluctant plus (+?). If the attribute name is literally name, the empty group (()) matches nothing, then the backreference (\1) matches nothing again, breaking out of the loop. (The backreference succeeds because the group participated in the match, even though it didn't consume any characters.)
The attribute value is captured each time in group #2, overwriting whatever was captured on the previous iteration. (The branch-reset construct ((?|(...)|(...)) enables us to "re-use" group #2 to capture the value inside the quotes, whichever kind of quotes they were.) Since the loop quits after the name name comes up, the final captured value corresponds to that attribute.
See a demo on Ideone

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

What is the use of '\G' anchor in regex? - php

Related

Using RegEx to find a string (as variable number)

(PHP) How to find words beginning with a pattern and replace all of them?

PHP regular expressions - pattern error

Getting regular expression

Regex Question: Matching this pattern with hard or soft quotes

Categories

Resources