I am trying to search a document for tags, to later replace them.
I am using preg_match, but am having some difficulty.
preg_match('/\[.*\]/', $haystack, $matches);
Searching through the following text
[TODAY_DATE]
Re: Demand for Payment
[ADDRESS]
[ADDRESS_LINE_2]
...etc
print_r($matches);
returns
Array ( [0] => [TODAY_DATE] )
How should I adjust my regex to return all matches?
Use preg_match_all. As the name suggests, it matches more than once.
preg_match_all('/\[.*?\]/', $haystack, $matches);
var_dump($matches);
Also use reluctant quantifier .*? instead of greedy one.
Related
I'm trying to use regular expression to match hashtags. When the language of a hashtag is English or Chinese, my code works fine. But when the language is Bengali, my code can't match the whole Bengali word.
Here is the code I'm testing with:
<?php
$hashtag = '#আয়াতুল_কুরসি';
preg_match('/(#\w+)/u', $hashtag, $matches);
print_r($matches);
?>
And the result is:
Array
(
[0] => #আয়
[1] => #আয়
)
I tried changing the pattern to '/(#\p{L}+)/u', but that didn't help.
The fact is that \w here does not match all diacritics that Bengali characters may contain. You need to allow them all:
preg_match('/#[\w\p{M}]+/u', $hashtag, $matches);
See the PHP demo.
I need the help of a regex wizard, or someone who knows more about this than me (which means there's lots of candidates :)
I am trying to match everything that occurs between the first and second slash, excluding those slashes, or nothing if there's no starting and trailing slash:
$subject = '/1234-abcd/blahblah';
$pattern = '/^\/(.*)\//';
preg_match($pattern, $subject, $matches);
print_r($matches);
Here are the results:
Array
(
[0] => /1234-abcd/
[1] => 1234-abcd
)
I'm close. $matches[1] has the result I'm after, but it's not matching this as its first array item (and instead, the first captured subpattern).
How do I exclude the starting and trailing slashes in this regex pattern?
Thanks!
You can use this regex:
$pattern = '#(?<=/)[^/]+#';
And use preg_match_all instead of preg_match
PS: Note that you can also use explode to split your input by / and avoid using regex altogether.
I have a special tag in text [Attachment: image;upload;url] to parse it I need to find all this tags, I have wrote this regular expression:
preg_match_all("/.*(\[Attachment: (.*);upload;(.*)\]).*/", $text, $matches);
All work fine, it returns this
Array
(
[0] => Array
(
Text
)
[1] => Array
(
[Attachment: image;upload;url]
)
[2] => Array
(
image
)
[3] => Array
(
url
)
)
But here is one problem, when text contains two or more tags, it will return info only about last founded tag.
You should match only the tags, not the surrounding text:
"/\[Attachment: ([^;]*);upload;([^\]]*)\]/"
Instead of the negative character set you could also use .*? to use non-greedy matching; however, I prefer to use the look-ahead set.
Remove the .* part from the end of the regex. With the .*, the regex matches to the end of the string, including any of the other substrings that you want to find. (Or at least all the ones on the same line - I can't remember what the default settings are in PHP.) After that it looks for more matches from the end of the string, but can't find any.
This regex should do it:
$regex = '/[Attachment: (.*?);(.*?);(.*?)]/';
preg_match_all($regex, $string, $matches);
For me, this came back with what you wanted (3 results);
I have a problem with a regex I wrote to match shortcodes in PHP.
This is the pattern, where $shortcode is the name of the shortcode:
\[$shortcode(.+?)?\](?:(.+?)?\[\/$shortcode\])?
Now, this regex behaves pretty much fine with these formats:
[shortcode]
[shortcode=value]
[shortcode key=value]
[shortcode=value]Text[/shortcode]
[shortcode key1=value1 key2=value2]Text[shortcode]
But it seems to have problems with the most common format,
[shortcode]Text[/shortcode]
which returns as matches the following:
Array
(
[0] => [shortcode]Text[/shortcode]
[1] => ]Text[/shortcode
)
As you can see, the second match (which should be the text, as the first is optional) includes the end of the opening tag and all the closing tag but the last bracket.
EDIT: Found out that the match returned is the first capture, not the second. See the regex in Regexr.
Can you help with this please? I'm really crushing my head on this one.
In your regex:
\[$shortcode(.+?)?\](?:(.+?)?\[\/$shortcode\])?
The first capture group (.+?) matches at least 1 character.
The whole group is optional, but in this case it happens to match every thing up to the last ].
The following regex works:
\[$shortcode(.*?)?\](?:(.+?)?\[\/$shortcode\])?
The * quantifier means 0 or more, while + means one or more.
Granted this is from C#, but
#"\[([\w-_]+)([^\]]*)?\](?:(.+?)?\[\/\1\])?"
should match any (?) possibly self-closing shortcode.
Or you could steal from wordpress: https://core.trac.wordpress.org/browser/tags/4.0/src/wp-includes/shortcodes.php#L309
$pattern = '/(\w+)\s*=\s*"([^"]*)"(?:\s|$)|(\w+)\s*=\s*\'([^\']*)\'(?:\s|$)|(\w+)\s*=\s*([^\s\'"]+)(?:\s|$)|"([^"]*)"(?:\s|$)|(\S+)(?:\s|$)/';
$text = preg_replace("/[\x{00a0}\x{200b}]+/u", " ", $text);
if ( preg_match_all($pattern, $text, $match, PREG_SET_ORDER) )...
I need to extract the digits from the following string using regular expression:
pc 32444 xbox 43567
so my array will be
array ([0] => 32444 [1] => 43567)
Can someone help construct a preg_match for me?
Thanks!
Try this regular expression:
/\d+/
But you would need to use preg_match_all to get all matches:
preg_match_all('/\\d+/', $str, $matches)
$matches[0] will then contain the array you’re looking for.