Regular expression searching special tag - php

I have a special tag in text [Attachment: image;upload;url] to parse it I need to find all this tags, I have wrote this regular expression:
preg_match_all("/.*(\[Attachment: (.*);upload;(.*)\]).*/", $text, $matches);
All work fine, it returns this
Array
(
[0] => Array
(
Text
)
[1] => Array
(
[Attachment: image;upload;url]
)
[2] => Array
(
image
)
[3] => Array
(
url
)
)
But here is one problem, when text contains two or more tags, it will return info only about last founded tag.

You should match only the tags, not the surrounding text:
"/\[Attachment: ([^;]*);upload;([^\]]*)\]/"
Instead of the negative character set you could also use .*? to use non-greedy matching; however, I prefer to use the look-ahead set.

Remove the .* part from the end of the regex. With the .*, the regex matches to the end of the string, including any of the other substrings that you want to find. (Or at least all the ones on the same line - I can't remember what the default settings are in PHP.) After that it looks for more matches from the end of the string, but can't find any.

This regex should do it:
$regex = '/[Attachment: (.*?);(.*?);(.*?)]/';
preg_match_all($regex, $string, $matches);
For me, this came back with what you wanted (3 results);

Related

How to use a regular expression with preg_match_all to split a string into blocks following a pattern

I'm going to be working with a long string of data that is serialized into blocks using a pattern (x:y).
However, I struggle with regular expressions, and are looking for resources to help identify how to construct a regex to identify any/all of these blocks as they appear in a string.
For example, given the following string:
$s = 't:user c:red t:admin n:"bob doe" s:expressionsf:json';
Note: the f:json at the end is missing a space on purpose, because the format might vary with how the string is eventually given to me. Each block might be spaced, and they might not.
How would I identify each block of x:y to end with the below result:
Array
(
[0] => t:user
[1] => c:red
[2] => t:admin
[3] => n:"bob doe"
[4] => s:expression
[5] => f:json
)
I've tested various expressions using my limited knowledge, but have not been terribly successful.
I can successfully match the pattern using something like this:
^[ctrns]:.+
But this unfourtunately matches the entire string. The part I seem to be missing is how to break each block, while also maintaining the ability to keep spaces within the pairs (see n:"bob doe" example).
Any assistance would be super appreciated! Also, ideally any submission would be explained as to what each token in the expression was accomplishing so that I better my understanding of these techniques.
I've been using https://regexr.com/ to practice.
You may use this regex in preg_match_all:
[ctnsf]:(?:"[^"\\]*(?:\\.[^"\\]*)*"|\S+?(?=[ctnsf]:|\s|$))
RegEx Demo
RegEx Details:
[ctnsf]:: Match one of ctnsf characters followed by :
(?:"[^"\\]*(?:\\.[^"\\]*)*": Match a quoted substring. This takes care of escaped quotes as well.
|: OR
\S+?: Match 1+ not-whitespace characters (non-greedy)
(?=[ctnsf]:|\s|$): Positive lookahead to assert one of the conditions given in assertions.
Code:
$re = '/[ctnsf]:(?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\S+?(?=[ctnsf]:|\s|$))/m';
$str = 't:user c:red t:admin n:"bob \\"doe" s:expressionsf:json';
preg_match_all($re, $str, $matches);
// Print the entire match result
print_r($matches[0]);
Code Demo

Using preg_match to find all text between brackets

I am trying to search a document for tags, to later replace them.
I am using preg_match, but am having some difficulty.
preg_match('/\[.*\]/', $haystack, $matches);
Searching through the following text
[TODAY_DATE]
Re: Demand for Payment
[ADDRESS]
[ADDRESS_LINE_2]
...etc
print_r($matches);
returns
Array ( [0] => [TODAY_DATE] )
How should I adjust my regex to return all matches?
Use preg_match_all. As the name suggests, it matches more than once.
preg_match_all('/\[.*?\]/', $haystack, $matches);
var_dump($matches);
Also use reluctant quantifier .*? instead of greedy one.

Ignore whitespace when using preg_match

I'm using preg_match to try and capture the 'Data' in this html structure but currently it's not returning anything, I think this may be down to the whitespace?
Just wondering what's wrong in the preg_match?
html
<td><strong>Title</strong></td>
<td>Data</td>
php
preg_match("~<td><strong>Title</strong></td>
<td>([a-zA-Z0-9 -_]+)</td>~", $html, $match);
Instead of trying to reproduce the exact sequence of whitespace (which may be hard or even impossible due to line endings), just use \s* to indicate "any number (including zero) of whitespace characters" - this includes spaces, tabs, newlines, carriage returns... exactly what you need here.
Sorry, did not test before. \s* gives you 0 to infinity possible spaces, so it is your solution here.
preg_match("/<td><strong>Title<\/strong><\/td>\s*<td>([a-zA-Z0-9 -_]+)<\/td>/",
$html, $match)
Tested it out. It works now :)
If you want to get data from an html file, an xml parser can be a lot better.
Anyway, your regular expression won't match anything in more than one line unless you specify the modifier m (you can also specify the modifier s for the dot (.) to match new lines too ).
See http://php.net/manual/en/reference.pcre.pattern.modifiers.php
Use s modifier
Read more about modifires Modifiers
preg_match_all('/<td><strong>Title<\/strong><\/td>.*<td>(.*)<\/td>/iUs',$cnt,$preg);
print_r($preg);
Output:
Array
(
[0] => Array
(
[0] => <td><strong>Title</strong></td>
<td>Data</td>
)
[1] => Array
(
[0] => Data
)
)

Regex pattern for shortcodes in PHP

I have a problem with a regex I wrote to match shortcodes in PHP.
This is the pattern, where $shortcode is the name of the shortcode:
\[$shortcode(.+?)?\](?:(.+?)?\[\/$shortcode\])?
Now, this regex behaves pretty much fine with these formats:
[shortcode]
[shortcode=value]
[shortcode key=value]
[shortcode=value]Text[/shortcode]
[shortcode key1=value1 key2=value2]Text[shortcode]
But it seems to have problems with the most common format,
[shortcode]Text[/shortcode]
which returns as matches the following:
Array
(
[0] => [shortcode]Text[/shortcode]
[1] => ]Text[/shortcode
)
As you can see, the second match (which should be the text, as the first is optional) includes the end of the opening tag and all the closing tag but the last bracket.
EDIT: Found out that the match returned is the first capture, not the second. See the regex in Regexr.
Can you help with this please? I'm really crushing my head on this one.
In your regex:
\[$shortcode(.+?)?\](?:(.+?)?\[\/$shortcode\])?
The first capture group (.+?) matches at least 1 character.
The whole group is optional, but in this case it happens to match every thing up to the last ].
The following regex works:
\[$shortcode(.*?)?\](?:(.+?)?\[\/$shortcode\])?
The * quantifier means 0 or more, while + means one or more.
Granted this is from C#, but
#"\[([\w-_]+)([^\]]*)?\](?:(.+?)?\[\/\1\])?"
should match any (?) possibly self-closing shortcode.
Or you could steal from wordpress: https://core.trac.wordpress.org/browser/tags/4.0/src/wp-includes/shortcodes.php#L309
$pattern = '/(\w+)\s*=\s*"([^"]*)"(?:\s|$)|(\w+)\s*=\s*\'([^\']*)\'(?:\s|$)|(\w+)\s*=\s*([^\s\'"]+)(?:\s|$)|"([^"]*)"(?:\s|$)|(\S+)(?:\s|$)/';
$text = preg_replace("/[\x{00a0}\x{200b}]+/u", " ", $text);
if ( preg_match_all($pattern, $text, $match, PREG_SET_ORDER) )...

how to find multiple occurring tags in a text with php?

i have a simple php example of using preg_match_all
$str = "
Line 1: This is a string
Line 2: [img] image_path [/img] Should not be [img] image_path2 [/img] included.
Line 3: End of test [img] image_path3 [/img] string.";
preg_match_all("~\[img](.+)\[/img]~i", $str, $m);
var_dump($m);
and i would like it to return
array(
[0] =>image_path
[1] =>image_path2
[2] =>image_path3
)
for some reason i don't get this result.
ant ideas?
Change it to this:
preg_match_all("~\[img](.+?)\[/img]~i", $str, $m);
var_dump($m[1]);
The reason you need the ? is to make it "non-greedy". With your code, it matches from the first opening tag to the last closing tag. The + and * operators are greedy by default, consuming as many characters as possible. The ? modifier stops this behaviour.
You need to dump $m[1]instead of $m since preg_match* also matches the entire matched string, not just marked captures.
Live example: http://ideone.com/vXk9W

Categories