Ignore whitespace when using preg_match - php

I'm using preg_match to try and capture the 'Data' in this html structure but currently it's not returning anything, I think this may be down to the whitespace?
Just wondering what's wrong in the preg_match?
html
<td><strong>Title</strong></td>
<td>Data</td>
php
preg_match("~<td><strong>Title</strong></td>
<td>([a-zA-Z0-9 -_]+)</td>~", $html, $match);

Instead of trying to reproduce the exact sequence of whitespace (which may be hard or even impossible due to line endings), just use \s* to indicate "any number (including zero) of whitespace characters" - this includes spaces, tabs, newlines, carriage returns... exactly what you need here.

Sorry, did not test before. \s* gives you 0 to infinity possible spaces, so it is your solution here.
preg_match("/<td><strong>Title<\/strong><\/td>\s*<td>([a-zA-Z0-9 -_]+)<\/td>/",
$html, $match)
Tested it out. It works now :)

If you want to get data from an html file, an xml parser can be a lot better.
Anyway, your regular expression won't match anything in more than one line unless you specify the modifier m (you can also specify the modifier s for the dot (.) to match new lines too ).
See http://php.net/manual/en/reference.pcre.pattern.modifiers.php

Use s modifier
Read more about modifires Modifiers
preg_match_all('/<td><strong>Title<\/strong><\/td>.*<td>(.*)<\/td>/iUs',$cnt,$preg);
print_r($preg);
Output:
Array
(
[0] => Array
(
[0] => <td><strong>Title</strong></td>
<td>Data</td>
)
[1] => Array
(
[0] => Data
)
)

Related

How to use a regular expression with preg_match_all to split a string into blocks following a pattern

I'm going to be working with a long string of data that is serialized into blocks using a pattern (x:y).
However, I struggle with regular expressions, and are looking for resources to help identify how to construct a regex to identify any/all of these blocks as they appear in a string.
For example, given the following string:
$s = 't:user c:red t:admin n:"bob doe" s:expressionsf:json';
Note: the f:json at the end is missing a space on purpose, because the format might vary with how the string is eventually given to me. Each block might be spaced, and they might not.
How would I identify each block of x:y to end with the below result:
Array
(
[0] => t:user
[1] => c:red
[2] => t:admin
[3] => n:"bob doe"
[4] => s:expression
[5] => f:json
)
I've tested various expressions using my limited knowledge, but have not been terribly successful.
I can successfully match the pattern using something like this:
^[ctrns]:.+
But this unfourtunately matches the entire string. The part I seem to be missing is how to break each block, while also maintaining the ability to keep spaces within the pairs (see n:"bob doe" example).
Any assistance would be super appreciated! Also, ideally any submission would be explained as to what each token in the expression was accomplishing so that I better my understanding of these techniques.
I've been using https://regexr.com/ to practice.
You may use this regex in preg_match_all:
[ctnsf]:(?:"[^"\\]*(?:\\.[^"\\]*)*"|\S+?(?=[ctnsf]:|\s|$))
RegEx Demo
RegEx Details:
[ctnsf]:: Match one of ctnsf characters followed by :
(?:"[^"\\]*(?:\\.[^"\\]*)*": Match a quoted substring. This takes care of escaped quotes as well.
|: OR
\S+?: Match 1+ not-whitespace characters (non-greedy)
(?=[ctnsf]:|\s|$): Positive lookahead to assert one of the conditions given in assertions.
Code:
$re = '/[ctnsf]:(?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\S+?(?=[ctnsf]:|\s|$))/m';
$str = 't:user c:red t:admin n:"bob \\"doe" s:expressionsf:json';
preg_match_all($re, $str, $matches);
// Print the entire match result
print_r($matches[0]);
Code Demo

Regular expression searching special tag

I have a special tag in text [Attachment: image;upload;url] to parse it I need to find all this tags, I have wrote this regular expression:
preg_match_all("/.*(\[Attachment: (.*);upload;(.*)\]).*/", $text, $matches);
All work fine, it returns this
Array
(
[0] => Array
(
Text
)
[1] => Array
(
[Attachment: image;upload;url]
)
[2] => Array
(
image
)
[3] => Array
(
url
)
)
But here is one problem, when text contains two or more tags, it will return info only about last founded tag.
You should match only the tags, not the surrounding text:
"/\[Attachment: ([^;]*);upload;([^\]]*)\]/"
Instead of the negative character set you could also use .*? to use non-greedy matching; however, I prefer to use the look-ahead set.
Remove the .* part from the end of the regex. With the .*, the regex matches to the end of the string, including any of the other substrings that you want to find. (Or at least all the ones on the same line - I can't remember what the default settings are in PHP.) After that it looks for more matches from the end of the string, but can't find any.
This regex should do it:
$regex = '/[Attachment: (.*?);(.*?);(.*?)]/';
preg_match_all($regex, $string, $matches);
For me, this came back with what you wanted (3 results);

Regex pattern for shortcodes in PHP

I have a problem with a regex I wrote to match shortcodes in PHP.
This is the pattern, where $shortcode is the name of the shortcode:
\[$shortcode(.+?)?\](?:(.+?)?\[\/$shortcode\])?
Now, this regex behaves pretty much fine with these formats:
[shortcode]
[shortcode=value]
[shortcode key=value]
[shortcode=value]Text[/shortcode]
[shortcode key1=value1 key2=value2]Text[shortcode]
But it seems to have problems with the most common format,
[shortcode]Text[/shortcode]
which returns as matches the following:
Array
(
[0] => [shortcode]Text[/shortcode]
[1] => ]Text[/shortcode
)
As you can see, the second match (which should be the text, as the first is optional) includes the end of the opening tag and all the closing tag but the last bracket.
EDIT: Found out that the match returned is the first capture, not the second. See the regex in Regexr.
Can you help with this please? I'm really crushing my head on this one.
In your regex:
\[$shortcode(.+?)?\](?:(.+?)?\[\/$shortcode\])?
The first capture group (.+?) matches at least 1 character.
The whole group is optional, but in this case it happens to match every thing up to the last ].
The following regex works:
\[$shortcode(.*?)?\](?:(.+?)?\[\/$shortcode\])?
The * quantifier means 0 or more, while + means one or more.
Granted this is from C#, but
#"\[([\w-_]+)([^\]]*)?\](?:(.+?)?\[\/\1\])?"
should match any (?) possibly self-closing shortcode.
Or you could steal from wordpress: https://core.trac.wordpress.org/browser/tags/4.0/src/wp-includes/shortcodes.php#L309
$pattern = '/(\w+)\s*=\s*"([^"]*)"(?:\s|$)|(\w+)\s*=\s*\'([^\']*)\'(?:\s|$)|(\w+)\s*=\s*([^\s\'"]+)(?:\s|$)|"([^"]*)"(?:\s|$)|(\S+)(?:\s|$)/';
$text = preg_replace("/[\x{00a0}\x{200b}]+/u", " ", $text);
if ( preg_match_all($pattern, $text, $match, PREG_SET_ORDER) )...

PHP regular expression not being matched - what is wrong?

I have the following regular expression:
"^[x]{1}[a-z]{3,4}:[a-z0-9]{1,6}"
I want to use it to be able to match strings like:
xabc:z123
However, when I try it with this regex tester, it does not match the pattern. Is it my pattern that is wrong, or is the online tester unreliable?.
If my pattern is wrong, could someone point out why it is wrong.
Also, I want to make the pattern matching case insensitive - but I'm not too sure the best way to do that (thought better to ask rather than trial and error). How do I change the pattern so it matches irrespective of case?
Just add an i for case insensitive matching:
/^[x]{1}[a-z]{3,4}:[a-z0-9]{1,6}/i
By the way, your regular expression works!?
Output:
Array
(
[0] => xabc:z123
)
If you want to have something like:
Array
(
[0] => 'xabc:z123',
[1] => 'x',
[2] => 'abc'
...
)
You need to add groups using (), e.g.:
/^([x]{1})([a-z]{3,4}):([a-z0-9]{1,6})/i
In the tester, you have to enter the regex without the surrounding quotes. In PHP source code, you have to use quotes and a regex delimiter; the tester shows that in the code it generates:
$ptn = "/^[x]{1}[a-z]{3,4}:[a-z0-9]{1,6}/";
To make it case insensitive, you have two options. One is to add an i after the closing delimiter, as #middus's answer demonstrates. The other is to add (?i) to the the regex itself:
(?i)^[x]{1}[a-z]{3,4}:[a-z0-9]{1,6}
The tester will accept it either way; if you don't add the delimiters yourself it adds / to either end, which means any slashes in your regex need to be escaped (i.e., it doesn't escape them for you). Be aware that PHP allows you to use other characters as the delimiters, but that tester only recognizes /.
Some further notes:
To match a single x, all you need is x. The square brackets are unnecessary when there's only one letter inside them, and the {1} quantifier never has any effect--it's pure clutter.
If you're using the regex to validate the string, you may want to add a $ anchor to the end.
End result:
/^x[a-z]{3,4}:[a-z0-9]{1,6}$/i
Here is another tester that lets you choose your own delimiters, among other things.

Php preg_match help

I am trying to find a php preg_match that can match
test1 test2[...] but not test1 test2 [...]
and return test2(...) as the output as $match.
I tried
preg_match('/^[a-zA-Z0-9][\[](.*)[\]]$/i',"test1 test2[...]", $matches);
But it matches both cases and return the full sentence.
Any help appreciated.
preg_match('/([a-zA-Z0-9]+[\[][^\]]+[\]])$/i',"test1 test2[...]", $matches);
notice the + after [a-zA-Z0-9] it says one or more alpha numeric character
the ( and ) around the whole expression would permit you to catch the whole expression.
Since your content is around [] I have changed .* to [^\]] since the regular expression are greedy in case of test2[.....] test3[sadsdasdasdad] it would capture until the end since there is a ].
Also please note since you are using the $ it will match always things in the end, I am not really sure if it's what you intend to do.
You can see this for reference.

Categories