PHP preg_match regex capturing pattern in a string - php

I can't seem to get Regular Expressions right whenever I need to use them ...
Given a string like this one:
$string = 'text here [download="PDC" type="A"] and the text continues [download="PDS" type="B"] and more text yet again, more shotcodes might exist ...';
I need to print the "text here" part, then execute a mysql query based on the variables "PDC" and "A", then print the rest of the string... (repeating all again if more [download] exist in the string).
So far I have the following regex
$regex = '/(.*?)[download="(.*?)" type="(.*?)"](.*?)/';
preg_match($regex,$string,$res);
print_r($res);
But this is only capturing the following:
Array ( [0] => 111111 [1] => 111111 [2] => )
I'm using preg_match() ... should I use preg_match_all() instead? Anyway ... the regex is surely wrong... any help ?

[ opens character class, and ] finishes it. Such characters with meaning need to be either escaped or put into a QE block in PCRE regex.
/(.*?)\Q[download="\E(.*?)" type="(.*?)"](.*?)/
##^ ## ^-- you were looking for "tipo"
|
this character needs to be taken literal, hence the \Q....\E around it
## ##

Try it with with "little" one
/(?P<before>(?:(?!\[download="[^"]*" type="[^"]*"\]).)*)\[download="(?P<download>[^"]*)" type="(?P<type>[^"]*)"\](?P<after>(?:(?!\[download="[^"]*" type="[^"]*"\]).)*)/
It will provide you the keys before, after, download and type in the matches result.
Test it here: http://www.regex101.com/r/mF2vN5

Related

All caps string with space in php

I got to extract string like "THE NEED OF FOLLOWING A RELIGION " from string.
I extracted individual words like THE , NEED, OF... but I need complete string of capital letter like "THE NEED OF FOLLOWING A RELIGION" but not able to do so, please help.
preg_match_all("/[A-Z]*/", $html, $out);
Thanks
A very basic modification to the original code to find capitals of more than 1 at a time.
$str='This is a string WITH MIXED CASE words and WE ONLY WANT capitals';
preg_match_all("/[A-Z\s]{2,}/", $str, $out);
echo '<pre>',print_r($out,true),'</pre>';
outputs:
Array
(
[0] => Array
(
[0] => WITH MIXED CASE
[1] => WE ONLY WANT
)
)
Your regex just missed some condition of delimiter, which is
Words has to be either followed or leaded by a space.
Convert the sentence above to regex we get
[A-Z]*(?=\s)|(?<=\s)[A-Z]*
The regex above can interpret into either
\sWORD
WORD\s
\sWORD\s
See DEMO.
You can add the white space to your class like [A-Z ]. Now you can get all the strings in capitalized words but also a bench of single spaces.To avoid getting single spaces you use this ([A-Z]+[A-Z ]*[A-Z]) I added the () to capture the matched results.
You can check it in action here.

preg_match returns different symbols than input string

[Resolved] Adding the modifier /u to the regular expression fixes this issue if anyone is struggling with this. Credit to M.I. in the comments :)
Consider the following code:
var_dump('Trimiteţi');
preg_match('/^([\p{L}]+)/', 'Trimiteţi', $matches);
print_r($matches);
I am using it to filter a word that might have non-latin characters using \p{L}, also notice that I don't use the end string $ regular expression symbol in the preg_match
Now to the problem, when executing the code locally I receive this output:
string 'Trimiteţi' (length=10)
Array ( [0] => TrimiteÅ [1] => TrimiteÅ )
I tried executing the code in the PHP sandbox, and it outputs something similar:
string(10) "Trimiteţi"
Array
(
[0] => Trimite�
[1] => Trimite�
)
Notice that at least this time it didn't ruin the original var_dump word.
What is going on? Why using preg_match changes the word? Worst part about this is, if I add $ to the end of the regular expression, it will NOT MATCH, since I suppose those transformed symbols could not be interpreted as a string end or something. Please, help me
Edit: the file encoding that I'm running is set to "text/x-php; charset=utf-8"
Edit2: Additionally, I used regex101.com, and when using REGULAR EXPRESSION "^[\p{L}]+$" and word "Trimiteţi" it seems to match. You can even switch the REGULAR EXPRESSION TO "^([\p{L}]+)$", adding the capturing group, and the site outputs:
MATCH 1
1. [0-9] `Trimiteţi`

php preg_match_all between ... and

I'm trying to use preg_match_all to match anything between ... and ... and the line does word wrap. I've done number of searches on google and tried different combinations and nothing is working. I have tried this
preg_match_all('/...(.*).../m/', $rawdata, $m);
Below is an example of what the format will look like:
...this is a test...
...this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test...
The s modifier allows for . to include new line characters so try:
preg_match_all('/\.{3}(.*?)\.{3}/s', $rawdata, $m);
The m modifier you were using is so the ^$ acts on a per line basis rather than per string (since you don't have ^$ doesn't make sense).
You can read more about the modifiers here.
Note the . needs to be escaped as well because it is a special character meaning any character. The ? after the .* makes it non-greedy so it will match the first ... that is found. The {3} says three of the previous character.
Regex101 demo: https://regex101.com/r/eO6iD1/1
Please escape the literal dots, since the character is also a regular expressions reservered sign, as you use it inside your code yourself:
preg_match_all('/\.\.\.(.*)\.\.\./m/', $rawdata, $m)
In case what you wanted to state is that there are line breaks within the content to match you would have to add this explicitely to your code:
preg_match_all('/\.\.\.([.\n\r]*)\.\.\./m/', $rawdata, $m)
Check here for reference on what characters the dot includes:
http://www.regular-expressions.info/dot.html
You're almost near to get it,
so you need to update your RE
/\.{3}(.*)\.{3}/m
RE breakdown
/: start/end of string
\.: match .
{3}: match exactly 3(in this case match exactly 3 dots)
(.*): match anything that comes after the first match(...)
m: match strings that are over Multi lines.
and when you're putting all things together, you'll have this
$str = "...this is a test...";
preg_match_all('/\.{3}(.*)\.{3}/m', $str, $m);
print_r($m);
outputs
Array
(
[0] => Array
(
[0] => ...this is a test...
)
[1] => Array
(
[0] => this is a test
)
)
DEMO

preg_split with two patterns (one of them quoted)

I would like to split a string in PHP containing quoted and unquoted substrings.
Let's say I have the following string:
"this is a string" cat dog "cow"
The splitted array should look like this:
array (
[0] => "this is a string"
[1] => "cat"
[2] => "dog"
[3] => "cow"
)
I'm struggling a bit with regex and I'm wondering if it is even possible to achieve with just one regex/preg_split-Call...
The first thing I tried was:
[[:blank:]]*(?=(?:[^"]*"[^"]*")*[^"]*$)[[:blank:]]*
But this splits only array[0] and array[3] correctly - the rest is splitted on a per character base.
Then I found this link:
PHP preg_split with two delimiters unless a delimiter is within quotes
(?=(?:[^"]*"[^"]*")*[^"]*$)
This seems to me as a good startingpoint. However the result in my example is the same as with the first regex.
I tried combining both - first the one for quoted strings and then a second sub-regex which should ommit quoted string (therefore the [^"]):
(?=(?:[^"]*"[^"]*")*[^"]*$)|[[:blank:]]*([^"].*[^"])[[:blank:]]*
Therefore 2 questions:
Is it even possible to achieve what I want with just one regex/preg_split-Call?
If yes, I would appreciate a hint on how to assemble the regex correctly
Since matches cannot overlap, you could use preg_match_all like this:
preg_match_all('/"[^"]*"|\S+/', $input, $matches);
Now $matches[0] should contain what you are looking for. The regex will first try to match a quoted string, and then stop. If that doesn't do it it will just collect as many non-whitespace characters as possible. Since alternations are tried from left to right, the quoted version takes precedence.
EDIT: This will not get rid of the quotes though. To do this, you could use capturing groups:
preg_match_all('/(?|"([^"]*)"|(\S+))/', $input, $matches);
Now $matches[1] will contain exactly what you are looking for. The (?| is there so that both capturing groups end up at the same index.
EDIT 2: Since you were asking for a preg_split solution, that is also possible. We can use a lookahead, that asserts that the space is followed by an even number of quotes (up until the end of the string):
$result = preg_split('/\s+(?=(?:[^"]*"[^"]*")*$)/', $input);
Of course, this will not get rid of the quotes, but that can easily be done in a separate step.

PHP REGEX: space & line break in paramters (custom tags)

this is my custom tag
[extract=A:B(
<div>
<p>Some content...</p>
</div>
)]
The word extract stays as it is.
Value A has a string input (one word no spaces, no line breaks)
Value B will contain html closed in (). it will contain line breaks
I am not good with regular expressions but this is basically what I want.
\/[extract=(.*?):(.*?)/]\
I need the appropriate pattern query and a foreach loop, preg_match_all(), to return A & B
Try this:
preg_match_all('/\[extract=(?<class>\w+):(?<method>\w+)\((?<html>.*?)\)\]/s', $content, $matches);
print_r($matches['class']);
print_r($matches['method']);
print_r($matches['html']);
Should output:
Array
(
[0] => A
)
Array
(
[0] => B
)
Array
(
[0] =>
<div>
<p>Some content...</p>
</div>
)
This may not be perfect, but seems to work in very quick and limited testing. If nothing else, it might help you get to a better solution.
/^(?:\[extract=)(\w)+:(\w+\(.*\))\]$/s
Note that the trailing s flag is used to make the dot match all characters including new lines.
I don't fully understand your question. Where should the string input for "A" be? In the place of letter "A"?
If this is what you want, then the solution is:
preg_match_all('/\[extract=([^\s]+?):(.+)\]/s', 'your custom tag', $result);
So what you might be looking for is the modifier s which modifies the dot (".") character's meaning so to include linebreaks.
I also recommend you http://www.regular-expressions.info/ if you want to get more familiar with regular expressions.

Categories