preg_match with multiple find - php

i have this code
$a='-t40-';
preg_match('/^-t(.*?)-$/', $a,$match);
var_dump($match);
Result:
array(2) { [0]=> array(1) { [0]=> string(5) "-t40-" }
[1]=> array(1) { [0]=> string(2) "40" } }
if i add some text after last "-" code will not be valid.
if $a='-t40-some text'; i need a result similar with:
array(3) { [0]=> array(1) { [0]=> string(5) "-t40-" }
[1]=> array(1) { [0]=> string(2) "40" }
[2]=> array(1) { [0]=> string(9) "some text" }}
How to edit pattern to find "some text"?
Thanks in advance.

$a='-t40-some text';
preg_match('/^-t(.*?)-(.*?)$/', $a,$match);
var_dump($match);
Output:
array(3) {
[0]=>
string(14) "-t40-some text"
[1]=>
string(2) "40"
[2]=>
string(9) "some text"
}
Explanation:
^ : beginning of line
-t : literally "-t"
(.*?) : group 1, 0 or more any charater but newline, not greedy
- : literally "-"
(.*?) : group 2, 0 or more any charater but newline, not greedy
$ : end of line

Related

Combine PREG_SPLIT_DELIM_CAPTURE results

I'm splitting a string following this format:
| + anything goes here + single space
The following regular expression corresponds to said pattern:
/(\|\S*)/
Using preg_split with PREG_SPLIT_DELIM_CAPTURE oddly returns the delimiter into two parts. Is there a flag or option to combine these resulting outputs?
$string = "|one |two |three this is a phrase |four";
$result = preg_split('/(\|\S*)/', $string, NULL, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
What I get:
array(7) {
[0]=>
string(4) "|one"
[1]=>
string(1) " "
[2]=>
string(4) "|two"
[3]=>
string(1) " "
[4]=>
string(6) "|three"
[5]=>
string(18) " this is a phrase "
[6]=>
string(5) "|four"
}
What I want:
array(5) {
[0]=>
string(5) "|one "
[1]=>
string(5) "|two "
[2]=>
string(7) "|three "
[3]=>
string(17) "this is a phrase "
[4]=>
string(5) "|four"
}
Simply catch another whitespace at the end of the word, and you'll get this:
/(\|\S*\h*)/ || /(\|\S*\s*)/
So your code will be:
<?php
$string = "|one |two |three this is a phrase |four";
$result = preg_split('/(\|\S*\s*)/', $string, NULL, PREG_SPLIT_NO_EMPTY |
PREG_SPLIT_DELIM_CAPTURE);
var_dump ($result);
Regex 101: https://regex101.com/r/m5M7Dv/1
Result
array(5) { [0]=> string(5) "|one " [1]=> string(5) "|two " [2]=> string(7) "|three " [3]=> string(17) "this is a phrase " [4]=> string(5) "|four" }

Regular expression that match letters from all language php

im trying for few hours to find the right regular expression in php to match any language letters but to prevent it to allow space
i have try this
[^\p{L}]
this is ok but it look like it allow the space
then i have try this
[^\w_-]
and it still look that it allow space
anyone can help with this please ?
You need to specify the Unicode modifier u to get Unicode character properties in PCRE.
For example...
$pattern = "/([\p{L}]+)/u";
$string = "你好,世界!Привет мир! !مرحبا بالعالم";
if (preg_match_all($pattern, $string, $match)) {
var_dump($match);
}
Gives us...
array(2) {
[0]=>
array(6) {
[0]=>
string(6) "你好"
[1]=>
string(6) "世界"
[2]=>
string(12) "Привет"
[3]=>
string(6) "мир"
[4]=>
string(10) "مرحبا"
[5]=>
string(14) "بالعالم"
}
[1]=>
array(6) {
[0]=>
string(6) "你好"
[1]=>
string(6) "世界"
[2]=>
string(12) "Привет"
[3]=>
string(6) "мир"
[4]=>
string(10) "مرحبا"
[5]=>
string(14) "بالعالم"
}
}

How to parse column separated key-value text with possible multiline strings

I need to parse the following text:
First: 1
Second: 2
Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
Fourth: value
Value is a string OR multiline string, at the same time value could contain "key: blablabla" substring. Such subsctring should be ignored (not parsed as a separate key-value pair).
Please help me with regex or other algorithm.
Ideal result would be:
$regex = "/SOME REGEX/";
$matches = [];
preg_match_all($regex, $html, $matches);
// $mathes has all key and value parsed pairs, including multilines values
Thank you.
I tried with simple regexes but result is incorrect, because I don't know how to handle multilines:
$regex = "/(.+?): (.+?)/";
$regex = "/(.+?):(.+?)\n/";
...
You can do it with this pattern:
$pattern = '~(?<key>[^:\s]+): (?<value>(?>[^\n]*\R)*?[^\n]*)(?=\R\S+:|$)~';
preg_match_all($pattern, $txt, $matches, PREG_SET_ORDER);
print_r($matches);
You can sort of do it, as long as you consider a single word followed by a colon at the start of a line to be a new key start:
$data = 'First: 1
Second: 2
Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
Fourth: value';
preg_match_all('/^([a-z]+): (.*?)(?=(^[a-z]+:|\z))/ims', $data, $matches);
var_dump($matches);
This gives the following result:
array(4) {
[0]=>
array(4) {
[0]=>
string(10) "First: 1
"
[1]=>
string(11) "Second: 2
"
[2]=>
string(86) "Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
"
[3]=>
string(13) "Fourth: value"
}
[1]=>
array(4) {
[0]=>
string(5) "First"
[1]=>
string(6) "Second"
[2]=>
string(9) "Multiline"
[3]=>
string(6) "Fourth"
}
[2]=>
array(4) {
[0]=>
string(3) "1
"
[1]=>
string(3) "2
"
[2]=>
string(75) "blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
"
[3]=>
string(5) "value"
}
[3]=>
array(4) {
[0]=>
string(7) "Second:"
[1]=>
string(10) "Multiline:"
[2]=>
string(7) "Fourth:"
[3]=>
string(0) ""
}
}

PHP Regex - Match multiple possibilities (pipe)

I want to grab all IDs (integers) from several URLs within a text. These URLs could look like these:
http://url.tld/index.php/p1
http://url.tld/p2#abc
http://url.tld/index.php/Page/3-xxx
http://url.tld/Page/4
For this, I've built two regexes (the URLs are enclosed by an URL bbcode):
\[url\](http\://url\.tld/index\.php/p(\d+).*?\)[/url\]
\[url\](http\://url\.tld(?:/index\.php)?/Page/(\d+).*?\)[/url\]
However, if i do a preg_match_all with every single regex, I get an array that looks like this (and which is correct):
array(3) {
[0]=>
array(2) {
[0]=>
string(62) "[url]http://url.tld/index.php/Page/6-fdgfh/[/url]"
[1]=>
string(50) "[url]http://url.tld/Page/7[/url]"
}
[1]=>
array(2) {
[0]=>
string(51) "http://url.tld/index.php/Page/6-fdgfh/"
[1]=>
string(39) "http://url.tld/Page/7"
}
[2]=>
array(2) {
[0]=>
string(1) "6"
[1]=>
string(1) "7"
}
}
But if I combine both regexes with a pipe:
\[url\](http\://url\.tld/index\.php/p(\d+).*?|http\://url\.tld(?:/index\.php)?/Page/(\d+).*?)\[/url\]
it builds an array like this (which is wrong):
array(4) {
[0]=>
array(3) {
[0]=>
string(71) "[url]http://url.tld/index.php/p9-abc#hashtag[/url]"
[1]=>
string(62) "[url]http://url.tld/index.php/Page/6-fdgfh/[/url]"
[2]=>
string(50) "[url]http://url.tld/Page/7[/url]"
}
[1]=>
array(3) {
[0]=>
string(60) "http://url.tld/index.php/t9-abc#hashtag"
[1]=>
string(51) "http://url.tld/index.php/Page/6-fdgfh/"
[2]=>
string(39) "http://url.tld/Page/7"
}
[2]=>
array(3) {
[0]=>
string(1) "9"
[1]=>
string(0) ""
[2]=>
string(0) ""
}
[3]=>
array(3) {
[0]=>
string(0) ""
[1]=>
string(1) "6"
[2]=>
string(1) "7"
}
}
====
So, my question is: How can I fix this? What I need is the array structure from the first example, while using both regular expressions as one regular expression, because I need a consistent structure to do a preg_replace_callback later.
I think you're looking for the Branch Reset group:
\[url]((?|http://url\.tld/index\.php/p(\d+).*?|http://url\.tld(?:/index\.php)?/Page/(\d+).*?))\[/url]
Or, for the line-noise-challenged among us:
\[url]
(
(?|
http://url\.tld/index\.php/p(\d+)[^[]*
|
http://url\.tld(?:/index\.php)?/Page/(\d+)[^[]*
)
)
\[/url]
This captures the numbers in group #2, no matter which part of the regex matched it. The whole URL is still captured in group #1.

PHP preg_match get content between

<!--:en-->Apvalus šviestuvas<!--:-->
<!--:ru-->Круглый Светильник<!--:-->
<!--:lt-->Round lighting<!--:-->
I need get the content between <!--:lt--> and <!--:-->
I have tried:
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('<!--:lt-->+[a-zA-Z0-9]+<!--:-->$', $string, $match);
var_dump($match);
Something is wrong with the syntax and logic. How can I make this work?
preg_match("/<!--:lt-->([a-zA-Z0-9 ]+?)<!--:-->/", $string, $match);
added delimiters
added a match group
added ? to make it ungreedy
added [space] (there is a space in Round lighting)
Your result should be in $match[1].
A cooler and more generic variation is:
preg_match_all("/<!--:([a-z]+)-->([^<]+)<!--:-->/", $string, $match);
Which will match all of them. Gives:
array(3) { [0]=> array(3) { [0]=> string(37) "Apvalus šviestuvas" [1]=> string(53) "Круглый Светильник" [2]=> string(32) "Round lighting" } [1]=> array(3) { [0]=> string(2) "en" [1]=> string(2) "ru" [2]=> string(2) "lt" } [2]=> array(3) { [0]=> string(19) "Apvalus šviestuvas" [1]=> string(35) "Круглый Светильник" [2]=> string(14) "Round lighting" } }
Use this Pattern (?<=<!--:lt-->)(.*)(?=<!--:-->)
<?php
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('~(?<=<!--:lt-->)(.*)(?=<!--:-->)~', $string, $match);
var_dump($match);

Categories