Combine PREG_SPLIT_DELIM_CAPTURE results - php

I'm splitting a string following this format:
| + anything goes here + single space
The following regular expression corresponds to said pattern:
/(\|\S*)/
Using preg_split with PREG_SPLIT_DELIM_CAPTURE oddly returns the delimiter into two parts. Is there a flag or option to combine these resulting outputs?
$string = "|one |two |three this is a phrase |four";
$result = preg_split('/(\|\S*)/', $string, NULL, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
What I get:
array(7) {
[0]=>
string(4) "|one"
[1]=>
string(1) " "
[2]=>
string(4) "|two"
[3]=>
string(1) " "
[4]=>
string(6) "|three"
[5]=>
string(18) " this is a phrase "
[6]=>
string(5) "|four"
}
What I want:
array(5) {
[0]=>
string(5) "|one "
[1]=>
string(5) "|two "
[2]=>
string(7) "|three "
[3]=>
string(17) "this is a phrase "
[4]=>
string(5) "|four"
}

Simply catch another whitespace at the end of the word, and you'll get this:
/(\|\S*\h*)/ || /(\|\S*\s*)/
So your code will be:
<?php
$string = "|one |two |three this is a phrase |four";
$result = preg_split('/(\|\S*\s*)/', $string, NULL, PREG_SPLIT_NO_EMPTY |
PREG_SPLIT_DELIM_CAPTURE);
var_dump ($result);
Regex 101: https://regex101.com/r/m5M7Dv/1
Result
array(5) { [0]=> string(5) "|one " [1]=> string(5) "|two " [2]=> string(7) "|three " [3]=> string(17) "this is a phrase " [4]=> string(5) "|four" }

Related

preg_match with multiple find

i have this code
$a='-t40-';
preg_match('/^-t(.*?)-$/', $a,$match);
var_dump($match);
Result:
array(2) { [0]=> array(1) { [0]=> string(5) "-t40-" }
[1]=> array(1) { [0]=> string(2) "40" } }
if i add some text after last "-" code will not be valid.
if $a='-t40-some text'; i need a result similar with:
array(3) { [0]=> array(1) { [0]=> string(5) "-t40-" }
[1]=> array(1) { [0]=> string(2) "40" }
[2]=> array(1) { [0]=> string(9) "some text" }}
How to edit pattern to find "some text"?
Thanks in advance.
$a='-t40-some text';
preg_match('/^-t(.*?)-(.*?)$/', $a,$match);
var_dump($match);
Output:
array(3) {
[0]=>
string(14) "-t40-some text"
[1]=>
string(2) "40"
[2]=>
string(9) "some text"
}
Explanation:
^ : beginning of line
-t : literally "-t"
(.*?) : group 1, 0 or more any charater but newline, not greedy
- : literally "-"
(.*?) : group 2, 0 or more any charater but newline, not greedy
$ : end of line

Regular expression that match letters from all language php

im trying for few hours to find the right regular expression in php to match any language letters but to prevent it to allow space
i have try this
[^\p{L}]
this is ok but it look like it allow the space
then i have try this
[^\w_-]
and it still look that it allow space
anyone can help with this please ?
You need to specify the Unicode modifier u to get Unicode character properties in PCRE.
For example...
$pattern = "/([\p{L}]+)/u";
$string = "你好,世界!Привет мир! !مرحبا بالعالم";
if (preg_match_all($pattern, $string, $match)) {
var_dump($match);
}
Gives us...
array(2) {
[0]=>
array(6) {
[0]=>
string(6) "你好"
[1]=>
string(6) "世界"
[2]=>
string(12) "Привет"
[3]=>
string(6) "мир"
[4]=>
string(10) "مرحبا"
[5]=>
string(14) "بالعالم"
}
[1]=>
array(6) {
[0]=>
string(6) "你好"
[1]=>
string(6) "世界"
[2]=>
string(12) "Привет"
[3]=>
string(6) "мир"
[4]=>
string(10) "مرحبا"
[5]=>
string(14) "بالعالم"
}
}

Trying to merge two separate regular expressions into one

I need to extract a string that is enclosed by both parentheses and single quotes. Currently, I am using two regex pattern to do the job. With the first regex I retrieve a string from parentheses while the string still contains single quotes, and with the second regex I can strip that single quotes from it. Now, I would like to do this job in a single step. For the past one hour and so I have been experimenting with some patterns without any viable results; may be its due to my limited regex knowledge. So, any feedback you offer to me will be very helpful. I also welcome any solutions apart from regular expressions.
Here is an example string that needs to be parsed.
$string = "[('minute stroller workout', 9.0), ('week', 1.0), ('leaving', 1.0), ('times', 1.0), ('guilt', 1.0), ('baby', 1.0), ('beginning', 1.0)]";
# Strip parentheses
preg_match_all('#\((.*?)\)#', $string, $match);
# I am using the first match here
echo $match[1][0]; // output = 'minute stroller workout', 9.0
# Strip single quotes and extract the string
preg_match('~(["\'])([^"\']+)\1~', $match[1][0], $matches);
echo $matches[2]; // output = minute stroller workout (i.e. what we are looking for)
If I understand you correctly
preg_match_all('/\(\'([\s\w]*)\', ([\d.]*)\)/', $string, $match);
Output for your string
array(3) {
[0]=>
array(7) {
[0]=>
string(32) "('minute stroller workout', 9.0)"
[1]=>
string(13) "('week', 1.0)"
[2]=>
string(16) "('leaving', 1.0)"
[3]=>
string(14) "('times', 1.0)"
[4]=>
string(14) "('guilt', 1.0)"
[5]=>
string(13) "('baby', 1.0)"
[6]=>
string(18) "('beginning', 1.0)"
}
[1]=>
array(7) {
[0]=>
string(23) "minute stroller workout"
[1]=>
string(4) "week"
[2]=>
string(7) "leaving"
[3]=>
string(5) "times"
[4]=>
string(5) "guilt"
[5]=>
string(4) "baby"
[6]=>
string(9) "beginning"
}
[2]=>
array(7) {
[0]=>
string(3) "9.0"
[1]=>
string(3) "1.0"
[2]=>
string(3) "1.0"
[3]=>
string(3) "1.0"
[4]=>
string(3) "1.0"
[5]=>
string(3) "1.0"
[6]=>
string(3) "1.0"
}
}
You can use this single regex:
preg_match("#\('([^']+)#", $string, $matches);
echo $matches[1];
//=> minute stroller workout

How to parse column separated key-value text with possible multiline strings

I need to parse the following text:
First: 1
Second: 2
Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
Fourth: value
Value is a string OR multiline string, at the same time value could contain "key: blablabla" substring. Such subsctring should be ignored (not parsed as a separate key-value pair).
Please help me with regex or other algorithm.
Ideal result would be:
$regex = "/SOME REGEX/";
$matches = [];
preg_match_all($regex, $html, $matches);
// $mathes has all key and value parsed pairs, including multilines values
Thank you.
I tried with simple regexes but result is incorrect, because I don't know how to handle multilines:
$regex = "/(.+?): (.+?)/";
$regex = "/(.+?):(.+?)\n/";
...
You can do it with this pattern:
$pattern = '~(?<key>[^:\s]+): (?<value>(?>[^\n]*\R)*?[^\n]*)(?=\R\S+:|$)~';
preg_match_all($pattern, $txt, $matches, PREG_SET_ORDER);
print_r($matches);
You can sort of do it, as long as you consider a single word followed by a colon at the start of a line to be a new key start:
$data = 'First: 1
Second: 2
Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
Fourth: value';
preg_match_all('/^([a-z]+): (.*?)(?=(^[a-z]+:|\z))/ims', $data, $matches);
var_dump($matches);
This gives the following result:
array(4) {
[0]=>
array(4) {
[0]=>
string(10) "First: 1
"
[1]=>
string(11) "Second: 2
"
[2]=>
string(86) "Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
"
[3]=>
string(13) "Fourth: value"
}
[1]=>
array(4) {
[0]=>
string(5) "First"
[1]=>
string(6) "Second"
[2]=>
string(9) "Multiline"
[3]=>
string(6) "Fourth"
}
[2]=>
array(4) {
[0]=>
string(3) "1
"
[1]=>
string(3) "2
"
[2]=>
string(75) "blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
"
[3]=>
string(5) "value"
}
[3]=>
array(4) {
[0]=>
string(7) "Second:"
[1]=>
string(10) "Multiline:"
[2]=>
string(7) "Fourth:"
[3]=>
string(0) ""
}
}

PHP preg_match get content between

<!--:en-->Apvalus šviestuvas<!--:-->
<!--:ru-->Круглый Светильник<!--:-->
<!--:lt-->Round lighting<!--:-->
I need get the content between <!--:lt--> and <!--:-->
I have tried:
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('<!--:lt-->+[a-zA-Z0-9]+<!--:-->$', $string, $match);
var_dump($match);
Something is wrong with the syntax and logic. How can I make this work?
preg_match("/<!--:lt-->([a-zA-Z0-9 ]+?)<!--:-->/", $string, $match);
added delimiters
added a match group
added ? to make it ungreedy
added [space] (there is a space in Round lighting)
Your result should be in $match[1].
A cooler and more generic variation is:
preg_match_all("/<!--:([a-z]+)-->([^<]+)<!--:-->/", $string, $match);
Which will match all of them. Gives:
array(3) { [0]=> array(3) { [0]=> string(37) "Apvalus šviestuvas" [1]=> string(53) "Круглый Светильник" [2]=> string(32) "Round lighting" } [1]=> array(3) { [0]=> string(2) "en" [1]=> string(2) "ru" [2]=> string(2) "lt" } [2]=> array(3) { [0]=> string(19) "Apvalus šviestuvas" [1]=> string(35) "Круглый Светильник" [2]=> string(14) "Round lighting" } }
Use this Pattern (?<=<!--:lt-->)(.*)(?=<!--:-->)
<?php
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('~(?<=<!--:lt-->)(.*)(?=<!--:-->)~', $string, $match);
var_dump($match);

Categories