How to parse column separated key-value text with possible multiline strings - php

I need to parse the following text:
First: 1
Second: 2
Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
Fourth: value
Value is a string OR multiline string, at the same time value could contain "key: blablabla" substring. Such subsctring should be ignored (not parsed as a separate key-value pair).
Please help me with regex or other algorithm.
Ideal result would be:
$regex = "/SOME REGEX/";
$matches = [];
preg_match_all($regex, $html, $matches);
// $mathes has all key and value parsed pairs, including multilines values
Thank you.
I tried with simple regexes but result is incorrect, because I don't know how to handle multilines:
$regex = "/(.+?): (.+?)/";
$regex = "/(.+?):(.+?)\n/";
...

You can do it with this pattern:
$pattern = '~(?<key>[^:\s]+): (?<value>(?>[^\n]*\R)*?[^\n]*)(?=\R\S+:|$)~';
preg_match_all($pattern, $txt, $matches, PREG_SET_ORDER);
print_r($matches);

You can sort of do it, as long as you consider a single word followed by a colon at the start of a line to be a new key start:
$data = 'First: 1
Second: 2
Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
Fourth: value';
preg_match_all('/^([a-z]+): (.*?)(?=(^[a-z]+:|\z))/ims', $data, $matches);
var_dump($matches);
This gives the following result:
array(4) {
[0]=>
array(4) {
[0]=>
string(10) "First: 1
"
[1]=>
string(11) "Second: 2
"
[2]=>
string(86) "Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
"
[3]=>
string(13) "Fourth: value"
}
[1]=>
array(4) {
[0]=>
string(5) "First"
[1]=>
string(6) "Second"
[2]=>
string(9) "Multiline"
[3]=>
string(6) "Fourth"
}
[2]=>
array(4) {
[0]=>
string(3) "1
"
[1]=>
string(3) "2
"
[2]=>
string(75) "blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
"
[3]=>
string(5) "value"
}
[3]=>
array(4) {
[0]=>
string(7) "Second:"
[1]=>
string(10) "Multiline:"
[2]=>
string(7) "Fourth:"
[3]=>
string(0) ""
}
}

Related

preg_match with multiple find

i have this code
$a='-t40-';
preg_match('/^-t(.*?)-$/', $a,$match);
var_dump($match);
Result:
array(2) { [0]=> array(1) { [0]=> string(5) "-t40-" }
[1]=> array(1) { [0]=> string(2) "40" } }
if i add some text after last "-" code will not be valid.
if $a='-t40-some text'; i need a result similar with:
array(3) { [0]=> array(1) { [0]=> string(5) "-t40-" }
[1]=> array(1) { [0]=> string(2) "40" }
[2]=> array(1) { [0]=> string(9) "some text" }}
How to edit pattern to find "some text"?
Thanks in advance.
$a='-t40-some text';
preg_match('/^-t(.*?)-(.*?)$/', $a,$match);
var_dump($match);
Output:
array(3) {
[0]=>
string(14) "-t40-some text"
[1]=>
string(2) "40"
[2]=>
string(9) "some text"
}
Explanation:
^ : beginning of line
-t : literally "-t"
(.*?) : group 1, 0 or more any charater but newline, not greedy
- : literally "-"
(.*?) : group 2, 0 or more any charater but newline, not greedy
$ : end of line

Combine PREG_SPLIT_DELIM_CAPTURE results

I'm splitting a string following this format:
| + anything goes here + single space
The following regular expression corresponds to said pattern:
/(\|\S*)/
Using preg_split with PREG_SPLIT_DELIM_CAPTURE oddly returns the delimiter into two parts. Is there a flag or option to combine these resulting outputs?
$string = "|one |two |three this is a phrase |four";
$result = preg_split('/(\|\S*)/', $string, NULL, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
What I get:
array(7) {
[0]=>
string(4) "|one"
[1]=>
string(1) " "
[2]=>
string(4) "|two"
[3]=>
string(1) " "
[4]=>
string(6) "|three"
[5]=>
string(18) " this is a phrase "
[6]=>
string(5) "|four"
}
What I want:
array(5) {
[0]=>
string(5) "|one "
[1]=>
string(5) "|two "
[2]=>
string(7) "|three "
[3]=>
string(17) "this is a phrase "
[4]=>
string(5) "|four"
}
Simply catch another whitespace at the end of the word, and you'll get this:
/(\|\S*\h*)/ || /(\|\S*\s*)/
So your code will be:
<?php
$string = "|one |two |three this is a phrase |four";
$result = preg_split('/(\|\S*\s*)/', $string, NULL, PREG_SPLIT_NO_EMPTY |
PREG_SPLIT_DELIM_CAPTURE);
var_dump ($result);
Regex 101: https://regex101.com/r/m5M7Dv/1
Result
array(5) { [0]=> string(5) "|one " [1]=> string(5) "|two " [2]=> string(7) "|three " [3]=> string(17) "this is a phrase " [4]=> string(5) "|four" }

PHP preg_match get content between

<!--:en-->Apvalus šviestuvas<!--:-->
<!--:ru-->Круглый Светильник<!--:-->
<!--:lt-->Round lighting<!--:-->
I need get the content between <!--:lt--> and <!--:-->
I have tried:
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('<!--:lt-->+[a-zA-Z0-9]+<!--:-->$', $string, $match);
var_dump($match);
Something is wrong with the syntax and logic. How can I make this work?
preg_match("/<!--:lt-->([a-zA-Z0-9 ]+?)<!--:-->/", $string, $match);
added delimiters
added a match group
added ? to make it ungreedy
added [space] (there is a space in Round lighting)
Your result should be in $match[1].
A cooler and more generic variation is:
preg_match_all("/<!--:([a-z]+)-->([^<]+)<!--:-->/", $string, $match);
Which will match all of them. Gives:
array(3) { [0]=> array(3) { [0]=> string(37) "Apvalus šviestuvas" [1]=> string(53) "Круглый Светильник" [2]=> string(32) "Round lighting" } [1]=> array(3) { [0]=> string(2) "en" [1]=> string(2) "ru" [2]=> string(2) "lt" } [2]=> array(3) { [0]=> string(19) "Apvalus šviestuvas" [1]=> string(35) "Круглый Светильник" [2]=> string(14) "Round lighting" } }
Use this Pattern (?<=<!--:lt-->)(.*)(?=<!--:-->)
<?php
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('~(?<=<!--:lt-->)(.*)(?=<!--:-->)~', $string, $match);
var_dump($match);

PHP - REGEX TO ARRAY like MP3TAG

I would like to ask how to convert a string to array using
a string pattern like mp3tag does
%ALBUM% - %SOMETHING% - %SOMETHING%,
the ' - ' are custom chars that are not static.
If i didnt made myself clear
i want fro custom sting to make it an array
but the pattern is custom not static
Is this possible in php and if so how.
$str = "%ALBUM% & %SOMETHING% (ノ゜-゜)ノ ︵ ┬──┬ %SOMETHING%,";
preg_match_all("/%([a-z]+)%/i", $str, $matches);
var_dump($matches);
Outputs
array(2) {
[0]=>
array(3) {
[0]=>
string(7) "%ALBUM%"
[1]=>
string(11) "%SOMETHING%"
[2]=>
string(11) "%SOMETHING%"
}
[1]=>
array(3) {
[0]=>
string(5) "ALBUM"
[1]=>
string(9) "SOMETHING"
[2]=>
string(9) "SOMETHING"
}
}

preg_match returns identical elements only once

I am going through a string and proces all the elements between !-- and --!. But only unique elements are processes. When I have !--example--! and a bit further in the text also !--example--!, the second one is ignored.
This is the code:
while ($do = preg_match("/!--(.*?)--!/", $formtext, $matches)){
I know about preg_match_all, but need to do this with preg_match.
Any help? Thanks in advance!
You'll want PHP to look for matches only after the previous match. For that, you'll need to capture string offsets using the PREG_OFFSET_CAPTURE flag.
Example:
$offset = 0;
while (preg_match("/!--(.*?)--!/", $formtext, $match, PREG_OFFSET_CAPTURE, $offset))
{
// calculate next offset
$offset = $match[0][1] + strlen($match[0][0]);
// the parenthesis text is accessed like this:
$paren = $match[1][0];
}
See the preg_match documentation for more info.
Use preg_match_all
edit: some clarification yields:
$string = '!--example--! asdasd !--example--!';
//either this:
$array = preg_split("/!--(.*?)--!/",$string,-1,PREG_SPLIT_DELIM_CAPTURE);
var_dump($array);
array(5) {
[0]=>
string(0) ""
[1]=>
string(7) "example"
[2]=>
string(10) " asdasd "
[3]=>
string(7) "example"
[4]=>
string(0) ""
}
//or this:
$array = preg_split("/(!--(.*?)--!)/",$string,-1,PREG_SPLIT_DELIM_CAPTURE);
var_dump($array);
array(7) {
[0]=>
string(0) ""
[1]=>
string(13) "!--example--!"
[2]=>
string(7) "example"
[3]=>
string(10) " asdasd "
[4]=>
string(13) "!--example--!"
[5]=>
string(7) "example"
[6]=>
string(0) ""
}
while ($do = preg_match("/[!--(.*?)--!]*/", $formtext, $matches)){
Specify the * at the end of the pattern to specify more than one. They should both get added to your $matches array.

Categories